Skip to content

Wikistats-to-CSV (wikistats2csv) downloads Wikipedia Statistics for a given Wikipedia in a format of CSV.

License

Notifications You must be signed in to change notification settings

SaiedAlshahrani/Wikistats-to-CSV

Repository files navigation

Wikistats-to-CSV (wikistats2csv)

wikistats2csv-logo

Wikistats-to-CSV (wikistats2csv) is a Python Package (PIP) and Command Line Interface (CLI) that downloads Wikipedia Statistics for a given Wikipedia in a format of CSV from Wikimedia Statistics project.

Install:

Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1, rich==12.5.1, numpy==1.23.2, pandas==1.4.3, selenium==3.141.0, and geckodriver-autoinstaller==0.1.0. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip manually.

python3 -m pip install -r requirements.txt

To download Wikistats-to-CSV (wikistats2csv) using pip command , we highly recommend you first upgrade the pip command to the latest version.

python3 -m pip install --upgrade pip
python3 -m pip install wikistats2csv

If you encounter a warning of "WARNING: the script is installed in '/Users/.../.../bin' which is not on path", then you need to add the displayed path "/Users/.../.../bin" to the $PATH variable using this command:

export PATH="/Users/.../.../bin:$PATH"

Usage:

* As CLI:

>> Long Flags:

$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly

              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌

## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                 28             37  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                 51            175  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z           36945305        6518484  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z           37088260        6534151  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z

>> Short Flags:

$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly

              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌

## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0            591  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0            591  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            5508072        1173410  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            5538121        1180401  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

* As Python Package:

>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')

## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0              0  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0              0  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            3896209        1786321  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            3903963        1792329  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

Supported Features:

Content Class/Metrics:

Queries*/Functions** Periods Filters*** Intervals
absolute-bytes-difference*
absolute_bytes_difference**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
edited-pages*
edited_pages**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot,
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all, 
activity-level-1-to-4-edits, 
activity-level-5-to-24-edits, 
activity-level-25-to-99-edits, 
activity-level-100-or-more-edits,   
activity-level-all
daily,
monthly
net-bytes-difference*
net_bytes_difference**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
pages-to-date*
pages_to_date**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
total-media-requests*
total_media_requests**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all,
agent-type-user,
agent-type-spider,
agent-type-all
daily,
monthly
top-media-requests*
top_media_requests**
last-month no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all
daily,
monthly

 * CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Contributing Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
editors* ** all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all,
activity-level-1-to-4-edits, 
activity-level-5-to-24-edits, 
activity-level-25-to-99-edits, 
activity-level-100-or-more-edits,  
activity-level-all
daily,
monthly
active-editors*
active_editors**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all
daily,
monthly
edits* ** all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
user-edits*
user_edits**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all
daily,
monthly
new-pages*
new_pages**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
new-registered-users*
new_registered_users**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter daily,
monthly
top-editors*
top_editors**
last-month no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
top-edited-pages*
top_edited_pages**
last-month no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
active-editors-by-country*
active_editors_by_country**
last-month activity-level-5-to-99-edits,
activity-level-100-or-more-edits
daily,
monthly

 * CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Reading Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
total-page-views*
total_page_views**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all,
agent-type-user,
agent-type-spider,
agent-type-automated,
agent-type-all
daily,
monthly
legacy-page-views*
legacy_page_views**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
page-views-by-country*
page_views_by_country**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly
unique-devices*
unique_devices**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
top-viewed-articles*
top_viewed_articles**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly

* CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Extra Features:

List All Wikipedia Languages with their Codes:

* As CLI:

To return the full list of all Wikipedia's supported languages with their codes, try one of these commands:

$ wikistats2csv -lw
# OR
$ wikistats2csv --list-wikis

* As Python Package:

from wikistats2csv import Helper
Helper.get_Wikis_Codes()

About

Wikistats-to-CSV (wikistats2csv) downloads Wikipedia Statistics for a given Wikipedia in a format of CSV.

Topics

Resources

License

Stars

Watchers

Forks

Languages