To update the data, make sure you follow the steps below.
Make sure you have a working environment with R and python 3 installed. We recommend R >= 4.0.2 and Python >= 3.7.
You can check:
$ python --version
and
$ R --version
In your environment (shell), run:
$ pip install -r requirements.txt
In your R console, run:
install.packages(c("data.table", "googledrive", "googlesheets4", "httr", "imputeTS", "lubridate", "pdftools", "retry",
"rjson", "rvest", "stringr", "tidyr", "rio", "plyr", "bit64"))
Note: pdftools
requires poppler
. In MacOS, run brew install poppler
.
Create a file testing_dataset_config.json
with all required parameters:
{
"google_credentials_email": "[OWID_MAIL]",
"covid_time_series_gsheet": "[COVID_TS_GSHEET]",
"attempted_countries_ghseet": "[COUNTRIES_GSHEET]",
"audit_gsheet": "[AUDIT_GSHEET]",
"owid_cloud_table_post": "[OWID_TABLE_POST]"
}
$ git pull
$ python3 run_python_scripts.py [option]
$ Rscript run_r_scripts.R [option]
Note: Accepted values for option
are: "quick" and "update". The "quick" option is automatically runs twice a day by and pushed to the repo by @edomt. Manual
execution with mode "update" is required ~twice a week (e.g. Tuesday and Friday).
Run generate_dataset.R
. Usage of RStudio is recommended for easier debugging.
Create your own version of test_update.sh.template
, adapted to your local paths, and run it to update the COVID megafile, and push the testing update to the repo.