This package takes calculates and plots GP contact rate for cardiovascular complaints before and during the COVID-19 pandemic. It was developed specifically for the ZonMw ‘Zorgmijding’ project and works ONLY with data exported form GP registration databases managed by five University Medical Centres in the Netherlands (EMC Rotterdam, UMC Utrecht, UMC Amsterdam, UMC Maastricht and UMC Groningen).
There are four different functions in the package, covering the entire pipeline from cleaning the raw data to plotting the final results. Those function are:
clean_data
takes raw data in csv format and cleans it for further analysisdenominators
calculates the population size (ie. the number of patients in the database)n_visits
calculates the consultation rates in the pre-pandemic period and in 2020plot_visits
plots the consultation rate
The functions have to be run in this specific order, as the output in
one is used as an input to the subsequent function. However, it is
entirely possible (and recommended) to re-use the output produced by the
functions and all functions (except plot_visit
) store their output on
your computer. For example, you can run the n_visit
using the output
form the clean
and denominators
that is stored on your computer.
Install the package you first need to make sure the ‘devtools’ package is installed. Then you can download and install the zorgmijding package.
if (!require(devtools)) install.packages("devtools")
devtools::install_github("frenkxs/zorgmijding")
If - for any reasons - this fails, install the package using a local file. In this case, you have to save the ‘tar.gz’ file in your computer and then specify a path to the folder in which you saved it:
devtools::install_local("/path/to/folder/zorgmijding_0.0.0.9000.tar.gz", dependencies = TRUE)
After installing, you load it into R:
library(zorgmijding)
The clean_data
function takes raw data and format and cleans it for
further analysis This is the fist function in this pipeline, it should
be run first. It makes sure the variables are consistently named, are in
the same order and have the right format. It also check there are no
missing or nonsensical data.
After running the function, you will be asked to provide two raw data files in the csv format: the one with GP contacts and one with all patients. The cleaned data are automatically saved in the ‘results’ folder created as a sub-folder in the folder in which the raw data are located.
The function has two arguments:
- umc: the UMC that provided the data; it can take the following values: “utrecht”, “maastricht”, “amsterdam”, “groningen”, “rotterdam”.
- clean_types: whether the list of eligible contact types should be used when cleaning the data. It only makes sense to use for data that does not come from Rotterdam, so when it’s set to TRUE and the umc is set to ‘rotterdam’, the it changes automatically to FALSE with a warning message printed out. The default value is FALSE.
clean_data(umc = "rotterdam")
The denominators
function takes the cleaned patient data - saved by
the clean_data
function - to calculate the population size for each
period - month, week and day - from 2016 to 2020. It does it for all
patients, but also by age groups, by sex and by age and sex. The results
are automatically saved in the results’ folder created as a subfolder in
the folder in which the raw data are located.
After runinng the function, you will be asked to provide the cleaned patients data (in .RData format).
The function has no arguments.
NOTE: Running the function may take a while (up to several hours), depending on the size of your database.
denominators()
The n_visits
function takes the cleaned visit data (output of the
clean_data
function) and the denominators data (output of the
denominators
function) and counts the number of GP contacts per period
(month, week, day) and per 100,000 patients. The output is stored as
separate data frames for daily, weekly and monthly contact rates,
stratified by sex, age, sex and age and as a total number. You can also
specify whether you want to see all data, or only data for patients 40
and older. In total 18 different data frames are saved.
The function has two arguments:
- averages: an indicator whether a pre-pandemic averages should be
computed. If set to TRUE (the default), then the output is a dataset
in which there are two numbers for each period (month, week, day) in
a year: one is the pre-pandemic average (2017-2019) weighted by the
population size, and the other is the observed values in 2020. If
set to FALSE, then data for the entire period (2016-2020) is saved.
If set to FALSE, the function
plot_visits
will not work. - remove_counts: if set to TRUE (the default), the resulting dataframes will not contain the absolute counts; only the rates will be returned.
- filename: a string with the name of the file in which the results will be stored and saved (e.g “contact_rates”). The default value is “results”.
n_visits(averages = TRUE, remove_counts = TRUE, filename = "contact_rates")
n_visits(averages = FALSE, remove_counts = TRUE, filename = "contact_rates_long")
The function returns the path to the RData file with the resulting data frames. You can therefore use it as an input to the plotting function (see below). (If the path is not defined, the function will ask the user to select the relevant file.)
path_to_res <- n_visits(averages = TRUE)
plot_visits("Weekly rate of GP contacts for cardiovascular complaints by sex",
stratum = "sex", periodicity = "w", segment = "full",
show_40plus = TRUE,
path = path_to_res)
The function plot_visits
plots the visit counts data, the output of
the n_visit
function. It only works when the the pre-pandemic averages
are computed, ie. when the argument “averages” in the n_visits
function was set to “TRUE”
There are several arguments that need to be specified:
- title: title of the plot
- stratum: The stratification variable, possible values are ‘sex’, ‘age’, ‘sex_age’, ‘total’ (the default). Plot by sex will be shown on one panel, plots by age will be faceted across multiple panels
- periodicity: The period of counts, can either be daily, weekly or monthly. The values are ‘d’: day, ‘w’: week or ‘m’: month (the default). If plotting daily rate, it is highly recommended to restrict the time period to be plotted (see segment)
- segment: the ime period to be plotted. By default, the data are plotted for the entire year, with the pre-pandemic averages plotted against the 2020 data. If plotting daily data, it is recommended to use shorter period. The period is specified in a vector with two dates: the start and end date of the period to be plotted. (e.g. segment = c(“2020-02-15”, “2020-04-31”))
- show_40plus: whether you want to plot data for the entire population or only for patients of 40 years and older
- path: path to the data to be plotted. Optional, if not provided you will be prompted to select the data via GUI after running the function.
Here’s a series of plots to make to visually inspect the data. The plots will be automatically saved to the ‘results’ folder (the same folder to which the data are saved)
# Replace with your region (e.g. "Maastricht", "Utrecht", ...)
region <- "Amsterdam"
plus40 <- TRUE
path_data <- n_visits(averages = TRUE)
# save the image to the same folder which stores the data
folder <- paste0(substr(path_data, 1, nchar(path_data) - 13))
# Daily by sex
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "sex", periodicity = "d", segment = c("2020-02-20", "2020-04-30"),
show_40plus = plus40,
path = path_data) +
labs(subtitle = region)
ggsave(paste0(folder, "daily_sex.png"), width = 2000, height = 1500, units = "px")
# Daily by sex and age
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "sex_age", periodicity = "d", segment = c("2020-02-20", "2020-04-30"),
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "daily_sex_age.png"), width = 2000, height = 1500, units = "px")
# Daily total
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "total", periodicity = "d", segment = c("2020-02-20", "2020-04-30"),
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "daily_total.png"), width = 2000, height = 1500, units = "px")
# -----------------------------------
# Weekly by sex
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "sex", periodicity = "w",
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "weekly_sex.png"), width = 2000, height = 1500, units = "px")
# Weekly by sex and age
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "sex_age", periodicity = "w",
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "weekly_age_sex.png"), width = 2000, height = 1500, units = "px")
# Weekly total
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "total", periodicity = "w",
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "weekly_total.png"), width = 2000, height = 1500, units = "px")
# -----------------------------------
# Monthly by sex
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "sex", periodicity = "m",
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "monthly_sex.png"), width = 2000, height = 1500, units = "px")
# # Monthly by sex and age
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "sex_age", periodicity = "m",
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "monthly_age_sex.png"), width = 2000, height = 1500, units = "px")
# # Monthly total
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
stratum = "total", periodicity = "m",
show_40plus = plus40,
path = path) +
labs(subtitle = region)
ggsave(paste0(folder, "monthly_total.png"), width = 2000, height = 1500, units = "px")