The zorgmijding package

This package takes calculates and plots GP contact rate for cardiovascular complaints before and during the COVID-19 pandemic. It was developed specifically for the ZonMw ‘Zorgmijding’ project and works ONLY with data exported form GP registration databases managed by five University Medical Centres in the Netherlands (EMC Rotterdam, UMC Utrecht, UMC Amsterdam, UMC Maastricht and UMC Groningen).

Using the package

There are four different functions in the package, covering the entire pipeline from cleaning the raw data to plotting the final results. Those function are:

clean_data takes raw data in csv format and cleans it for further analysis
denominators calculates the population size (ie. the number of patients in the database)
n_visits calculates the consultation rates in the pre-pandemic period and in 2020
plot_visits plots the consultation rate

The functions have to be run in this specific order, as the output in one is used as an input to the subsequent function. However, it is entirely possible (and recommended) to re-use the output produced by the functions and all functions (except plot_visit) store their output on your computer. For example, you can run the n_visit using the output form the clean and denominators that is stored on your computer.

Installation

Install the package you first need to make sure the ‘devtools’ package is installed. Then you can download and install the zorgmijding package.

if (!require(devtools)) install.packages("devtools")
devtools::install_github("frenkxs/zorgmijding")

If - for any reasons - this fails, install the package using a local file. In this case, you have to save the ‘tar.gz’ file in your computer and then specify a path to the folder in which you saved it:

devtools::install_local("/path/to/folder/zorgmijding_0.0.0.9000.tar.gz", dependencies = TRUE)

After installing, you load it into R:

library(zorgmijding)

Cleaning

The clean_data function takes raw data and format and cleans it for further analysis This is the fist function in this pipeline, it should be run first. It makes sure the variables are consistently named, are in the same order and have the right format. It also check there are no missing or nonsensical data.

After running the function, you will be asked to provide two raw data files in the csv format: the one with GP contacts and one with all patients. The cleaned data are automatically saved in the ‘results’ folder created as a sub-folder in the folder in which the raw data are located.

The function has two arguments:

umc: the UMC that provided the data; it can take the following values: “utrecht”, “maastricht”, “amsterdam”, “groningen”, “rotterdam”.
clean_types: whether the list of eligible contact types should be used when cleaning the data. It only makes sense to use for data that does not come from Rotterdam, so when it’s set to TRUE and the umc is set to ‘rotterdam’, the it changes automatically to FALSE with a warning message printed out. The default value is FALSE.

clean_data(umc = "rotterdam")

Calculating sample size

The denominators function takes the cleaned patient data - saved by the clean_data function - to calculate the population size for each period - month, week and day - from 2016 to 2020. It does it for all patients, but also by age groups, by sex and by age and sex. The results are automatically saved in the results’ folder created as a subfolder in the folder in which the raw data are located.

After runinng the function, you will be asked to provide the cleaned patients data (in .RData format).

The function has no arguments.

NOTE: Running the function may take a while (up to several hours), depending on the size of your database.

denominators()

Counting GP contacts

The n_visits function takes the cleaned visit data (output of the clean_data function) and the denominators data (output of the denominators function) and counts the number of GP contacts per period (month, week, day) and per 100,000 patients. The output is stored as separate data frames for daily, weekly and monthly contact rates, stratified by sex, age, sex and age and as a total number. You can also specify whether you want to see all data, or only data for patients 40 and older. In total 18 different data frames are saved.

The function has two arguments:

averages: an indicator whether a pre-pandemic averages should be computed. If set to TRUE (the default), then the output is a dataset in which there are two numbers for each period (month, week, day) in a year: one is the pre-pandemic average (2017-2019) weighted by the population size, and the other is the observed values in 2020. If set to FALSE, then data for the entire period (2016-2020) is saved. If set to FALSE, the function plot_visits will not work.
remove_counts: if set to TRUE (the default), the resulting dataframes will not contain the absolute counts; only the rates will be returned.
filename: a string with the name of the file in which the results will be stored and saved (e.g “contact_rates”). The default value is “results”.

n_visits(averages = TRUE, remove_counts = TRUE, filename = "contact_rates")
n_visits(averages = FALSE, remove_counts = TRUE, filename = "contact_rates_long")

The function returns the path to the RData file with the resulting data frames. You can therefore use it as an input to the plotting function (see below). (If the path is not defined, the function will ask the user to select the relevant file.)

path_to_res <- n_visits(averages = TRUE)

plot_visits("Weekly rate of GP contacts for cardiovascular complaints by sex",
  stratum = "sex", periodicity = "w", segment = "full",
  show_40plus = TRUE, 
  path = path_to_res)

Plotting

The function plot_visits plots the visit counts data, the output of the n_visit function. It only works when the the pre-pandemic averages are computed, ie. when the argument “averages” in the n_visits function was set to “TRUE”

There are several arguments that need to be specified:

title: title of the plot
stratum: The stratification variable, possible values are ‘sex’, ‘age’, ‘sex_age’, ‘total’ (the default). Plot by sex will be shown on one panel, plots by age will be faceted across multiple panels
periodicity: The period of counts, can either be daily, weekly or monthly. The values are ‘d’: day, ‘w’: week or ‘m’: month (the default). If plotting daily rate, it is highly recommended to restrict the time period to be plotted (see segment)
segment: the ime period to be plotted. By default, the data are plotted for the entire year, with the pre-pandemic averages plotted against the 2020 data. If plotting daily data, it is recommended to use shorter period. The period is specified in a vector with two dates: the start and end date of the period to be plotted. (e.g. segment = c(“2020-02-15”, “2020-04-31”))
show_40plus: whether you want to plot data for the entire population or only for patients of 40 years and older
path: path to the data to be plotted. Optional, if not provided you will be prompted to select the data via GUI after running the function.

Here’s a series of plots to make to visually inspect the data. The plots will be automatically saved to the ‘results’ folder (the same folder to which the data are saved)

# Replace with your region (e.g. "Maastricht", "Utrecht", ...)
region <- "Amsterdam"
plus40 <- TRUE

path_data <- n_visits(averages = TRUE)

# save the image to the same folder which stores the data
folder <- paste0(substr(path_data, 1, nchar(path_data) - 13))


# Daily by sex
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "sex", periodicity = "d", segment = c("2020-02-20", "2020-04-30"),
            show_40plus = plus40,
            path = path_data) +
  labs(subtitle = region)

ggsave(paste0(folder, "daily_sex.png"), width = 2000, height = 1500, units = "px")

# Daily by sex and age
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "sex_age", periodicity = "d", segment = c("2020-02-20", "2020-04-30"),
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "daily_sex_age.png"), width = 2000, height = 1500, units = "px")

# Daily total
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "total", periodicity = "d", segment = c("2020-02-20", "2020-04-30"),
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "daily_total.png"), width = 2000, height = 1500, units = "px")

# -----------------------------------

# Weekly by sex
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "sex", periodicity = "w",
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "weekly_sex.png"), width = 2000, height = 1500, units = "px")

# Weekly by sex and age
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "sex_age", periodicity = "w",
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "weekly_age_sex.png"), width = 2000, height = 1500, units = "px")

# Weekly total
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "total", periodicity = "w",
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "weekly_total.png"), width = 2000, height = 1500, units = "px")

# -----------------------------------

# Monthly by sex
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "sex", periodicity = "m",
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "monthly_sex.png"), width = 2000, height = 1500, units = "px")


# # Monthly by sex and age
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "sex_age", periodicity = "m",
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "monthly_age_sex.png"), width = 2000, height = 1500, units = "px")


# # Monthly total
plot_visits("Daily rate of GP contacts for cardiovascular complaints",
            stratum = "total", periodicity = "m",
            show_40plus = plus40,
            path = path) +
  labs(subtitle = region)

ggsave(paste0(folder, "monthly_total.png"), width = 2000, height = 1500, units = "px")

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
R		R
data-raw		data-raw
data		data
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
zorgmijding.Rproj		zorgmijding.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

The zorgmijding package

Using the package

Installation

Cleaning

Calculating sample size

Counting GP contacts

Plotting

About

Licenses found

Releases

Packages

Languages

License

Licenses found

frenkxs/zorgmijding

Folders and files

Latest commit

History

Repository files navigation

The zorgmijding package

Using the package

Installation

Cleaning

Calculating sample size

Counting GP contacts

Plotting

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages