Skip to content

Commit

Permalink
Merge pull request #42 from nhejazi/fix_nfolds
Browse files Browse the repository at this point in the history
consistent use of `nfolds` for CV
  • Loading branch information
nhejazi authored Sep 22, 2024
2 parents 7a553e4 + 13ef481 commit 58822ff
Show file tree
Hide file tree
Showing 21 changed files with 72 additions and 62 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/draft-pdf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,15 @@ jobs:
name: Paper Draft
steps:
- name: Checkout
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Build draft PDF
uses: openjournals/openjournals-draft-action@master
with:
journal: joss
# This should be the path to the paper within your repo.
paper-path: paper/paper.md
- name: Upload
uses: actions/upload-artifact@v1
uses: actions/upload-artifact@v4
with:
name: paper
# This is the output path where Pandoc will write the compiled
Expand Down
30 changes: 15 additions & 15 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: haldensify
Title: Highly Adaptive Lasso Conditional Density Estimation
Version: 0.2.6
Version: 0.2.7
Authors@R: c(
person("Nima", "Hejazi", email = "nh@nimahejazi.org",
role = c("aut", "cre", "cph"),
Expand All @@ -19,19 +19,19 @@ Maintainer: Nima Hejazi <nh@nimahejazi.org>
Description: An algorithm for flexible conditional density estimation based on
application of pooled hazard regression to an artificial repeated measures
dataset constructed by discretizing the support of the outcome variable. To
facilitate non/semi-parametric estimation of the conditional density, the
highly adaptive lasso, a nonparametric regression function shown to
reliably estimate a large class of functions at a fast convergence rate, is
utilized. The pooled hazards formulation implemented was first described
by Díaz and van der Laan (2011) <doi:10.2202/1557-4679.1356>. To complement
the conditional density estimation utilities, nonparametric inverse
probability weighted (IPW) estimators of the causal effects of additive
modified treatment policies are implemented, using the conditional density
estimation procedure to estimate the generalized propensity score. Per
Hejazi, Benkeser, Díaz, and van der Laan <>10.48550/arXiv.2205.05777>,
these nonparametric IPW estimators can be coupled with sieve estimation
(undersmoothing) of the generalized propensity score estimators to attain
the non/semi-parametric efficiency bound.
facilitate flexible estimation of the conditional density, the highly
adaptive lasso, a non-parametric regression function shown to estimate
cadlag (RCLL) functions at a suitably fast convergence rate, is used. The
use of pooled hazards regression for conditional density estimation as
implemented here was first described for by Díaz and van der Laan (2011)
<doi:10.2202/1557-4679.1356>. Building on the conditional density estimation
utilities, non-parametric inverse probability weighted (IPW) estimators of
the causal effects of additive modified treatment policies are implemented,
using conditional density estimation to estimate the generalized propensity
score. Non-parametric IPW estimators based on this can be coupled with sieve
estimation (undersmoothing) of the generalized propensity score to attain
the semi-parametric efficiency bound (per Hejazi, Benkeser, Díaz, and van
der Laan <doi:10.48550/arXiv.2205.05777>).
Depends: R (>= 3.2.0)
Imports:
stats,
Expand Down Expand Up @@ -60,5 +60,5 @@ URL: https://github.com/nhejazi/haldensify
BugReports: https://github.com/nhejazi/haldensify/issues
Encoding: UTF-8
VignetteBuilder: knitr
RoxygenNote: 7.2.3
RoxygenNote: 7.3.2
RdMacros: Rdpack
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
# haldensify 0.2.7

As of September 2024
* Continue fixing issues with incorrectly passing `n_folds` to `glmnet`, which
has argument `nfolds` (https://github.com/nhejazi/haldensify/issues/41)

# haldensify 0.2.6

As of February 2024:
* Updated versions of `hal9001` and `origami` in `DESCRIPTION` to match the
latest CRAN releases, resolving bugs related to `Matrix` v1.6-2 as reported
at <https://github.com/tlverse/hal9001/issues/109>.
* Catch and fix incorrect internal references (as `n_folds`) to `glmnet` formal
argument `nfolds`, previously dropped by `hal9001`.

# haldensify 0.2.5

Expand Down
5 changes: 2 additions & 3 deletions R/confint.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,12 @@
#' # fit the IPW estimator
#' est_ipw <- ipw_shift(
#' W = cbind(W1, W2, W3), A = A, Y = Y,
#' delta = 0.5, cv_folds = 2L,
#' delta = 0.5, cv_folds = 3L,
#' n_bins = 5L, bin_type = "equal_range",
#' lambda_seq = exp(seq(-1, -10, length = 100L)),
#' # arguments passed to hal9001::fit_hal()
#' max_degree = 3,
#' max_degree = 2,
#' smoothness_orders = 0,
#' num_knots = NULL,
#' reduce_basis = 1 / sqrt(n_obs)
#' )
#' confint(est_ipw)
Expand Down
13 changes: 7 additions & 6 deletions R/haldensify.R
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ cv_haldensify <- function(fold,
#'
#' @examples
#' # simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
#' set.seed(429153)
#' set.seed(11249)
#' n_train <- 50
#' w <- runif(n_train, -4, 4)
#' a <- rnorm(n_train, w, 0.5)
Expand Down Expand Up @@ -248,7 +248,7 @@ haldensify <- function(A, W,

# extract n_bins/grid_type index that is empirical loss minimizer
emp_risk_per_lambda <- lapply(select_out, `[[`, "emp_risks")
min_loss_idx <- lapply(emp_risk_per_lambda, which.min)
#min_loss_idx <- lapply(emp_risk_per_lambda, which.min)
min_risk <- lapply(emp_risk_per_lambda, min)
cv_selected_params <- tune_grid[which.min(min_risk), , drop = FALSE]
cv_selected_fits <- select_out[[which.min(min_risk)]]
Expand All @@ -270,12 +270,12 @@ haldensify <- function(A, W,
# data (bootstrap); disadvantage: non-sample-split nuisance estimates
if (!any(grepl("fit_control", names(fit_hal_args)))) {
fit_hal_args$fit_control <- list(
cv_select = FALSE, weights = as.numeric(long_data$wts), n_folds = 1
cv_select = FALSE, weights = as.numeric(long_data$wts), nfolds = 1L
)
} else {
fit_hal_args$fit_control$cv_select <- FALSE
fit_hal_args$fit_control$weights <- as.numeric(long_data$wts)
fit_hal_args$fit_control$n_folds <- 1L
fit_hal_args$fit_control$nfolds <- 1L
}
fit_hal_args$X <- as.matrix(long_data[, -c("obs_id", "in_bin", "wts")])
fit_hal_args$Y <- as.numeric(long_data$in_bin)
Expand All @@ -301,7 +301,7 @@ haldensify <- function(A, W,

###############################################################################

#' Fit Conditional Density Estimation for a Sequence of HAL Models
#' Fit Conditional Density Estimation over a Sequence of HAL Models
#'
#' @details Estimation of the conditional density of A|W via a cross-validated
#' highly adaptive lasso, used to estimate the conditional hazard of failure
Expand Down Expand Up @@ -350,6 +350,7 @@ haldensify <- function(A, W,
#'
#' @examples
#' # simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
#' set.seed(11249)
#' n_train <- 50
#' w <- runif(n_train, -4, 4)
#' a <- rnorm(n_train, w, 0.5)
Expand All @@ -368,7 +369,7 @@ fit_haldensify <- function(A, W,
smoothness_orders = 0L,
...) {
# capture dot arguments for reference
dot_args <- list(...)
#dot_args <- list(...)

# re-format input data into long hazards structure
reformatted_output <- format_long_hazards(
Expand Down
14 changes: 7 additions & 7 deletions R/ipw_shift.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ utils::globalVariables(c("lambda_idx", "se_est", "l1_norm", "type"))
#'
#' @examples
#' # simulate data
#' set.seed(11249)
#' n_obs <- 50
#' W1 <- rbinom(n_obs, 1, 0.6)
#' W2 <- rbinom(n_obs, 1, 0.2)
Expand All @@ -57,13 +58,12 @@ utils::globalVariables(c("lambda_idx", "se_est", "l1_norm", "type"))
#' # fit the IPW estimator
#' est_ipw <- ipw_shift(
#' W = cbind(W1, W2, W3), A = A, Y = Y,
#' delta = 0.5, cv_folds = 2L,
#' n_bins = 5L, bin_type = "equal_range",
#' delta = 0.5, cv_folds = 3L,
#' n_bins = 4L, bin_type = "equal_range",
#' lambda_seq = exp(seq(-1, -10, length = 100L)),
#' # arguments passed to hal9001::fit_hal()
#' max_degree = 3,
#' max_degree = 1L,
#' smoothness_orders = 0,
#' num_knots = NULL,
#' reduce_basis = 1 / sqrt(n_obs)
#' )
ipw_shift <- function(W, A, Y,
Expand Down Expand Up @@ -124,13 +124,13 @@ ipw_shift <- function(W, A, Y,
# fit outcome mechanism Qn via CV-HAL
Qn_fit <- hal9001::fit_hal(
X = cbind(A, W), Y = Y,
max_degree = 5L,
smoothness_orders = 1L,
max_degree = 3L,
smoothness_orders = 0L,
reduce_basis = 1 / sqrt(n_obs),
family = outcome_family,
fit_control = list(
cv_select = TRUE,
n_folds = cv_folds
nfolds = cv_folds
),
yolo = FALSE
)
Expand Down
2 changes: 1 addition & 1 deletion R/plots.R
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ utils::globalVariables(c("lambda", "risk"))
#' A = a, W = w, n_bins = 3,
#' lambda_seq = exp(seq(-1, -10, length = 50)),
#' # the following arguments are passed to hal9001::fit_hal()
#' max_degree = 3, reduce_basis = 0.1
#' max_degree = 2L, smoothness_orders = 0L, reduce_basis = 0.1
#' )
#' plot(haldensify_fit)
plot.haldensify <- function(x, ..., type = c("risk", "density")) {
Expand Down
2 changes: 1 addition & 1 deletion R/predict.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ utils::globalVariables(c("wts"))
#' haldensify_fit <- haldensify(
#' A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)),
#' # the following arguments are passed to hal9001::fit_hal()
#' max_degree = 3, reduce_basis = 1 / sqrt(length(a))
#' max_degree = 2, smoothness_orders = 0L, reduce_basis = 1 / sqrt(length(a))
#' )
#' # predictions to recover conditional density of A|W
#' new_a <- seq(-4, 4, by = 0.1)
Expand Down
7 changes: 4 additions & 3 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,7 @@ make_bins <- function(grid_var,
#'
#' @examples
#' # simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5)
#' set.seed(429153)
#' set.seed(11249)
#' n_train <- 50
#' w <- runif(n_train, -4, 4)
#' a <- rnorm(n_train, w, 0.5)
Expand All @@ -279,7 +279,7 @@ make_bins <- function(grid_var,
#' haldensify_fit <- haldensify(
#' A = a, W = w, n_bins = c(3, 5),
#' lambda_seq = exp(seq(-1, -15, length = 50L)),
#' max_degree = 3, reduce_basis = 0.1
#' max_degree = 2, smoothness_orders = 0, reduce_basis = 0.1
#' )
#' print(haldensify_fit)
print.haldensify <- function(x, ...) {
Expand Down Expand Up @@ -326,6 +326,7 @@ print.haldensify <- function(x, ...) {
#'
#' @examples
#' # simulate data
#' set.seed(11249)
#' n_obs <- 50
#' W1 <- rbinom(n_obs, 1, 0.6)
#' W2 <- rbinom(n_obs, 1, 0.2)
Expand All @@ -335,7 +336,7 @@ print.haldensify <- function(x, ...) {
#' # fit the IPW estimator
#' est_ipw_shift <- ipw_shift(
#' W = cbind(W1, W2), A = A, Y = Y,
#' delta = 0.5, n_bins = 3L, cv_folds = 2L,
#' delta = 0.5, n_bins = 3L, cv_folds = 3L,
#' lambda_seq = exp(seq(-1, -10, length = 100L)),
#' # arguments passed to hal9001::fit_hal()
#' max_degree = 1,
Expand Down
5 changes: 2 additions & 3 deletions man/confint.ipw_haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions man/fit_haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/ipw_shift.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/plot.haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/predict.haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions man/print.haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/print.ipw_haldensify.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions tests/testthat/test-density_standard.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,8 @@ haldensify_fit_cntrl <- haldensify(
A = a, W = w,
n_bins = c(3, 5),
lambda_seq = exp(seq(-1, -13, length = n_lambda)),
max_degree = 2,
fit_control = list(cv_select = TRUE, n_folds = 3L, use_min = TRUE)
max_degree = 1L,
fit_control = list(cv_select = TRUE, nfolds = 3L, use_min = TRUE)
)
cv_lambda_idx <- haldensify_fit_cntrl$cv_tuning_results$lambda_loss_min_idx

Expand Down
Loading

0 comments on commit 58822ff

Please sign in to comment.