Giter Site home page Giter Site logo

doubleml / doubleml-for-r Goto Github PK

View Code? Open in Web Editor NEW
115.0 7.0 24.0 3.05 MB

DoubleML - Double Machine Learning in R

Home Page: https://docs.doubleml.org

License: Other

R 100.00%
machine-learning r mlr3 causal-inference statistics econometrics data-science double-machine-learning

doubleml-for-r's Introduction

DoubleML - Double Machine Learning in R

build codecov CodeFactor CRAN Version

The R package DoubleML provides an implementation of the double / debiased machine learning framework of Chernozhukov et al. (2018). It is built on top of mlr3 and the mlr3 ecosystem (Lang et al., 2019).

Note that the R package was developed together with a python twin based on scikit-learn. The python package is also available on GitHub and PyPI version.

Documentation and maintenance

Documentation of functions in R: https://docs.doubleml.org/r/stable/reference/index.html

User guide: https://docs.doubleml.org

DoubleML is currently maintained by @PhilippBach and @SvenKlaassen.

Main Features

Double / debiased machine learning framework of Chernozhukov et al. (2018) for

  • Partially linear regression models (PLR)
  • Partially linear IV regression models (PLIV)
  • Interactive regression models (IRM)
  • Interactive IV regression models (IIVM)

The object-oriented implementation of DoubleML that is based on the R6 package for R is very flexible. The model classes DoubleMLPLR, DoubleMLPLIV, DoubleMLIRM and DoubleIIVM implement the estimation of the nuisance functions via machine learning methods and the computation of the Neyman orthogonal score function. All other functionalities are implemented in the abstract base class DoubleML. In particular functionalities to estimate double machine learning models and to perform statistical inference via the methods fit, bootstrap, confint, p_adjust and tune. This object-oriented implementation allows a high flexibility for the model specification in terms of …

  • … the machine learning methods for estimation of the nuisance functions,
  • … the resampling schemes,
  • … the double machine learning algorithm,
  • … the Neyman orthogonal score functions,

It further can be readily extended with regards to

  • … new model classes that come with Neyman orthogonal score functions being linear in the target parameter,
  • … alternative score functions via callables,
  • … alternative resampling schemes,

OOP structure of the DoubleML package

OOP structure of the DoubleML package

Installation

Install the latest release from CRAN:

remotes::packages("DoubleML")

Install the development version from GitHub:

remotes::install_github("DoubleML/doubleml-for-r")

DoubleML requires

  • R (>= 3.5.0)
  • R6 (>= 2.4.1)
  • data.table (>= 1.12.8)
  • stats
  • checkmate
  • mlr3 (>= 0.5.0)
  • mlr3tuning (>= 0.3.0)
  • mlr3learners (>= 0.3.0)
  • mvtnorm
  • utils
  • clusterGeneration
  • readstata13

Contributing

DoubleML is a community effort. Everyone is welcome to contribute. To get started for your first contribution we recommend reading our contributing guidelines and our code of conduct.

Citation

If you use the DoubleML package a citation is highly appreciated:

Bach, P., Chernozhukov, V., Kurz, M. S., and Spindler, M. (2021), DoubleML - An Object-Oriented Implementation of Double Machine Learning in R, arXiv:2103.09603.

Bibtex-entry:

@misc{DoubleML2020,
      title={{DoubleML} -- {A}n Object-Oriented Implementation of Double Machine Learning in {R}}, 
      author={P. Bach and V. Chernozhukov and M. S. Kurz and M. Spindler and Sven Klaassen},
      year={2024},
      journal={Journal of Statistical Software},
      volume={108},
      number={3},
      pages= {1-56},
      doi={10.18637/jss.v108.i03},
      note={arXiv:\href{https://arxiv.org/abs/2103.09603}{2103.09603} [stat.ML]}
}

Acknowledgements

Funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) is acknowledged – Project Number 431701914.

References

  • Bach, P., Chernozhukov, V., Kurz, M. S., Spindler, M. and Klaassen, S. (2024), DoubleML - An Object-Oriented Implementation of Double Machine Learning in R, Journal of Statistical Software, 108(3): 1-56, doi:10.18637/jss.v108.i03, arXiv:2103.09603.

  • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2018), Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21: C1-C68, https://doi.org/10.1111/ectj.12097.

  • Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L., Bischl, B. (2019), mlr3: A modern object-oriented machine learing framework in R. Journal of Open Source Software, https://doi.org/10.21105/joss.01903.

doubleml-for-r's People

Contributors

maltekurz avatar michaelquinn32 avatar mllg avatar philippbach avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

doubleml-for-r's Issues

Patrick 0.1.0 will have backwards incompatible changes

Hi Double ML team!

Thanks for using patrick for creating parameterized tests.

I am going to start the process of releasing a backwards incompatible change in the package.

  • In the past, the undocumented test_name parameter could be used to in cases data frames and as an argument for naming tests
  • I am moving this to a documented argument in with_parameters_test_that(). The argument is also getting the name .test_name in order to distinguish it from test cases passed by a user

In version 0.1.0, patrick will throw a warning about this change and rename input as appropriate. In the future, this warning will be dropped. Addressing it requires changing your use of test_name to .test_name.

Apologies for any inconvenience that this causes. Please let me know how else I can help.

Best wishes,
Michael

[Bug]: Tuning fails with non-meaningful error message when `tune_settings$measure` real subset of the nuisance parts

Describe the bug

The tune method allows to specify a nuisance specific measure. If the list provided contains a name that is not a nuisance part, a meaningful error message is produced, i.e., tune_settings[['measure']] = list(ml_m = "regr.mae", ml_wrong_name = "regr.rmse") results in something like:

Error in private$assert_tune_settings(tune_settings) : 
  Invalid name of measure ml_m, ml_r 
 measure must be a named list with elements named ml_g, ml_m 

However, if the list of measures is a real subset (e.g. tune_settings[['measure']] = list(ml_m = "regr.mae")) of the nuisance parts it fails with a non-meaningful error message:

Error in default_measures(task_type)[[1L]] : subscript out of bounds 

Minimum reproducible code snippet

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
set.seed(2)
ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2)
ml_m = ml_g$clone()
obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
par_grids = list("ml_g" = paradox::ParamSet$new(list(
    paradox::ParamInt$new("num.trees", lower = 5, upper = 6, default = 5))),
    "ml_m" = paradox::ParamSet$new(list(
    paradox::ParamInt$new("num.trees", lower = 5, upper = 6, default = 5))))
default_tune_settings = list(
    n_folds_tune = 5,
    rsmp_tune = mlr3::rsmp("cv", folds = 5),
    measure = NULL,
    terminator = mlr3tuning::trm("evals", n_evals = 20),
    algorithm = mlr3tuning::tnr("grid_search"),
    resolution = 5)

tune_settings = default_tune_settings
tune_settings[['measure']] = list(ml_m = "regr.mae")

dml_plr_obj$tune(param_set=par_grids, tune_settings = tune_settings)

Expected Result

I would expect that the ML methods are tuned successfully. For all nuisance parts where a measure was actively set, I expect it to be used and for all other nuisance parts I would expect that it falls back to the default measure. I expect this behavior, because otherwise it wouldn't make sense to check for subset here

doubleml-for-r/R/double_ml.R

Lines 1316 to 1323 in acb9d46

if (!test_names(names(tune_settings$measure),
subset.of = valid_learner)) {
stop(paste(
"Invalid name of measure", paste0(names(tune_settings$measure),
collapse = ", "),
"\n measure must be a named list with elements named",
paste0(valid_learner, collapse = ", ")))
}
.

Alternative expected behavior: As an alternative we could enforce that either for every nuisance part there is measure set or for none (resulting in default measures being used for every nuisance part). If we go for this alternative solution, we should check for exactly matching list keys instead of checking for a subset. This would then produce a meaningful error message. However, I prefer the above described solution where we fall back to default measures for every nuisance part where no measure was actively set (the implementation of this selective fallback solution would be easy).

Actual Result

Error in default_measures(task_type)[[1L]] : subscript out of bounds 

Versions

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.10

> packageVersion('DoubleML')
[1] ‘0.4.1’
> packageVersion('mlr3')
[1] ‘0.11.0.9000’

Reduce unit test times

Some of our unit tests just take too long. With ON_CRAN='false' it takes around 30 minutes on github actions. We should find better parametrization while keeping a similar level of coverage.
unit_test_runtime

Make DoubleML available for R (≥ 4.0.2)

Describe the feature you want to propose or implement

I cannot install DoubleML in below configs

SessionInfo (Microsoft R Open 4.0.2)

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.3 (Ootpa)

Matrix products: default
BLAS:   /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RevoUtils_11.0.2     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
[1] compiler_4.0.2 parallel_4.0.2 tools_4.0.2

Commands Ran and outputs

> install.packages("DoubleML")
Installing package into ‘<XXX>/app/R40_Library’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘DoubleML’ is not available (for R version 4.0.2)

Propose a possible solution or implementation

No response

Did you consider alternatives to the proposed solution. If yes, please describe

No response

Comments, context or references

No response

Inconsistent initilization of `task_type` between PLIV and other models (PLR, IRM, IIVM)

PLR, IRM and IIVM

For PLR, IRM and IIVM, we first initialize the private property task_type to NULL for all necessary nuisance parts and during the call to assert_learner it is filled up with content. See

private$task_type = list(
"ml_g" = NULL,
"ml_m" = NULL)
ml_g = private$assert_learner(ml_g, "ml_g", Regr = TRUE, Classif = FALSE)
ml_m = private$assert_learner(ml_m, "ml_m", Regr = TRUE, Classif = TRUE)

private$task_type = list(
"ml_g" = NULL,
"ml_m" = NULL)
ml_g = private$assert_learner(ml_g, "ml_g", Regr = TRUE, Classif = TRUE)
ml_m = private$assert_learner(ml_m, "ml_m", Regr = FALSE, Classif = TRUE)

private$task_type = list(
"ml_g" = NULL,
"ml_m" = NULL,
"ml_r" = NULL)
ml_g = private$assert_learner(ml_g, "ml_g", Regr = TRUE, Classif = TRUE)
ml_m = private$assert_learner(ml_m, "ml_m", Regr = FALSE, Classif = TRUE)
ml_r = private$assert_learner(ml_r, "ml_r", Regr = FALSE, Classif = TRUE)

PLIV

For PLIV this initialization does not happen, but still it seems to work as expected, see

ml_g = private$assert_learner(ml_g, "ml_g",
Regr = TRUE,
Classif = FALSE)
ml_m = private$assert_learner(ml_m, "ml_m",
Regr = TRUE,
Classif = FALSE)
ml_r = private$assert_learner(ml_r, "ml_r",
Regr = TRUE,
Classif = FALSE)

Possible solution

In the base class DoubleML the private property task_type is initialized to an empty list, which in my view suffices.

task_type = list(),

It is then filled up with meaningful content when assert_learner is being called for the learners assigned for the different nuisance parts. Therefore, I guess we could simplify by removing the additional nuisance-part specific initialization to NULL being done for PLR, IRM and IIVM.

Miscellaneous

I would furthermore would suggest to add some sort of assertion in the helper function dml_cv_predict. Basically, I wouldn't accept something else than "regr" or "classif". The code will anyways fail with any other choice, like NULL, because then variable resp_name would never be assigned, see

doubleml-for-r/R/helper.R

Lines 162 to 166 in acb9d46

if (task_type == "regr") {
resp_name = "response"
} else if (task_type == "classif") {
resp_name = "prob.1"
}

Logarithmic spacing in grid search during tuning

When tuning LASSO, I didn't find a way to specify the grid with logarithmic spacing, even though it seems natural to me. The default is equal spacing.

library(DoubleML)
library(mlr3)
library(paradox)
library(mlr3tuning)

# set logger to omit messages during tuning and fitting
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

set.seed(3141)
n_obs = 500
n_vars = 100
theta = rep(3, 3)
# generate matrix-like objects and use the corresponding wrapper
X = matrix(stats::rnorm(n_obs * n_vars), nrow = n_obs, ncol = n_vars)
y = X[, 1:3, drop = FALSE] %*% theta  + stats::rnorm(n_obs)
df = data.frame(y, X)

doubleml_data = double_ml_data_from_data_frame(df,
                                               y_col = "y",
                                               d_cols = c("X1"),
                                               x_cols = c("X2","X3"))

set.seed(1234)
ml_g = lrn("regr.glmnet")
ml_m = lrn("regr.glmnet")
doubleml_plr = DoubleMLPLR$new(doubleml_data, ml_g, ml_m)

par_grids = list(
  "ml_g" = ParamSet$new(list(
    ParamDbl$new("lambda", lower = 0.0001, upper = 10))),  # I WANT LOGARITHMIC SPACING HERE, eg. 1e-5, 1e-4, 1e-3, etc
  "ml_m" =  ParamSet$new(list(
    ParamDbl$new("lambda", lower = 0.05, upper = 0.1))))

tune_settings = list(terminator = trm("evals", n_evals = 100),
                     algorithm = tnr("grid_search", resolution = 11),
                     rsmp_tune = rsmp("cv", folds = 5),
                     measure = list("ml_g" = msr("regr.mse"),
                                    "ml_m" = msr("regr.mse")))

doubleml_plr$tune(param_set = par_grids, tune_settings = tune_settings)

doubleml_plr$tuning_res

# BUT THE SPACING ON THE GRID IS LINEAR
doubleml_plr$tuning_res$X1$ml_g[[1]]$tuning_result[[1]]$tuning_archive %>% arrange(lambda)


Support for Categorical D in PLIV

I'm having some trouble with the PLIV on R. It doesn't appear to support binary treatments as it doesn't let you do a classifier for ml_r. Am I doing something wrong here? Thanks

Missing excpetion handling for infinite / missing predictions

There is no exception handling in-place in case some learner produces infinite or missing predictions. Basically, very silently the estimates are becoming NA's without a warning or exception.

See for example:

library(DoubleML)

g = function(x) {
  res = sin(x)^2
  return(res)
}

m = function(x, nu = 0, gamma = 1) {
  xx = sinh(gamma) / (cosh(gamma) - cos(x - nu))
  res = 0.5 / pi * xx
  return(res)
}

dgp1_irmiv = function(theta, N, k) {
  
  b = 1 / (1:k)
  sigma = clusterGeneration::genPositiveDefMat(k, "unifcorrmat")$Sigma
  
  X = mvtnorm::rmvnorm(N, sigma = sigma)
  G = g(as.vector(X %*% b))
  M = m(as.vector(X %*% b))
  
  pr_z = 1 / (1 + exp(-(1) * X[, 1] * b[5] + X[, 2] * b[2] + rnorm(N)))
  z = rbinom(N, 1, pr_z)
  
  U = rnorm(N)
  pr = 1 / (1 + exp(-(1) * (0.5 * z + X[, 1] * (-0.5) + X[, 2] * 0.25 - 0.5 * U + rnorm(N))))
  d = rbinom(N, 1, pr)
  err = rnorm(N)
  
  y = theta * d + G + 4 * U + err
  
  data = data.frame(y, d, z, X)
  
  return(data)
}

set.seed(1282)
df = dgp1_irmiv(0.5, 1000, 20)
Xnames = names(df)[names(df) %in% c("y", "d", "z") == FALSE]
dml_data = double_ml_data_from_data_frame(df,
                                          y_col = "y",
                                          d_cols = "d", x_cols = Xnames, z_col = "z")

ml_g = mlr3::lrn("regr.rpart", cp = 0.01, minsplit = 20)
ml_m = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)
ml_r = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)

set.seed(3141)
double_mliivm_obj = DoubleMLIIVM$new(
  data = dml_data,
  n_folds = 5,
  ml_g = ml_g,
  ml_m = ml_m,
  ml_r = ml_r,
  dml_procedure = "dml2",
  trimming_threshold = 0,
  score = "LATE")
double_mliivm_obj$fit()
print(double_mliivm_obj$coef)
print(double_mliivm_obj$se)

It is then getting even more confusing if one thereafter calls the method bootstrap(). This results in exception

double_mliivm_obj$bootstrap()
Error in double_mliivm_obj$bootstrap(): Apply fit() before bootstrap().

which is obviously not the root cause and also the remark to apply fit() will obviously not fix the issue.

I propose to implement a check for finite predictions similar to the check in the Python package: https://github.com/DoubleML/doubleml-for-py/blob/b3cbdb572fce435c18ec67ca323645900fc901b5/doubleml/_utils.py#L204-L208

Score function with weighting

The pdf documentation (link) suggests that a user can specify the "score" parameter (at least for PLM estimator) in the call of DoubleMLPLR$new(), "for example, to adjust the DML estimators in terms of a re-weighting".

This is exactly my situation. How can I pass the weights?

The template for the score function in the example doesn't have a weight parameter... I want to do something like this (the only change I made to the code is added Ws):

# Here:
# y: dependent variable
# d: treatment variable
# g_hat: predicted values from regression of Y on X's
# m_hat: predicted values from regression of D on X's
# smpls: sample split under consideration, can be ignored in this example

score_manual = function(y, d, g_hat, m_hat, smpls) {
  resid_y = y - g_hat
  resid_d = d - m_hat
  psi_a = -1 * resid_d * resid_d * W   # HERE
  psi_b = resid_d * resid_y * W      # and HERE
  psis = list(psi_a = psi_a, psi_b = psi_b)
  return(psis)
}

R CMD check Note about doi

Found the following URLs which should use \doi (with the DOI name only):
  File ‘fetch_401k.Rd’:
    https://doi.org/10.1111/ectj.12097
  File ‘fetch_bonus.Rd’:
    https://doi.org/10.1111/ectj.12097
  File ‘make_iivm_data.Rd’:
    http://dx.doi.org/10.2139/ssrn.3619201
  File ‘make_plr_CCDDHNR2018.Rd’:
    https://doi.org/10.1111/ectj.12097

Failing unit test on CRAN solaris

See https://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/DoubleML-00check.html

checking tests ... [139s/138s] ERROR
  Running ‘testthat_regression_tests.R’ [138s/138s]
Running the tests in ‘tests/testthat_regression_tests.R’ failed.
Complete output:
  >
  > library("testthat")
  > library("patrick")
  > library("DoubleML")
  >
  > testthat::test_check("DoubleML")
  ── ERROR (test-double_ml_iivm.R:33:3): Unit tests for IIVM: cv_glmnet_dml2_LATE_
  Error: 'NA' indices are not (yet?) supported for sparse Matrices
  Backtrace:
       █
    1. ├─rlang::eval_tidy(code, args)
    2. └─DoubleML:::dml_irmiv(...) test-double_ml_iivm.R:33:2
    3. └─mlr3::resample(task_p, ml_p, resampling_p, store_models = TRUE) helper-11-dml_irmiv.R:91:2
    4. └─future.apply::future_lapply(...)
    5. └─future.apply:::future_xapply(...)
    6. ├─future::value(fs)
    7. └─future:::value.list(fs)
    8. ├─future::resolve(...)
    9. └─future:::resolve.list(...)
   10. └─future:::signalConditionsASAP(obj, resignal = FALSE, pos = ii)
   11. └─future:::signalConditions(...)
  
  ── ERROR (test-double_ml_irm.R:33:3): Unit tests for IRM: cv_glmnet_dml2_ATE_1_0
  Error: missing value where TRUE/FALSE needed
  Backtrace:
       █
    1. ├─rlang::eval_tidy(code, args)
    2. └─DoubleML:::dml_irm(...) test-double_ml_irm.R:33:2
    3. └─mlr3::resample(task_m, ml_m, resampling_m, store_models = TRUE) helper-10-dml_irm.R:69:2
    4. └─future.apply::future_lapply(...)
    5. └─future.apply:::future_xapply(...)
    6. ├─future::value(fs)
    7. └─future:::value.list(fs)
    8. ├─future::resolve(...)
    9. └─future:::resolve.list(...)
   10. └─future:::signalConditionsASAP(obj, resignal = FALSE, pos = ii)
   11. └─future:::signalConditions(...)

Calculating RMSE using D and Y nuisance model residuals

does the DoublML package have the option to output the residuals of the nuisance models, for example when computing RMSE for predicting D and Y, in order to compare different methods for estimating them. Maybe there is an existing code example somehwere that I couldn't find.

thank you

Support for ensemble multiple learners for ml_g and ml_m

Thanks for developing this great package.

I was wondering if you support estimating E[Y|X] or E[D|X] with super learners, i.e. we can use multiple learners to cross-fit E[Y|X] and E[D|X] as below? The weights of each learner are estimated based on their cross-fitting performance. Or I was wondering how could the double ML framework work together with the SuperLearner package?

learner = lrns(c("regr.glm","regr.gam","regr.bart"), k=2)
ml_g = learner$clone()

Many thanks!!!

[Bug]: Tuning with default `tune_settings` fails

Describe the bug

Tuning with default tune_settings fails. Typo in

terminator = mlr3tunin::trm("evals", n_evals = 20),

Minimum reproducible code snippet

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
set.seed(2)
ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2)
ml_m = ml_g$clone()
obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
par_grids = list("ml_g" = paradox::ParamSet$new(list(
    paradox::ParamInt$new("num.trees", lower = 1, upper = 10, default = 5))),
    "ml_m" = paradox::ParamSet$new(list(
    paradox::ParamInt$new("num.trees", lower = 1, upper = 10, default = 5))))

dml_plr_obj$tune(param_set=par_grids)

Expected Result

No exception

Actual Result

Exception

 Error in loadNamespace(name) : there is no package called ‘mlr3tunin’ 

Versions

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.10

> packageVersion('DoubleML')
[1] ‘0.4.1’
> packageVersion('mlr3')
[1] ‘0.11.0.9000’

Minor inconsistency between user guide notation and the code?

I have a question about a potential inconsistency between the notation provided in the user guide and the code. If not an inconsistency, then it must represent my own misunderstanding of the notation in the user guide (and if so, my apologies in advance).

Looking at the documentation to estimate the variance of the estimator, I would describe the expression as J_{0}^{-2} multiplied by the mean of \psi^2, where this latter term is represented by the double sum over folds and observations within each fold. The N^{-1} here serves to calculate the mean over this double sum.

However, in the code here, the quantity above is premultiplied by an additional N^{-1} term.

I suspect the code is correct, and so that's why this seems more like an issue about the notation in the documentation. I looked at Theorem 3.2 in the published paper but I had trouble identifying where the extra N^{-1} term would come from.

Is this a notation problem or am I missing something?

Thanks,
Brett

different learners for different treatments in Simultaneous Inference

Hi,
I have an idea to develop the package for simultaneous inference.

When the nature of the treatments are different (continuous or binary) it is not possible to run the function DoubleMLPLR, for example. Because there is only one choice for the argument ml_m to estimate related nuisance function. To more elaborate,
consider two treatments d1 and d2 which are continuous and binary, respectively.
To estimate the nuisance function in the case of causal inference for d1 we must apply a machine learning method for family gaussian. While to for causal inference in the case of d2 we must apply a machine learning method for logistic regression. Thus, users must define a continuous version of d2 or convert d1 to a binary treatment to have a same-nature treatments.

However, in some cases, the program automatically detect the nature of the treatments (for example regr.gbm learner from the package gbm).

If the argument ml_m can be of type list as length as d_cols, we can run DoubleMLPLR for different-nature treatment situation.

Thanks for your hot pkg!

Dimensions of properties like the estimated coefficients in docu (R)

This issue might be relevant for R as well:

DoubleML/doubleml-for-py#85

We should ...

  • ... shortly explain the dimensions all_coef (etc.) in the documentation
  • ... add column and row names to the dimensions of the object

In R we should think of the (currently private) get__...-methods in this context, for example:

get__psi_a = function() self$psi_a[, private$i_rep, private$i_treat],

Method `set_ml_nuisance_params` overwrites hyperparameters from initilization

Setting hyperparameters via the method set_ml_nuisance_params results in a call to

lrn$param_set$values = params

which according to mlr3 docu (I also tested it) results in replacing all hyperparameters by defaults except the ones in list params ("Note that this operation replaces all previously set hyperparameter values.").

Is that the intended behavior? I would favor going for, which results in only replacing the explicitly mentioned ones.

lrn$param_set$values = mlr3misc::insert_named(lrn$param_set$values, params)

Bug in the aggregation of standard errors from repeated cross-fitting

I think there is a bug in the aggregation of standard errors from repeated cross-fitting.

Description

Implementation

Unit Tests

  • We don't seem to have unit tests being sensitive for the bug fix in the aggregation formula. In my ongoing major update of the unit test framework I will add this extension.
  • In our R vs. Python package tests we so far didn't had a test case with repeated cross-fitting and therefore the difference between the implementations didn't become visible: I added such a test case in DoubleML/doubleml-py-vs-r#4. As the tests are now sensitive for the aggregation formula, they also fail in the PR which will be resolved when the R package got its bug fix.

Documentation

[Unit Test Extension]: Implement "default setting unit tests"

In the python package DoubleML, we do have unit tests for model defaults, see https://github.com/DoubleML/doubleml-for-py/blob/master/doubleml/tests/test_doubleml_model_defaults.py. The intention behind such "default setting unit tests" is twofold:

  1. It should assert that the defaults are valid / meaningful, i.e., the code runs through successfully with default values for the input parameters.
  2. The unit tests serve as a reminder to update the documentation of defaults in case a default value is being changed.

Such "default setting unit tests" could be done for the initialization of the model classes as well as for the most important methods.

Note: Such "default setting unit tests" would have been sensitive for bugs like #155 & #156

Rename of column "row_id" -> "row_ids"

The next mlr3 version will include a refactoring which is breaking your package.
The column "row_id" of as.data.table.Prediction() will be renamed to "row_ids" (c.f. mlr-org/mlr3#547).
It would be great if you could update your package accordingly and implement a workaround in the fashion of the following lines to ease the transition:

tab = as.data.table(prediction)
data.table::setnames(tab, old = "row_id", new = "row_ids", skip_absent = TRUE) # rename col for mlr3 <= 0.10.0

Thanks and let us know if you are missing some getters or converters.

Messages in DoubleML

Handle messages in DoubleML during instantiation, fitting, tuning etc. of models

Categorical D

Hi,
thanks for developing this!
this might be a silly question, but would it be possible for D to be categorical?
Best,
Hans

mlr3tuning API change

We will upload a new version of mlr3tuning to CRAN.
This line will no longer work.

tuning_archive = tuning_instance$archive$data()

You have to use tuning_instance$archive$data since the data table is now accessible via a public field instead by a function.

Missleading entries in evaluated score functions & predictions in case of estimation without cross-fitting (`apply_cross_fitting = FALSE`)

Description

When a DoubleML model is estimated with apply_cross_fitting = FALSE and n_folds = 2, there are misleading entries in the evaluated score functions as well as the exported predictions. Basically for all indices in the test set the entries are correct and also used for estimating the causal paramter(s), etc. However, for all indices which are not part of the test set, the predictions are filled up with zeros. These zero-predictions are then also later used when evaluating the score functions. These entries in psi, psi_a and psi_b are never used but in my view still misleading. In the case at hand, I would propose to fill the predictions and evaluated score function values with NA instead of zeros and non-meaningful values, respectively.

Example

> ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2)
> ml_m = ml_g$clone()
> obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5)
> dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m,
+                               n_folds=2, apply_cross_fitting = FALSE)
> dml_plr_obj$fit(store_predictions = TRUE)
> dml_plr_obj$predictions$ml_g[1:10,,]
 [1]  0.0000000  0.5718869  0.7672342  0.6698870  0.0000000  1.5471172  1.1006015  0.0000000  0.0000000
[10] -0.2258972
> dml_plr_obj$psi[1:10]
 [1] -0.5875342  0.8229460 -0.3105735  0.6203550  0.2614734  0.7999844  1.1656477  0.3464782 -0.6397427
[10]  0.7832788
> obj_dml_data$data$y[1:10]*obj_dml_data$data$d[1:10] == dml_plr_obj$psi_b[1:10]
 [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE

[Bug]: the result of Lasso learner is different from others

Describe the bug

Hi, the DML package is really useful for me and I am using it to conduct my master thesis. I have tried LightGBM/RF/Xgboost/Lasso for learners. The results of LightGBM/RF/Xgboost are similar but the results of Lasso is rather different. The following is a part of the results. Can you help me with that issue?

Minimum reproducible code snippet

LassoFormula =xnames[1]

for (name in xnames[-1]){
LassoFormula = paste0(LassoFormula,'+',name)
}
LassoFormula = paste0('~(',LassoFormula,")^2")

LassoFormula = formula(LassoFormula)# create the formula
#features_flex = data.frame(model.matrix(LassoFormula, dataS)) #second order term
model_data = data.table("y"= dataS[, ynames],
"d" = dataS[, "IndShareSuccessful"],
features_flex)

################################ Lasso

DMLLasso = function(yname){
set.seed(123)
lasso = lrn("regr.cv_glmnet", nfolds = 5, s = "lambda.min") #set g model
lasso_class = lrn("classif.cv_glmnet", nfolds = 5, s = "lambda.min")# set m model

data_dml_flex = DoubleMLData$new(model_data,
y_col = paste0('y.',yname),
d_cols ='d.IndShareSuccessful')
dml_plr_lasso = DoubleMLPLR$new(data_dml_flex,
ml_g = lasso,
ml_m = lasso_class,
n_folds = 3)
dml_plr_lasso$fit()
dml_plr_lasso$summary()
}

Expected Result

I think the results of different learners should be similar.

Actual Result

indicators Lasso lightGBM Xgboost RF
a -0.105 -13.424*** -13.410*** -13.025***
b 0.001 0.265*** 0.275*** 0.259***
c 0.003 0.186*** 0.187*** 0.185***
d -0.017 20.600*** 21.417*** 20.165***
e 1.701 13.227*** 13.282*** 12.853***
f 16.672 10.549* 15.637** 8.339

Versions

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.5.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8

attached base packages:
[1] grid stats graphics grDevices utils datasets methods base

other attached packages:
[1] xtable_1.8-4 mlr3tuning_0.9.0 paradox_0.7.1
[4] mlr3learners.lightgbm_0.0.10.9001 xgboost_1.5.0.2 WRS2_1.1-3
[7] tmcn_0.2-13 forcats_0.5.1 stringr_1.4.0
[10] purrr_0.3.4 readr_2.0.1 tidyr_1.1.3
[13] tibble_3.1.3 tidyverse_1.3.1 stargazer_5.2.2
[16] sm_2.2-5.7 scales_1.1.1 readxl_1.3.1
[19] randomForest_4.6-14 ranger_0.13.1 qdapRegex_0.7.2
[22] plotrix_3.8-2 plotly_4.10.0 ggplot2_3.3.5
[25] plm_2.4-3 np_0.60-11 nnet_7.3-16
[28] nlme_3.1-153 MultiGHQuad_1.2.0 mvtnorm_1.1-3
[31] mlr3_0.13.1 MatchIt_4.3.2 lubridate_1.7.10
[34] lm.beta_1.5-1 lfe_2.8-7.1 lawstat_3.4
[37] knitr_1.33 kSamples_1.2-9 SuppDists_1.1-9.7
[40] kknn_1.3.1 gridExtra_2.3 grf_2.0.2
[43] gmm_1.6-6 glmnet_4.1-3 ggridges_0.5.3
[46] frequentdirections_0.1.0 fixest_0.10.1 expm_0.999-6
[49] Matrix_1.3-4 DoubleML_0.4.1 dgof_1.2
[52] data.table_1.14.2 contextual_0.9.8.4 coda_0.19-4
[55] BTYDplus_1.2.0 BTYD_2.4.3 dplyr_1.0.7
[58] optimx_2021-6.12 hypergeo_1.2-13 broom_0.7.11
[61] bit64_4.0.5 bit_4.0.4 beepr_1.3
[64] AER_1.2-9 survival_3.2-13 sandwich_3.0-1
[67] lmtest_0.9-38 zoo_1.8-9 car_3.0-12
[70] carData_3.0-5 devtools_2.4.3 usethis_2.1.3

loaded via a namespace (and not attached):
[1] SparseM_1.81 ModelMetrics_1.2.2.2 R.methodsS3_1.8.1 maxLik_1.5-2
[5] clusterGeneration_1.3.7 R.utils_2.11.0 rpart_4.1-15 doParallel_1.0.16
[9] generics_0.1.0 callr_3.7.0 future_1.23.0 tzdb_0.1.2
[13] xml2_1.3.2 assertthat_0.2.1 gower_0.2.2 xfun_0.24
[17] hms_1.1.0 fansi_0.5.0 dbplyr_2.1.1 igraph_1.2.6
[21] DBI_1.1.1 htmlwidgets_1.5.3 reshape_0.8.8 stats4_4.1.2
[25] ellipsis_0.3.2 backports_1.2.1 vctrs_0.3.8 remotes_2.4.1
[29] quantreg_5.86 abind_1.4-5 caret_6.0-78 cachem_1.0.5
[33] withr_2.4.2 itertools_0.1-3 mlr3learners_0.5.1 vroom_1.5.4
[37] bdsmatrix_1.3-4 checkmate_2.0.0 prettyunits_1.1.1 cluster_2.1.2
[41] lazyeval_0.2.2 crayon_1.4.1 elliptic_1.4-0 recipes_0.1.17
[45] pkgconfig_2.0.3 pkgload_1.2.3 rlang_0.4.11 globals_0.14.0
[49] lifecycle_1.0.0 MatrixModels_0.5-0 palmerpenguins_0.1.0 modelr_0.1.8
[53] Kendall_2.2 cellranger_1.1.0 rprojroot_2.0.2 matrixStats_0.61.0
[57] mc2d_0.1-21 boot_1.3-28 reprex_2.0.1 base64enc_0.1-3
[61] processx_3.5.2 png_0.1-7 viridisLite_0.4.0 rjson_0.2.21
[65] R.oo_1.24.0 shape_1.4.6 parallelly_1.28.1 jpeg_0.1-9
[69] memoise_2.0.1 magrittr_2.0.1 plyr_1.8.6 audio_0.1-10
[73] compiler_4.1.2 miscTools_0.6-26 RColorBrewer_1.1-2 cli_3.1.0
[77] listenv_0.8.0 ps_1.6.0 htmlTable_2.3.0 Formula_1.2-4
[81] MASS_7.3-54 tidyselect_1.1.1 stringi_1.7.3 latticeExtra_0.6-29
[85] tools_4.1.2 mlr3misc_0.10.0 future.apply_1.8.1 parallel_4.1.2
[89] rstudioapi_0.13 uuid_0.1-4 foreign_0.8-81 foreach_1.5.1
[93] cubature_2.0.4.2 prodlim_2019.11.13 digest_0.6.27 lava_1.6.10
[97] quadprog_1.5-8 Rcpp_1.0.7 R.devices_2.17.0 httr_1.4.2
[101] contfrac_1.1-12 Rdpack_2.1.3 colorspace_2.0-2 rvest_1.0.1
[105] fs_1.5.0 readstata13_0.10.0 splines_4.1.2 lgr_0.4.3
[109] bbotk_0.4.0 conquer_1.2.1 sessioninfo_1.2.1 dreamerr_1.2.3
[113] jsonlite_1.7.2 timeDate_3043.102 testthat_3.1.0 ipred_0.9-12
[117] R6_2.5.1 Hmisc_4.6-0 pillar_1.6.2 htmltools_0.5.1.1
[121] glue_1.4.2 fastmap_1.1.0 deSolve_1.30 class_7.3-19
[125] codetools_0.2-18 pkgbuild_1.2.0 utf8_1.2.2 lattice_0.20-45
[129] numDeriv_2016.8-1.1 curl_4.3.2 desc_1.4.0 munsell_0.5.0
[133] iterators_1.0.13 haven_2.4.3 reshape2_1.4.4 gtable_0.3.0
[137] rbibutils_2.2.7

packageVersion('DoubleML')
[1] ‘0.4.1’
packageVersion('mlr3')
[1] ‘0.13.1’

Pass score function for IRM

Follow-up to this #124 (comment) but for the IRM model.

I want to modify the score function for IRM to allow weights. In the example provided in the manual, I only see how to pass g_hat and m_hat for the PLM. However, IRM requires passing g0_hat and g1_hat. How do I do it?

PLM from the manual:

# Here:
# y: dependent variable
# d: treatment variable
# g_hat: predicted values from regression of Y on X's
# m_hat: predicted values from regression of D on X's
# smpls: sample split under consideration, can be ignored in this example

score_manual = function(y, d, g_hat, m_hat, smpls) {
  resid_y = y - g_hat
  resid_d = d - m_hat
  psi_a = -1 * resid_d * resid_d 
  psi_b = resid_d * resid_y 
  psis = list(psi_a = psi_a, psi_b = psi_b)
  return(psis)
}

[Bug]: Unable to perform ensemble learners

Describe the bug

Hello, I am very fascinated with this great algorithm for causal machine learning analysis.
But when I was trying to test ensemble learners in R, I faced this error indicating that the learner for ml_g and ml_m must be of Class 'LearnerRegr'.

I first got this error when trying it on Interactive IV Model.
I also tried it with the exact same codes described on the User Guide website. I posted the link of the page below.
https://docs.doubleml.org/stable/examples/R_double_ml_pipeline.html?highlight=ensemble

I tried all kinds of solutions that I could think of, but was unable to go through this error message below.
Please help.

Thanks in advance!

Error in private$assert_learner(ml_g, "ml_g", Regr = TRUE, Classif = FALSE) :
Invalid learner provided for ml_g: must be of class 'LearnerRegr'

Minimum reproducible code snippet

Initiate new DoubleML object and estimate with graph learner

set.seed(123)
obj_dml_plr_sim_pipe_ensemble = DoubleMLPLR$new(dml_data_sim, ml_g = ensemble_pipe_regr, ml_m = ensemble_pipe_regr)
Error in private$assert_learner(ml_g, "ml_g", Regr = TRUE, Classif = FALSE) :
Invalid learner provided for ml_g: must be of class 'LearnerRegr'
obj_dml_plr_sim_pipe_ensemble$fit()
Error: object 'obj_dml_plr_sim_pipe_ensemble' not found

Expected Result

Results of the Double ML with ensemble learner

Actual Result

Error in private$assert_learner(ml_g, "ml_g", Regr = TRUE, Classif = FALSE) :
Invalid learner provided for ml_g: must be of class 'LearnerRegr'

Versions

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=Korean_Korea.949 LC_CTYPE=Korean_Korea.949 LC_MONETARY=Korean_Korea.949 LC_NUMERIC=C LC_TIME=Korean_Korea.949

attached base packages:
[1] splines stats graphics grDevices utils datasets methods base

other attached packages:
[1] mlr3pipelines_0.4.0 data.table_1.14.2 mlr3learners_0.5.2 mlr3_0.13.3 DoubleML_0.4.1 sandwich_3.0-1 lmtest_0.9-39 zoo_1.8-9
[9] MASS_7.3-54 glmnet_4.1-3 Matrix_1.3-4 rpart_4.1-15 fastDummies_1.6.3 np_0.60-11 causalweight_1.0.2 ranger_0.13.1
[17] openxlsx_4.2.5 ivreg_0.6-1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_2.1.1 tidyr_1.1.4
[25] tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] paradox_0.8.0 cubature_2.0.4.4 colorspace_2.0-2 ellipsis_0.3.2 class_7.3-19 rprojroot_2.0.2 fs_1.5.2
[8] rstudioapi_0.13 proxy_0.4-26 listenv_0.8.0 remotes_2.4.2 MatrixModels_0.5-0 mlr3tuning_0.13.0 fansi_0.5.0
[15] mvtnorm_1.1-3 lubridate_1.8.0 xml2_1.3.3 codetools_0.2-18 knitr_1.37 pkgload_1.2.4 Formula_1.2-4
[22] jsonlite_1.7.2 broom_0.7.11 dbplyr_2.1.1 hdm_0.3.1 compiler_4.1.2 httr_1.4.2 backports_1.4.1
[29] assertthat_0.2.1 fastmap_1.1.0 cli_3.1.0 prettyunits_1.1.1 quantreg_5.88 htmltools_0.5.2 tools_4.1.2
[36] igraph_1.2.11 gtable_0.3.0 glue_1.6.0 clusterGeneration_1.3.7 Rcpp_1.0.7 carData_3.0-5 SuperLearner_2.0-28
[43] cellranger_1.1.0 vctrs_0.3.8 iterators_1.0.14 xfun_0.29 ps_1.6.0 globals_0.14.0 testthat_3.1.1
[50] rvest_1.0.2 lifecycle_1.0.1 future_1.24.0 scales_1.1.1 lgr_0.4.3 hms_1.1.1 parallel_4.1.2
[57] SparseM_1.81 readstata13_0.10.0 curl_4.3.2 yaml_2.2.1 gam_1.20.1 stringi_1.7.6 desc_1.4.0
[64] foreach_1.5.2 e1071_1.7-9 checkmate_2.0.0 palmerpenguins_0.1.0 pkgbuild_1.3.1 boot_1.3-28 zip_2.2.0
[71] shape_1.4.6 rlang_0.4.12 pkgconfig_2.0.3 evaluate_0.14 lattice_0.20-45 processx_3.5.2 tidyselect_1.1.1
[78] parallelly_1.31.0 magrittr_2.0.1 R6_2.5.1 generics_0.1.1 nnls_1.4 DBI_1.1.2 pillar_1.6.4
[85] haven_2.4.3 withr_2.4.3 survival_3.2-13 abind_1.4-5 future.apply_1.8.1 modelr_0.1.8 crayon_1.4.2
[92] car_3.0-12 xgboost_1.5.2.1 uuid_1.0-3 utf8_1.2.2 tzdb_0.2.0 rmarkdown_2.11 grid_4.1.2
[99] readxl_1.3.1 callr_3.7.0 mlr3misc_0.10.0 bbotk_0.5.1 reprex_2.0.1 digest_0.6.29 LARF_1.4
[106] munsell_0.5.0 quadprog_1.5-8

packageVersion('DoubleML')
[1] ‘0.4.1’
packageVersion('mlr3')
[1] ‘0.13.3’

[Bug]: Tuning with default `tune_settings` fails

Describe the bug

Tuning with default tune_settings fails (even after fixing #155). If tune_settings$measure is set to NULL according to the docu (https://docs.doubleml.org/r/stable/reference/DoubleML.html#method-tune) default measures should be used.

Minimum reproducible code snippet

library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
set.seed(2)
ml_g = lrn("regr.ranger", num.trees = 10, max.depth = 2)
ml_m = ml_g$clone()
obj_dml_data = make_plr_CCDDHNR2018(alpha = 0.5)
dml_plr_obj = DoubleMLPLR$new(obj_dml_data, ml_g, ml_m)
par_grids = list("ml_g" = paradox::ParamSet$new(list(
    paradox::ParamInt$new("num.trees", lower = 5, upper = 6, default = 5))),
    "ml_m" = paradox::ParamSet$new(list(
    paradox::ParamInt$new("num.trees", lower = 5, upper = 6, default = 5))))
tune_settings = list(
    n_folds_tune = 5,
    rsmp_tune = mlr3::rsmp("cv", folds = 5),
    measure = NULL,
    terminator = mlr3tuning::trm("evals", n_evals = 20),
    algorithm = mlr3tuning::tnr("grid_search"),
    resolution = 5)

dml_plr_obj$tune(param_set=par_grids, tune_settings = tune_settings)

Expected Result

No exception

Actual Result

Exception

Error in private$assert_tune_settings(tune_settings) : 
  Assertion on 'tune_settings$measure' failed: Must be of type 'list', not 'NULL'. 

Versions

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.10

> packageVersion('DoubleML')
[1] ‘0.4.1’
> packageVersion('mlr3')
[1] ‘0.11.0.9000’

Meaningful error message if sample splitting was not yet set

Calling

dml_plr_obj = DoubleMLPLR$new(make_plr_CCDDHNR2018(),
                                                        lrn("regr.ranger"), lrn("regr.ranger"),
                                                        draw_sample_splitting = FALSE)
dml_plr_obj$fit()

produces error message

 Error in .__ResamplingCustom__instantiate(self = self, private = private,  : 
  Assertion on 'train_sets' failed: Must be of type 'list', not 'NULL'. 

More meaningful would be something in the lines of https://github.com/DoubleML/doubleml-for-py/blob/a574e0afcab0e7cce475925f1344399e75dd4a11/doubleml/double_ml.py#L238-L239.

Use active bindings in the R6 OOP implementation

For public fields R6 active bindings (https://r6.r-lib.org/articles/Introduction.html#active-bindings) are pretty similar to a property (with getter and setter) in Python. I am considering to use that in our implementation as well. Currently, we have a lot of public fields of which many actually shouldn't be settable. Basically you can pretty easy screw up things by setting some of the properties to an invalid value after initialization, say

> dml_plr = DoubleMLPLR$new(dml_data, ml_g, ml_m)
> dml_plr$dml_procedure = 'an_invalid_algo_name'
> dml_plr$fit()
Controls variables do not include other treatment variables
Set treatment variable d to d1.
Error in self$all_coef[private$i_treat, private$i_rep] = value : 
  number of items to replace is not a multiple of replacement length

It does not result in a meaningful error message. In Python we already heavily rely on properties with (or without) setters. So basically we can use this as a basis to move towards active bindings.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.