| Title: | Cram Method for Efficient Simultaneous Learning and Evaluation |
|---|---|
| Description: | Performs the Cram method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning algorithm. In a single pass of batched data, the proposed method repeatedly trains a machine learning algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. Unlike cross-validation, Cram evaluates the final learned model directly, providing sharper inference aligned with real-world deployment. The method naturally applies to both policy learning and contextual bandits, where decisions are based on individual features to maximize outcomes. The package includes cram_policy() for learning and evaluating individualized binary treatment rules, cram_ml() to train and assess the population-level performance of machine learning models, and cram_bandit() for on-policy evaluation of contextual bandit algorithms. For all three functions, the package provides estimates of the average outcome that would result if the model were deployed, along with standard errors and confidence intervals for these estimates. Details of the method are described in Jia, Imai, and Li (2024) <https://www.hbs.edu/ris/Publication%20Files/2403.07031v1_a83462e0-145b-4675-99d5-9754aa65d786.pdf> and Jia et al. (2025) <doi:10.48550/arXiv.2403.07031>. |
| Authors: | Yanis Vandecasteele [cre, aut], Michael Lingzhi Li [ctb], Kosuke Imai [ctb], Zeyang Jia [ctb], Longlin Wang [ctb] |
| Maintainer: | Yanis Vandecasteele <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-09 08:47:33 UTC |
| Source: | https://github.com/yanisvdc/cramr |
Batch Contextual Epsilon-Greedy Policy
Batch Contextual Epsilon-Greedy Policy
Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.
cramR::NA
epsilonProbability of selecting a random arm (exploration rate).
batch_sizeNumber of rounds per batch before updating model parameters.
A_ccList of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_ccList of reward-weighted context sums (one per arm), updated batch-wise.
class_nameInternal class name identifier.
new()
Constructor for the Batch Epsilon-Greedy policy.
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)
epsilonNumeric between 0 and 1. Probability of random arm selection.
batch_sizeInteger. Number of observations between parameter updates.
set_parameters()
Initializes the parameter structures for each arm.
BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)
context_paramsA list with at least 'd' (number of features) and 'k' (number of arms).
get_action()
Chooses an arm based on epsilon-greedy logic and the current estimates.
BatchContextualEpsilonGreedyPolicy$get_action(t, context)
tInteger time step.
contextA list with contextual features and arm count.
A list with the selected action.
set_reward()
Updates model statistics based on observed reward. Updates occur once per batch.
BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)
tInteger time step.
contextList of contextual features used for the action.
actionA list with the chosen arm.
rewardA list with the observed reward.
Updated parameter estimates.
clone()
The objects of this class are cloneable with this method.
BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)
deepWhether to make a deep clone.
Batch Contextual Thompson Sampling Policy
Batch Contextual Thompson Sampling Policy
Implements Thompson Sampling for linear contextual bandits with batch updates.
- 'initialize(v = 0.2, batch_size = 1)': Constructor, sets variance and batch size. - 'set_parameters(context_params)': Initializes arm-level matrices. - 'get_action(t, context)': Samples from the posterior and selects action. - 'set_reward(t, context, action, reward)': Updates posterior statistics using observed feedback.
cramR::NA
sigmaNumeric, posterior variance scale parameter.
batch_sizeInteger, size of mini-batches before parameter updates.
A_ccList of accumulated Gram matrices per arm.
b_ccList of reward-weighted context sums per arm.
class_nameInternal name of the class.
new()
Constructor for the batch-based Thompson Sampling policy.
BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)
vNumeric. Standard deviation scaling parameter for posterior sampling.
batch_sizeInteger. Number of rounds before parameters are updated.
set_parameters()
Initializes per-arm sufficient statistics.
BatchContextualLinTSPolicy$set_parameters(context_params)
context_paramsList with entries: 'unique' (feature vector), 'k' (number of arms).
get_action()
Samples from the posterior distribution of expected rewards and selects an action.
BatchContextualLinTSPolicy$get_action(t, context)
tInteger. Time step.
contextList containing the current context and arm information.
A list with the chosen arm ('choice').
set_reward()
Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every 'batch_size' rounds.
BatchContextualLinTSPolicy$set_reward(t, context, action, reward)
tInteger. Time step.
contextContext object containing feature info.
actionChosen action (arm index).
rewardObserved reward for the action.
Updated internal parameters.
clone()
The objects of this class are cloneable with this method.
BatchContextualLinTSPolicy$clone(deep = FALSE)
deepWhether to make a deep clone.
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Batch Disjoint LinUCB Policy with Epsilon-Greedy
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.
- 'initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)': Constructor. - 'set_parameters(context_params)': Initializes sufficient statistics for each arm. - 'get_action(t, context)': Selects an arm using UCB scores and epsilon-greedy rule. - 'set_reward(t, context, action, reward)': Updates statistics and refreshes model at batch intervals.
cramR::NA
alphaNumeric, UCB exploration strength parameter.
epsilonNumeric, probability of taking a random exploratory action.
batch_sizeInteger, number of rounds per batch update.
A_ccList of Gram matrices per arm, accumulated across batch.
b_ccList of reward-weighted context vectors per arm.
class_nameInternal class name identifier.
new()
Constructor for batched LinUCB with epsilon-greedy exploration.
BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)
alphaNumeric. UCB width parameter (exploration strength).
epsilonNumeric. Probability of selecting a random arm.
batch_sizeInteger. Number of rounds before updating parameters.
set_parameters()
Initialize arm-specific parameter containers.
BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)
context_paramsList containing at least 'unique' (feature size) and 'k' (number of arms).
get_action()
Chooses an arm based on UCB and epsilon-greedy sampling.
BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)
tInteger timestep.
contextList containing the context for the decision.
A list with the selected action.
set_reward()
Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.
BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
tInteger timestep.
contextContext object used for decision-making.
actionList containing the chosen action.
rewardList containing the observed reward.
Updated internal model parameters.
clone()
The objects of this class are cloneable with this method.
BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
deepWhether to make a deep clone.
Contextual Linear Bandit Environment
Contextual Linear Bandit Environment
An R6 class for simulating a contextual linear bandit environment with normally distributed rewards.
- 'initialize(k, d, list_betas, sigma = 0.1, binary_rewards = FALSE)': Constructor. - 'post_initialization()': Loads correct coefficients based on 'sim_id'. - 'get_context(t)': Returns context and sets internal reward vector. - 'get_reward(t, context_common, action)': Returns observed reward for an action.
cramR::NA -> ContextualLinearBandit
rewardsA vector of rewards for each arm in the current round.
betasCoefficient matrix of the linear reward model (one column per arm).
sigmaStandard deviation of the Gaussian noise added to rewards.
binaryLogical, indicating whether to convert rewards into binary outcomes.
weightsThe latent reward scores before noise and/or binarization.
list_betasA list of coefficient matrices, one per simulation.
sim_idIndex for selecting which simulation's coefficients to use.
class_nameName of the class for internal tracking.
new()
ContextualLinearBandit$new( k, d, list_betas, sigma = 0.1, binary_rewards = FALSE )
kNumber of arms
dNumber of features
list_betasA list of true beta matrices for each simulation
sigmaStandard deviation of Gaussian noise
binary_rewardsLogical, use binary rewards or not
post_initialization()
Set the simulation-specific coefficients for the current simulation.
ContextualLinearBandit$post_initialization()
No return value; modifies the internal state of the object.
get_context()
ContextualLinearBandit$get_context(t)
tCurrent time step
A list containing context vector 'X' and arm count 'k'
get_reward()
ContextualLinearBandit$get_reward(t, context_common, action)
tCurrent time step
context_commonContext shared across arms
actionAction taken by the policy
A list with reward and optimal arm/reward info
clone()
The objects of this class are cloneable with this method.
ContextualLinearBandit$clone(deep = FALSE)
deepWhether to make a deep clone.
Performs the Cram method for On-policy Statistical Evaluation in Contextual Bandits
cram_bandit(pi, arm, reward, batch = 1, alpha = 0.05)cram_bandit(pi, arm, reward, batch = 1, alpha = 0.05)
pi |
An array of shape (T × B, T, K) or (T × B, T), where T is the number of learning steps (or policy updates), B is the batch size, K is the number of arms, T x B is the total number of contexts. If 3D, pi[j, t, a] gives the probability that the policy pi_t assigns arm a to context X_j. If 2D, pi[j, t] gives the probability that the policy pi_t assigns arm A_j (arm actually chosen under X_j in the history) to context X_j. Please see vignette for more details. |
arm |
A vector of length T x B indicating which arm was selected in each context |
reward |
A vector of observed rewards of length T x B |
batch |
(Optional) A vector or integer. If a vector, gives the batch assignment for each context. If an integer, interpreted as the batch size and contexts are assigned to a batch in the order of the dataset. Default is 1. |
alpha |
Significance level for confidence intervals for calculating the empirical coverage. Default is 0.05 (95% confidence). |
A list containing:
raw_results |
A data frame summarizing key metrics: Empirical Bias on Policy Value, Average relative error on Policy Value, RMSE using relative errors on Policy Value, Empirical Coverage of Confidence Intervals. |
interactive_table |
An interactive table summarizing the same key metrics in a user-friendly interface. |
# Example with batch size of 1 # Set random seed for reproducibility set.seed(42) # Define parameters T <- 100 # Number of timesteps K <- 4 # Number of arms # Simulate a 3D array pi of shape (T, T, K) # - First dimension: Individuals (context Xj) # - Second dimension: Time steps (pi_t) # - Third dimension: Arms (depth) pi <- array(runif(T * T * K, 0.1, 1), dim = c(T, T, K)) # Normalize probabilities so that each row sums to 1 across arms for (t in 1:T) { for (j in 1:T) { pi[j, t, ] <- pi[j, t, ] / sum(pi[j, t, ]) } } # Simulate arm selections (randomly choosing an arm) arm <- sample(1:K, T, replace = TRUE) # Simulate rewards (assume normally distributed rewards) reward <- rnorm(T, mean = 1, sd = 0.5) result <- cram_bandit(pi, arm, reward, batch=1, alpha=0.05) result$raw_results result$interactive_table# Example with batch size of 1 # Set random seed for reproducibility set.seed(42) # Define parameters T <- 100 # Number of timesteps K <- 4 # Number of arms # Simulate a 3D array pi of shape (T, T, K) # - First dimension: Individuals (context Xj) # - Second dimension: Time steps (pi_t) # - Third dimension: Arms (depth) pi <- array(runif(T * T * K, 0.1, 1), dim = c(T, T, K)) # Normalize probabilities so that each row sums to 1 across arms for (t in 1:T) { for (j in 1:T) { pi[j, t, ] <- pi[j, t, ] / sum(pi[j, t, ]) } } # Simulate arm selections (randomly choosing an arm) arm <- sample(1:K, T, replace = TRUE) # Simulate rewards (assume normally distributed rewards) reward <- rnorm(T, mean = 1, sd = 0.5) result <- cram_bandit(pi, arm, reward, batch=1, alpha=0.05) result$raw_results result$interactive_table
This function implements the contextual armed bandit on-policy evaluation by providing the policy value estimate.
cram_bandit_est(pi, reward, arm, batch = 1)cram_bandit_est(pi, reward, arm, batch = 1)
pi |
An array of shape (T × B, T, K) or (T × B, T), where T is the number of learning steps (or policy updates), B is the batch size, K is the number of arms, T x B is the total number of contexts. If 3D, pi[j, t, a] gives the probability that the policy pi_t assigns arm a to context X_j. If 2D, pi[j, t] gives the probability that the policy pi_t assigns arm A_j (arm actually chosen under X_j in the history) to context X_j. Please see vignette for more details. |
reward |
A vector of observed rewards of length T x B |
arm |
A vector of length T x B indicating which arm was selected in each context |
batch |
(Optional) A vector or integer. If a vector, gives the batch assignment for each context. If an integer, interpreted as the batch size and contexts are assigned to a batch in the order of the dataset. Default is 1. |
The estimated policy value.
This function runs on-policy simulation for contextual bandit algorithms using the Cram method. It evaluates the statistical properties of policy value estimates.
cram_bandit_sim( horizon, simulations, bandit, policy, alpha = 0.05, do_parallel = FALSE, seed = 42 )cram_bandit_sim( horizon, simulations, bandit, policy, alpha = 0.05, do_parallel = FALSE, seed = 42 )
horizon |
An integer specifying the number of timesteps (rounds) per simulation. |
simulations |
An integer specifying the number of independent Monte Carlo simulations to perform. |
bandit |
A contextual bandit environment object that generates contexts (feature vectors) and observed rewards for each arm chosen. |
policy |
A policy object that takes in a context and selects an arm (action) at each timestep. |
alpha |
Significance level for confidence intervals for calculating the empirical coverage. Default is 0.05 (95% confidence). |
do_parallel |
Whether to parallelize the simulations. Default to FALSE. We recommend keeping to FALSE unless necessary, please see vignette. |
seed |
An optional integer to set the random seed for reproducibility. If NULL, no seed is set. |
A list containing:
estimates |
A table containing the detailed history of estimates and errors for each simulation. |
raw_results |
A data frame summarizing key metrics: Empirical Bias on Policy Value, Average relative error on Policy Value, RMSE using relative errors on Policy Value, Empirical Coverage of Confidence Intervals. |
interactive_table |
An interactive table summarizing the same key metrics in a user-friendly interface. |
# Number of time steps horizon <- 500L # Number of simulations simulations <- 100L # Number of arms k = 4 # Number of context features d= 3 # Reward beta parameters of linear model (the outcome generation models, # one for each arm, are linear with arm-specific parameters betas) list_betas <- cramR::get_betas(simulations, d, k) # Define the contextual linear bandit, where sigma is the scale # of the noise in the outcome linear model bandit <- cramR::ContextualLinearBandit$new(k = k, d = d, list_betas = list_betas, sigma = 0.3) # Define the policy object (choose between Contextual Epsilon Greedy, # UCB Disjoint and Thompson Sampling) policy <- cramR::BatchContextualEpsilonGreedyPolicy$new(epsilon=0.1, batch_size=5) # policy <- cramR::BatchLinUCBDisjointPolicyEpsilon$new(alpha=1.0,epsilon=0.1,batch_size=1) # policy <- cramR::BatchContextualLinTSPolicy$new(v = 0.1, batch_size=1) sim <- cram_bandit_sim(horizon, simulations, bandit, policy, alpha=0.05, do_parallel = FALSE) sim$summary_table# Number of time steps horizon <- 500L # Number of simulations simulations <- 100L # Number of arms k = 4 # Number of context features d= 3 # Reward beta parameters of linear model (the outcome generation models, # one for each arm, are linear with arm-specific parameters betas) list_betas <- cramR::get_betas(simulations, d, k) # Define the contextual linear bandit, where sigma is the scale # of the noise in the outcome linear model bandit <- cramR::ContextualLinearBandit$new(k = k, d = d, list_betas = list_betas, sigma = 0.3) # Define the policy object (choose between Contextual Epsilon Greedy, # UCB Disjoint and Thompson Sampling) policy <- cramR::BatchContextualEpsilonGreedyPolicy$new(epsilon=0.1, batch_size=5) # policy <- cramR::BatchLinUCBDisjointPolicyEpsilon$new(alpha=1.0,epsilon=0.1,batch_size=1) # policy <- cramR::BatchContextualLinTSPolicy$new(v = 0.1, batch_size=1) sim <- cram_bandit_sim(horizon, simulations, bandit, policy, alpha=0.05, do_parallel = FALSE) sim$summary_table
This function implements the crammed variance estimate of the policy value estimate for the contextual armed bandit on-policy evaluation setting.
cram_bandit_var(pi, reward, arm, batch = 1)cram_bandit_var(pi, reward, arm, batch = 1)
pi |
An array of shape (T × B, T, K) or (T × B, T), where T is the number of learning steps (or policy updates), B is the batch size, K is the number of arms, T x B is the total number of contexts. If 3D, pi[j, t, a] gives the probability that the policy pi_t assigns arm a to context X_j. If 2D, pi[j, t] gives the probability that the policy pi_t assigns arm A_j (arm actually chosen under X_j in the history) to context X_j. Please see vignette for more details. |
reward |
A vector of observed rewards of length T x B |
arm |
A vector of length T x B indicating which arm was selected in each context |
batch |
(Optional) A vector or integer. If a vector, gives the batch assignment for each context. If an integer, interpreted as the batch size and contexts are assigned to a batch in the order of the dataset. Default is 1. |
The crammed variance estimate of the policy value estimate.
This function returns the cram policy estimator for the policy value difference (delta).
cram_estimator(X, Y, D, pi, batch_indices, propensity = NULL)cram_estimator(X, Y, D, pi, batch_indices, propensity = NULL)
X |
A matrix or data frame of covariates for each sample. |
Y |
A vector of outcomes for the n individuals. |
D |
A vector of binary treatments for the n individuals. |
pi |
A matrix of n rows and (nb_batch + 1) columns, where n is the sample size and nb_batch is the number of batches, containing the policy assignment for each individual for each policy. The first column represents the baseline policy. |
batch_indices |
A list where each element is a vector of indices corresponding to the individuals in each batch. |
propensity |
The propensity score function |
The estimated policy value difference (Delta).
This function computes the Cram ML expected loss estimator based on the given loss matrix and batch indices.
cram_expected_loss(loss, batch_indices)cram_expected_loss(loss, batch_indices)
loss |
A matrix of loss values with N rows (data points) and K+1 columns (batches). We assume that the first column of the loss matrix contains only zeros. The following nb_batch columns contain the losses of each trained model for each individual. |
batch_indices |
A list where each element is a vector of indices corresponding to a batch. |
The Cram ML expected loss estimate
This function performs the learning part of the Cram Policy method.
cram_learning( X, D, Y, batch, model_type = "causal_forest", learner_type = "ridge", baseline_policy = NULL, parallelize_batch = FALSE, model_params = NULL, custom_fit = NULL, custom_predict = NULL, n_cores = detectCores() - 1, propensity = NULL )cram_learning( X, D, Y, batch, model_type = "causal_forest", learner_type = "ridge", baseline_policy = NULL, parallelize_batch = FALSE, model_params = NULL, custom_fit = NULL, custom_predict = NULL, n_cores = detectCores() - 1, propensity = NULL )
X |
A matrix or data frame of covariates for each sample. |
D |
A vector of binary treatment indicators (1 for treated, 0 for untreated). |
Y |
A vector of outcome values for each sample. |
batch |
Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample. |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
baseline_policy |
A list providing the baseline policy (binary 0 or 1) for each sample. If |
parallelize_batch |
Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to |
model_params |
A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to |
custom_fit |
A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to |
custom_predict |
A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to |
n_cores |
Number of cores to use for parallelization when parallelize_batch is set to TRUE. Defaults to detectCores() - 1. |
propensity |
The propensity score |
A list containing:
final_policy_model |
The final fitted policy model, depending on |
policies |
A matrix of learned policies, where each column represents a batch's learned policy and the first column is the baseline policy. |
batch_indices |
The indices for each batch, either as generated (if |
causal_forest, cv.glmnet, keras_model_sequential
Performs the Cram method for simultaneous machine learning and evaluation.
cram_ml( data, batch, formula = NULL, caret_params = NULL, parallelize_batch = FALSE, loss_name = NULL, custom_fit = NULL, custom_predict = NULL, custom_loss = NULL, alpha = 0.05, classify = FALSE )cram_ml( data, batch, formula = NULL, caret_params = NULL, parallelize_batch = FALSE, loss_name = NULL, custom_fit = NULL, custom_predict = NULL, custom_loss = NULL, alpha = 0.05, classify = FALSE )
data |
A matrix or data frame of covariates. For supervised learning, must include the target variable specified in formula. |
batch |
Integer specifying number of batches or vector of pre-defined batch assignments. |
formula |
Formula for supervised learning (e.g., y ~ .). |
caret_params |
List of parameters for caret::train() containing:
|
parallelize_batch |
Logical indicating whether to parallelize batch processing (default = FALSE). |
loss_name |
Name of loss metric (supported: "se", "logloss", "accuracy"). |
custom_fit |
Optional custom model training function. |
custom_predict |
Optional custom prediction function. |
custom_loss |
Optional custom loss function. |
alpha |
Confidence level for intervals (default = 0.05). |
classify |
Indicate if this is a classification problem. Defaults to FALSE. |
A list containing:
raw_results: Data frame with performance metrics
interactive_table: The same performance metrics in a user-friendly interface
final_ml_model: Trained model object
train for model training parameters
# Load necessary libraries library(caret) # Set seed for reproducibility set.seed(42) # Generate example dataset X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100)) Y_data <- rnorm(100) # Continuous target variable for regression data_df <- data.frame(X_data, Y = Y_data) # Ensure target variable is included # Define caret parameters for simple linear regression (no cross-validation) caret_params_lm <- list( method = "lm", trControl = trainControl(method = "none") ) nb_batch <- 5 # Run ML learning function result <- cram_ml( data = data_df, formula = Y ~ ., # Linear regression model batch = nb_batch, loss_name = 'se', caret_params = caret_params_lm ) result$raw_results result$interactive_table result$final_ml_model# Load necessary libraries library(caret) # Set seed for reproducibility set.seed(42) # Generate example dataset X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100)) Y_data <- rnorm(100) # Continuous target variable for regression data_df <- data.frame(X_data, Y = Y_data) # Ensure target variable is included # Define caret parameters for simple linear regression (no cross-validation) caret_params_lm <- list( method = "lm", trControl = trainControl(method = "none") ) nb_batch <- 5 # Run ML learning function result <- cram_ml( data = data_df, formula = Y ~ ., # Linear regression model batch = nb_batch, loss_name = 'se', caret_params = caret_params_lm ) result$raw_results result$interactive_table result$final_ml_model
This function performs the cram method (simultaneous policy learning and evaluation) for binary policies on data including covariates (X), binary treatment indicator (D) and outcomes (Y).
cram_policy( X, D, Y, batch, model_type = "causal_forest", learner_type = "ridge", baseline_policy = NULL, parallelize_batch = FALSE, model_params = NULL, custom_fit = NULL, custom_predict = NULL, alpha = 0.05, propensity = NULL )cram_policy( X, D, Y, batch, model_type = "causal_forest", learner_type = "ridge", baseline_policy = NULL, parallelize_batch = FALSE, model_params = NULL, custom_fit = NULL, custom_predict = NULL, alpha = 0.05, propensity = NULL )
X |
A matrix or data frame of covariates for each sample. |
D |
A vector of binary treatment indicators (1 for treated, 0 for non-treated). |
Y |
A vector of outcome values for each sample. |
batch |
Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample. |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
baseline_policy |
A list providing the baseline policy (binary 0 or 1) for each sample. If |
parallelize_batch |
Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to |
model_params |
A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to |
custom_fit |
A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to |
custom_predict |
A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to |
alpha |
Significance level for confidence intervals. Default is 0.05 (95% confidence). |
propensity |
The propensity score function for binary treatment indicator (D) (probability for each unit to receive treatment). Defaults to 0.5 (random assignment). |
A list containing:
raw_results: A data frame summarizing key metrics with truncated decimals:
Delta Estimate: The estimated treatment effect (delta).
Delta Standard Error: The standard error of the delta estimate.
Delta CI Lower: The lower bound of the confidence interval for delta.
Delta CI Upper: The upper bound of the confidence interval for delta.
Policy Value Estimate: The estimated policy value.
Policy Value Standard Error: The standard error of the policy value estimate.
Policy Value CI Lower: The lower bound of the confidence interval for policy value.
Policy Value CI Upper: The upper bound of the confidence interval for policy value.
Proportion Treated: The proportion of individuals treated under the final policy.
interactive_table: An interactive table summarizing key metrics for detailed exploration.
final_policy_model: The final fitted policy model based on model_type and learner_type or custom_fit.
causal_forest, cv.glmnet, keras_model_sequential
# Example data X_data <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) D_data <- as.integer(sample(c(0, 1), 100, replace = TRUE)) Y_data <- rnorm(100) nb_batch <- 5 # Perform CRAM policy result <- cram_policy(X = X_data, D = D_data, Y = Y_data, batch = nb_batch) # Access results result$raw_results result$interactive_table result$final_policy_model# Example data X_data <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) D_data <- as.integer(sample(c(0, 1), 100, replace = TRUE)) Y_data <- rnorm(100) nb_batch <- 5 # Perform CRAM policy result <- cram_policy(X = X_data, D = D_data, Y = Y_data, batch = nb_batch) # Access results result$raw_results result$interactive_table result$final_policy_model
This function returns the cram estimator for the policy value (psi).
cram_policy_value_estimator(X, Y, D, pi, batch_indices, propensity = NULL)cram_policy_value_estimator(X, Y, D, pi, batch_indices, propensity = NULL)
X |
A matrix or data frame of covariates for each sample. |
Y |
A vector of outcomes for the n individuals. |
D |
A vector of binary treatments for the n individuals. |
pi |
A matrix of n rows and (nb_batch + 1) columns, where n is the sample size and nb_batch is the number of batches, containing the policy assignment for each individual for each policy. The first column represents the baseline policy. |
batch_indices |
A list where each element is a vector of indices corresponding to the individuals in each batch. |
propensity |
Propensity score function |
The estimated policy value.
This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.
cram_simulation( X = NULL, dgp_X = NULL, dgp_D, dgp_Y, batch, nb_simulations, nb_simulations_truth = NULL, sample_size, model_type = "causal_forest", learner_type = "ridge", alpha = 0.05, baseline_policy = NULL, parallelize_batch = FALSE, model_params = NULL, custom_fit = NULL, custom_predict = NULL, propensity = NULL )cram_simulation( X = NULL, dgp_X = NULL, dgp_D, dgp_Y, batch, nb_simulations, nb_simulations_truth = NULL, sample_size, model_type = "causal_forest", learner_type = "ridge", alpha = 0.05, baseline_policy = NULL, parallelize_batch = FALSE, model_params = NULL, custom_fit = NULL, custom_predict = NULL, propensity = NULL )
X |
Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates. |
dgp_X |
Optional. A function to generate covariate data for simulations. |
dgp_D |
A vectorized function to generate binary treatment assignments for each sample. |
dgp_Y |
A vectorized function to generate the outcome variable for each sample given the treatment and covariates. |
batch |
Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample. |
nb_simulations |
The number of simulations (Monte Carlo replicates) to run. |
nb_simulations_truth |
Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi) |
sample_size |
The number of samples in each simulation. |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
alpha |
Significance level for confidence intervals. Default is 0.05 (95% confidence). |
baseline_policy |
A list providing the baseline policy (binary 0 or 1) for each sample.
If |
parallelize_batch |
Logical. Whether to parallelize batch processing
(i.e. the cram method learns T policies,
with T the number of batches. They are learned in parallel
when parallelize_batch is TRUE vs. learned sequentially using
the efficient data.table structure when parallelize_batch is FALSE,
recommended for light weight training). Defaults to |
model_params |
A list of additional parameters to pass to the model,
which can be any parameter defined in the model reference package.
Defaults to |
custom_fit |
A custom, user-defined, function that outputs a fitted model given training data
(allows flexibility). Defaults to |
custom_predict |
A custom, user-defined, function for making predictions given a fitted model
and test data (allow flexibility). Defaults to |
propensity |
The propensity score model |
A list containing:
avg_proportion_treatedThe average proportion of treated individuals across simulations.
avg_delta_estimateThe average delta estimate across simulations.
avg_delta_standard_errorThe average standard error of delta estimates.
delta_empirical_biasThe empirical bias of delta estimates.
delta_empirical_coverageThe empirical coverage of delta confidence intervals.
avg_policy_value_estimateThe average policy value estimate across simulations.
avg_policy_value_standard_errorThe average standard error of policy value estimates.
policy_value_empirical_biasThe empirical bias of policy value estimates.
policy_value_empirical_coverageThe empirical coverage of policy value confidence intervals.
causal_forest, cv.glmnet, keras_model_sequential
set.seed(123) # dgp_X <- function(n) { # data.table::data.table( # binary = rbinom(n, 1, 0.5), # discrete = sample(1:5, n, replace = TRUE), # continuous = rnorm(n) # ) # } n <- 100 X_data <- data.table::data.table( binary = rbinom(n, 1, 0.5), discrete = sample(1:5, n, replace = TRUE), continuous = rnorm(n) ) dgp_D <- function(X) rbinom(nrow(X), 1, 0.5) dgp_Y <- function(D, X) { theta <- ifelse( X[, binary] == 1 & X[, discrete] <= 2, # Group 1: High benefit 1, ifelse(X[, binary] == 0 & X[, discrete] >= 4, # Group 3: Negative benefit -1, 0.1) # Group 2: Neutral effect ) Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) + (1 - D) * rnorm(length(D)) # Outcome for untreated return(Y) } # Parameters nb_simulations <- 100 nb_simulations_truth <- 200 batch <- 5 # Perform CRAM simulation result <- cram_simulation( X = X_data, dgp_D = dgp_D, dgp_Y = dgp_Y, batch = batch, nb_simulations = nb_simulations, nb_simulations_truth = nb_simulations_truth, sample_size = 500 ) result$raw_results result$interactive_tableset.seed(123) # dgp_X <- function(n) { # data.table::data.table( # binary = rbinom(n, 1, 0.5), # discrete = sample(1:5, n, replace = TRUE), # continuous = rnorm(n) # ) # } n <- 100 X_data <- data.table::data.table( binary = rbinom(n, 1, 0.5), discrete = sample(1:5, n, replace = TRUE), continuous = rnorm(n) ) dgp_D <- function(X) rbinom(nrow(X), 1, 0.5) dgp_Y <- function(D, X) { theta <- ifelse( X[, binary] == 1 & X[, discrete] <= 2, # Group 1: High benefit 1, ifelse(X[, binary] == 0 & X[, discrete] >= 4, # Group 3: Negative benefit -1, 0.1) # Group 2: Neutral effect ) Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) + (1 - D) * rnorm(length(D)) # Outcome for untreated return(Y) } # Parameters nb_simulations <- 100 nb_simulations_truth <- 200 batch <- 5 # Perform CRAM simulation result <- cram_simulation( X = X_data, dgp_D = dgp_D, dgp_Y = dgp_Y, batch = batch, nb_simulations = nb_simulations, nb_simulations_truth = nb_simulations_truth, sample_size = 500 ) result$raw_results result$interactive_table
This function computes the variance estimator based on the given loss matrix and batch indices.
cram_var_expected_loss(loss, batch_indices)cram_var_expected_loss(loss, batch_indices)
loss |
A matrix of loss values with N rows (data points) and K+1 columns (batches). We assume that the first column of the loss matrix contains only zeros. The following nb_batch columns contain the losses of each trained model for each individual. |
batch_indices |
A list where each element is a vector of indices corresponding to a batch. |
The variance estimate of the crammed expected loss estimate
This function estimates the asymptotic variance of the cram estimator for the policy value difference (delta).
cram_variance_estimator(X, Y, D, pi, batch_indices, propensity = NULL)cram_variance_estimator(X, Y, D, pi, batch_indices, propensity = NULL)
X |
A matrix or data frame of covariates for each sample. |
Y |
A vector of outcomes for the n individuals. |
D |
A vector of binary treatments for the n individuals. |
pi |
A matrix of n rows and (nb_batch + 1) columns, where n is the sample size and nb_batch is the number of batches, containing the policy assignment for each individual for each policy. The first column represents the baseline policy. |
batch_indices |
A list where each element is a vector of indices corresponding to the individuals in each batch. |
propensity |
The propensity score function |
The estimated variance of the policy value difference (Delta)
This function estimates the asymptotic variance of the cram estimator for the policy value (psi).
cram_variance_estimator_policy_value( X, Y, D, pi, batch_indices, propensity = NULL )cram_variance_estimator_policy_value( X, Y, D, pi, batch_indices, propensity = NULL )
X |
A matrix or data frame of covariates for each sample. |
Y |
A vector of outcomes for the n individuals. |
D |
A vector of binary treatments for the n individuals. |
pi |
A matrix of n rows and (nb_batch + 1) columns, where n is the sample size and nb_batch is the number of batches, containing the policy assignment for each individual for each policy. The first column represents the baseline policy. |
batch_indices |
A list where each element is a vector of indices corresponding to the individuals in each batch. |
propensity |
Propensity score function |
The variance estimate of the crammed Policy Value estimate (Psi)
This function trains a given unfitted model with the provided data and parameters, according to model type and learner type.
fit_model(model, X, Y, D, model_type, learner_type, model_params, propensity)fit_model(model, X, Y, D, model_type, learner_type, model_params, propensity)
model |
An unfitted model object, as returned by 'set_model'. |
X |
A matrix or data frame of covariates for the samples. |
Y |
A vector of outcome values. |
D |
A vector of binary treatment indicators (1 for treated, 0 for untreated). |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
model_params |
A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to |
propensity |
The propensity score |
The fitted model object.
This function trains a given unfitted model with the provided data and parameters, according to model type and learner type.
fit_model_ml(data, formula, caret_params, classify)fit_model_ml(data, formula, caret_params, classify)
data |
The dataset |
formula |
The formula |
caret_params |
The parameters for caret model |
classify |
Indicate if this is a classification problem. Defaults to FALSE |
The fitted model object.
Creates a list of matrices representing the arm-specific reward-generating parameters (betas) used in contextual linear bandit simulations. Each matrix corresponds to one simulation and contains normalized random coefficients.
get_betas(simulations, d, k)get_betas(simulations, d, k)
simulations |
Integer. Number of simulations. |
d |
Integer. Number of features (context dimensions). |
k |
Integer. Number of arms. |
A list of length simulations + 1 (first element being discarded in the underlying
simulation package), each containing a d x k matrix of normalized reward parameters.
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration.
- 'initialize(alpha = 1.0, epsilon = 0.1)': Create a new LinUCBDisjointPolicyEpsilon object. - 'set_parameters(context_params)': Initialize arm-level parameters. - 'get_action(t, context)': Selects an arm using epsilon-greedy UCB. - 'set_reward(t, context, action, reward)': Updates internal statistics based on observed reward.
cramR::NA
alphaNumeric, exploration parameter controlling the width of the confidence bound.
epsilonNumeric, probability of selecting a random action (exploration).
class_nameInternal class name.
new()
Initializes the policy with UCB parameter alpha and exploration rate epsilon.
LinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1)
alphaNumeric. Controls width of the UCB bonus.
epsilonNumeric between 0 and 1. Probability of random action selection.
set_parameters()
Set arm-specific parameter structures.
LinUCBDisjointPolicyEpsilon$set_parameters(context_params)
context_paramsA list with context information, typically including the number of unique features.
get_action()
Selects an arm using epsilon-greedy Upper Confidence Bound (UCB).
LinUCBDisjointPolicyEpsilon$get_action(t, context)
tInteger time step.
contextA list with contextual features and number of arms.
A list containing the selected action.
set_reward()
Updates internal statistics using the observed reward for the selected arm.
LinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
tInteger time step.
contextContextual features for all arms at time t.
actionA list containing the chosen arm.
rewardA list containing the observed reward for the selected arm.
Updated internal parameters.
clone()
The objects of this class are cloneable with this method.
LinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
deepWhether to make a deep clone.
This function performs batch-wise learning for machine learning models.
ml_learning( data, formula = NULL, batch, parallelize_batch = FALSE, loss_name = NULL, caret_params = NULL, custom_fit = NULL, custom_predict = NULL, custom_loss = NULL, n_cores = detectCores() - 1, classify = FALSE )ml_learning( data, formula = NULL, batch, parallelize_batch = FALSE, loss_name = NULL, caret_params = NULL, custom_fit = NULL, custom_predict = NULL, custom_loss = NULL, n_cores = detectCores() - 1, classify = FALSE )
data |
A matrix or data frame of features. Must include the target variable. |
formula |
Formula specifying the relationship between the target and predictors for supervised learning. |
batch |
Either an integer specifying the number of batches (randomly sampled) or a vector of length equal to the sample size indicating batch assignment for each observation. |
parallelize_batch |
Logical. Whether to parallelize batch processing. Defaults to 'FALSE'. |
loss_name |
The name of the loss function to be used (e.g., '"se"', '"logloss"'). |
caret_params |
A list of parameters to pass to the 'caret::train()' function. - Required: 'method' (e.g., '"glm"', '"rf"'). |
custom_fit |
A custom function for training user-defined models. Defaults to 'NULL'. |
custom_predict |
A custom function for making predictions from user-defined models. Defaults to 'NULL'. |
custom_loss |
Optional custom function for computing the loss of a trained model on the data. Should return a vector containing per-instance losses. |
n_cores |
Number of CPU cores to use for parallel processing ('parallelize_batch = TRUE'). Defaults to 'detectCores() - 1'. |
classify |
Indicate if this is a classification problem. Defaults to FALSE |
A list containing:
final_ml_model |
The final trained ML model. |
losses |
A matrix of losses where each column represents a batch's trained model. The first column contains zeros (baseline model). |
batch_indices |
The indices of observations in each batch. |
This function performs inference using a trained model, providing flexibility for different types of models such as Causal Forest, Ridge Regression, and Feedforward Neural Networks (FNNs).
model_predict(model, X, D, model_type, learner_type, model_params)model_predict(model, X, D, model_type, learner_type, model_params)
model |
A trained model object returned by the 'fit_model' function. |
X |
A matrix or data frame of covariates for which predictions are required. |
D |
A vector of binary treatment indicators (1 for treated, 0 for untreated). Optional, depending on the model type. |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
model_params |
A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to |
A vector of binary policy assignments, depending on the model_type and learner_type.
This function performs inference using a trained model
model_predict_ml( model, data, formula, caret_params, cram_policy_handle = FALSE )model_predict_ml( model, data, formula, caret_params, cram_policy_handle = FALSE )
model |
A trained model object returned by the 'fit_model_ml' function. |
data |
The dataset |
formula |
The formula |
caret_params |
The parameters of the caret model |
cram_policy_handle |
Internal use. Post-process predictions differently for cram policy use. Defaults to FALSE. |
Predictions of the model on the data
This function maps the model type and learner type to the corresponding model function.
set_model(model_type, learner_type, model_params)set_model(model_type, learner_type, model_params)
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
model_params |
A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to
For other learners (e.g., |
The instantiated model object or the corresponding model function.
This function validates a provided baseline policy or sets a default baseline policy of zeros for all individuals.
test_baseline_policy(baseline_policy, n)test_baseline_policy(baseline_policy, n)
baseline_policy |
A list representing the baseline policy for each individual. If |
n |
An integer specifying the number of individuals in the population. |
A validated or default baseline policy as a list of numeric values.
This function validates a provided batch assignment or generates random batch assignments for individuals.
test_batch(batch, n)test_batch(batch, n)
batch |
Either an integer specifying the number of batches or a vector/list of batch assignments for all individuals. |
n |
An integer specifying the number of individuals in the population. |
A list containing:
batchesA list where each element contains the indices of individuals assigned to a specific batch.
nb_batchThe total number of batches.
This function validates user-provided parameters against the formal arguments of a specified model function. It ensures that all user-specified parameters are recognized by the model and raises an error for invalid parameters.
validate_params(model_function, model_type, learner_type, user_params)validate_params(model_function, model_type, learner_type, user_params)
model_function |
The model function for which parameters are being validated (e.g., |
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
user_params |
A named list of parameters provided by the user. |
A named list of validated parameters that are safe to pass to the model function.
This function validates user-provided parameters for a Feedforward Neural Network (FNN) model.
It ensures the correct structure for input_layer, layers, output_layer,
compile_args and fit_params.
validate_params_fnn(model_type, learner_type, model_params, X)validate_params_fnn(model_type, learner_type, model_params, X)
model_type |
The model type for policy learning. Options include |
learner_type |
The learner type for the chosen model. Options include |
model_params |
A named list of parameters provided by the user for configuring the FNN model. |
X |
A matrix or data frame of covariates for which the parameters are validated. |
A named list of validated parameters merged with defaults for any missing values.