-
Notifications
You must be signed in to change notification settings - Fork 2
/
rct_basic.qmd
212 lines (165 loc) · 8.7 KB
/
rct_basic.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
---
title: "The Power of Randomization"
share:
permalink: "https://book.martinez.fyi/rct_basic.html"
description: "Business Data Science: What Does it Mean to Be Data-Driven?"
linkedin: true
email: true
mastodon: true
---
<img src="img/randomization.png" align="right" height="280" alt="Randomization" />
It is crucial to recognize that randomization isn't magic. When a sufficiently
large population is randomly divided into two groups, these groups will be
remarkably similar in both observable and unobservable characteristics. By
assigning one group to a treatment and leaving the other as a control, any
difference in the outcome of interest can be confidently attributed to the
treatment. As we discussed in Chapter 1, causal inference can be conceptualized
as a missing data problem. In a randomized experiment, this problem is
simplified because we effectively have data missing at random, allowing us to
make unbiased estimates of causal effects.
## The Importance of SUTVA: The Cornerstone of Valid Inference
However, it's crucial to remember that the success of both RCTs and A/B tests
hinges on a fundamental assumption: the Stable Unit Treatment Value Assumption
(SUTVA). SUTVA has two main components:
- **No Interference (or No Spillover):** The treatment applied to one unit
should not affect the outcome of another unit. This means that the outcome
for any unit is unaffected by the treatments received by other units.
- **Treatment Variation Irrelevance (or Consistency):** The potential outcome
of a unit under a specific treatment should be the same regardless of how
that treatment is assigned. This implies that if a unit receives a
particular treatment, the outcome should only depend on that treatment, not
on how or why it was assigned.
In simpler terms, SUTVA ensures that the effect of the treatment is solely due
to the treatment itself and not influenced by other factors or interactions
between units.
#### SUTVA Violations: When the Ideal Meets Reality
While SUTVA is often assumed, it can be easily violated:
- **Network Effects:** Consider an A/B test of a new social media feature. If
users in the treatment group interact with users in the control group, the
feature's impact might spread beyond the intended group, violating SUTVA.
- **Market Competition:** Testing a new pricing strategy might trigger
competitor reactions, indirectly affecting the outcome even for users not
exposed to the new price.
- **Spillover Effects:** In advertising, a targeted campaign for one product
might unintentionally increase awareness or sales of related products.
#### Mitigating SUTVA Violations
Sometimes, the solution to a SUTVA violation can be as simple as **changing the
unit of randomization.** For instance, running geo-experiments in geographically
isolated markets can minimize interaction between groups. In other cases,
solutions require more intricate study designs. When complete elimination isn't
feasible, it's crucial to **acknowledge and mitigate** the potential impact of
SUTVA violations on your conclusions.
::: {.callout-note title="Key Takeaway:"}
Understanding and addressing SUTVA is essential for designing experiments and
drawing valid conclusions. By carefully considering the potential for
interference and inconsistency, researchers and practitioners can design more
robust experiments and make more informed decisions based on their findings.
:::
## An example using code
<img src="https://github.com/google/imt/blob/main/man/figures/logo.png?raw=true" align="right" height="138" alt="" />
The [{imt}](https://github.com/google/imt) package provides a convenient way to
randomize while ensuring baseline equivalence. The `imt::randomize` function
iteratively re-randomizes until achieving a specified level of baseline
equivalence (see @sec-baseline) or reaching a maximum number of attempts.
```{r imt::randomizer}
my_data <- tibble::tibble(
x1 = rnorm(10000),
x2 = rnorm(10000)
)
# Randomize
randomized <- imt::randomizer$new(
data = my_data,
seed = 12345,
max_attempts = 1000,
variables = c("x1", "x2"),
standard = "Not Concerned"
)
# Get Randomized Data
randomized$data
# Get Balance Summary
randomized$balance_summary
# Generate Balance Plot
randomized$balance_plot
```
## Stratified Randomization
While random assignment is effective, unforeseen factors can sometimes lead to
imbalanced groups. Stratified randomization addresses this by dividing users
into subgroups (strata) based on relevant characteristics that are believed to
influence the outcome metric. Randomization is then performed within each
stratum, ensuring that both treatment and control groups have a similar
proportion of users from each subgroup.
This approach strengthens experiments by creating balanced groups. For instance,
if user location is expected to affect the outcome metric, users can be
stratified by location (e.g., urban vs. rural), followed by randomization within
each location. This ensures a similar distribution of user attributes across
treatment and control groups, controlling for confounding factors—user traits
that impact both exposure to the new feature and the desired outcome. With
balanced groups, any observed differences in the outcome metric are more likely
due to the new feature itself, leading to more precise and reliable results.
### Examples
- **Targeting Mobile App Engagement:** In an RCT to evaluate a new in-app
notification, user location (urban vs. rural) is suspected to influence user
response. Stratification by location, followed by randomization within each
stratum, can control for this factor.
- **Personalizing a Recommendation Engine:** When A/B testing a revamped
recommendation engine, past purchase history is hypothesized to influence
user response. Stratification by purchase history categories (e.g., frequent
buyers of clothing vs. electronics), followed by randomization within each
category, can account for this.
### Advantages:
- **Reduced bias:** Stratification helps isolate the true effect of the new
feature by controlling for the influence of confounding factors. This leads
to more reliable conclusions about the feature's impact on user behavior.
- **Improved decision-making:** By pinpointing the feature's effect on
specific user groups (e.g., urban vs. rural in the notification example),
stratified randomization can inform decisions about targeted rollouts or
further iterations based on subgroup performance.
### Disadvantages:
- **Increased complexity:** Designing and implementing stratified randomization
requires careful planning to choose the right stratification factors and ensure
enough users within each stratum for valid analysis.
- **Need for larger sample sizes:** Maintaining balance across strata might
necessitate a larger overall sample size compared to simple random assignment.
### Example with code
Let's illustrate the concept of stratified randomization with a practical
example. Consider a scenario where we have data on 10,000 individuals, each
described by two continuous variables (x1 and x2) and two categorical variables
(x3 and x4). We suspect that variable x3 might be a confounding factor
influencing the outcome of our experiment.
```{r stratified}
my_data <- tibble::tibble(
x1 = rnorm(10000),
x2 = rnorm(10000),
x3 = rep(c("A", "B"), 5000),
x4 = rep(c("C", "D"), 5000)
)
# Create a Randomizer Object
randomized <- imt::randomizer$new(
data = my_data,
seed = 12345,
max_attempts = 1000,
variables = c("x1", "x2"),
standard = "Not Concerned",
group_by = "x3"
)
# Generate Balance Plot
randomized$balance_plot
```
In this code, we're using the imt::randomizer function to create an object that
will help us perform stratified randomization. We specify x3 as the variable to
stratify by, ensuring that the treatment and control groups have a balanced
distribution of individuals from both categories of x3.
By incorporating stratified randomization into our experimental design, we can
effectively control for the influence of variable x3, enhancing the internal
validity of our study and allowing for more precise estimates of causal effects.
In conclusion, stratified randomization offers a powerful way to enhance the
rigor and precision of experiments, particularly when dealing with potential
confounding factors. While it may introduce some additional complexity and
potentially require larger sample sizes, the benefits in terms of internal
validity and the ability to draw more nuanced conclusions often outweigh these
drawbacks. The thoughtful use of stratified randomization can be a valuable
asset in the causal inference toolkit.
::: {.callout-tip}
## Learn more
- @chernozhukov2024applied Applied causal inference powered by ML and AI.
:::