-
Notifications
You must be signed in to change notification settings - Fork 2
/
blm.qmd
342 lines (275 loc) · 14.1 KB
/
blm.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
---
title: "Bayesian Linear Regression"
share:
permalink: "https://book.martinez.fyi/blm.html"
description: "Business Data Science: What Does it Mean to Be Data-Driven?"
linkedin: true
email: true
mastodon: true
---
While Ordinary Least Squares (OLS) is a popular frequentist method for linear
regression, the Bayesian approach is arguably better suited for informing
business decisions.
OLS aims to find the line that minimizes the sum of squared differences between
observed and predicted values. It treats model parameters as fixed but unknown
quantities, estimating them by minimizing residuals. Inference relies on
hypothesis testing and p-values to assess the significance of relationships.
However, this approach can lead to a rigid focus on statistical significance
rather than practical relevance.
A Bayesian Linear Model, while similar in structure to OLS, views parameters as
random variables with probability distributions reflecting uncertainty. Prior
distributions incorporate existing knowledge or assumptions, and Bayes' theorem
combines this prior information with observed data to estimate the posterior
distribution of parameters. Inference focuses on posterior probabilities to
quantify uncertainty and interpret the strength of evidence.
The Bayesian approach offers several advantages for business decision-making:
- **Incorporating Prior Knowledge:** Bayesian models allow you to explicitly
include prior knowledge or beliefs about the parameters, which can be
valuable in business contexts where historical data or expert opinions
exist.
- **Learning from New Data:** The Bayesian framework naturally shows how new
data updates and refines your understanding of the relationships between
variables.
- **Thinking in Bets:** Instead of relying on the binary and often arbitrary
concept of statistical significance, Bayesian analysis encourages thinking
in terms of probabilities and bets. This aligns well with business
decisions, where you often need to weigh potential risks and rewards.
- **Practical Significance:** While anything can be statistically significant
with a large enough sample size, Bayesian analysis focuses on the magnitude
and probability of effects that are practically meaningful for your business
goals. Even if a result isn't statistically significant, it could still be a
good bet if the posterior probability of a meaningful impact is sufficiently
high.
The Bayesian approach embraces the inherent uncertainty in data analysis,
providing a richer and more nuanced understanding of the relationships between
variables, ultimately leading to more informed and effective business decisions.
## An example with synthetic data:
Imagine that you are faced with a decision: should you discontinue a product?
You would like to keep the product if, and only if, its impact on your outcome
of interest is at least 0.1. To help you make this decision, you've conducted a
well-designed experiment. Let's illustrate this with some synthetic data:
```{r fake_data, message=FALSE}
library(dplyr)
set.seed(9782)
N <- 200
fake_data <- tibble::tibble(
x = rnorm(n = N, mean = 0, sd = 1),
t = sample(x = c(T,F), size = N, replace = T, prob = c(0.5,0.5)),
e = rnorm(n = N, mean = 0, sd = 0.4)
) %>%
mutate(y = 7.1 + 0.6*x + 0.02*t + e) # <1>
```
1. Note that the true impact is 0.02, suggesting that the correct decision
would be to not discontinue the product. However, what happens if you
analyze this data using a traditional frequentist approach?
### Frequentist approach:
```{r OLS}
library(ggplot2)
library(broom)
lm1 <- lm(data = fake_data, formula = y ~ x + t) %>%
tidy(., conf.int=T, conf.level=0.95) %>%
filter(term=="tTRUE")
plot <- ggplot(data = lm1, aes(y=estimate, x= term)) +
geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
geom_hline(yintercept = 0, linetype = "dotted", color = "blue") +
scale_y_continuous(breaks = seq(-0.2, 0.2, by = 0.02)) +
theme_bw(base_size = 18) +
xlab("") +
ylab("Impact") +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
plot
```
In this case, the point estimate is `r round(lm1$estimate, 2)`, the p-value
`r round(lm1$p.value,2)` is greater than 0.05, and the 95% confidence interval
ranges from `r round(lm1$conf.low,2)` to `r round(lm1$conf.high,2)`. How would a
decision-maker typically use this information? Unfortunately, many might decide
to discontinue the product, misinterpreting the results [see
@chandler2020speaking].
::: {.callout-important title="The null ritual, @gigerenzer2004null:"}
1. Set up a statistical null hypothesis of "no mean difference" or "zero
correlation." Don't specify the predictions of your research hypothesis or
of any alternative substantive hypotheses.
2. Use 5% as a convention for rejecting the null. If significant, accept your
research hypothesis.
3. Always performing this procedure.
:::
This problem was so widespread that in 2016, the American Statistical
Association issued a statement cautioning against this practice [see
@wasserstein2016asa]. Confidence intervals are also frequently misinterpreted
[see @hoekstra2014robust].
::: {.callout-important title="Incorrect interpretations:"}
1. The probability that the true mean is greater than 0 is at least 95%.
2. The probability that the true mean equals 0 is smaller than 5%.
3. The “null hypothesis” that the true mean equals 0 is likely to be incorrect.
4. There is a 95% probability that the true mean lies between
`r round(lm1$conf.low,2)` and `r round(lm1$conf.high,2)`.
5. We can be 95% confident that the true mean lies between
`r round(lm1$conf.low,2)` and `r round(lm1$conf.high,2)`.
6. If we were to repeat the experiment over and over, then 95% of the time the
true mean falls between 0.1 and `r round(lm1$conf.high,2)`.
:::
::: {.callout-tip title="Correct interpretations:"}
A particular procedure, when used repeatedly across a series of hyptothetical
data sets, yields intervals that contain the true parameter value 95% of the
cases. The key is that the CIs do not provide a statement about the parameter as
it relates to the particular sample at hand.
:::
This example starkly illustrates the disconnect between what decision-makers
want to say and what a frequentist approach allows them to say. The good news?
Bayesian methods offer a way to answer business questions directly and in plain
language.
### Bayesian approach:
The Bayesian approach to linear regression fundamentally shifts how we interpret
and utilize data in decision-making. Rather than relying on point estimates and
p-values, it focuses on understanding the probability distributions of
parameters, providing a richer, more nuanced picture.
In a Bayesian Linear Model, parameters are viewed as random variables with their
own probability distributions. This perspective allows us to incorporate prior
knowledge into the model: prior distributions reflect existing knowledge or
beliefs about parameters before observing the current data, which can be based
on historical data, expert opinions, or theoretical considerations. The
likelihood represents the probability of the observed data given the parameters,
similar to the frequentist approach. Posterior distributions combine the prior
distribution and the likelihood using Bayes' theorem, reflecting updated beliefs
about the parameters after observing the data. The beauty of the Bayesian
approach lies in its flexibility and adaptability. As new data becomes
available, the posterior distribution from one analysis can serve as the prior
for the next, continually refining our understanding.
Business decisions often leverage historical data and expert judgment, and
Bayesian models explicitly incorporate this information, leading to more
informed and credible inferences. Bayesian analysis naturally adapts to new
information. As fresh data is collected, the model updates its estimates,
providing a dynamic and current understanding of the business environment.
Instead of fixating on binary outcomes (significant vs. non-significant),
Bayesian analysis assesses probabilities, aligning perfectly with the real-world
decision-making process, which is inherently probabilistic and involves weighing
risks and rewards. Bayesian models emphasize the magnitude and probability of
effects that matter in practice. This focus is crucial in business, where even
small but reliable improvements can have substantial impacts.
The {im} package fits a Bayesian linear model using [weakly informative
priors](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations) for
the covariates and allows the user to set more informative priors for the impact
of the intervention. If $y$ is the outcome of interest, the model is specified
as follows:
$$
\begin{aligned}
y & \sim N(\mu, \sigma) \\
\mu &= \alpha + X^\star\beta + \color{red}{\eta} t
\end{aligned}
$$
We standardize the data as follows:
$$
\begin{aligned}
y^\star & = \frac{y - \mu_y}{\sigma_y} \\
& \sim N(\mu^\star, \sigma^\star) \\
\mu^\star & = \alpha^\star + \frac{X - \mu_X}{\sigma_X} \beta^\star +
\eta^\star t \\
\alpha^\star & \sim N(0,1) \\
\beta^\star & \sim N(0,1) \\
\color{red}{\eta^\star} & \color{red}{\sim N(\mu_\eta, \sigma_\eta)} \\
\sigma^\star & \sim N^+(0,1) \\
\end{aligned}
$$
Therefore
$$
\begin{aligned}
\frac{y - \mu_y}{\sigma_y} & = \alpha^\star +
\frac{X - \mu_X}{\sigma_X} \beta^\star + \eta^\star t \\
y & = (\alpha^\star +
\frac{X - \mu_X}{\sigma_X} \beta^\star + \eta^\star t) \sigma_y + \mu_y \\
\color{red}\eta = \eta^\star \sigma_y
\end{aligned}
$$
Notice that if you have better priors, you should use them. To use this simple
model, you just need to run the following code:
```{r blm, message=FALSE, results = "hide"}
library(im)
fitted_blm <- blm$new(
y = "y",
x = c("x"),
treatment = "t",
data = fake_data,
eta_mean = 0,
eta_sd = 0.5
)
```
It is always a good idea to look at the traceplot. A traceplot is a diagnostic
tool used to visualize the "path" that a Markov Chain Monte Carlo (MCMC)
sampler takes as it explores the parameter space. It helps assess the
convergence and mixing of the chains, which is crucial for ensuring
reliable inference from the model.
```{r tracePlot}
fitted_blm$tracePlot()
```
1. **Assessing Convergence:**
A well-converged chain should exhibit a "hairy caterpillar" pattern,
where the trace fluctuates around a stable value without any trends or drifts.
This indicates that the sampler has adequately explored the parameter space
and reached a stationary distribution.
Conversely, non-converging chains might show trends, jumps, or slow mixing,
suggesting that the sampler is stuck in a local region or hasn't adequately
explored the posterior distribution. Inferences drawn from such chains can be
unreliable and misleading.
2. **Diagnosing Mixing:**
Good mixing implies that the chains effectively explore the entire parameter
space and don't get stuck in local regions. This is visually represented by
well-intertwined lines from different chains on the traceplot.
Poorly mixed chains show distinct separation among lines, indicating they
haven't adequately explored the entire posterior distribution.
This can lead to biased and inaccurate estimates of the parameters and
their uncertainty.
3. **Identifying Issues:**
Traceplots can reveal potential issues in the model specification, priors,
or MCMC settings. For example, highly correlated parameters might exhibit
synchronized movement in the traceplot, suggesting a dependence relationship
that needs further investigation.
Overall, examining traceplots is a valuable diagnostic step in Bayesian
statistical analysis. They provide valuable insights into the convergence
and mixing of MCMC chains, aiding in the valid and reliable interpretation
of the model results.
It is prudent to verify that our model's data generating process is compatible
with the data used to fit the model. To do this, we can compare the kernel
density of draws from the posterior distribution to the density of our data.
```{r ppcDensOverlay}
fitted_blm$ppcDensOverlay(n = 50)
```
The next step is to use the fitted Bayesian model to answer our business
question directly. In this example, we want to determine the likelihood that the
product's impact is at least $0.01$. We can calculate this probability with a
single line of code:
```{r posterior}
fitted_blm$posteriorProb(threshold = 0.01)
```
With this information, we can make a much more informed decision about whether
to keep the product than if we were merely assessing the rejection of a null
hypothesis. Moreover, we may care about multiple thresholds for this decision.
For instance, if the impact exceeds $0.2$, we might consider doubling our
investment.
::: {.content-visible when-format="html"}
The {im} package enables the creation of interactive visualizations that
effectively demonstrate our data insights and summarize the risks associated
with various decisions.
```{r}
#| eval: !expr knitr::is_html_output()
fitted_blm$vizdraws(breaks = c(0.01, 0.2),
break_names = c("Discontinue", "Keep", "Double down"),
display_mode_name = TRUE)
```
The plot generated by this code not only answers our business question directly
but also illustrates how much we have learned from the data and how our initial
priors have evolved. This comprehensive view is crucial for making better
business decisions.
:::
Bayesian analysis provides probabilities directly aligned with decision-making
needs. For example, if the probability that the product's impact exceeds $0.01$
is low, we can confidently discontinue it. Conversely, if there's a reasonable
probability of a positive impact, we might decide to retain the product,
potentially conducting further investigations or collecting more data.
In conclusion, the Bayesian approach offers a powerful, flexible, and intuitive
framework for business decision-making. By focusing on probabilities and
incorporating prior knowledge, it provides a clearer and more practical basis
for making informed decisions in an uncertain world. This methodology enhances
our ability to navigate uncertainty, ultimately leading to more effective and
strategic business outcomes.