01:00
STA 101L - Summer I 2022
Raphael Morsomme
Statistical inference
Five cases
Hypothesis test
Confidence interval
A first glimpse of modern statistics
population parameter: proportion \(p\)
Sample statistics: sample proportion \(\hat{p}\)
Hypothesis testing:
\(H_0:p=p_0\) where \(p_0\) is a number between \(0\) and \(1\)
\(H_a:p\neq p_0\)
Confidence interval: range of plausible values for \(p\).
01:00
What proportion of US adults support legalizing marijuana?
Sample: 900/1500 support it.
\[ \hat{p} = \dfrac{900}{1500}=0.6 \]
\(H_0:p=0.5\)
\(H_a:p\neq 0.5\)
Simulate many samples under \(H_0\).
Determine if the observed data could have plausibly arisen under \(H_0\).
\(\hat{p}=0.6\) is extremely unlikely to happen under \(H_0:p=0.5\).
We therefore reject the null hypothesis that half of the people support legalizing marijuana.
03:00
What if we do not have a clear-cut case?
p-value: probability that the sample statistic of a sample simulated under \(H_0\) is at least as extreme as that of the observed sample.
R
# A tibble: 10,000 x 2
p_hat is_more_extreme
<dbl> <lgl>
1 0.475 TRUE
2 0.485 FALSE
3 0.483 FALSE
4 0.528 TRUE
5 0.51 FALSE
6 0.493 FALSE
7 0.503 FALSE
8 0.5 FALSE
9 0.495 FALSE
10 0.503 FALSE
# ... with 9,990 more rows
In practice, if a p-value is smaller than 0.05 we reject \(H_0\)
\(\alpha = 0.05\) is called the significance level.
If \(p<0.05\), the observed sample is in the top 5% of the most extreme simulated samples.
It is highly unlikely that the observed sample could have arisen if \(H_0\) were true.
the difference \(\hat{p}-p_0\) is statistically significant.
04:00
Sample with repetition from the observed sample to construct many bootstrap samples.
Bootstrap samples \(\Rightarrow\) sampling distribution \(\Rightarrow\) CI
Source: IMS
R
# A tibble: 5 x 1
support
<dbl>
1 0
2 0
3 1
4 1
5 0
03:30
A population divided in two groups.
Population parameter: difference in proportion
\[ p_{diff}=p_1-p_2 \]
Sample statistics: difference in proportion in the sample
\[ \hat{p}_{diff}=\hat{p}_1-\hat{p}_2 \]
\(H_0:p_{diff}=0\) (no difference between the two groups)
\(H_a:p_{diff}\neq0\)
Are individuals who identify as female discriminated against in promotion decisions?
decision |
|||
---|---|---|---|
sex | promoted | not promoted | Total |
male | 21 | 3 | 24 |
female | 14 | 10 | 24 |
Total | 35 | 13 | 48 |
02:00
\(H_0:p_{diff}=0\)
\(H_a:p_{diff}\neq 0\)
Simulate many samples under \(H_0\) (no discrimination)
Determine if the observed data could have plausibly arisen under \(H_0\)
Under \(H_0\), there is no discrimination
\(\Rightarrow\) whether someone receives a promotion is independent of their sexual identification,
\(\Rightarrow\) randomly re-assign the \(35\) promotions independently of the sexual identification.
Source: IMS
decision |
|||
---|---|---|---|
sex | promoted | not promoted | Total |
male | 18 | 6 | 24 |
female | 17 | 7 | 24 |
Total | 35 | 13 | 48 |
Source: IMS
d <- gender_discrimination
d_sim <- d %>% mutate(decision = sample(decision)) # shuffle the promotions
d_sim
# A tibble: 48 x 2
gender decision
<fct> <fct>
1 male not promoted
2 male not promoted
3 male promoted
4 male promoted
5 male promoted
6 male promoted
7 male promoted
8 male not promoted
9 male promoted
10 male not promoted
# ... with 38 more rows
# Setup
results <- tibble(p_diff_hat = numeric())
d <- gender_discrimination
# Simulations
set.seed(0)
for(i in 1 : 1e4){
d_sim <- d %>% mutate(decision = sample(decision)) # simulate under H0
p_diff_hat <- compute_p_diff(d_sim) # test statistic
results <- results %>% add_row(p_diff_hat = p_diff_hat)
}
p-value: probability that the sample statistic of a sample simulated under \(H_0\) is at least as extreme as that of the observed sample.
Using the usual significance level \(\alpha = 0.05\), we reject the null hypothesis
03:00
Same idea as before: sample with repetition from the observed data to construct many bootstrap samples.
Bootstrap samples \(\Rightarrow\) sampling distribution \(\Rightarrow\) CI
Source: IMS
R
sample_observed_m <- gender_discrimination %>% filter(gender == "male" )
sample_observed_f <- gender_discrimination %>% filter(gender == "female")
set.seed(0)
sample_bootstrap(sample_observed_m) # bootstrap sample
# A tibble: 24 x 2
gender decision
<fct> <fct>
1 male promoted
2 male promoted
3 male promoted
4 male promoted
5 male promoted
6 male not promoted
7 male promoted
8 male promoted
9 male promoted
10 male promoted
# ... with 14 more rows
# A tibble: 24 x 2
gender decision
<fct> <fct>
1 female promoted
2 female promoted
3 female promoted
4 female promoted
5 female promoted
6 female not promoted
7 female promoted
8 female not promoted
9 female promoted
10 female promoted
# ... with 14 more rows
results <- tibble(p_diff_hat = numeric())
for(i in 1 : 1e4){
d_boot_m <- sample_bootstrap(sample_observed_m) # bootstrap sample
d_boot_f <- sample_bootstrap(sample_observed_f) # bootstrap sample
p_diff_hat <- compute_p_diff(rbind(d_boot_m, d_boot_f)) # bootstrap statistic
results <- results %>% add_row(p_diff_hat = p_diff_hat)
}
Remember that if we could obtain multiple samples, they’d all be a bit different.
\(\Rightarrow\) the corresponding CIs would also be a bit different
A 90% CI will capture the true value of the parameter 90% of the time.
A 95% CI will be wider and thus more likely to capture the truth (95% of the time).
Source: IMS
03:00
One proportion
HT via simulation
CI via bootstrap
Two proportions
HT via simulation
CI via bootstrap