`03:00`

STA 101L - Summer I 2022

Raphael Morsomme

Tuesday: lecture + QA

Wednesday: work on project (online OH)

Thursday: work on project (online OH)

Friday: presentations

HT via simulation

CI via bootstrap

5 cases

one proportion

two proportions

one mean

two means

linear regression

- Normal approximation
- Classical approach to statistical inference
- Standard error
- Case 6 – many proportions (\(\chi^2\) test)
- Case 7 – many means (ANOVA)

`03:00`

\(\Rightarrow\) unimodal, symmetric, thin tails – bell-shaped

Source: IMS

Source: IMS

The normal distribution describes the variability of the different statistics

\(\hat{p}\), \(\bar{x}\), \(\hat{\beta}\)

simply look at all the histograms we have constructed from simulated samples (HT) and bootstrap samples (CI)!

**Classical approach**: instead of simulating the sampling distribution via simulation (HT) or bootstrapping (CI), we approximate it with a normal distribution.

We have seen that if a numerical variable \(X\) is normally distributed

\[ X\sim N(\mu, \sigma^2) \]

then the sample average is also normally distributed

\[ \bar{x} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]

In practice, we cannot assume that the variable \(X\) is *exactly* normally distributed.

But as long as

the sample is large, or

the variable is

*approximately*normal: unimodal, roughly symmetric and no serious outlier

\(\bar{x}\) is well approximated by a normal distribution

\[ \bar{x} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \]

See the numerous histograms for case 3 (one mean) where the distribution of \(\bar{x}\) always looks pretty normal.

If

the observations are independent – the

**independence**condition\(p\) is not extreme and \(n\) is not small \((pn\ge 10 \text{ and } (1-p)n\ge 10)\) – the

**success-failure**condition

the distribution of \(\hat{p}\) can be approximated by a normal distribution

\[ \hat{p} \sim N\left(p, \frac{p(1-p)}{n}\right) \]

Step 1: we are interested in the distribution of the statistic under \(H_0\).

**Modern approach**:*simulate*from this distribution**Classical approach**:*approximate*this distribution with a normal distribution

Step 2: we want to compute the p-value

**Modern approach**: the p-value is the proportion of simulations with a statistic at least as extreme as that of the observed sample**Classical approach**: the p-value is the*area under the curve*of the normal distribution that is at least as extreme as the observed statistic.

`R`

does`R`

will compute the p-value for you. Here is what `R`

does behind the scene:

Step 2: identify the upper and lower bounds of the CI

**Modern approach**: find the appropriate percentiles among the simulated values**Classical approach**:find the appropriate percentiles of the normal approximation

`R`

does`R`

will compute the upper and lower bounds for you. Here is what `R`

does behind the scene:

```
n <- 1500 # sample size
x <- 780 # number of successes
prop.test(
x, n, # observed data
p = 0.5, # value in the null hypothesis
conf.level = 0.99 # confidence level for CI
)
```

```
1-sample proportions test with continuity correction
data: x out of n, null probability 0.5
X-squared = 2.3207, df = 1, p-value = 0.1277
alternative hypothesis: true p is not equal to 0.5
99 percent confidence interval:
0.4864251 0.5533970
sample estimates:
p
0.52
```

The simulation-based HT yielded a p-value of 0.127.

**Conditions**: independence, success-failure condition

`05:00`

Consider the gender discrimination study.

```
n_m <- 24; n_f <- 24 # sample sizes
x_m <- 14; x_f <- 21 # numbers of promotions
prop.test(c(x_m, x_f), c(n_m, n_f))
```

```
2-sample test for equality of proportions with continuity correction
data: c(x_m, x_f) out of c(n_m, n_f)
X-squared = 3.7978, df = 1, p-value = 0.05132
alternative hypothesis: two.sided
95 percent confidence interval:
-0.57084188 -0.01249145
sample estimates:
prop 1 prop 2
0.5833333 0.8750000
```

Independence within groups (same as case 1)

Independence between groups

Success-failure condition for each group (10 successes and 10 failures in each group)

Using the simulation-based HT, we found a p-value of 0.0435.

`06:00`

Independence

Normality – can be relaxed for larger samples \((n\ge30)\)

`03:00`

There are two implementation; which one is more convenient depends on the structure of the data.

```
Welch Two Sample t-test
data: hwy by year
t = -0.032864, df = 231.64, p-value = 0.9738
alternative hypothesis: true difference in means between group 1999 and group 2008 is not equal to 0
95 percent confidence interval:
-1.562854 1.511572
sample estimates:
mean in group 1999 mean in group 2008
23.42735 23.45299
```

```
Welch Two Sample t-test
data: d$cty and d$hwy
t = -13.755, df = 421.79, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-7.521683 -5.640710
sample estimates:
mean of x mean of y
16.85897 23.44017
```

Independence within groups

Independence between groups

Normality in each group (same as case 3 – one mean)

`01:00`

```
Paired t-test
data: d$cty and d$hwy
t = -44.492, df = 233, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-6.872628 -6.289765
sample estimates:
mean of the differences
-6.581197
```

Paired observations

Independence between pairs

Normality

`01:00`

```
d <- heart_transplant %>% mutate(survived_binary = survived == "alive")
m <- glm(survived_binary ~ age + transplant, family = "binomial", data = d)
tidy(m)
```

```
# A tibble: 3 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.973 1.08 0.904 0.366
2 age -0.0763 0.0255 -2.99 0.00277
3 transplanttreatment 1.82 0.668 2.73 0.00635
```

Linearity

Independence

Normality

Equal variability (homoskedasticity)

\(\Rightarrow\) verify with a residual plot!

`05:00`

**Standard error (SE)**: standard deviation of the normal approximation.

The SE measures the variability of the statistic.

\(SE(\hat{p})=\sqrt{\frac{p(1-p)}{n}}\)

\(SE(\hat{p}_{diff})=\sqrt{\frac{p_1(1-p_1)}{n_1}+\frac{p_2(1-p_2)}{n_2}}\)

\(SE(\bar{x}) = \sqrt{\frac{\sigma^2}{n}}\)

\(SE(\bar{x}_{diff}) = \sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}\)

\(SE(\hat{\beta})\) has a complicated form.

`02:00`

```
ask <- openintro::ask %>%
mutate(
response = if_else(response == "disclose", "Disclose problem", "Hide problem"),
question_class = case_when(
question_class == "general" ~ "General",
question_class == "neg_assumption" ~ "Negative assumption",
question_class == "pos_assumption" ~ "Positive assumption"
),
question_class = fct_relevel(question_class, "General", "Positive assumption", "Negative assumption")
)
```

Question | Disclose problem | Hide problem | Total |
---|---|---|---|

General | 2 | 71 | 73 |

Positive assumption | 23 | 50 | 73 |

Negative assumption | 36 | 37 | 73 |

Total | 61 | 158 | 219 |

Source: IMS

\(H_0\): the response is independent of the question asked

\(H_a\): the response depends on the question asked

We will not quantify the differences between the three question with CIs.

Disclose problem |
Hide problem |
Total |
|||
---|---|---|---|---|---|

General | 2 | (20.33) | 71 | (52.67) | 73 |

Positive assumption | 23 | (20.33) | 50 | (52.67) | 73 |

Negative assumption | 36 | (20.33) | 37 | (52.67) | 73 |

Total | 61 | NA | 158 | NA | 219 |

Source: IMS

Is the difference between the *expected* and *observed* counts is due to

chance alone, or

the fact that the way people responded depended on the question asked?

\(\chi^2\) (“Kai-squared”) statistic:

\[ \chi^2 = \dfrac{(O_{11} - E_{11})^2}{E_{11}} + \dfrac{(O_{21} - E_{21})^2}{E_{21}} + \dots + \dfrac{(O_{32} - E_{32})^2}{E_{32}} \]

\[ \begin{aligned} &\text{General formula} && \frac{(\text{observed count } - \text{expected count})^2} {\text{expected count}} \\ &\text{Row 1, Col 1} && \frac{(2 - 20.33)^2}{20.33} = 16.53 \\ &\text{Row 2, Col 1} && \frac{(23 - 20.33)^2}{20.33} = 0.35 \\ & \hspace{9mm}\vdots && \hspace{13mm}\vdots \\ &\text{Row 3, Col 2} && \frac{(37 - 52.67)^2}{52.67} = 4.66 \end{aligned} \]

\[\chi^2 = 16.53 + 0.35 + \dots + 4.66 = 40.13\]

Source: IMS

When the conditions of

independence

\(>5\) expected counts per cell

are satisfied, the \(\chi^2\) statistic approximately follows a \(\chi^2\) distribution.

Source: IMS

`R`

```
# A tibble: 6 x 3
question_class question response
<fct> <chr> <chr>
1 General What can you tell me about it? Hide problem
2 Positive assumption It doesn't have any problems, does it? Hide problem
3 Positive assumption It doesn't have any problems, does it? Disclose problem
4 Negative assumption What problems does it have? Disclose problem
5 General What can you tell me about it? Hide problem
6 Negative assumption What problems does it have? Disclose problem
```

```
Pearson's Chi-squared test
data: ask$response and ask$question_class
X-squared = 40.128, df = 2, p-value = 0.000000001933
```

`01:30`

When the conditions are not met, you need to conduct a HT via *simulation*.

- Simulate artificial samples under \(H_0\) by shuffling the response variable
- Compute the \(\chi^2\) statistic of each simulated sample
- Determine how extreme the \(\chi^2\) statistic of the observed sample is by computing a p-value

See Section 18.1 for an example.

Source: IMS

\(H_0: \mu_{OF} = \mu_{IF} = \mu_{C}\): (the batting performance is the same across all three positions)

\(H_a\): at least one mean is different

We will not quantify the differences between the three positions with CIs.

`R`

Independence within

Independence between

Normality (sample size and outliers)

Constant variance

Verify assumptions 3 and 4 with side-sby-side histograms

`03:00`

When the conditions are not met, you need to conduct a HT via *simulation*.

- Simulate artificial samples under \(H_0\) by shuffling the response variable
- Compute the \(F\) statistic of each simulated sample (see Section 22.2)
- Determine how extreme the \(F\) statistic of the observed sample is by computing a p-value

See Section 22.2 for an example.

- Normal approximation
- Classical approach to statistical inference
- Standard error
- Case 6 – many proportions (\(\chi^2\) test)
- Case 7 – many means (ANOVA)