patient | group | 30 days | 365 days |
---|---|---|---|
1 | treatment | no event | no event |
2 | treatment | no event | no event |
3 | control | no event | no event |
4 | control | no event | no event |
5 | control | no event | no event |
STA 101L - Summer I 2022
Raphael Morsomme
Stents are known to reduce the risk of an additional heart attack or death after a cardiac event.
We have an experiment with 451 at-risk patients:
patient | group | 30 days | 365 days |
---|---|---|---|
1 | treatment | no event | no event |
2 | treatment | no event | no event |
3 | control | no event | no event |
4 | control | no event | no event |
5 | control | no event | no event |
30 days |
365 days |
|||
---|---|---|---|---|
Group | Stroke | No event | Stroke | No event |
Control | 13 | 214 | 28 | 199 |
Treatment | 33 | 191 | 45 | 179 |
Total | 46 | 405 | 73 | 378 |
Contrary to expectation, we observe more strokes in the treatment group
This type of questions is central in statistics.
Suppose I flip a coin \(100\) times and count the number of times I obtain heads.
Group exercise - gut feeling about randomness
05:00
patient | group | 30 days | 365 days |
---|---|---|---|
1 | treatment | stroke | stroke |
2 | treatment | no event | no event |
3 | treatment | no event | no event |
Observational units: individuals, families, student cohort, cities, counties, countries, cells (biology), animals, books, courses, apples
Variables: height, weight, age, size, year, latitude, longitude, type, sex, diet, number of pages, genre, level, color
We are typically interested in the relation between variables in some population.
The population of interest is often large, but with well-defined limits
There are two ways to learn about the relation between variables in a given population.
ideal
…but typically impractical, expensive
Group exercise - observation and variables
07:00
When you make soup, there is no need to drink the whole pot (population) to know if the it is seasoned enough.
Group exercise - sampling
Back to the study on the effect of diet on sleep among Duke students. How would you obtain a sample of student for your study if you had (i) 1 hour, (ii) 1 week to collect your data?
03:00
Are all samples created equal? No!
What can go wrong?
Sampling is an art.
The gold standard is a random sample
🛑 Obtaining a representative sample is difficult.
✅ But surprisingly small representative samples can do the job!
Warning
Not all numbers are numerical variables, e.g. zip code, phone number.
Heuristic: is the average meaningful? Yes!
Numerical variables are either
Warning
Some numbers are categorical variables, e.g. zip code, phone number.
Heuristic: is the average meaningful? No!
Numerical variables are either
Breakdown of variables into their respective types.
Source: IMS
Group exercise - types of variables
05:00
Two variables can either be independent or associated.
If two variables are associated, the association can be
Group exercise - types of associations
Provide two numerical variables which you expect to be
02:00
Why are most of the shaded counties in the middle of the country?
04:00
Source: Bayesian Data Analysis
Source: Bayesian Data Analysis
Source: Bayesian Data Analysis
🛑 we cannot always use experiments:
✅ But when experiments can be implement, they lead to causal claims and are therefore the gold standard.
🛑 Does not easily lead to causal claims due to the potential presence of counfounding variables
Source: IMS
…but they can lead to causal claims in certain cases!
Group exercise - experiment and observational study
You want to investigate the effect of caffeine on class participation among Duke students
Provide an example of an observational study that you would not turn into an experiment due to:
Exercise 2.12
06:00