patient | group | 30 days | 365 days |
---|---|---|---|
1 | treatment | no event | no event |
2 | treatment | no event | no event |
3 | control | no event | no event |
4 | control | no event | no event |
5 | control | no event | no event |
STA 101L - Summer I 2022
Raphael Morsomme
Stents are known to reduce the risk of an additional heart attack or death after a cardiac event.
We have an experiment with 451 at-risk patients:
patient | group | 30 days | 365 days |
---|---|---|---|
1 | treatment | no event | no event |
2 | treatment | no event | no event |
3 | control | no event | no event |
4 | control | no event | no event |
5 | control | no event | no event |
30 days |
365 days |
|||
---|---|---|---|---|
Group | Stroke | No event | Stroke | No event |
Control | 13 | 214 | 28 | 199 |
Treatment | 33 | 191 | 45 | 179 |
Total | 46 | 405 | 73 | 378 |
Contrary to expectation, we observe more strokes in the treatment group
This type of questions is central in statistics.
Suppose I flip a coin \(100\) times and count the number of times I obtain heads.
05:00
patient | group | 30 days | 365 days |
---|---|---|---|
1 | treatment | stroke | stroke |
2 | treatment | no event | no event |
3 | treatment | no event | no event |
Observational units: individuals, families, student cohort, cities, counties, countries, cells (biology), animals, books, courses, apples
Variables: height, weight, age, size, year, latitude, longitude, type, sex, diet, number of pages, genre, level, color
We are typically interested in the relation between variables in some population.
The population of interest is often large, but with well-defined limits
There are two ways to learn about the relation between variables in a given population.
ideal
…but typically impractical, expensive
07:00
When you make soup, there is no need to drink the whole pot (population) to know if the it is seasoned enough.
03:00
Are all samples created equal? No!
What can go wrong?
Sampling is an art.
The gold standard is a random sample
🛑 Obtaining a representative sample is difficult.
✅ But surprisingly small representative samples can do the job!
Numerical variables are either
Numerical variables are either
Source: IMS
05:00
Two variables can either be independent or associated.
If two variables are associated, the association can be
02:00
Why are most of the shaded counties in the middle of the country?
04:00
Source: Bayesian Data Analysis
Source: Bayesian Data Analysis
Source: Bayesian Data Analysis
🛑 we cannot always use experiments:
✅ But when experiments can be implement, they lead to causal claims and are therefore the gold standard.
🛑 Does not easily lead to causal claims due to the potential presence of counfounding variables
Source: IMS
…but they can lead to causal claims in certain cases!
06:00