<- palmerpenguins::penguins d
HW 4 - Multiple linear regression
Due Thursday, May 26, 9:00pm on Gradescope
This assignment needs to be completed with RMarkdown and submitted as a PDF on Gradescope. Feel free to re-use the template provided for HW1.
When submitting your work on Gradescope, please assign a page for each question.
Problem set and applied exercises (20 points)
- 8.6 (2 points)
- 8.10 (11 points)
Read in the data in
R
with the following command. How many observations and variables are there?Identify the type of each variable.
Fit the model in
R
using thelm
command. You should obtain the same estimates as in the book.Identify the baseline level of the categorical predictors.
Do parts a-d.
Compute \(R^2\) by hand (do not use
glance
). Hint: compute SSR and SST.
- 8.12 (1 points)
- 8.14 (3 points)
Fit the candidate model in
R
and compute its adjusted-\(R^2\) value. You should obtain the same value as in the book.<- palmerpenguins::penguins d
Do exercise 8.14.
- 25.10 (3 points)
Lab exercises - Grading the professor (15 points)
You can find the lab here
- Exercise 1 (2 points)
- Exercise 2 (2 points)
- Exercise 3 (1 points)
Skip exercises 4, 5, 6, 7, 8
- Exercise 9 (2 points)
- Exercise 10 (2 points)
Skip exercises 11, 12
Exercise 13 (1 point)
Exercise 14 (5 point)
Drop one variable at a time and peek at the adjusted \(R^2\). Removing which variable increases adjusted \(R^2\) the most?
Skip the rest of the question.
Skip the remaining exercises.
Function, for-loop and cross-validation in R
(15 points)
Function – Write a function in
R
that computes the average \(\bar{x}\) of numerical variable from scratch (do not use the commandmean
). Your function should be calledmy_mean
, should take a vector of numbers as input and should output its average. Show that your function works by applying in on the variablecty
of theggplot2::mpg
data. You should obtain the same value as the commandmean
. (5 points)For-loop – Write a for-loop that computes the first 20 Fibonacci numbers. The first few Fibonacci numbers are \(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …\). These numbers follow the following rule: each is the sum of the previous two; that is,
\[ x_n = x_{n-1}+x_{n-2}, \]
and the first two numbers are \(x_1=0\) and \(x_2=1\). For instance, the third number is \(1 = x_3 = x_2 + x_1 = 1 + 0\) and the seventh number is \(8 = x_7 = x_6 + x_5 = 5 + 3\) (5 points).
Cross-validation – Use the code from lecture to implement \(5\)-fold cross-validation on the prediction project data to compare the simple regression model with
gestation_week
as the predictor and the full model with all 15 raw predictors. (5 points)