d <- palmerpenguins::penguinsHW 4 - Multiple linear regression
Due Thursday, May 26, 9:00pm on Gradescope
This assignment needs to be completed with RMarkdown and submitted as a PDF on Gradescope. Feel free to re-use the template provided for HW1.
When submitting your work on Gradescope, please assign a page for each question.
Problem set and applied exercises (20 points)
- 8.6 (2 points)
- 8.10 (11 points)
Read in the data in
Rwith the following command. How many observations and variables are there?Identify the type of each variable.
Fit the model in
Rusing thelmcommand. You should obtain the same estimates as in the book.Identify the baseline level of the categorical predictors.
Do parts a-d.
Compute \(R^2\) by hand (do not use
glance). Hint: compute SSR and SST.
- 8.12 (1 points)
- 8.14 (3 points)
Fit the candidate model in
Rand compute its adjusted-\(R^2\) value. You should obtain the same value as in the book.d <- palmerpenguins::penguinsDo exercise 8.14.
- 25.10 (3 points)
Lab exercises - Grading the professor (15 points)
You can find the lab here
- Exercise 1 (2 points)
- Exercise 2 (2 points)
- Exercise 3 (1 points)
Skip exercises 4, 5, 6, 7, 8
- Exercise 9 (2 points)
- Exercise 10 (2 points)
Skip exercises 11, 12
Exercise 13 (1 point)
Exercise 14 (5 point)
Drop one variable at a time and peek at the adjusted \(R^2\). Removing which variable increases adjusted \(R^2\) the most?
Skip the rest of the question.
Skip the remaining exercises.
Function, for-loop and cross-validation in R (15 points)
Function – Write a function in
Rthat computes the average \(\bar{x}\) of numerical variable from scratch (do not use the commandmean). Your function should be calledmy_mean, should take a vector of numbers as input and should output its average. Show that your function works by applying in on the variablectyof theggplot2::mpgdata. You should obtain the same value as the commandmean. (5 points)For-loop – Write a for-loop that computes the first 20 Fibonacci numbers. The first few Fibonacci numbers are \(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …\). These numbers follow the following rule: each is the sum of the previous two; that is,
\[ x_n = x_{n-1}+x_{n-2}, \]
and the first two numbers are \(x_1=0\) and \(x_2=1\). For instance, the third number is \(1 = x_3 = x_2 + x_1 = 1 + 0\) and the seventh number is \(8 = x_7 = x_6 + x_5 = 5 + 3\) (5 points).
Cross-validation – Use the code from lecture to implement \(5\)-fold cross-validation on the prediction project data to compare the simple regression model with
gestation_weekas the predictor and the full model with all 15 raw predictors. (5 points)