HW 4 - Multiple linear regression

Due Thursday, May 26, 9:00pm on Gradescope

This assignment needs to be completed with RMarkdown and submitted as a PDF on Gradescope. Feel free to re-use the template provided for HW1.

When submitting your work on Gradescope, please assign a page for each question.

Problem set and applied exercises (20 points)

  • 8.6 (2 points)
  • 8.10 (11 points)
    1. Read in the data in R with the following command. How many observations and variables are there?

      d <- palmerpenguins::penguins
    2. Identify the type of each variable.

    3. Fit the model in R using the lm command. You should obtain the same estimates as in the book.

    4. Identify the baseline level of the categorical predictors.

    5. Do parts a-d.

    6. Compute \(R^2\) by hand (do not use glance). Hint: compute SSR and SST.

  • 8.12 (1 points)
  • 8.14 (3 points)
    • Fit the candidate model in R and compute its adjusted-\(R^2\) value. You should obtain the same value as in the book.

      d <- palmerpenguins::penguins
    • Do exercise 8.14.

  • 25.10 (3 points)

Lab exercises - Grading the professor (15 points)

You can find the lab here

  • Exercise 1 (2 points)
  • Exercise 2 (2 points)
  • Exercise 3 (1 points)

Skip exercises 4, 5, 6, 7, 8

  • Exercise 9 (2 points)
  • Exercise 10 (2 points)

Skip exercises 11, 12

  • Exercise 13 (1 point)

  • Exercise 14 (5 point)

    • Drop one variable at a time and peek at the adjusted \(R^2\). Removing which variable increases adjusted \(R^2\) the most?

    • Skip the rest of the question.

Skip the remaining exercises.

Function, for-loop and cross-validation in R (15 points)

  • Function – Write a function in R that computes the average \(\bar{x}\) of numerical variable from scratch (do not use the command mean). Your function should be called my_mean, should take a vector of numbers as input and should output its average. Show that your function works by applying in on the variable cty of the ggplot2::mpg data. You should obtain the same value as the command mean. (5 points)

  • For-loop – Write a for-loop that computes the first 20 Fibonacci numbers. The first few Fibonacci numbers are \(0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, …\). These numbers follow the following rule: each is the sum of the previous two; that is,

    \[ x_n = x_{n-1}+x_{n-2}, \]

    and the first two numbers are \(x_1=0\) and \(x_2=1\). For instance, the third number is \(1 = x_3 = x_2 + x_1 = 1 + 0\) and the seventh number is \(8 = x_7 = x_6 + x_5 = 5 + 3\) (5 points).

  • Cross-validation – Use the code from lecture to implement \(5\)-fold cross-validation on the prediction project data to compare the simple regression model with gestation_week as the predictor and the full model with all 15 raw predictors. (5 points)