Practice Problems 1.2 Key

Here are some practice problems to explore the penguins data set. First we need to load the penguin data set.

library("palmerpenguins")

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
  1. What are the mean and standard deviation of bill length of the penguins?
# create a variable that contains only the column for bill length

billLength<-penguins$bill_length_mm


# mean 

mean(billLength, na.rm=TRUE)
[1] 43.92193
# standard deviation

sd(billLength, na.rm=TRUE)
[1] 5.459584
  1. What are the mean and standard deviation of the body mass of the penguins?
# create a variable that contains only the column for body mass

bodyMass<-penguins$body_mass_g

# mean

mean(bodyMass, na.rm=TRUE)
[1] 4201.754
# standard deviation

sd(bodyMass, na.rm = TRUE)
[1] 801.9545
  1. What is the mean and median flipper length of the penguins?
# create a variable that contains only the column for flipper length

flipperLength<-penguins$flipper_length_mm

# mean

mean(flipperLength, na.rm=TRUE)
[1] 200.9152
# median

median(flipperLength, na.rm=TRUE)
[1] 197
  1. How long are the largest flippers in this data set? How long is the shortest? (Hint: google how to find the minimum and maximum values in a vector). You can use the same variable that you created in 3!
# longest flipper length

max(flipperLength, na.rm=TRUE)
[1] 231
# shortest flipper length

min(flipperLength, na.rm=TRUE)
[1] 172
  1. What is the range of bill depths of penguins? (Range is the maximum value - minimum value)
billDepth<-penguins$bill_depth_mm

# deepest bill depth

deepest<-max(billDepth, na.rm=TRUE)


# shallowest bill depth

shallowest<-min(billDepth, na.rm=TRUE)

# range of bill depth

range<-deepest - shallowest

range
[1] 8.4
  1. How many species of penguins are in this data set (hint: there is a function to find distinct values in a vector)? List them using comments.
# get unique values
species<-unique(penguins$species)

# count them
length(species)
[1] 3
  1. Find the value for the longest bill in the data set. Assign it to the variable longestBill.
longestBill<-max(penguins$bill_length_mm, na.rm=TRUE)

longestBill
[1] 59.6
  1. Calculate the variance of body mass. (Hint: google variance in R)
var(penguins$body_mass_g, na.rm=TRUE)
[1] 643131.1
  1. Calculate the variance of flipper length.
var(penguins$flipper_length_mm, na.rm=TRUE)
[1] 197.7318
  1. Calculate and compare the standard deviation of bill length and bill depth. Which has a larger standard deviation?
# standard deviation of bill length

sd(penguins$bill_length_mm, na.rm=TRUE)
[1] 5.459584
# standard deviation of bill depth

sd(penguins$bill_depth_mm, na.rm=TRUE)
[1] 1.974793
# Which has a larger standard deviation? 

#Answer: the standard deviation for bill length is larger.
  1. Challenge: calculate the mean and standard deviation of bill length for gentoo penguins
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
gentoo<-filter(.data=penguins, species=="Gentoo")

mean(gentoo$bill_length_mm, na.rm=TRUE)
[1] 47.50488
sd(gentoo$bill_length_mm, na.rm=TRUE)
[1] 3.081857