Practice Problems 1.3 Key

First we need to load the penguin data set, just like last week. The dataset will be called penguins This data was collected by real scientists! Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

library(palmerpenguins)

penguins
# A tibble: 344 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           NA            NA                  NA          NA
 5 Adelie  Torgersen           36.7          19.3               193        3450
 6 Adelie  Torgersen           39.3          20.6               190        3650
 7 Adelie  Torgersen           38.9          17.8               181        3625
 8 Adelie  Torgersen           39.2          19.6               195        4675
 9 Adelie  Torgersen           34.1          18.1               193        3475
10 Adelie  Torgersen           42            20.2               190        4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
library(tidyverse) # to make tidyverse commands available 
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
  1. Create a vector that is the subset of the data with only penguins that live on Torgersen. How many penguins is this?
penguins <- na.omit(penguins)

torg <- filter(penguins, island == "Torgersen")
torg
# A tibble: 47 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 37 more rows
# ℹ 2 more variables: sex <fct>, year <int>
  1. Of the penguins that live on Torgersen, how many have flippers shorter than 190mm?
torgShort <- filter(torg, flipper_length_mm < 190)
torgShort
# A tibble: 16 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           38.9          17.8               181        3625
 4 Adelie  Torgersen           41.1          17.6               182        3200
 5 Adelie  Torgersen           36.6          17.8               185        3700
 6 Adelie  Torgersen           34.4          18.4               184        3325
 7 Adelie  Torgersen           37.2          19.4               184        3900
 8 Adelie  Torgersen           36.2          16.1               187        3550
 9 Adelie  Torgersen           34.6          17.2               189        3200
10 Adelie  Torgersen           36.7          18.8               187        3800
11 Adelie  Torgersen           38.6          17                 188        2900
12 Adelie  Torgersen           35.7          17                 189        3350
13 Adelie  Torgersen           41.1          18.6               189        3325
14 Adelie  Torgersen           36.2          17.2               187        3150
15 Adelie  Torgersen           40.2          17                 176        3450
16 Adelie  Torgersen           35.2          15.9               186        3050
# ℹ 2 more variables: sex <fct>, year <int>
  1. Of the penguins that live on Torgersen, what percentage are female?
torgFemalePerc <- torg %>% 
  group_by(sex) %>% 
  summarise(percent=n()/nrow(torg))

torgFemalePerc
# A tibble: 2 × 2
  sex    percent
  <fct>    <dbl>
1 female   0.511
2 male     0.489

There are three different species of penguins in this dataset. We can see from the photo below that they may have different body dimensions.

  1. What is the mean and standard deviation of body mass for each penguin species? (Hint: use group_by/summarize)
massSummary <- penguins %>% 
  group_by(species) %>% 
  summarize(avgMass=mean(body_mass_g), sdMass=sd(body_mass_g))

massSummary
# A tibble: 3 × 3
  species   avgMass sdMass
  <fct>       <dbl>  <dbl>
1 Adelie      3706.   459.
2 Chinstrap   3733.   384.
3 Gentoo      5092.   501.
  1. What is the mean and standard deviation of bill length for each penguin species?
lengthSummary <- penguins %>% 
  group_by(species) %>% 
  summarize(avgLength=mean(bill_length_mm), sdLength=sd(bill_length_mm))

lengthSummary
# A tibble: 3 × 3
  species   avgLength sdLength
  <fct>         <dbl>    <dbl>
1 Adelie         38.8     2.66
2 Chinstrap      48.8     3.34
3 Gentoo         47.6     3.11

The penguins live on different islands. The islands are different sizes and located in different locations within the Palmer Archipelago. This could affect the avaibility of prey, habitat, etc.

6. Do the Adelie penguins living on Torgersen Island have a different mean body mass than the Adelie penguins living on Biscoe?

# one way is using filters and later calulating mass

# first create vectors that are 1. only Adelie on Torgersen and 2. only Adelie on Biscoe
adelieTorgersen <- filter(penguins, island=="Torgersen" & species=="Adelie")
adelieTorgersen
# A tibble: 47 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.1          18.7               181        3750
 2 Adelie  Torgersen           39.5          17.4               186        3800
 3 Adelie  Torgersen           40.3          18                 195        3250
 4 Adelie  Torgersen           36.7          19.3               193        3450
 5 Adelie  Torgersen           39.3          20.6               190        3650
 6 Adelie  Torgersen           38.9          17.8               181        3625
 7 Adelie  Torgersen           39.2          19.6               195        4675
 8 Adelie  Torgersen           41.1          17.6               182        3200
 9 Adelie  Torgersen           38.6          21.2               191        3800
10 Adelie  Torgersen           34.6          21.1               198        4400
# ℹ 37 more rows
# ℹ 2 more variables: sex <fct>, year <int>
adelieBiscoe <- filter(penguins, island=="Biscoe" & species=="Adelie")
adelieBiscoe
# A tibble: 44 × 8
   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
 1 Adelie  Biscoe           37.8          18.3               174        3400
 2 Adelie  Biscoe           37.7          18.7               180        3600
 3 Adelie  Biscoe           35.9          19.2               189        3800
 4 Adelie  Biscoe           38.2          18.1               185        3950
 5 Adelie  Biscoe           38.8          17.2               180        3800
 6 Adelie  Biscoe           35.3          18.9               187        3800
 7 Adelie  Biscoe           40.6          18.6               183        3550
 8 Adelie  Biscoe           40.5          17.9               187        3200
 9 Adelie  Biscoe           37.9          18.6               172        3150
10 Adelie  Biscoe           40.5          18.9               180        3950
# ℹ 34 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# then caclulate the average mass of each of those subsets
adelieTorgersenMass <- mean(adelieTorgersen$body_mass_g)
adelieTorgersenMass 
[1] 3708.511
adelieBiscoeMass <- mean(adelieBiscoe$body_mass_g)
adelieBiscoeMass
[1] 3709.659
# another way is using group by with multiple criteria, search through the table to find the rows that show Adelies on Torgersen and Biscoe
massByIslandSpecies <- penguins %>% 
  group_by(island, species) %>% 
  summarize(avgMass=mean(body_mass_g), sdMass=sd(body_mass_g))
`summarise()` has grouped output by 'island'. You can override using the
`.groups` argument.
massByIslandSpecies
# A tibble: 5 × 4
# Groups:   island [3]
  island    species   avgMass sdMass
  <fct>     <fct>       <dbl>  <dbl>
1 Biscoe    Adelie      3710.   488.
2 Biscoe    Gentoo      5092.   501.
3 Dream     Adelie      3701.   449.
4 Dream     Chinstrap   3733.   384.
5 Torgersen Adelie      3709.   452.
  1. Do the female Adelie penguins living on Torgersen Island have a different mean body mass than the female Adelie penguins living on Biscoe? Calculate both the mean and standard deviation of body mass for both groups.
# one way is using filters and later calulating mass

# first create vectors that are 1. only Adelie on Torgersen and 2. only Adelie on Biscoe
adelieTorgersenF <- filter(penguins, island=="Torgersen" & species=="Adelie" & sex=="female")
adelieTorgersenF
# A tibble: 24 × 8
   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
 1 Adelie  Torgersen           39.5          17.4               186        3800
 2 Adelie  Torgersen           40.3          18                 195        3250
 3 Adelie  Torgersen           36.7          19.3               193        3450
 4 Adelie  Torgersen           38.9          17.8               181        3625
 5 Adelie  Torgersen           41.1          17.6               182        3200
 6 Adelie  Torgersen           36.6          17.8               185        3700
 7 Adelie  Torgersen           38.7          19                 195        3450
 8 Adelie  Torgersen           34.4          18.4               184        3325
 9 Adelie  Torgersen           35.9          16.6               190        3050
10 Adelie  Torgersen           33.5          19                 190        3600
# ℹ 14 more rows
# ℹ 2 more variables: sex <fct>, year <int>
adelieBiscoeF <- filter(penguins, island=="Biscoe" & species=="Adelie" & sex=="female")
adelieBiscoeF
# A tibble: 22 × 8
   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
   <fct>   <fct>           <dbl>         <dbl>             <int>       <int>
 1 Adelie  Biscoe           37.8          18.3               174        3400
 2 Adelie  Biscoe           35.9          19.2               189        3800
 3 Adelie  Biscoe           35.3          18.9               187        3800
 4 Adelie  Biscoe           40.5          17.9               187        3200
 5 Adelie  Biscoe           37.9          18.6               172        3150
 6 Adelie  Biscoe           39.6          17.7               186        3500
 7 Adelie  Biscoe           35            17.9               190        3450
 8 Adelie  Biscoe           34.5          18.1               187        2900
 9 Adelie  Biscoe           39            17.5               186        3550
10 Adelie  Biscoe           36.5          16.6               181        2850
# ℹ 12 more rows
# ℹ 2 more variables: sex <fct>, year <int>
# then caclulate the average mass of each of those subsets
adelieTorgersenMassF <- mean(adelieTorgersenF$body_mass_g)
adelieTorgersenSdF<-sd(adelieTorgersenF$body_mass_g)
adelieTorgersenMassF
[1] 3395.833
adelieTorgersenSdF
[1] 259.1444
adelieBiscoeMassF <- mean(adelieBiscoeF$body_mass_g)
adelieBiscoeSdF <- sd(adelieBiscoeF$body_mass_g)
adelieBiscoeMassF
[1] 3369.318
adelieBiscoeSdF
[1] 343.4707
  1. What is the maximum bill depth of penguins for each island?
billByIsland <- penguins %>% 
  group_by(island) %>% 
  summarize(maxDepth=max(bill_depth_mm)) 
billByIsland
# A tibble: 3 × 2
  island    maxDepth
  <fct>        <dbl>
1 Biscoe        21.1
2 Dream         21.2
3 Torgersen     21.5
  1. What is the percentage of female penguins present in the entire dataset?
femalePerc <- penguins %>% 
  group_by(sex) %>% 
  summarise(percent=n()/nrow(penguins))

femalePerc
# A tibble: 2 × 2
  sex    percent
  <fct>    <dbl>
1 female   0.495
2 male     0.505
  1. During which year did the scientists measure the most penguins? (Hint: how many penguins are in the data set per year)
yearCounts <- penguins %>% 
  group_by(year) %>% 
  summarize(count=n())
yearCounts
# A tibble: 3 × 2
   year count
  <int> <int>
1  2007   103
2  2008   113
3  2009   117
  1. What species of penguin live on each island?
speciesByIsland <- penguins %>% 
  group_by(species) %>% 
  summarize(penguinSp=unique(island)) 
Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
  always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'species'. You can override using the
`.groups` argument.
speciesByIsland
# A tibble: 5 × 2
# Groups:   species [3]
  species   penguinSp
  <fct>     <fct>    
1 Adelie    Torgersen
2 Adelie    Biscoe   
3 Adelie    Dream    
4 Chinstrap Dream    
5 Gentoo    Biscoe   
# on Torgersen
# Adelie only 

# on Biscoe
# Adelie and Gentoo

# on Dream
#Adelie and Chinstrap