<- c(1:10) #numbers<- c(1,2,3,4,5,6,7,8,9,10) also works! numbers
Introduction to Base R
Welcome
Welcome to this interactive tutorial for students new to base R. You’ll learn about creating vectors, data frames, summarizing data, understanding data structures, and basic plotting. This tutorial also introduces the use of the $ operator to reference variables in dataframes.
Functions in Focus
Before we dive into the exercises, let’s introduce the key functions we will be working with and provide some guidance on their syntax:
c()
: This function is used for combining values into a vector. It is a fundamental function in R, often used in data analysis for creating vectors which can be used as input for other functions or for data manipulation.
Syntax: c(element1, element2, ...)
summary()
: This function provides statistical summaries of the contents of an R object, such as mean, median, min, max, and quartiles for numeric data, or frequency counts for factors. It’s a quick way to get an overview of your data.
Syntax when used with a dataframe: summary(df)
plot()
: This function is used for creating basic types of graphs such as scatterplots, line plots, histograms, etc. It’s a versatile function for quick and easy visualization of data in a dataframe.
Syntax for a simple scatter plot: plot(df$column1, df$column2)
(plotting data from two columns of df against each other) These functions are essential building blocks in R for data manipulation and analysis. They provide foundational capabilities for handling, exploring, and visualizing data.
Bracket Indexing Review
Indexing in base R is a fundamental concept for accessing and manipulating elements within vectors and dataframes. It primarily uses brackets []
to specify the elements you want to extract or modify.
For Vectors:
Indexing by Position: You can access elements of a vector by their position. For instance, if
v
is a vector,v[2]
will give you the second element of v.Logical Indexing: You can also use logical vectors to index. If
v
is a vectorv[v>3]
will return elements ofv
greater than 3.
For Dataframes: Dataframes can be indexed by row and column.
- Indexing by Position: To access specific elements, you can specify the row and column numbers. For instance,
df[2, 3]
will give you the element in the second row and third column of the dataframedf
. - Logical Indexing: Similar to vectors, you can use logical statements for indexing. For example,
df[df$column_name > 10, ]
will give you all rows ofdf
where the values incolumn_name
are greater than 10.
Our Dataset
We will work with a dataset with size measurements for 3 different species of penguins on islands in the Palmer Archipelago, Antarctica. We’ll be focusing on the measurements for bill length (mm) and body mass (g). Here’s more information about the data set!
Here is a preview of the data you will be working with:
Exercises
Here are some practice exercises. You can check if your code works using the “check your work” button after each problem.
If you get stuck, you can also click for hints and for an example answer. Remember, there’s many ways to solve each problem, so the code in the “answer” box isn’t the only way to solve it!
Exercise 1: Creating a Vector
Create a vector named numbers containing the first ten numbers.
Exercise 2
Exercise 2: Summarizing Data. Apply the summary()
function to penguin_sample
.
Exercise 3
Create a simple plot of bill_length
vs body_mass
from penguin_sample
.
Exercise 4
Using the $ Operator: Extract the bill_length
column from penguin_sample
using the $ operator. Assign it to a new stand-alone vector called bill_length
.
Exercise 5
Using the $ operator to reference the vector (variable) bill_length
in the dataframe penguin_sample
, create a new object called long_bill
that takes on only the bill length values of survey respondents greater than or equal to 50mm.
Exercise 6
This time, extract the entire dataset (all columns, including body mass and species info) for the group of penguins with a bill length over 50mm. Call this object long_bill_data
.
Congratulations, you finished this tutorial!