Biodiversity tutorial
Grace Kazmir, edited for web version by Olivia Spagnuolo
Introduction
Today we’ll be looking at biodiversity!
We’re going to using coding to help us think about biodiversity and make graphs.
What is biodiversity?
Biodiversity is the variety of all life on Earth. It is often measured by the number of species in an ecosystem.
Here’s a video to learn more about biodiversity and how we can calculate it using Simpson’s Diversity Index.
Collecting data
To better understand the biodiversity in an ecosystem, scientists perform field work where they record information about an ecosystem. Let’s learn about the Borneo rainforst where scientists collect data about the plants, animals, and ecological factors there.
After scientists collect all this data, other people analyze it. They import and process data and then generate graphs and perform statistical analyses to learn more about what is happening in that ecosystem.
Today, we’re going to learn more about how to analyze data. We’ll use a coding language called R to process and analyze biodiversity data.
What is R and what can it do?
R is a programming language that helps us work with large amounts of data.
You can import data from various sources, clean and process it, perform statistical analyses and generate reports or visualizations
Today, we are going to create graphs from a big data set!
Fun Fact: R is like a fancy calculator
You can use R!
This tutorial lets you use R to work with data. Throughout the tutorial, there are gray boxes with a green “run code” button. These are called code chunks. You can type commands into these boxes and it will tell the computer to do things! Let’s try it out.
Press the Green Play button on the top right of the code chunk.
(the code chunk is the grey rectangular box below)
You Try: Replace 4+3
with any math equation
Press Green Play Button when done
Data Set
We are going to be using data from BioTIME, which is a database with a focus on biodiversity. They include free open-access data in an effort to make data more accessible and usable by anyone.
- We are specifically looking at data on the Biodiversity of Terrestrial Plants and Invertebrates in the Temperate Coniferous Forest and in Tropical Forests.
We’ve pre-loaded a data set from the biotime data into this tutorial
Biodiversity_Index
is the name of the Data set.
Lets look at it!
The kable()
function opens a data set so you can look at it.
Press Green play button to run the code
This is A LOT of data!! It can be hard to understand by just looking at it. But luckily we can use R/the computer to better understand the data. That’s what we’ll do in the next sections.
First, though, let’s see what we can observe just by looking at the data.
Stop and Think:
How many columns does this table have? How many rows?
âś—148 columns and 11 rows
âś—11 columns and 10 rows
âś“11 columns and 148 rows
âś—52 columns and 14 rows
At the top you’ll see the names of all the columns. Try scrolling all the way to the bottom of the data to see how many rows there are.
Choosing our columns
How can we narrow down all the data and variables? Let’s consider our goal, which is to graph biodiversity.
We will focus on biodiversity Index and the number of different species to help us understand biodiversity in this data set.
Stop and Think:
Discuss these questions with your neighbor:
How do biodiversity index and number of different species represent biodiversity in an area?
What is the difference between biodiversity index and number of different species?
Why would we want to look at both biodiversity index and number of different species?
Dataset format
- The columns names are variables in the data.
- The rows show individual data entries.
Which Columns in the dataset do you think represent biodiversity index and number of different species?
âś—Taxa
and total
âś—Biome
and total
âś—entry
and YEAR
âś“index
and numspecies
Go back to the Biodiversity_Index
data set code chunk to look at the columns again! At the top you’ll see the names of all the columns. Each column is separated by a vertical line | .
Narrowing down the data
We’ve determined that we want to work with the index
and numspecies
columns.
The
index
column represents the biodiversity index per year.The
numspecies
column represents the different number of species observed per year.
Choosing the biome and taxa
Now let’s look closer at each biome and taxa in the data set. In order to make a better graph, we want to narrow down the data we’re looking at. We’ll need to choose a biome and taxa to focus on.
Stop and Think
How many different biomes does the data set have?
(Go back and look at the
Biodiversity_Index
data set to try to find the answer)notice this is HARD to find by just looking at the data set, especially if the data set is huge.
Let’s use R to solve this question instead! The unique()
function lists unique values within a column, so it shows us all the distinct values in a column.
Press Green play button to run the code
There are two Biomes in this Data Set; Temperate Coniferous Forest
and Tropical Forest
.
Unique Taxa
How many different taxa does the data set have?
- Now you know what code to use to find unique values within a column. See if you can reuse the code to find the unique values for the
Taxa
column.
Edit this code to find the unique values for the Taxa
column. Replace the red line ______ with the column name we’re looking at.
Press Green play button to run the code chunk once you edit the code.
The code should look like unique(Biodiversity_Index$"Taxa")
There are two different taxa in this data set: Terrestrial Plants
and Terrestrial Invertebrates
.
Now let’s combine all the columns we just looked at: the index
,total
,Biome
, and Taxa
columns. Let’s also add the YEAR
column, since it’s the independent variable we will be using in our graphs. Below, we’ll look at just the columns we want to graph:
Press Green play button to run the code chunk
Now that you have extracted the columns you need, let’s graph!
Creating a Graph
We have A LOT of data. It would be really hard to graph by hand. Luckily, we can use the computer (R) to graph it.
Our goal is to create a graph that tracks the biodiversity of a specific location over time. We have two methods of measuring biodiversity: the number of species and the biodiversity index. We will also narrow down our data by choosing a specific biome and taxa.
Stop and Think
Before you make the graph, make a prediction about what it will look like:
How do you think biodiversity is changing over time?
- Will there be more or less biodiversity now compared to 50 years ago?
Will biodiversity look different in different biomes and taxas? How?
Go through the following steps to create your graph.
Step 1: Choosing a Biome and Taxa
First, choose a biome and taxa.
Here are your options again:
Biomes
Temperate Coniferous Forest
Tropical Forest
Taxa
Terrestrial Plants
Terrestrial Invertebrates
Now, you’ll need to tell the computer what you chose. In the code chunk below:
Type in the name of your Biome and Taxa. Replace the red line ______ and don’t delete the quotes ""
. Make sure you spell everything correct and use the exact capitalization in the dataset!
Then press “run code”.
Run this code. If it outputs the Biome and Taxa you chose spelled and capitalized EXACTLY as they are spelled in the data, then you can continue!
Step 2: Choosing Your X and Y variabless
In the previous step, you narrowed down what specific data you want to look at.
Now, you’ll need to choose what variables you want to plot on the graph. Recall our goal is to plot the biodiversity index OR number of species over time.
Stop and Think
What is the X variable? What is the Y variable?
âś“X variable: time; Y variable: biodiversity index/number of species
âś—X variable: biodiversity index/number of species; Y variable: time
Remember:
The x variable is the independent variable. It is NOT affected by the other variable. It’s like the “cause” variable.
The y variable is the dependent variable. It IS affected by the other variable. It’s like the “effect” variable.
The x variable is time because it is independent - it will change and progress no matter what!
The y variable is biodiversity index or number of species (whichever you choose) because it is dependent on time. As time moves forward, the biodiversity index will also change.
You Choose your Variable
Now, you get to input which y variable you want to graph - either biodiversity index or number of species.
Recall the column names for each of these variables:
index
andnumspecies
.Type in the name of your y variable below. Replace the red line _____ with its name. Be sure to spell it exactly as it is spelled in the data!
Then press “run code”.
Everyone’s x variable will be year so you don’t need to change that.
Step 3: Choosing the type of graph
In this tutorial, we’re going to make a bar graph. A bar graph allows us to see how biodiversity changes over time in a simple format. Bar Graphs are used to display and compare the values of different categories and color can help distinguish the different categories.
If you want to try making a different kind of graph, refer to the Challenge coding section.
Step 4: Assigning Color to Bar Graph
You can now choose a custom color for your bar graph. Here are some color options:
"darkseagreen"
"salmon3"
"seagreen"
"brown"
"blue"
"yellow"
"purple"
Now, type the color name in place of the red line ______. Be sure to keep it inside the quotes ""
.Then click “run code”.
Step 5: Adding a Title and Axis Labels
Now, you can add a title and axis labels to your graph.
- A good graph should always have at least a Title and Axis labels (X-Axis, Y-axis)
In the code chunk below, add a Title, Subtitle, x-axis label, and y-axis label. Then click “run code”.
Be sure to keep quotations ""
around name. Example : xLabel="year"
Step 6: Create the graph!
Now, you can create your graph! Click “run” on the code chunk below
Did you get an error message in red?
If the graph also displayed and looks right, don’t worry about the error message!
If the graph is not displaying, go back and make sure you ran all the code chunks previously where you defined the variables and labels.
Did you put everything in quotes when you defined the variables and labels?
Did you press “Run code” in each code chunk?
If it still isn’t working or if you accidentally deleted something, you can click the “refresh” button at the top of a code chunk. Or you can refresh the page to restart fully restart, but this will delete all your work.
Stop and think
Talk with your group about the graphs!
What patterns do you notice?
Do you see any outliers?
How is the data changing over time?
What do you wonder?
Does this data give you all the information you want? What is the data missing?
What else would you like to know?
Congratulations!
You have created a data visualization from a big data set!
Stop and Think:
What were some challenges?
What have you learned about Data Science?
What other kind of data would you like to look at in the future?
What kind of graphs do you want to make or see more of?
If you want to learn more about the code we used to create the graph, you can click “continue” to go to the Code Challenge. This is OPTIONAL. Otherwise you are all done! Great work!!
Code challenge
In this code challenge, we’ll have you directly edit the entire code chunk used to create your graph. Previously, you inputted your variables and we then put them into the graph code. So this time, you get to put your variables directly into the graph code and learn about what each command does!
Here’s the code we used to create your graph. We’ll go over it in detail below.
Step 1: Selecting a data set to graph
Lets look at line 1 : Biodiversity_Index |>
This line is telling the computer which data set we want to create a graph from
You do not need to make any changes to this line.
Step 2: Identifying which Biome
and which Taxa
Looking at Line 2 : filter(Biome=="Insert Biome Name Here", Taxa=="Insert Taxa Name Here")
This is line is telling R which specific rows you want displayed in your graph based on the categories you tell it.
The
filter()
function picks rows you want to keep based on the instructions you give itSince there are multiple biomes and taxa within the data set we need to specify which ones we want to graph
You Try:
Based on your assigned graph insert the corresponding Biome
and Taxa
in the code chunk below.
Be sure you spell the Biome and Taxa exactly as it is spelled in the dataset with proper capitalizations. Here are the options:
- Biome:
- “Temperate Coniferous Forest”
- “Tropical Forest”
- Taxa:
- “Terrestrial Invertebrates”
- “Terrestrial Plants”
DO NOT Press play!
Make sure to keep your answers in the quotations""
!
Step 3: Choose your X and Y variables
Looking at Line 3: ggplot(aes(x= YEAR, y="numspecies or index"))
The
ggplot()
function is used to create a wide variety of data visualizations, including scatter plots,bar graphs, line graphs, and many others.Within the
ggplot()
function we use theaes()
function to assign the features of the graphs ( X and Y Values)The X value (
x=YEAR
) will be the same for all eight graphs, so DO NOT change that part
You Try:
Based on your assigned graph insert the corresponding
y
value in the code chunk belowThis should be either
numspecies
if your graph is looking at the number of species orindex
if your graph is looking at Biodiversity IndexNOTE: In this part it important to NOT include quotations
""
. Example:y=numspecies
ory=index
DO NOT Press Green play button
Step 4: Choosing the type of graph
Looking at line 5, this is where we choose the type of graph we want to use. Currently it says geom_col(________)
with some random words in the parenthesis that you can ignore for now.
You can change the part where it says col
to adjust the type of plot. For example, you can create a scatter plot by replacing col
with point
. Here’s how that would look:
Step 5: Assigning corresponding color to the bars
Looking at line 5: geom_col(fill= "Insert Color Name Here", color="grey37")
The
geom_col
function is used to tell R that we want to graph a Bar graphBar Graphs are used to display and compare the values of different categories and color can help distinguish the different categories.
Within the
geom_col()
function we can specify the color of the bars using the codefill="Name of Color"
Ignore
color="grey37"
that is specifying the color for the border of the bars. DO NOT change that.
You can now choose a custom color for your bar graph. Here are some color options:
"darkseagreen"
"salmon3"
"seagreen"
"brown"
"blue"
"yellow"
"purple"
Your Turn:
Once you’ve identified which color you need to use, insert the color where it says "Insert Color Name Here"
.
NOTE: It is important to keep the quotations ""
Step 6: Labels
Looking at lines 5,6,and 7 : labs(title = "Insert Title Name", subtitle = "Insert Subtititle", x= "Year", y= "Number of species or Biodiversity index")
The
labs()
function is used to specify labels on the graph (Title, Subtitle, Axis labels, Captions, and more)A good graph should always have at least a Title and Axis labels (X-Axis, Y-axis)
Your Turn:
Insert the Title of your graph where it says
"Insert Title Name"
Insert the subtitle of your graph where it says
"Insert Subtitle"
Keep the X-axis labeled
"Year"
DO NOT Change this code.Pick
"Number of species"
or"Biodiversity index"
as your Y-Axis label based on your corresponding graph.NOTE: keep quotations
""
around name. Example :x="year"
DO NOT Press Green Play button
Step 7: Putting all the lines together!
Now that you have edited all the necessary code lines, let’s put them together to create our graphs
Your Turn:
- Look back at your work in steps 1-5. Edit the following code based on your work in those steps.
If you’re having trouble with the code, click below to see the answers and edit your code accordingly.
Congratulations!
You have created a data visualization from a big data set!
Nice work!