# Here's how you start:
<- climate_data_longer |>
climate_data pivot_wider(names_from = _______, values_from = ______)
Data Transformation in R
Introduction
In this tutorial, we’ll explore several critical tidverse functions for data transformation with environmental data. The data represents fictional measurements from different countries concerning carbon emissions, temperature changes, and participation in various environmental treaties.
Here are the functions you will need to use:
bind_rows()
: This function is used to combine two or more data frames by row, binding them together. Syntax:combined_data <- bind_rows(df1, df2)
pivot_longer
: Reshape data when variables are spread across columns. Syntax:pivot_longer(data, cols_to_pivot, names_to="name", values_to="value")
pivot_wider
: Reshape data when you need to spread rows across new columns. Syntax:pivot_wider(data, names_from = "category", values_from = "amount")
left_join()
: Joins two datasets, retaining observations from the first (left) dataframe. Syntax:left_join(df1, df2, by = "key_variable")
anti_join()
: Identifies rows in one data frame that do not have a corresponding match in another data frame. Syntax:result <- anti_join(df1, df2, by = "key_variable")
Our Dataset
Let’s create our synthetic dataset first.
This dataset includes fictional information about carbon emissions and temperature changes over four years for various countries, as well as whether they participated in environmental treaties. Here is your first dataeframe, called climate_data_longer
:
Exercises
Here are some practice exercises. You can check if your code works using the “check your work” button after each problem.
If you get stuck, you can also click for hints and for an example answer. Remember, there’s many ways to solve each problem, so the code in the “answer” box isn’t the only way to solve it!
Exercise 1: Widen the Data
This data is too long, with multiple variables sharing the same columns. Convert climate_data_longer
to a wider format, where each column represents a different type of measurement. Use the current measurement variable to create new variable names. Name the wider dataframe climate_data
.
Exercise 2: Lengthen the Data
For practice purposes, can you convert climate_data
back to its original, longer format, where one column represents the type of measurement (carbon emission or temperature change), and another column represents the values? Call this new data frame climate_data_reversion
.
Exercise 3: Adding Rows
Suppose we have measured data for another country, “Elbonia,” and want to add this data to our existing climate_data
. Use bind_rows()
to do this.
First, we will create the new data with the following code:
Now, bind this new data to climate_data
and call your new data set all_climate_data
Exercise 4: Joining Data
We also have another dataset which tells us which countries have signed an environmental treaty. The dataset is called treaty_data
.
Let’s find out which countries in our climate_data
have signed an environmental treaty using left_join()
.
Exercise 5: Anti-Join
Now, identify which countries listed in treaty_data
are not present in climate_data
using anti_join()
. Save the mismatched countries in an object called missing_countries
.
Conclusion
Great work! In this tutorial, you’ve learned how to use various tidyverse functions to manipulate a climate change dataset. These functions are versatile and can be applied to many data manipulation tasks in R. Keep practicing and exploring their different parameters and capabilities.