Home

1 What is ggplot2?

The grammar of graphics package in R called ggplot2 is used to generate data visualizations. It breaks down the components of a plot, so you can adjust its aesthetics (size or intensity of plot points, colors, spatial arrangement of the elements) for a variety of graph types. Here, we will use ggplot2 to code:

  • Histograms

  • Box plots

  • Scatter plots

  • Bar plots

2 Installing ggplot2

ggplot is a sub-package included in the tidyverse package. To install ggplot into your environment, run the code

install.packages("tidyverse")

To load ggplot into your environment, run

library(tidyverse)

In this tutorial, don’t worry about installing tidyverse. However, you must install it on any of your personal projects.

3 Elements of a graph

There are key features in all graphs, including:

A title: A good title should be the conclusion of your data analysis, or the relationship of your X and Y axes.

Axes: The X and Y axis should have their respective types of data. Each graph tutorial will guide you on what type of data goes on which axis. Always include axes measurements!

Colors: Using colors can help guide your audience on what trend, if any, you want to emphasize. It is important to note that they should be used sparingly. While they may look pretty, having too many colors can be distracting of your data’s big picture.

Legends: To prevent the graph from being too busy, consider using a legend to explain your graph’s features.

4 What makes a graph good?

Anyone should be able to look at a graph and understand your data without the context of your research. Take a step back from your code and ask yourself:

What is this graph telling me? What is the relationship between the X and Y variables?

The key to making a graph comprehensible is to make its story as clear as possible. Use the elements of your graph as tools to convey this story.

5 Types of data

Identifying the types of data you have will help you identify the type of graph you’ll need. There are two types of data with two subcategories for each:

Categorical: data arranged in groups

Nominal: No priority or ranking (e.g., yes or no, favorite colors)

Ordinal: Structure within groups (e.g., small, medium, large; cold, warm, hot)

Numerical: all things numbers

Continuous: Decimal/fractional integers

Discrete: Whole integers

6 Data in this tutorial

We will be using three different data sets to hone in on our data science skills.

Palmer penguins: This data set gives insight on the bill length, bill depth, flipper length, body mass, sex, and year of data collection of various Antarctic penguins.

Chick weight: This data set tracks the weight of chicks in grams on different diets over 21 days. The diet types are as follows:

Diet 1: Normal feed with a 10% protein replacement

Diet 2: Normal feed with a 20% protein replacement

Diet 3: Normal feed with a 40% protein replacement

Diet 4: Normal feed, controlled group

Take a look at each data set. Do you notice any trends? What variables interest you?

7 ggplot Code Framework

In order to use ggplot, there here is a skeletal framework:

ggplot(data=dataset, mapping=aes(x=xVariable, y=yVariable))+
geom_typeOfPlot()+ 
labs(title="Your title", x="Name of X variable", y="Your Y variable")

Let’s break down each part of this graph:

  • First, are calling on the ggplot function.

  • Then, we are asking R to look through our dataset.

  • The mapping=aes is used to change elements in your graph that are in the “Tips and Tricks” section of this site.

  • After that, we are explicitly telling R what variables we want on each axis.

  • Next, we are specifying which type of graph we want R to make.

  • Finally, we call on the labs function to rename our axes with our variable names, making sure to include units when necessary.

Now, we are ready to start coding! Let’s learn about some plots we can make with R.