Bio110 Cheatsheet

1 Basic R Commands

Basic arithmetic
- Examples: 5*6, sqrt(9), abs(-3)
- Type help(command) to find information about any command
Create variables: Use <- or = to assign values to a variable.
- Example: x <- 7
Create a vector:
```
vector <- c(1,3,7)
vector
```
```
[1] 1 3 7
```

1.1 Basic Statistics

mean(vector) : Calculates the mean of a given set of values.
median(vector) : Calculates the median of a given set of values.
var(vector): Calculates the variance of a given set of values.
sd(vector) : Calculates the standard deviation of a given set of values.
std.error(vector): Calculates the standard error of a given set of values.

IMPORTANT: This command is stored in a package plotrix, so you must install and load plotrix to find standard error.
t.test(data$variableName): Calculates a t-test for a given set of values. Also outputs the 95% confidence interval.
nrow(data): Calculates the total number of rows in a dataset
na.rm = TRUE : Remove NA values. Add this as an argument to any of the statistics calculations. E.g. mean(vector, na.rm=TRUE)

2 Installing packages

install.packages("package"): Install a package. IMPORTANT: Only run this once in a single R session. Do not rerun unless you restart R.
library(package): Load a package that has already been installed.

3 Working with a data set in the tidyverse

We use the tidyverse package to analyze data in these tutorials. See directly above for how to install.

3.1 Uploading and Viewing a data set

dataFrame <- read_csv("myCSV.csv") : Creates a data frame from a file called myCSV.csv
data: View your data - type the name and run code
head(dataFrame): View the first few entries in your data
str(dataFrame): Gives the structure of data frame
dataFrame$columnName: Calls up specific column from a data frame
summary(dataFrame): Returns min, max, mean, meadian, 1st/3rd quartiles for all vectors in dataFrame

3.2 Grouping and Summarizing data

group_by(.data, column): Takes a dataset and groups it by a column/variable
summarize(.data, summaryStat = statistic formula): Takes a dataset and outputs summary statistics that you define.
- n(): Calculates current group size. Can be used in summarize and group_by
- Combine group_by and summarize using the pipe (|>) to see summary statistics for specific groups/variables.

The pipe: Use |> OR %>% to string functions and data together. Read as “AND THEN”.

Example:

  dataFrame |> 
    group_by(firstColumn)|> 
    summarize(mean_of_secondColumn = mean(secondColumn), 
              standard_deviation_of_secondColumn = sd(secondColumn))

3.3 Filtering data

filter(data, columnName == "some value"): Extract data with a specific condition, from a given column.
- Use logical operators to combine conditions: & (and), | (or), ! (not)
- Use comparison operators to describe conditions: <, >, ==, !=, <=, >=

4 GGPlot

4.1 Resources:

4.2 GGPlot Basics:

Basic structure:

ggplot(data, mapping=aes()) +
      geom_function()

Connect different properties using a +
Basic Components:
- Data: data=dataSet: Define your data set
- Aesthetics: mapping = aes(variables): Define the variables. Can also specify color/fill for your graph and geometries. For example: mapping=aes(x=____, y=_____, color="____")
- Geometry: geom_object(): Define the type of plot

4.3 Geometries

geom_histogram(): Creates a histogram
- Syntax: geom_histogram(bins=X) Specify number of bins
geom_boxplot(): Creates a boxplot
- Add error bars: stat_boxplot(geom="errorbar")
geom_point(): Creates points (scatterplot) for each data point
- Can specify color = ___, shape = ____, size = ____
geom_col() : Creates a bar graph with pre-aggregated data that you input
- Add error bars: geom_errorbar(mapping=aes(ymin, ymax), width)
geom_smooth(method="lm", se=FALSE): Creates a line of best fit

4.4 Plot Customization

Chain these onto your functions with a + to customize your plot:

labs(x="____", y="____", title="______") : Add a title and axes labels to your graph
scale_x_discrete(labels=c("firstLabel", "secondLabel")): Add labels for individual categories on the x axis
xlim(minLimit, maxLimit) and ylim(minLimit, maxLimit): Specify x and y minimum and maximum values
facet_wrap(~ variable): Create separate plots for each aspect of a given variable. Creates a clustered plot.
color="____" and fill="_____": Change the outline color (color) and filled in color (fill) of your plot. Add these commands in the geom_object() parentheses.
And lots more!

5 Inferential statistics

5.1 Regression Analysis

lm(data$yVariable ~ data$xVariable): Create a linear model by performing regression analysis.
summary(model): View multiple statistics, including p-values, of a given model.

5.2 2 Sample t-test

Compare two samples that are normally distributed.

t.test(data$depVar ~ data$indVar): two sample t-test, when your 2 groups are listed in the same dependent variable. They will be grouped by the independent variable.
t.test(varA, varB): two sample t-test, when your 2 groups are in different variables/columns

5.3 Wilcoxon test

Compare 2 samples that are not normally distributed.

wilcox.test(data$depVar~data$indVar): Wilcoxon test

5.4 ANOVA test and Tukey’s test

Compare more than 2 groups

model <- aov(depvar~indvar): Perform an ANOVA test for multiple groups and save it to a variable model.
summary(model): Use this to view the p-value for your ANOVA test.
TukeyHSD(model): Perform a Tukey’s test on your ANOVA model.
model<-aov(data$depvar ~ data$indVar1*data$indVar2): Perform a two way ANOVA test (with 2 independent variables) and save it to a variable model.