<- c(1,3,7)
vector vector
[1] 1 3 7
Basic arithmetic
Examples: 5*6
, sqrt(9)
, abs(-3)
Type help(command)
to find information about any command
Create variables: Use <-
or =
to assign values to a variable.
x <- 7
Create a vector:
mean(vector)
: Calculates the mean of a given set of values.
median(vector)
: Calculates the median of a given set of values.
var(vector)
: Calculates the variance of a given set of values.
sd(vector)
: Calculates the standard deviation of a given set of values.
std.error(vector)
: Calculates the standard error of a given set of values.
IMPORTANT: This command is stored in a package plotrix
, so you must install and load plotrix to find standard error.
t.test(data$variableName)
: Calculates a t-test for a given set of values. Also outputs the 95% confidence interval.
nrow(data)
: Calculates the total number of rows in a dataset
na.rm = TRUE
: Remove NA
values. Add this as an argument to any of the statistics calculations. E.g. mean(vector, na.rm=TRUE)
install.packages("package")
: Install a package. IMPORTANT: Only run this once in a single R session. Do not rerun unless you restart R.
library(package)
: Load a package that has already been installed.
We use the tidyverse
package to analyze data in these tutorials. See directly above for how to install.
dataFrame <- read_csv("myCSV.csv")
: Creates a data frame from a file called myCSV.csv
data
: View your data - type the name and run code
head(dataFrame)
: View the first few entries in your data
str(dataFrame)
: Gives the structure of data frame
dataFrame$columnName
: Calls up specific column from a data frame
summary(dataFrame)
: Returns min, max, mean, meadian, 1st/3rd quartiles for all vectors in dataFrame
group_by(.data, column)
: Takes a dataset and groups it by a column/variable
summarize(.data, summaryStat = statistic formula)
: Takes a dataset and outputs summary statistics that you define.
n()
: Calculates current group size. Can be used in summarize
and group_by
Combine group_by
and summarize
using the pipe (|>
) to see summary statistics for specific groups/variables.
The pipe: Use |>
OR %>%
to string functions and data together. Read as “AND THEN”.
Example:
filter(data, columnName == "some value")
: Extract data with a specific condition, from a given column.
Use logical operators to combine conditions: &
(and), |
(or), !
(not)
Use comparison operators to describe conditions: <
, >
, ==
, !=
, <=
, >=
Basic structure:
Connect different properties using a +
Basic Components:
Data: data=dataSet
: Define your data set
Aesthetics: mapping = aes(variables)
: Define the variables. Can also specify color/fill for your graph and geometries. For example: mapping=aes(x=____, y=_____, color="____")
Geometry: geom_object()
: Define the type of plot
geom_histogram()
: Creates a histogram
geom_histogram(bins=X)
Specify number of binsgeom_boxplot()
: Creates a boxplot
stat_boxplot(geom="errorbar")
geom_point()
: Creates points (scatterplot) for each data point
color = ___
, shape = ____
, size = ____
geom_col()
: Creates a bar graph with pre-aggregated data that you input
geom_errorbar(mapping=aes(ymin, ymax), width)
geom_smooth(method="lm", se=FALSE)
: Creates a line of best fit
Chain these onto your functions with a +
to customize your plot:
labs(x="____", y="____", title="______")
: Add a title and axes labels to your graph
scale_x_discrete(labels=c("firstLabel", "secondLabel"))
: Add labels for individual categories on the x axis
xlim(minLimit, maxLimit)
and ylim(minLimit, maxLimit)
: Specify x and y minimum and maximum values
facet_wrap(~ variable)
: Create separate plots for each aspect of a given variable. Creates a clustered plot.
color="____"
and fill="_____"
: Change the outline color (color) and filled in color (fill) of your plot. Add these commands in the geom_object()
parentheses.
And lots more!
lm(data$yVariable ~ data$xVariable)
: Create a linear model by performing regression analysis.
summary(model)
: View multiple statistics, including p-values, of a given model.
Compare two samples that are normally distributed.
t.test(data$depVar ~ data$indVar)
: two sample t-test, when your 2 groups are listed in the same dependent variable. They will be grouped by the independent variable.
t.test(varA, varB)
: two sample t-test, when your 2 groups are in different variables/columns
Compare 2 samples that are not normally distributed.
wilcox.test(data$depVar~data$indVar)
: Wilcoxon testCompare more than 2 groups
model <- aov(depvar~indvar)
: Perform an ANOVA test for multiple groups and save it to a variable model
.
summary(model)
: Use this to view the p-value for your ANOVA test.
TukeyHSD(model)
: Perform a Tukey’s test on your ANOVA model.
model<-aov(data$depvar ~ data$indVar1*data$indVar2)
: Perform a two way ANOVA test (with 2 independent variables) and save it to a variable model
.