# this is a comment, use it to explain your code for a human to understand
# each new line of a comment needs a new `#`
10 R functions quick reference sheet
10.1 Goals
- Help make your own quick reference sheet of the most important R functions we’ve gone over so far
This lab will be a little different from the others. In the Learning the Tools section we’ll take a guided tour of the start of a quick reference sheet. Then in the Questions section, you’ll be tasked with completing the reference sheet.
10.2 Learning the Tools
After completing this lab you will have a quick reference guide for the following functions/commands:
“Base R” stuff (i.e. you don’t need to load a package for these functions/commands)
#
library
<-
c
_______________________[q1]:
mean
____________________[q2]sd
______________________[q3]sum
_____________________[q4]length
data.frame
nrow
____________________[q5]head
View
____________________[q6]$
_______________________[q7]==
<=
______________________[q8]<
_______________________[q9]>=
______________________[q10]>
_______________________[q11]!=
______________________[q12]subset
[
sample
__________________[q13]
Functions from package dplyr
group_by
summarize
Functions from package ggplot2
ggplot
aes
geom_point
geom_bar
________________[q14]geom_boxplot
____________[q15]geom_histogram
__________[q16]vars
facet_grid
scale_color_viridis_c
scale_color_viridis_d
___[q17]
Functions and commands that have [qXYZ] next to them will be saved for you as questions, all the rest we will cover together here in Learning the Tools. Let’s dive in!
10.2.1 #
Definition: The special “comment” character, this charachter tells R to ignore what you’re typing, those words are for human eyes only
Examples:
10.2.2 library
Definition: Use library
to load a package, this allows you to use the functions defined by that package, and any data contained in that package.
Examples:
# load a package for its data
library(palmerpenguins)
# now we can, e.g., use the `penguins` data from *palmerpenguins*
# in a plot
# first load the package *ggplot2*
library(ggplot2)
ggplot(penguins, aes(x = bill_length_mm)) +
geom_histogram()
10.2.3 <-
Definition: The special “assignment” character, tells R to assign the stuff on the right hand side to the name on the left hand side
Examples:
<- "anything you want, like this string of characters"
any_name_you_want any_name_you_want
[1] "anything you want, like this string of characters"
10.2.4 :
Definition: A quick way to make a vector of consecutive integers
Examples:
<- 1:3
x x
[1] 1 2 3
10.2.5 length
Definition: Tells you the length of a vector
Examples:
<- c("a", "b", "c")
y length(y)
[1] 3
10.2.6 data.frame
Definition: A function to create a data.frame object. You tell it what columns you want by passing the names and values of the columns as arguments
Examples:
# note: we will make columns called "group" and "measurement" and to give
# values to those columns we use the equal size (=), **NOT** the
# assignment character (<-)
<- data.frame(group = c("a", "b", "c"),
my_data measurement = 1:3)
my_data
group measurement
1 a 1
2 b 2
3 c 3
10.2.7 head
Definition: Look at the first few rows a data.frame
Examples:
<- data.frame(group = c("a", "b", "c"),
my_data measurement = 1:3)
# with this short data.frame `head` will just print all the rows,
# for a longer data.frame, only the first 6 rows will be printed
head(my_data)
group measurement
1 a 1
2 b 2
3 c 3
10.2.8 ==
Definition: Asks a “yes/no” question: does the object on the left hand size equal the object on the right hand side, yes (TRUE
) or no (FALSE
)?
Examples:
1 == 2
[1] FALSE
# also works for each element of vectors, but the vectors need to be the
# same length, and be the same type (e.g. both numeric)
<- c(1, 5, 7)
z <- c(1, 2, 7)
w
== w z
[1] TRUE FALSE TRUE
10.2.9 &
Definition: Combines “TRUE/FALSE” questions with “and”. Like “is this TRUE, and this TRUE?” If either is not true, the &
gives us FALSE
Examples:
<- data.frame(group = c("a", "b", "c"),
my_data measurement = 1:3)
$group == "a" & my_data$measurement <= 2 my_data
[1] TRUE FALSE FALSE
10.2.10 |
Definition: Combines “TRUE/FALSE” questions with “or”. Like “is this TRUE, or this TRUE?” If either is true, the |
gives us TRUE
Examples:
<- data.frame(group = c("a", "b", "c"),
my_data measurement = 1:3)
$group == "a" | my_data$measurement <= 2 my_data
[1] TRUE TRUE FALSE
10.2.11 is.na
Definition: Asks a “TRUE/FALSE” question: is a value missing (NA
) or not?
Examples:
<- c(1.1, 4.3, NA)
my_values is.na(my_values)
[1] FALSE FALSE TRUE
10.2.12 subset
Definition: Returns a subset of a data.frame based on some condition being met. The condition is usually whether the values in one of the vectors meet some kind of criterion
Examples:
<- data.frame(group = c("a", "b", "c"),
my_data measurement = 1:3)
<- subset(my_data, my_data$measurement <= 2)
my_data_sub my_data_sub
group measurement
1 a 1
2 b 2
10.2.13 [
Definition: Called the “square bracket” this command is used to extract specific elements of a vector or data.frame
Examples:
<- c("a", "b", "c")
y 1:2] y[
[1] "a" "b"
10.2.14 group_by
Definition: A function from the dplyr package, this function tells R to split a data.frame into groups according to one or more of the columns in the data.frame. This function is almost always used in combination with summarize
. We will show an example of both working together
10.2.15 summarize
Definition: A function provided by the package dyplyr to calculate a statistic from the column or columns of a data.frame. This function is almost always used in combination with group_by
Examples:
# load package to get access to `penguins` data set
library(palmerpenguins)
# load package to get access to functions
library(dplyr)
# group the penguins by their species ID
<- group_by(penguins, species)
penguis_by_species
# calculate a statistic like the mean bill length
# notice that we're telling the `mean` function to ignore NA values by
# specifying `na.rm = TRUE`
<- summarize(penguis_by_species,
bill_by_species mean_length = mean(bill_length_mm, na.rm = TRUE))
10.2.16 ggplot
Definition: The main function from package ggplot2 for setting up a plot. This function doesn’t make the plot itself, but it tells R what data.frame to use and which columns from that data.frame will go on the x and y axes. The x and y axes are set up wit the aes
function. We will show an example after we define aes
and the geom_*
family of functions
10.2.17 aes
Definition: The function from package ggplot2 that tells R what data will go across the x and y axes
10.2.18 geom_point
Definition: The function from package ggplot2 that draws a scatter plot of points
Examples:
library(palmerpenguins)
library(ggplot2)
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point()
10.2.19 vars
Definition: A function from package ggplot2 that is used to refer to a specific column when doing complex plotting like adding facets or colors that display information from the data. This will be more clear with the example from the facet_grid
function
10.2.20 facet_grid
Definition: A function from package ggplot2 that allows us to make multiple plots all at once based on a categorical column in our data set (each unique category in the data will get its own plot)
Examples:
library(palmerpenguins)
library(ggplot2)
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point() +
facet_grid(vars(species)) # notice that we have to use `vars` here
10.2.21 scale_color_viridis_c
Definition: A function from package ggplot2 that creates a color scale corresponding to a continuous variable. The “viridis” color palette is a very useful color scale because is very well aligned to how humans perceive color changes
Examples:
library(palmerpenguins)
library(ggplot2)
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm,
color = bill_length_mm)) +
geom_point() +
scale_color_viridis_c()
# note that the above example doesn't make efficient use of color because
# the color scale simply shows the same info as the x axis, a better use
# of color would be to show a third data dimension like this:
ggplot(penguins, aes(x = bill_length_mm, y = bill_depth_mm,
color = body_mass_g)) +
geom_point() +
scale_color_viridis_c()
10.3 Questions
Now it’s your turn! In the reprot.R
script, create a reference guide for yourself following the same format, like this:
## <function name>
## **definition:** <write your definition here>
## **examples:**
<write some R code here giving examples>
# <use comments to help explain your examples>
The report.R
script will already be formatted for you
Here’s the functions you’ll need to define and give examples for:
c
mean
sd
sum
nrow
View
$
<=
<
>=
>
!=
sample
geom_bar
geom_boxplot
geom_histogram
scale_color_viridis_d