diff --git a/_bookdown_files/Module1_files/figure-html/unnamed-chunk-310-1.png b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-310-1.png
new file mode 100644
index 0000000..eba5306
Binary files /dev/null and b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-310-1.png differ
diff --git a/_bookdown_files/Module1_files/figure-html/unnamed-chunk-311-1.png b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-311-1.png
new file mode 100644
index 0000000..eba5306
Binary files /dev/null and b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-311-1.png differ
diff --git a/_bookdown_files/Module1_files/figure-html/unnamed-chunk-312-1.png b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-312-1.png
new file mode 100644
index 0000000..9c48ca1
Binary files /dev/null and b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-312-1.png differ
diff --git a/_bookdown_files/Module1_files/figure-html/unnamed-chunk-313-1.png b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-313-1.png
new file mode 100644
index 0000000..9c48ca1
Binary files /dev/null and b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-313-1.png differ
diff --git a/_bookdown_files/Module1_files/figure-html/unnamed-chunk-314-1.png b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-314-1.png
new file mode 100644
index 0000000..d91889b
Binary files /dev/null and b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-314-1.png differ
diff --git a/_bookdown_files/Module1_files/figure-html/unnamed-chunk-315-1.png b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-315-1.png
new file mode 100644
index 0000000..cd73061
Binary files /dev/null and b/_bookdown_files/Module1_files/figure-html/unnamed-chunk-315-1.png differ
diff --git a/assignment_templates/assignment_2.rmd b/assignment_templates/assignment_2.rmd
new file mode 100644
index 0000000..84e2f68
--- /dev/null
+++ b/assignment_templates/assignment_2.rmd
@@ -0,0 +1,102 @@
+---
+title: "Assignment 2"
+author: "Your Name Here"
+date: "`r format(Sys.time(), '%d %h, %Y, %I:%M %p')`"
+output: html_document
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+# Instructions:
+
+1. Update your name in the header block, example: `author: "Alex Fout" `
+1. Select `File > Save as` and save the file by adding your last name at the beginning with an underscore, example: `fout_assignment_2.rmd`
+1. Follow the instructions below to fill in the assignment.
+1. Be sure to _run your code chunks when you make them_, to make sure everything works!
+1. When you've completed the assignment, __knit__ the document and make sure the resulting HTML or PDF file looks alright.
+1. Upload the PDF or HTML file to Canvas (Don't upload the Rmd document).
+
+
+# Assignment
+
+In this assignment, we'll review R programming fundamentals.
+Remember to place all code inside a code chunk.
+
+
+## Problem 1.
+
+Create an empty numeric vector of length 3 called `a`
+Then assign the elements of a the values 5, 6, and 6, respectively.
+
+
+
+## Problem 2.
+
+Create a new numeric vector called `b` which is the result of squaring each element of `a`.
+Then change the value of the second element of `b` so that it is the same as the first element of `b`.
+
+
+
+## Problem 3.
+
+Find the result of `a` times `b` and assign it to a new variable, `c`.
+Then print out the sum of the elements of `c`.
+
+
+
+## Problem 4.
+
+Re-visit problems 1, 2, and 3, and add comments to your code explaining what the code does.
+
+
+## Problem 5.
+
+Change the third element of `b` to -2.
+print out the values of `b` and `c`.
+`c` was defined using `b`, did the value of `c` change?
+
+
+## Problem 6.
+
+create a new variable `d` which is the result of converting `a` into a character vector.
+combine `a`, `b`, `c`, and `d` into a list called `letters`.
+
+
+## Problem 7.
+
+create a data frame from `letters` and call it `df`.
+Change the column names of `df` to have names `a`, `b`, `c`, and `d`, respectively.
+Then print out the dimension of `df`.
+
+
+## Problem 9.
+
+Create a logical vector by comparing which elements in `a` are equal to 6.
+
+
+## Problem 10.
+
+remove the `letters` variable from R.
+Did this also remove `a`, `b`, `c`, and `d`?
+
+
+
+## Problem 11.
+
+Create a numeric vector called `two` which has elements 1 and 2.
+Create a numeric vector called `four` which has four ones.
+Normally you can't add vectors of different length, but let's try anyways.
+Print the result of `two` plus `four`.
+What did R do?
+
+
+
+
+## End
+
+This is the end of the assignment!
+You can knit the document and upload it to Canvas
+
+
diff --git a/assignment_templates/assignment_3.rmd b/assignment_templates/assignment_3.rmd
new file mode 100644
index 0000000..31113b2
--- /dev/null
+++ b/assignment_templates/assignment_3.rmd
@@ -0,0 +1,36 @@
+---
+title: "Assignment 3"
+author: "Your Name Here"
+date: "`r format(Sys.time(), '%d %h, %Y, %I:%M %p')`"
+output: html_document
+---
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+```
+
+# Instructions:
+
+1. Update your name in the header block, example: `author: "Alex Fout" `
+1. Select `File > Save as` and save the file by adding your last name at the beginning with an underscore, example: `fout_assignment_3.rmd`
+1. Follow the instructions below to fill in the assignment.
+1. Be sure to _run your code chunks when you make them_, to make sure everything works!
+1. When you've completed the assignment, __knit__ the document and make sure the resulting HTML or PDF file looks alright.
+1. Upload the PDF or HTML file to Canvas (Don't upload the Rmd document).
+
+
+# Assignment
+
+In this assignment, we'll explore working with data in R.
+
+
+## Problem 1.
+
+
+
+## End
+
+This is the end of the assignment!
+You can knit the document and upload it to Canvas
+
+
diff --git a/docs/Module1_files/figure-html/unnamed-chunk-310-1.png b/docs/Module1_files/figure-html/unnamed-chunk-310-1.png
new file mode 100644
index 0000000..eba5306
Binary files /dev/null and b/docs/Module1_files/figure-html/unnamed-chunk-310-1.png differ
diff --git a/docs/Module1_files/figure-html/unnamed-chunk-311-1.png b/docs/Module1_files/figure-html/unnamed-chunk-311-1.png
new file mode 100644
index 0000000..eba5306
Binary files /dev/null and b/docs/Module1_files/figure-html/unnamed-chunk-311-1.png differ
diff --git a/docs/Module1_files/figure-html/unnamed-chunk-312-1.png b/docs/Module1_files/figure-html/unnamed-chunk-312-1.png
new file mode 100644
index 0000000..9c48ca1
Binary files /dev/null and b/docs/Module1_files/figure-html/unnamed-chunk-312-1.png differ
diff --git a/docs/Module1_files/figure-html/unnamed-chunk-313-1.png b/docs/Module1_files/figure-html/unnamed-chunk-313-1.png
new file mode 100644
index 0000000..9c48ca1
Binary files /dev/null and b/docs/Module1_files/figure-html/unnamed-chunk-313-1.png differ
diff --git a/docs/Module1_files/figure-html/unnamed-chunk-314-1.png b/docs/Module1_files/figure-html/unnamed-chunk-314-1.png
new file mode 100644
index 0000000..d91889b
Binary files /dev/null and b/docs/Module1_files/figure-html/unnamed-chunk-314-1.png differ
diff --git a/docs/Module1_files/figure-html/unnamed-chunk-315-1.png b/docs/Module1_files/figure-html/unnamed-chunk-315-1.png
new file mode 100644
index 0000000..cd73061
Binary files /dev/null and b/docs/Module1_files/figure-html/unnamed-chunk-315-1.png differ
diff --git a/docs/advanced-control-flow.html b/docs/advanced-control-flow.html
index 9e1e47a..41477f0 100644
--- a/docs/advanced-control-flow.html
+++ b/docs/advanced-control-flow.html
@@ -222,21 +222,23 @@
Course Credits: 1. Because this week lasts four weeks, this course should “feel” like a 3 credit course for four weeks. Normally this means ~3 hours of lecture and 12 hours of work outside of lecture per week. Because this course is online, there will be 1 hour or less of “lecture” (see below), and about 14 hours of outside work per week.
Textbook: You’re reading it right now. The textbook will be your primary learning resource. You’ll be expected to read through the required sections, watch any relevant videos, and complete any reflections, progress checks, and assessments along the way. On days when a quiz is due, you should complete the reading before you take the quiz.
A vector is just an ordered set of elements (in other words, data), all of which have the same data type.
Vectors can be created for the logical, numeric (double or integer), or character data types.
Here’s an example of a vector:
-
x <-c(1, 2, 3) # this is a vector of numeric types
-print(x)
+
x <-c(1, 2, 3) # this is a vector of numeric types
+print(x)
[1] 1 2 3
Note that to create a vector, we use the c function, where c stands for combine.
This makes sense, because we are combining three numeric objects into a numeric vector.
We may determine the length of any atomic vector like so:
-
length(x)
+
length(x)
[1] 3
The class function will tell us what type of data is stored in a vector (which makes sense, because all elements of the vector have the same data type).
What’s happening in the above code is an example of type conversion, which we will talk more about later.
For now, remember that every element in an R vector is the same type.
You can create empty vectors as placeholders, by indicating the data type and how many elements there are:
-
empty <-numeric(10) # this creates a numeric vector of length 10
+
empty <-numeric(10) # this creates a numeric vector of length 10
This is the first instance of us using a name which is longer than a single character! This new vector is called empty.
Let’s print the contents of the vector:
-
print(empty)
+
print(empty)
[1] 0 0 0 0 0 0 0 0 0 0
Even though we didn’t tell R what data to put in the vector, it put a 0 in each element.
This is the default value for a new vector.
Here’s how you can create new vectors of other types:
-
empty_int <-integer(45) # create integer vector with 45 elements
-empty_cha <-character(2) # create character vector with 2 elements
-empty_log <-logical(1000) # create logical vector with 1000 elements!!
+
empty_int <-integer(45) # create integer vector with 45 elements
+empty_cha <-character(2) # create character vector with 2 elements
+empty_log <-logical(1000) # create logical vector with 1000 elements!!
We saw that the default value for a numeric vector is 0. Use the code above to create empty integer, character, and logical vectors, then print them out to see what default values R has given to each element. Do these make sense?
@@ -362,9 +364,9 @@
4.3.1 Vectors
What happens if we create a vector of length 1?
It turns out this is the same as just creating a single instance of that data type.
Observe how the following are the same.
-
a <-numeric(1) # create vector of length 1 (default value is 0, right?)
-b <-0# create single numeric with value 0
-a ==b # compare a and b to see if they are the same.
+
a <-numeric(1) # create vector of length 1 (default value is 0, right?)
+b <-0# create single numeric with value 0
+a ==b # compare a and b to see if they are the same.
[1] TRUE
@@ -375,14 +377,14 @@
4.3.1 Vectors
4.3.1.1 Accessing and Changing Elements
After you’ve created a vector, how do you put your data in them?
Here’s how you can change the value of a specific element:
-
a <-c(1, 2, 3) # create numeric vector of length 3
-a[2] <-4# change the value of the second element of a to 4
-a # print the result
+
a <-c(1, 2, 3) # create numeric vector of length 3
+a[2] <-4# change the value of the second element of a to 4
+a # print the result
[1] 1 4 3
See how the second element of a has changed?
So you can access a specific element using square brackets: [ and ].
In fact, if you want to know the value of the third element (without changing anything), just use:
-
a[3] # access the third element
+
a[3] # access the third element
[1] 3
@@ -403,27 +405,27 @@
4.3.1.1 Accessing and Changing El
4.3.1.2 Working with vectors
You can do many things with vectors that you can with single instances of each data type.
Recall, you can add a number to a numeric object:
-
a <-3# create a numeric object
-a +4# add a number to the object.
+
a <-3# create a numeric object
+a +4# add a number to the object.
[1] 7
The same thing is possible with numeric vectors:
-
a <-c(1, 2, 3) # create a numeric vector
-a +4# add a number to EACH ELEMENT of the vector!
+
a <-c(1, 2, 3) # create a numeric vector
+a +4# add a number to EACH ELEMENT of the vector!
[1] 5 6 7
This type of behavior is called elementwise behavior. That is, the operation is performed on each element separately.
Here are some other elementwise operations:
-
a -3
+
a -3
[1] -2 -1 0
-
a *1.5
+
a *1.5
[1] 1.5 3.0 4.5
-
a ^2
+
a ^2
[1] 1 4 9
-
a ==2
+
a ==2
[1] FALSE TRUE FALSE
R has some functions which summarize the values in a vector.
One such function is the sum function, which adds the values of each element in the vector:
-
print(a) # print the elements of a as a reminder
-sum(a) # add all the elements of a together.
+
print(a) # print the elements of a as a reminder
+sum(a) # add all the elements of a together.
[1] 1 2 3
[1] 6
@@ -432,35 +434,35 @@
4.3.1.2 Working with vectors
Some operations work on two vectors, as long as they are the same length:
-
b <-c(1, 0, 1)
-a +b
+
b <-c(1, 0, 1)
+a +b
[1] 2 2 4
-
b *a
+
b *a
[1] 1 0 3
-
a ^b
+
a ^b
[1] 1 1 3
You can even compare two vectors, and the result will be a logical vector:
-
z <-a >b # compare a and b, element by element, assign the result to z
-z # print the value of z
+
z <-a >b # compare a and b, element by element, assign the result to z
+z # print the value of z
[1] FALSE TRUE TRUE
The first logical value is the result of a[1] < b[1], the second logical value is the result of a[2] < b[2], etc.
what operations can we perform on character vectors?
Here are some examples:
-
z ==TRUE# which elements are TRUE?
+
z ==TRUE# which elements are TRUE?
[1] FALSE TRUE TRUE
This just produces z again (Do you see why?).
Here’s how to get the logical “opposite” of z:
-
z ==FALSE
+
z ==FALSE
[1] TRUE FALSE FALSE
Or, as we saw before, we can use !, which operates on each element of z:
-
!z
+
!z
[1] TRUE FALSE FALSE
Remember how logical objects can be treated as numeric objects (either a 0 or 1)?
If we use this with the sum function to determine how many elements are TRUE:
-
sum(z)
+
sum(z)
[1] 2
Here’s another example of using the sum function on a logical vector:
-
sum(a ==b) # how many elements do a and b have in common?
+
sum(a ==b) # how many elements do a and b have in common?
[1] 1
So there are no elements where a and b are the same.
@@ -469,17 +471,17 @@
4.3.1.2 Working with vectors
Let’s create some character vectors and explore a few things we can do with them:
a +b # Can you add a numeric vector to a character vector?
+
Error in a + b: non-numeric argument to binary operator
+
+
a +c # can you add a numeric vector to a logical vector?
[1] 2 3 3
@@ -514,12 +517,12 @@
4.3.1.3 Vectors of different type
4.3.1.4 Special Numeric Vectors
There are a few special ways of creating a numeric vector which can be very useful, so we’ll mention them here.
The first way creates a sequence of all integers between a starting and ending point:
-
d <-1:5# create sequence starting at 1 and ending at 5
-d
+
d <-1:5# create sequence starting at 1 and ending at 5
+d
[1] 1 2 3 4 5
Here’s a longer example:
-
d <-1:100# create sequence starting at 1 and ending at 5
-d
+
d <-1:100# create sequence starting at 1 and ending at 5
+d
Or you can also specify how long you want the vector to be, and seq will determine the appropriate interval to make the elements evenly spaced.
-
seq(1, 10, length.out=3)
+
seq(1, 10, length.out=3)
[1] 1.0 5.5 10.0
-
seq(1, 10, length.out=5)
+
seq(1, 10, length.out=5)
[1] 1.00 3.25 5.50 7.75 10.00
4.3.1.5 Another Data Type: Factor
In the previous section, we avoided talking about the factor data type, because we need the concept of vectors to appreciate their purpose, but now we are equipped to talk about them.
Consider the following example of a character vector:
There are seven elements in this vector (length(cha_vec) is 7), but there are only two unique elements, “cheese” and “crackers”.
Imagine having two write down this vector on a piece of paper, and the space it would take.
Now imagine writing down instead:
@@ -565,17 +568,17 @@
4.3.1.5 Another Data Type: Factor
This is the essence of what a factor data type is: A character vector stored more efficiently on the computer.
For a factor vector, R stores an integer vector (which often takes less space than a character vector), and also maintains a “lookup table” which keeps track of which integers correspond with which character strings.
To illustrate, let’s create a factor variable:
-
# create a new factor variable from our existing character vector:
-fac_vec <-factor(cha_vec)
+
# create a new factor variable from our existing character vector:
+fac_vec <-factor(cha_vec)
Notice how we started with a character vector and used the factor function to create a factor from it.
If we print the new vector,
it displays the elements as we would expect, but also includes another line of output giving Levels.
This shows that there are only two unique character strings, which are called factor levels.
Since R is using integers “behind the scenes” to store the vector, we can see those integers by using the as.integer function:
-
as.integer(fac_vec)
+
as.integer(fac_vec)
[1] 1 2 1 2 1 2 1
This is another example of type conversion, which we will discuss soon.
@@ -592,9 +595,9 @@
4.3.1.5 Another Data Type: Factor
There are a few neat things you can do with factor vectors.
By changing the levels, you can quickly change all occurrences of a string at once.
For example:
-
print(fac_vec)
-levels(fac_vec) <-c("peas", "carrots") # change the levels of fac_vec
-fac_vec
+
print(fac_vec)
+levels(fac_vec) <-c("peas", "carrots") # change the levels of fac_vec
+fac_vec
Given two vectors, it’s easy to combine them into one vector:
-
a <-c(1, 2, 3)
-b <-c(4, 5, 6, 7)
-c(a, b) # combine vectors a and b
+
a <-c(1, 2, 3)
+b <-c(4, 5, 6, 7)
+c(a, b) # combine vectors a and b
[1] 1 2 3 4 5 6 7
The combine function (c) is smart enough to recognize that a and b are vectors, and performs concatenation to create the resultant longer vector.
You can also use the combine function to add a single element to the end of a vector:
-
a <-c("CEO", "CFO") # initialize
-a <-c(a, "CTO") # redefine a by combining a with a new element
-a
+
a <-c("CEO", "CFO") # initialize
+a <-c(a, "CTO") # redefine a by combining a with a new element
+a
[1] "CEO" "CFO" "CTO"
@@ -625,9 +628,9 @@
4.3.1.6 Combining Vectors
What if you try to combine vectors of different types?
-
a <-c(1, 2, 3)
-b <-c("four", "five")
-c(a, b)
+
a <-c(1, 2, 3)
+b <-c("four", "five")
+c(a, b)
[1] "1" "2" "3" "four" "five"
Again, we see that the c function has converted all elements to be character strings, and the resultant vector is a character vector.
Since we’ve seen type conversion arise a few times now, it’s appropriate to talk more explicitly about how it works.
@@ -638,8 +641,9 @@
4.3.1.7 Type conversion
There may be times when you’d like to convert from one type of data into another.
An example would be the character string "1", which R does not view as a number.
Therefore, the following does not work:
-
"1"+ "2"# R can't add two character string
-
Error in "1" + "2": non-numeric argument to binary operator
+
"1"+ "2"# R can't add two character string
+
Error in "1" + "2": non-numeric argument to binary operator
+
To remedy issues like this, R provides functions in order to convert from one data type into another:
- as.character: converts to character
- as.numeric: converts to numeric
@@ -648,19 +652,19 @@
4.3.1.7 Type conversion
Using these functions, R will “do its best” to convert whatever you start with into the desired data type, but it’s not always possible to make the conversion.
Below are a few examples which do and don’t work well.
Converting from a numeric to a character vector is always possible:
-
x <-c(3, 2, 1)
-
y <-as.character(x) # Here's how to convert to a character vector
-print(x)
-print(y)
+
x <-c(3, 2, 1)
+
y <-as.character(x) # Here's how to convert to a character vector
+print(x)
+print(y)
[1] 3 2 1
[1] "3" "2" "1"
However, converting from a character vector to a numeric only works if the characters represent numbers.
Any element that won’t convert will be given
-
w <-c("1", "12.3", "-5", "22") # this character vector can be converted to numeric
-as.numeric(w)
+
w <-c("1", "12.3", "-5", "22") # this character vector can be converted to numeric
+as.numeric(w)
[1] 1.0 12.3 -5.0 22.0
-
v <-c("frank", "went", "to", "mars") # this character vector can't be converted to numeric
-as.numeric(v)
+
v <-c("frank", "went", "to", "mars") # this character vector can't be converted to numeric
+as.numeric(v)
Warning: NAs introduced by coercion
[1] NA NA NA NA
None of the elements can be converted into a number, so R prints a warning message, and the result is an NA in each element, which stands for “not available”.
@@ -680,17 +684,17 @@
4.3.1.7 Type conversion
If only part of a vector can be converted, then the result will contain some converted values and some NA’s:
-
u <-c("1.2", "chicken", "33")
-as.numeric(u)
+
u <-c("1.2", "chicken", "33")
+as.numeric(u)
Warning: NAs introduced by coercion
[1] 1.2 NA 33.0
What other conversions are possible?
Character vectors can also be converted into logical:
-
s <-c("TRUE", "FALSE", "T", "F", "cat") # all but the last element can be converted to logical
-as.logical(s)
+
s <-c("TRUE", "FALSE", "T", "F", "cat") # all but the last element can be converted to logical
+as.logical(s)
[1] TRUE FALSE TRUE FALSE NA
Based on the examples we’ve seen before, it should make sense that numeric vectors containing 0 or 1 can also be converted into a logical vector:
-
as.logical(c(1, 0, 1, 0)) # here we create the vector and convert it in the same line
+
as.logical(c(1, 0, 1, 0)) # here we create the vector and convert it in the same line
[1] TRUE FALSE TRUE FALSE
@@ -704,7 +708,7 @@
4.3.1.7 Type conversion
Remember that “solo” objects are just vectors of length 1, so any of these type conversions should work on a single object as well, like so:
-
as.numeric("99")
+
as.numeric("99")
[1] 99
Along with the conversion functions as...., there are companion functions which simply check whether a vector is of a certain type:
@@ -714,15 +718,15 @@
4.3.1.7 Type conversion
is.factor: checks if factor
Here are some examples:
-
a <-c("1", "2", "3")
-is.character(a)
+
a <-c("1", "2", "3")
+is.character(a)
[1] TRUE
-
is.numeric(a)
+
is.numeric(a)
[1] FALSE
-
a <-as.numeric(a)
-is.character(a)
+
a <-as.numeric(a)
+is.character(a)
[1] FALSE
-
is.numeric(a)
+
is.numeric(a)
[1] TRUE
@@ -732,7 +736,6 @@
4.3.1.7 Type conversion
To understand more about this, try typing ?c to bring up the documentation, and have a look at the “Details” section.
-
TODO: include conversion b/t character vec and factor vec.
@@ -740,9 +743,9 @@
4.3.2 Matrices
Not all data can be arranged as an ordered set of elements, so R has other data structures besides vectors.
Another data type is the matrix, which can be thought of as a grid of numbers.
Here’s an example of creating a grid:
-
data <-c(1, 2, 3, 4, 5, 6, 7, 8, 9)
-A <-matrix(data, 3, 3)
-A
If you’ve worked with matrices in a math class, you may have talked about some of the following operations:
Here we can find the transpose of a matrix (the rows become columns and the columns become rows):
-
t(A) # find the transpose
+
t(A) # find the transpose
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
-
# Find the trace:
-sum(diag(A)) # get the diagonal elements of A, then sum them.
+
# Find the trace:
+sum(diag(A)) # get the diagonal elements of A, then sum them.
[1] 15
Here are some things you can do with two matrices:
-
B <-matrix(1, 3, 3) # create a 3x3 matrix of all 1's (notice how we only need one 1?)
-
-A +B # Add two matrices together
+
B <-matrix(1, 3, 3) # create a 3x3 matrix of all 1's (notice how we only need one 1?)
+
+A +B # Add two matrices together
[,1] [,2] [,3]
[1,] 2 5 8
[2,] 3 6 9
[3,] 4 7 10
-
A *B # multiply the elements of A together
+
A *B # multiply the elements of A together
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
-
A %*%B # Perform matrix multiplication between A and B
+
A %*%B # Perform matrix multiplication between A and B
The error message: non-conformable arrays tells us that A and C have different shapes, so it’s impossible to multiply their matching elements together.
But you can still perform matrix multiplication between them:
A matrix is just a special case of a data structure called an array. Matrices have two dimensions (row and column), and arrays can have any number of dimensions (1, 2, 3, 4, 5, etc.). We won’t discuss arrays in this course much.
+
-
Try running the following code in R, which should produce an error message:
-
-
data <-c(4.5, 6.1, 3.3, 2.0)
- A <-matrix(data, 2, 3)
+
+
data <-c(4.5, 6.1, 3.3, 2.0)
+ A <-matrix(data, 2, 3)
Read the error message and the code carefully, and see if you can figure out the problem.
What change would you make to the above code so that it runs?
@@ -890,17 +894,17 @@
4.3.2 Matrices
[1] TRUE
Here we see each component of the list printed in order, with [[1]], [[2]], and [[3]] indicating the first, second, and third components.
To access just one of the components, use double square brackets ([[ and ]]):
-
# Get the second component of A
-A[[2]]
+
# Get the second component of A
+A[[2]]
[1] "chicken"
Notice that each component of A is a different data type (numeric, character, boolean), which is not a problem for lists.
Nothing was converted automatically, as we saw happen with vectors.
Here’s how to add a component to an existing list:
-
A[[4]] <-matrix(c(1, 2, 3, 4, 5, 6), 2, 3)
+
A[[4]] <-matrix(c(1, 2, 3, 4, 5, 6), 2, 3)
Notice how we accessed component 4, which didn’t exist yet, and assigned it a value.
We actually added a matrix as the fourth component, this is not possible with vectors!
Now A has four components:
-
A
+
A
[[1]]
[1] 42
@@ -930,20 +934,20 @@
4.3.2 Matrices
List components can also have names.
Here we add an component with a name:
-
A[["color"]] <- "yellow"
+
A[["color"]] <- "yellow"
Notice how this new component displays differently?
Instead of showing [[5]], the component is labeled with a dollar sign, then its name: $color.
You can access components using their name in two ways:
-
A[["color"]] # use double square brackets to access a named element
+
A[["color"]] # use double square brackets to access a named element
[1] "yellow"
-
A$color # use dollar sign to access a named element
+
A$color # use dollar sign to access a named element
[1] "yellow"
But the color component is also the fifth component of the list, so we can access it like this as well:
-
A[[5]]
+
A[[5]]
[1] "yellow"
Here’s a new list created by giving names to each element:
-
person <-list(name ="Millard Fillmore", occupation ="President", birth_year=1800)
-person
+
person <-list(name ="Millard Fillmore", occupation ="President", birth_year=1800)
+person
-When defining the rocks list, we’ve spread the command accross multiple lines for clarity. The commas at the end of some of the lines indicate that the list has more components, so R will continue reading the next line until it finds the closing parenthesis, ’.
+When defining the rocks list, we’ve spread the command accross multiple lines for clarity. The commas at the end of some of the lines indicate that the list has more components, so R will continue reading the next line until it finds the closing parenthesis, ).
There are so many sets of data that fit into this pattern, that R has a special data type called a data frame, which we will discuss in the next section.
@@ -1060,10 +1067,10 @@
4.3.2.2 Lists of Vectors
4.3.3 Data Frames
At their core, data frames are just lists of vectors, but they also have some extra features as well.
Here, we’ll re-define the rocks list from the previous section, but this time we’ll create it as a data frame:
-
rocks <-data.frame(type=c("igneous", "metamorphic", "sedimentary"),
-weight=c(21.2, 56.7, 3.8),
-age=c(120, 10000, 5000000))
-rocks # we'll add the specimen names later
+
rocks <-data.frame(type=c("igneous", "metamorphic", "sedimentary"),
+weight=c(21.2, 56.7, 3.8),
+age=c(120, 10000, 5000000))
+rocks # we'll add the specimen names later
-
rocks[,2] # get the second column, this gives a vector.
+
rocks[,2] # get the second column, this gives a vector.
[1] 21.2 56.7 3.8
Here’s how to get the shape of a data frame (number of rows and columns):
-
dim(rocks)
+
dim(rocks)
[1] 3 3
If we start with a list of vectors, we can convert it to a data frame with as.data.frame:
R comes with pre loaded with several data frames, such as mtcars, which contains data from the 1974 Motor Trend Magazine for 32 different automobiles:
-
mtcars
+
mtcars
-
names(rocks)[[1]] <- "rock type"
-rocks
+
names(rocks)[[1]] <- "rock type"
+rocks
+
-
-Row and column names are allowed to have spaces in them, but you must be careful how you access them. The following code will not work: rocks\(rock type</code> , because R will stop looking for the name you are referencing once it encounters a space. To access this column, you must enclose the reference in “backticks” ( <code>`</code> ) like so: <code>rocks\)rock type.
-
+
Row and column names are allowed to have spaces in them, but you must be careful how you access them.
+The following code will not work: rocks$rock type , because R will stop looking for the name you are referencing once it encounters a space.
+To access this column, you must enclose the reference in “backticks” ( ` ) like so: rocks$`rock type`.
+
Look at the set of available data sets in R, and pick 2 data sets.
For each data set, answer the following questions:
R can be very precise when performing computations.
However, viewing all of the digits stored by R can be distracting and hard to read.
You can show just some of the digits by using the round function:
-
a
-
[1] -11
-
round(a, 3)
-
[1] -11
+
a <-0.123456
+
round(a, 3)
+
[1] 0.123
It also turns out that R stores more digits than what it shows when it prints, though we won’t go into detail on that now.
4.2.2 Integer
In general, numeric data in R are treated as if they can be any decimal number (technically, they are a double precision number, if you know what that means; if not, it’s not important right now).
However, there is a way to specify that a specific numeric object is an integer, by placing an “L” at the end of it, like so:
-
x <-20# x will be a numeric object
-y <-20L # y will be an integer object
-
class(x)
+
x <-20# x will be a numeric object
+y <-20L # y will be an integer object
+
class(x)
[1] "numeric"
-
class(y)
+
class(y)
[1] "integer"
Integers take half of the space in a computer’s memory or hard drive, so if you are working with or storing a lot of numbers which are integers, it might make sense to declare them as integer type in R.
This will make more sense when we discuss vectors later.
@@ -354,57 +355,62 @@
4.2.3 Character
Not all data are numbers!
R also has the capability to store strings of characters, and this is the aptly named character type (or sometimes called a character string or just string).
Here are some examples:
-
d <- "Hello"# This string is defined with *double* quotes
-e <- 'how are you?'# This string is defined with *single* quotes!
-print(d)
-print(e)
+
d <- "Hello"# This string is defined with *double* quotes
+e <- 'how are you?'# This string is defined with *single* quotes!
+print(d)
+print(e)
[1] "Hello"
[1] "how are you?"
Notice how we can define character strings using single quotes or double quotes, as long as we are consistent.
So this is not valid:
-
# Note the mismatched single/double quotes:
-f <- "this does not work'
-
Error: <text>:2:6: unexpected INCOMPLETE_STRING
+
# Note the mismatched single/double quotes:
+f <- "this does not work'
+
Error: :2:6: unexpected INCOMPLETE_STRING
1: # Note the mismatched single/double quotes:
-2: f <- "this does not work'
- ^
+2: f <- "this does not work'
+ ^
+
So, make sure you are consistent.
However, you may see another problem with this: some strings contain quotes in them, like this:
-
g <- 'This won't work'
-
Error: <text>:1:16: unexpected symbol
-1: g <- 'This won't
- ^
+
g <- 'This won't work'
+
Error: :1:16: unexpected symbol
+1: g <- 'This won't
+ ^
+
Since single quotes are being used to define the string, they can’t be used in the string itself, because R will “think” the string is ending at the second '.
One option is to change the defining quotes to be double quotes, then the single quote will be safely included in the string:
-
g <- "I'm happy that this works!"
-print(g)
+
g <- "I'm happy that this works!"
+print(g)
[1] "I'm happy that this works!"
Another option is to use a backslash when using quotes inside the string, so that R “knows” the quote is part of the string and not ending the definition of the string:
-
g <- 'I\'ve found another way that works!'
-print(g)
+
g <- 'I\'ve found another way that works!'
+print(g)
[1] "I've found another way that works!"
Notice that when we define g we place a \' anywhere in the string where we want a ' to be, but when printed out, we see that R has interpreted it as just a '.
Notice also that we didn’t have to change the defining quotes to be double quotes in this case.
The backslash is called the escape character, and it signifies that what follows it should be interpreted literally by R, and any special meaning should be ignored.
-Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another escape character, like so: g <- “here is a backslash: \”
+Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another backslash, which functions as an escape character, like so: g <- “here is a backslash: \\”. You will see both backslashes when using the print function (which is meant for any data type), but if you use the special cat function (which is meant for character types specifically), all escape characters will be “processed”, and you will see just a single backslash.
+
+
+Try the same thing with the newline character, \n!
To see a list of special characters, try typing ?Quotes into the R console
Here is an important string to know about:
-
h <- ""# This string is empty!
+
h <- ""# This string is empty!
h is a character string with no characters, called an empty string.
You can perform math on numeric data, so what can you do with strings?
The answer is, quite alot, using some functions that R provides.
Here are some of them:
-
nchar(g) # This prints out the number of characters in a string
+
nchar(g) # This prints out the number of characters in a string
[1] 34
-
substr(g, 6, 10) # This extracts just part of a string, using the start and stop positions you provide
+
substr(g, 6, 10) # This extracts just part of a string, using the start and stop positions you provide
[1] "found"
-
strsplit(g, " ") # This splits the string up using a specified "delimiter" string, a single space in this case.
+
strsplit(g, " ") # This splits the string up using a specified "delimiter" string, a single space in this case.
When you split a string, this produces a list containing a vector of character strings. This is an example of how data can be organized in a structured way. We’ll talk more about so called data structures in the next section.
-
paste("hello", "world") # This combines multiple strings together into one string!
+
paste("hello", "world") # This combines multiple strings together into one string!
[1] "hello world"
@@ -426,45 +432,45 @@
4.2.4 Logical
Numeric objects can be any number, character objects can be any string of characters, but logical objects can only be two different values: True or False
Logical data types are also known as “boolean” data types.
Here we define some Logical objects:
-
a <-TRUE
-b <-FALSE
-c <-T
-d <-F
-
print(a)
+
a <-TRUE
+b <-FALSE
+c <-T
+d <-F
+
print(a)
[1] TRUE
-
print(b)
+
print(b)
[1] FALSE
-
print(c)
+
print(c)
[1] TRUE
-
print(d)
+
print(d)
[1] FALSE
So you can see that we can define a logical object using the full name or just the first letter.
Here’s how to get the “opposite” of a logical object
-
!a
+
!a
[1] FALSE
Logical data are the simplest type, but there are actually some clever things you can do with them.
You can test whether simple mathematical expressions are true or false.
-
# create x and y
-x <-3
-y <-4
-# check: is x less than y? (should give TRUE)
-x <y
+
# create x and y
+x <-3
+y <-4
+# check: is x less than y? (should give TRUE)
+x <y
[1] TRUE
The third command is a way to check if the value of x is less than the value of y.
The result of this comparison is a logical, in this case, TRUE.
Here are other ways of making comparisons:
-
x <=y # check if x is less or equal to y
+
x <=y # check if x is less or equal to y
[1] TRUE
-
x ==y # check if x is equal to y (note how you need two equals signs)
+
x ==y # check if x is equal to y (note how you need two equals signs)
[1] FALSE
-
x >=y # check if x is greater or equal to y
+
x >=y # check if x is greater or equal to y
[1] FALSE
-
x >=y # check if x is greater than y
+
x >=y # check if x is greater than y
[1] FALSE
Comparisons can be made using strings as well:
-
x <- "Hello"
-y <- "hello"
-x ==y
+
x <- "Hello"
+y <- "hello"
+x ==y
[1] FALSE
@@ -472,15 +478,15 @@
4.2.4 Logical
Of course any object (like x) will be equal to itself:
-
x ==x
+
x ==x
[1] TRUE
Surprisingly, logicals can be treated as numerics, where TRUE is treated as 1 and FALSE is treated as 0.
Here are some examples:
-
TRUE+TRUE# TRUE is treated as 1
+
TRUE+TRUE# TRUE is treated as 1
[1] 2
-
FALSE*7# FALSE is treated as 0
+
FALSE*7# FALSE is treated as 0
[1] 0
-
(2<3) +(1==2) # What's going on here?
+
(2<3) +(1==2) # What's going on here?
[1] 1
The last example deserves some thought. Start with each expression in parentheses, and decide whether it will evaluate to true or false.
Then remember how logicals are treated as numbers, and determine what happens when you add them together.
diff --git a/docs/downloading-and-saving.html b/docs/downloading-and-saving.html
index b19165b..7f9f84a 100644
--- a/docs/downloading-and-saving.html
+++ b/docs/downloading-and-saving.html
@@ -4,11 +4,11 @@
- 5.2 Downloading and Saving | R Module 1
+ 5.3 Downloading and Saving | R Module 1
-
+
@@ -16,7 +16,7 @@
-
+
@@ -222,21 +222,23 @@
This section is still under construction
diff --git a/docs/getoutoftheclass.html b/docs/getoutoftheclass.html
index dbf4120..b6f893e 100644
--- a/docs/getoutoftheclass.html
+++ b/docs/getoutoftheclass.html
@@ -222,21 +222,23 @@
Click “Download R for Windows”, then click “base”.
-
Finally, Click “Download R X.Y.Z for Windows”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version r_version.
+
Finally, Click “Download R X.Y.Z for Windows”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version 4.0.2.
Your computer might prompt for the location on your computer that you would like to save the file. Select a location (reasonable options are your Downloads folder or the Desktop) and select “save”.
When the download completes, find the downloaded file in the File Explorer and double click to run it. This will start the installation process.
Follow the on screen prompts. For the most part you can click “next” and “install” as appropriate, and you don’t have to worry about changing any installation settings.
@@ -298,7 +300,7 @@
3.2.1.1 Windows
3.2.1.2 Mac OS X
Click “Download R for (Mac) OS X”
-
Click “R-X.Y.Z.pkg”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version r_version.
+
Click “R-X.Y.Z.pkg”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version 4.0.2.
Your computer might prompt for the location on your computer that you would like to save the file. Select a location and select “save”.
When the download completes, find the downloaded file in the Finder and double click to run it. This will start the installation process.
Follow the on screen prompts. For the most part you can click “continue”, “agree”, “install”, as appropriate, and you don’t have to worry about changing any installation settings.
This section is still under construction
diff --git a/docs/prelim.html b/docs/prelim.html
index 9ccd63d..e9ed6fb 100644
--- a/docs/prelim.html
+++ b/docs/prelim.html
@@ -222,21 +222,23 @@
R also has rules that must be followed in order for a human ( you ) to communicate with a computer, in order to tell the computer what to do.
In human language, grammar is often fluid and evolving, and two people may have to adapt their use of the language in order to communicate.
-With R, the fules are fixed, and the computer “knows” them perfectly.
+With R, the rules are fixed, and the computer “knows” them perfectly.
It is up to you to learn the rules in order to make the computer do exactly what you want it to do.
Since any computer programming language will do exactly what you tell it to do,
it’s important to cover some of the basic rules of the R programming language
@@ -291,29 +293,29 @@
4.1.1 R Commands
Like most programming languages, R consists of a set of commands which form the sequence of instructions which the computer completes. You can think of commands
as the verbs of R, they are the actions the computer will take.
Here is an example of a command, followed by the result.
-
print("hello, world!")
+
print("hello, world!")
[1] "hello, world!"
This command is telling R to print out a message.
R code usually contains more than one command, and typically each command is put on a separate line.
Here are multiple commands, each on a separate line:
-
print("The air is fine!")
-print(1+1)
-print(4>5)
+
print("The air is fine!")
+print(1+1)
+print(4>5)
[1] "The air is fine!"
[1] 2
[1] FALSE
The first command prints another message, the second command does some math then then prints the result, and the third command evaluates whether the statement is true or false and prints the result.
Generally, it’s a good idea to put separate commands on separate lines, but you can put multiple commands on the same line, as long as you separate them by a semicolon.
See this code for example:
-
x <-1+1; print(x); print(x^2)
+
x <-1+1; print(x); print(x^2)
[1] 2
[1] 4
In this example, three commands are given on one line.
The first command creates a new variable called x, the second command prints the value of x, and the third command prints the value of xsquared.
We see that the semicolon, ;, serves as the command termination, because it tells R where one command ends and another begins.
-When a line contains a single command, no semicolon in necessary at the end, but including a semicolon doesn’t have any effect either.
-
print("This line doesn't have a semicolon")
-print("This line does have a semicolon");
+When a line contains a single command, no semicolon is necessary at the end, but including a semicolon doesn’t have any effect either.
+
print("This line doesn't have a semicolon")
+print("This line does have a semicolon");
[1] "This line doesn't have a semicolon"
[1] "This line does have a semicolon"
@@ -323,7 +325,7 @@
4.1.1 R Commands
-You’ve just seen your first example of assignment. That is, we created a thing called x , and assigned to it the value of 1+1 using the assignment operator, <-. Formally x is called an object, but we’ll talk about that more objects and assignments later.
+You’ve just seen your first example of assignment. That is, we created a thing called x , and assigned to it the value of 1+1 using the assignment operator, <-. Formally x is called an object, but we’ll talk more about objects and assignment later.
@@ -339,22 +341,22 @@
4.1.1 R Commands
Lastly, commands can be “grouped together” using left and right curly braces: { and }.
Here’s an example:
-
{
-print("here's some code that's all grouped together")
-print(2^3-7)
- w <- "hello"
-print(w)
-}
+
{
+print("here's some code that's all grouped together")
+print(2^3-7)
+ w <- "hello"
+print(w)
+}
[1] "here's some code that's all grouped together"
[1] 1
[1] "hello"
The above grouped code is indented so that it looks nice, but it doesn’t have to be:
-
{
-print("here's some code that's all grouped together")
-print(2^3-7)
-w <- "hello"
-print(w)
-}
+
{
+print("here's some code that's all grouped together")
+print(2^3-7)
+w <- "hello"
+print(w)
+}
[1] "here's some code that's all grouped together"
[1] 1
[1] "hello"
@@ -368,10 +370,10 @@
4.1.1 R Commands
However, code grouping will become very important later on, when we discuss control flow later.
-There are several helpful shortcuts that you can use in R. If you forget to put quotes around something, you can highlight and press the quote key and it will add quotes to both sides. This works with parenthesis too.
+There are several helpful shortcuts that you can use in R. If you forget to put quotes around something, you can highlight and press the quote key and it will add quotes to both sides. This works with parentheses too.
-You can also use tab completion with functions and defined variables. Tab completion allows you to use the same amount of time using a longer, descriptive variable name as a short, meaningless, and easily confused one. This can save you a lot of time and reduce mistakes!
+You can also use tab completion with functions and defined variables. Tab completion allows you to use longer, more descriptive variable names without the additional typing time. This can save you a lot of time and reduce mistakes!
@@ -386,10 +388,10 @@
4.1.2 Comments
This can be done with comments, which are ignored by R when it is running the code.
The “#” comment
Here’s an example of some comments:
-
# Let's define y and z
-y <-8
-z <-y +5# adding 5 to y and assigning the result to z
-## This is still a comment, even though we're using two #'s
+
# Let's define y and z
+y <-8
+z <-y +5# adding 5 to y and assigning the result to z
+## This is still a comment, even though we're using two #'s
Notice that it’s possible for a line to contain only a comment, or for part of a line to be a comment.
R decides which part of a line is a comment by looking for the first “#”, and everything after that will be treated as a comment and ignored.
@@ -401,10 +403,10 @@
4.1.2 Comments
4.1.3 Blank Lines
Blank lines in R are ignored, but they can be used to organize code and enhance readability:
-
print("The sky is blue")
-# the blank line below here is ignored
-
-print("The grass is green")
+
print("The sky is blue")
+# the blank line below here is ignored
+
+print("The grass is green")
[1] "The sky is blue"
[1] "The grass is green"
@@ -413,18 +415,19 @@
4.1.4 CaSe SeNsItIvItY
In R, variables, functions, and other objects (all of which we’ll talk about later), have names.
These names are case sensitive, so you must be careful when referencing an object by name.
Here we create two variables and give them different values, notice how they are different from each other:
-
A <-4
-a <-5
-
-print(a)
-print(A)
+
A <-4
+a <-5
+
+print(a)
+print(A)
[1] 5
[1] 4
This may seem obvious, but case sensitivity applies to functions (which we’ll talk about later) too.
We’ve been using the print function a lot in the above examples, which begins with a lower case p.
There is no Print function:
-
Print("testing")
-
Error in Print("testing"): could not find function "Print"
+
Print("testing")
+
Error in Print("testing"): could not find function "Print"
+
Before diving into detail, let’s do a quick example so you can begin to see what is possible with data in R.
+As we mentioned in the last chapter, R includes some pre-packaged data sets, which you can access with the data() command.
+One of the data sets is Seatbelts, which documents road casualties in Great Britain between 1969 and 1984.
+Firstly, we need to convert Seatbelts to a dataframe, because it starts out as a “Time-Series”, which we haven’t discussed.
look at the dimensions of the data set with the dim command:
+
dim(Seatbelts) # get the number of dimensions in the Seatbelts dataframe
+
[1] 192 9
+
This shows that there are 192 rows (months), and 9 columns (variables measured each month).
+We could also determine the number of rows and columns separately using the nrow and ncol functions.
+To view the first few rows of the Seatbelts dataframe, use the head command:
+
head(Seatbelts) # view first few rows of the Seatbelts dataset
+
+
+
+
This is a good way to learn which variables are being measured (columns) and see some example observations (rows) for each variable.
+Because these data are included with R, more information about each variable can be found with:
+
?Seatbelts
+
Next, let’s view a summary of each column with the summary function:
+
summary(Seatbelts)
+
DriversKilled drivers front rear
+ Min. : 60.0 Min. :1057 Min. : 426.0 Min. :224.0
+ 1st Qu.:104.8 1st Qu.:1462 1st Qu.: 715.5 1st Qu.:344.8
+ Median :118.5 Median :1631 Median : 828.5 Median :401.5
+ Mean :122.8 Mean :1670 Mean : 837.2 Mean :401.2
+ 3rd Qu.:138.0 3rd Qu.:1851 3rd Qu.: 950.8 3rd Qu.:456.2
+ Max. :198.0 Max. :2654 Max. :1299.0 Max. :646.0
+ kms PetrolPrice VanKilled law
+ Min. : 7685 Min. :0.08118 Min. : 2.000 Min. :0.0000
+ 1st Qu.:12685 1st Qu.:0.09258 1st Qu.: 6.000 1st Qu.:0.0000
+ Median :14987 Median :0.10448 Median : 8.000 Median :0.0000
+ Mean :14994 Mean :0.10362 Mean : 9.057 Mean :0.1198
+ 3rd Qu.:17203 3rd Qu.:0.11406 3rd Qu.:12.000 3rd Qu.:0.0000
+ Max. :21626 Max. :0.13303 Max. :17.000 Max. :1.0000
+ date
+ Min. :1969
+ 1st Qu.:1973
+ Median :1977
+ Mean :1977
+ 3rd Qu.:1981
+ Max. :1985
+
Since each column is numeric, R shows a five number summary (minimum, first quartile, median, third quartile, maximum) and mean for each column.
+We learn, for example, that the average number of drivers killed per month is 1670, and that the greatest price of petrol was £0.13 per litre!
+Let’s view a histogram of DriversKilled:
+
hist(Seatbelts$DriversKilled, breaks=20)
+
+
+
+Figure 5.1: Histogram of Drivers Killed in Seatbelt data
+
+
+
We see that in some months, more than 150 drivers were killed!
+We can calculate how many exactly like so:
+
sum(Seatbelts$DriversKilled >150)
+
[1] 33
+
To investigate the effect of the seat belt law, let’s create a scatter plot Drivers killed against time:
+
plot(Seatbelts$date, Seatbelts$DriversKilled)
+
+
+
+Figure 5.2: UK Seatbelt deaths vs time
+
+
+
By adding a col argument to the plot function, we can color the points based on whether the law was in effect:
+Figure 5.3: UK Seatbelt deaths vs time, red = no seatbelt law, green = seatbelt law
+
+
+
There do appear to be fewer deaths, but there is so much fluctuation in deaths each year that it’s difficult to tell.
+Let’s change the x-axis to reflect month of the year instead of date:
This plot shows that there is a clear seasonal effect in the number of deaths with higher deaths occurring in the Fall/Winter compared to Spring/Summer.
+We can also see that within each month, the traffic deaths after enacting the Seatbelt law are among the lowest.
+
+
+Another data set included in R is mtcars. Following the example above, find the dimension of mtcars and have R print out a summary of each column, then create a scatter plot of fuel economy (mpg) to engine displacement. What do you observe about the relationship between these two variables?
+
+
+
This concludes the quick example.
+In the rest of this chapter, we’ll talk in more detail about the different steps of working with data, and how to complete them using R!
+
+
+People often use data in order to answer questions, but often times, learning about data can generate even more questions than it answers. Take a moment to think of a question that you have about the Seatbelts dataset. Do you think the question can be answered using the data alone? If not, what other sources of data might be available which can help answer the question?
+
Any object can be assigned to a variable, as we’ve been doing already.
Here’s an example:
-
a <- "pink pineapple"
+
a <- "pink pineapple"
The <- is called an assignment operator.
This is the most common way of assigning objects in R, but there are others.
Sometimes you may see:
-
a = "pink pineapple"
+
a = "pink pineapple"
which in most cases, has the exact same effect as using the <-, but in a few instances, it has a different effect.
Our recommendation is to always use <- when making object assignments.
@@ -310,10 +312,10 @@
4.4.2 Assigning Objects
One neat thing you can do is assign multiple variables at the same time:
-
a <-b <- "Hello"
-
a
+
a <-b <- "Hello"
+
a
[1] "Hello"
-
b
+
b
[1] "Hello"
@@ -325,11 +327,11 @@
4.4.2 Assigning Objects
4.4.3 Attributes
Every object in R has attributes, extra information that’s “attached” to the object.
Every object has a length attribute:
-
a <-c(1, 2, 3, 4)
-b <-c("bonjour", "au revoir")
-
length(a)
+
a <-c(1, 2, 3, 4)
+b <-c("bonjour", "au revoir")
+
length(a)
[1] 4
-
length(b)
+
length(b)
[1] 2
@@ -338,9 +340,9 @@
4.4.3 Attributes
Every R object has a mode as well, which tells you what type of object you have.
Here are some examples:
-
mode(a)
+
mode(a)
[1] "numeric"
-
mode(b)
+
mode(b)
[1] "character"
@@ -348,7 +350,7 @@
4.4.3 Attributes
Aside from these two attributes, you can list all attributes of an object like this:
To access a specific attribute of an object, you can use the dollar sign ($):
-
attr(mtcars, "class") # get the class attribute for the mtcars dataframe
+
To access a specific attribute of an object, you can do this:
+
attr(mtcars, "class") # get the class attribute for the mtcars dataframe
[1] "data.frame"
4.4.4 Null Objects
There is a special object called the NULL object, which is really just represents “nothing”.
It’s used mainly if you want to remove an element from a list:
-
a <-list(1, 2, 3)
-a[[2]] <-NULL# replace component 2 with "nothing"
-a
+
a <-list(1, 2, 3)
+a[[2]] <-NULL# replace component 2 with "nothing"
+a
[[1]]
[1] 1
@@ -390,14 +392,15 @@
4.4.4 Null Objects
4.4.5 Removing Objects
Sometimes you want to get rid of an object!
In R, you can use the rm function like so:
-
a <- "an object"
-rm(a)
-a
-
Error in eval(expr, envir, enclos): object 'a' not found
+
a <- "an object"
+rm(a)
+a
+
Error in eval(expr, envir, enclos): object 'a' not found
+
As you can see, the error message indicates that a has been removed.
Sometimes, you’d like to remove all the objects in your environment.
To do this, you can use the command:
-
rm(list=ls())
+
rm(list=ls())
Any feedback for this section? Click here
diff --git a/docs/r-programming-fundamentals.html b/docs/r-programming-fundamentals.html
index eead80e..28f4c0c 100644
--- a/docs/r-programming-fundamentals.html
+++ b/docs/r-programming-fundamentals.html
@@ -222,21 +222,23 @@
Now that you’re somewhat familiar with RStudio, let’s run the same code as we ran on the website, but this time let’s run it in R.
3.4.1 The R Console:
-
In the R console, type 1+1 and press enter.
+
In the R console, type 1+1 and press enter.
The output in the console should look like the following:
@@ -290,15 +292,15 @@
3.4.2 R scripts
Figure 3.2: Click this button to create a new file
-
In the script window which opens, type 1+1 and press enter.
+
In the script window which opens, type 1+1 and press enter.
Notice how now, the code did not run?
In a script, you are free to write R code on several lines before you run it.
You can even save the script and load it later in order to run the code it contains.
There are multiple ways to run R code in a script.
To run a single line of code, do one of the following:
-
Place the cursor on the desired line, hold the <control> key, and press enter. On Mac OS X, hold <command> key and press enter instead
-
Place the curse on the desired line and click the Run button that looks like this:
+
Place the cursor on the desired line, hold the <control> key, and press enter. On Mac OS X, hold <command> key and press return instead
+
Place the cursor on the desired line and click the Run button that looks like this:
@@ -307,7 +309,7 @@
3.4.2 R scripts
To run multiple lines of code, do one of the following:
-- Highlight all the code you’d like to run, hold the <control> key, and press enter. On Mac OS X, hold the <command> key and press enter instead.
+- Highlight all the code you’d like to run, hold the <control> key, and press enter. On Mac OS X, hold the <command> key and press return instead.
- Highlight all the code you’d like to run, and click the Run button.
Run the 1+1 code using one of the methods above, and observe the output.
Notice how the output is still in the console window, even you ran the code in a script!
The box comes with some code entered already, but we want to use our own code instead, so delete all the text, from library(ggplot2) to factor(cyl)).
+
The box comes with some code entered already, but we want to use our own code instead, so delete all the text, from before library(ggplot2) to after factor(cyl)).
In its place, type 1+1, then click the big green “Run” button.
You should see the [1] 2 displayed below.
So if you give R a math expression, it will evaluate it and give the result.
diff --git a/docs/search_index.json b/docs/search_index.json
index 71e817e..5b1c4bf 100644
--- a/docs/search_index.json
+++ b/docs/search_index.json
@@ -1,30 +1,31 @@
[
-["index.html", "R Module 1 Chapter 1 Welcome!", " R Module 1 Alex Fout1 09 Jul, 2020, 04:20 PM Chapter 1 Welcome! Hi, and welcome to the R Module 1 (AKA STAT 158) course at Colorado State University! This course is the first of three 1 credit courses intended to introduce the R programming language to those with little or no programming experience. Through these Modules (courses), we’ll explore how R can be used to do the following: Perform basic computations and logic, just like any other programming language Load, clean, analyze, and visualise data Run scripts Create reproducible reports so you can explain your work in a narrative form In addition, you’ll also be exposed to some aspects of the broader R community, including: R as free, open source software The RStudio free software Publicly available packages which extend the capability of R Events and community groups which advocate for the use of R and the support of R users More detail will be provided in the Course Topics laid out in the next chapter. 1.0.1 How To Navigate This Book To move quickly to different portions of the book, click on the appropriate chapter or section in the the table of contents on the left. The buttons at the top of the page allow you to show/hide the table of contents, search the book, change font settings, download a pdf or ebook copy of this book, or get hints on various sections of the book. The faint left and right arrows at the sides of each page (or bottom of the page if it’s narrow enough) allow you to step to the next/previous section. Here’s what they look like: Figure 1.1: Left and right navigation arrows Department of Statistics, Colorado State University, fout@colostate.edu↩︎ "],
+["index.html", "R Module 1 Chapter 1 Welcome!", " R Module 1 Alex Fout1 17 Jul, 2020, 10:28 AM Chapter 1 Welcome! Hi, and welcome to the R Module 1 (AKA STAT 158) course at Colorado State University! This course is the first of three 1 credit courses intended to introduce the R programming language to those with little or no programming experience. Through these Modules (courses), we’ll explore how R can be used to do the following: Perform basic computations and logic, just like any other programming language Load, clean, analyze, and visualise data Run scripts Create reproducible reports so you can explain your work in a narrative form In addition, you’ll also be exposed to some aspects of the broader R community, including: R as free, open source software The RStudio free software Publicly available packages which extend the capability of R Events and community groups which advocate for the use of R and the support of R users More detail will be provided in the Course Topics laid out in the next chapter. 1.0.1 How To Navigate This Book To move quickly to different portions of the book, click on the appropriate chapter or section in the the table of contents on the left. The buttons at the top of the page allow you to show/hide the table of contents, search the book, change font settings, download a pdf or ebook copy of this book, or get hints on various sections of the book. The faint left and right arrows at the sides of each page (or bottom of the page if it’s narrow enough) allow you to step to the next/previous section. Here’s what they look like: Figure 1.1: Left and right navigation arrows Department of Statistics, Colorado State University, fout@colostate.edu↩︎ "],
["associated-csu-course.html", "1.1 Associated CSU Course", " 1.1 Associated CSU Course This bookdown book is intended to accompany the associated course at Colorado State University, but the curriculum is free for anyone to access and use. If you’re reading the PDF or EPUB version of this book, you can find the “live” version at https://csu-r.github.io/Module1/, and all of the source files for this book can be found at https://github.com/CSU-R/Module1. If you’re not taking the CSU course, you will periodically encounter instructions and references which are not relevant to you. For example, we will make reference to the Canvas website, which only CSU students enrolled in the course have access to. "],
["prelim.html", "Chapter 2 Course Preliminaries", " Chapter 2 Course Preliminaries This course is presented as a bookdown document, and is divided into chapters and sections Each week, you’ll be expected to read through the chapter and complete any associated exercises, quizzes, or assignments. 2.0.1 Special Boxes Throughout the book, you’ll encounter special boxes, each with a special meaning. Here is an example of each type of box: This box will prompt you to pause and reflect on your experience and/or learning. No feedback will be given, but this may be graded on completion. This box will signify a quiz or assignment which you will turn in for grading, on which the instructor will provide feedback. This box is for checking your understanding, to make sure you are ready for what follows. This box is for displaying/linking to videos in order to help illustrate or communicate concepts. This box will warn you of possible problems or pitfalls you may encounter! This box is to provide material going beyond the main course content, or material which will be revisited later in more depth. This box will prompt for your feedback on the organization of the course, so we can improve the material for everyone! Any of the boxes may include hyperlinks like this: I am a link or code like this This is code. 2.0.2 How This Book Displays Code In addition, you may see R code either as part of a sentence like this: 1+1, or as a separate block like so: 1+1 [1] 2 Sometimes (as in this example) we will also show the output (in yellow), that is, the result of running the R code. In this case the code 1+1 produced the output 2. If you hover over a code block with your mouse, you will see the option to copy the code to your clipboard, like this: Figure 2.1: copying code from this book This will be useful when you are asked to run code on your computer. 2.0.3 Next Steps When you’re ready, go to the next section to learn about the course syllabus and grading policies. Any feedback for this section? Click here "],
-["course-topics-syllabus.html", "2.1 Course Topics & Syllabus", " 2.1 Course Topics & Syllabus Broadly speaking, the topics of this course are described by the Chapter Titles. Here’s what each entails: - Course Preliminaries: Introduction to R and the world of R - Installing R: Like it sounds, setting up your computer so you can work with R. - R Programming Fundamentals: The basics of programming in R, the building blocks that you need in order to do anything more interesting. - Working with Data: How to do meaningful things with data sets. Probably the most useful Chapter of the book. - Creating R Programs: More programming concepts to increase your R Power! 2.1.1 Syllabus First, some important details: Instructor: Alex Fout Office Hours: TBD on Google Meet (look for email invite) Webpages: Canvas, this textbook Course Credits: 1. Because this week lasts four weeks, this course should “feel” like a 3 credit course for four weeks. Normally this means ~3 hours of lecture and 12 hours of work outside of lecture per week. Because this course is online, there will be 1 hour or less of “lecture” (see below), and about 14 hours of outside work per week. Textbook: You’re reading it right now. The textbook will be your primary learning resource. You’ll be expected to read through the required sections, watch any relevant videos, and complete any reflections, progress checks, and assessments along the way. On days when a quiz is due, you should complete the reading before you take the quiz. Prerequisites: None Progress Checks: As you work your way through the textbook, you’ll encounter purple “Progress Check” boxes. For Week 1, you’ll submit your responses directly to canvas. For weeks 2-4, you’ll fill in a R Markdown document and submit it to canvas. You’ll be provided a template to fill in as you complete the progress checks. To turn in the document, you’ll knit the document to HTML or PDF and upload to Canvas. (More details coming later in the book!). Progress checks will be graded on completion, organization, and correctness. Homework: About once per week, you’ll complete an assignment using R. Homeworks must be turned in by 11:59pm (Mountain) on the day they are due. Exams: There will be no exams in this course Quizzes: Once per week, there will be a 15 minute Canvas quiz. Quizzes must be completed by 11:59pm (Mountain) on the day they are due. Lectures: Since we aren’t having in-person lectures, we will hold short mini-lectures instead (more details on Canvas). These will be shorter than a traditional lecture (approximately 10-30 minutes), and the purpose will be to allow some interaction between everyone in the course and to allow the instructor to introduce any relevant topics and address any challenges that students are having. Grading: The grading for the course is apportioned like so: Progress Checks: 30% Homework: 40% Quizzes: 30% 2.1.2 Schedule Week Weekday Date Reading Due 1 Monday July 13 1, 2 Progress Check 1 1 Wednesday July 15 3 Quiz 1 1 Friday July 17 4.1, 4.2 Assignment 1 2 Monday July 20 4.3 Progress Check 2 2 Wednesday July 22 4.4, 4.5 Quiz 1 2 Friday July 24 5.X (TBD) Assignment 2 3 Monday July 27 5.X (TBD) Progress Check 3 3 Wednesday July 29 5.X (TBD) Quiz 1 3 Friday July 31 5.X (TBD) Assignment 3 4 Monday August 03 6.X (TBD) Progress Check 4 4 Wednesday August 05 6.X (TBD) Quiz 4 4 Friday August 07 6.X (TBD) Assignment 4 2.1.3 Course Policies Late Work: Homework and Progress Checks must be turned in on time to receive full credit. You may turn in Homework and Progress Checks up to 2 days late for up to 50% credit. Group Work: Students are welcome to discuss the course with each other, but all work you turn in must be your own. This means no sharing solutions to homework, progress checks, or quizzes. You may not work with other students on quizzes. You are welcome to seek help on Canvas discussion boards and during office hours. Students with Disabilities: The university is committed to providing support for students with disabilities. If you have an accommodation plan, please provide that to me as soon as possible so we can discuss appropriate arrangements. Growth Mindset: This phrase was coined by Carol Dweck to reflect how your learning outcomes can be affected by the way you view the learning process. To quote Dweck: “The view you adopt for yourself profoundly affects the way you lead your life… Believing that your qualities are carved in stone - the fixed mindset - creates an urgency to prove yourself over and over. If you have only a certain amount of intelligence, a certain personality, and a certain moral character — well, then you’d better prove that you have a healthy dose of them. It simply wouldn’t do to look or feel deficient in these most basic characteristics… There’s another mindset in which these traits are not simply a hand you’re dealt and have to live with, always trying to convince yourself and others that you have a royal flush when you’re secretly worried it’s a pair of tens. In this mindset, the hand you’re dealt is just the starting point for development. This growth mindset is based on the belief that your basic qualities are things you can cultivate through your efforts. Although people may differ in every which way — in their initial talents and aptitudes, interests, or temperaments — everyone can change and grow through application and experience.” Programming may be a very new, intimidating thing for you. That’s okay! View this course as a way to grow and gain new skills which you can use to do incredible and important things! Learn by doing: A wise statistics instructor once compared watching someone else solve statistics problems to watching someone else practice shooting basketball free throws. You may learn a little by watching, but at some point you won’t get any better until you try it yourself! The same can be said for programming. Reading a textbook and watching videos are a good start, but you’ll have to actually program in order to get any better! This textbook was designed to be interactive, and I encourage you to “code along with the book” as you read. 2.1.4 Grading Scale Grades will be assigned according to the following scale: Class_Score Letter_Grade 92%-100% A 90%-92% A- 88%-90% B+ 82%-88% B 80%-82% B- 78%-80% C+ 70%-78% C 60%-70% D 0%-60% F Any feedback for this section? Click here "],
-["running-your-first-r-code.html", "2.2 Running your first R Code", " 2.2 Running your first R Code Enough of the boring stuff, let’s run some R code! Normally you will run R on your computer, but since you may not have R installed yet, let’s run some R code using a website first. As you run code, you’ll see some of the things R can do. In a browser, navigate to rdrr.io/snippets, where you’ll see a box that looks like this: Figure 2.2: rdrr code entry box The box comes with some code entered already, but we want to use our own code instead, so delete all the text, from library(ggplot2) to factor(cyl)). In its place, type 1+1, then click the big green “Run” button. You should see the [1] 2 displayed below. So if you give R a math expression, it will evaluate it and give the result. Note: the “correct answer” to \\(1+1\\) is 2, but the output also displays [1], which we won’t explain until later, so you can ignore that for now. Next, delete the code you just wrote and type (or copy/paste) the following, and run it: factorial(10) The result should be a very large number, which is equivalent to \\(10!\\), that is, \\(10\\times9\\times8\\times7\\times6\\times5\\times4\\times3\\times2\\times1\\). This is an example of an R function, which we will discuss more later. Aside from math, R can produce plots. Try copy/pasting the following code into the website: x <- -10:10 plot(x, x^2) You should see points in a scatter plot which follow a parabola. Here’s a more complicated example, which you should copy/paste into the website and run: library(ggplot2) theme_set(theme_bw()) ggplot(mtcars, aes(y=mpg, fill=as.factor(cyl))) + geom_boxplot() + labs(title="Engine Fuel Efficiency vs. Number of Cylinders", y="MPG", fill="Cylinders") + theme(legend.position="bottom", axis.ticks.x = element_blank(), axis.text.x = element_blank()) R can be used to make many types of visualizations, which you will do more of later. This may be the first time you’ve seen R, so it’s okay if you don’t understand how to read this code. We’ll talk more later about what each statement is doing, but for now, here is a brief description of some of the code above: -10:10 This creates a sequence of numbers starting from -10 and ending at 10. That is, \\(-10, -9, -8, \\ldots, 8, 9, 10\\). library This is a function which loads an R package. R packages provide extra abilities to R. Any feedback for this section? Click here "],
+["course-topics-syllabus.html", "2.1 Course Topics & Syllabus", " 2.1 Course Topics & Syllabus Broadly speaking, the topics of this course are described by the Chapter Titles. Here’s what each entails: - Course Preliminaries: Introduction to R and the world of R - Installing R: Like it sounds, setting up your computer so you can work with R. - R Programming Fundamentals: The basics of programming in R, the building blocks that you need in order to do anything more interesting. - Working with Data: How to do meaningful things with data sets. Probably the most useful Chapter of the book. - Creating R Programs: More programming concepts to increase your R Power! 2.1.1 Syllabus First, some important details: Instructor: Alex Fout Office Hours: Held via Google Meet (look for email invite), schedule on Canvas. Webpages: Canvas, this textbook Course Credits: 1. Because this week lasts four weeks, this course should “feel” like a 3 credit course for four weeks. Normally this means ~3 hours of lecture and 12 hours of work outside of lecture per week. Because this course is online, there will be 1 hour or less of “lecture” (see below), and about 14 hours of outside work per week. Textbook: You’re reading it right now. The textbook will be your primary learning resource. You’ll be expected to read through the required sections, watch any relevant videos, and complete any reflections, progress checks, and assessments along the way. On days when a quiz is due, you should complete the reading before you take the quiz. Prerequisites: None Progress Checks: As you work your way through the textbook, you’ll encounter purple “Progress Check” boxes. For Week 1, you’ll submit your responses directly to canvas. For weeks 2-4, you’ll fill in a R Markdown document and submit it to canvas. You’ll be provided a template to fill in as you complete the progress checks. To turn in the document, you’ll knit the document to HTML or PDF and upload to Canvas. (More details coming later in the book!). Progress checks will be graded on completion, organization, and correctness. Homework: About once per week, you’ll complete an assignment using R. Homeworks must be turned in by 11:59pm (Mountain) on the day they are due. Exams: There will be no exams in this course Quizzes: Once per week, there will be a 15 minute Canvas quiz. Quizzes must be completed by 11:59pm (Mountain) on the day they are due. Lectures: Since we aren’t having in-person lectures, we will hold short mini-lectures instead (more details on Canvas). These will be shorter than a traditional lecture (approximately 10-30 minutes), and the purpose will be to allow some interaction between everyone in the course and to allow the instructor to introduce any relevant topics and address any challenges that students are having. Grading: The grading for the course is apportioned like so: Progress Checks: 30% Homework: 40% Quizzes: 30% 2.1.2 Schedule Week Weekday Date Reading Due 1 Monday July 13 1, 2 Progress Check 1 1 Wednesday July 15 3 Quiz 1 1 Friday July 17 4.1, 4.2 Assignment 1 2 Monday July 20 4.3 Progress Check 2 2 Wednesday July 22 4.4, 4.5 Quiz 1 2 Friday July 24 5.X (TBD) Assignment 2 3 Monday July 27 5.X (TBD) Progress Check 3 3 Wednesday July 29 5.X (TBD) Quiz 1 3 Friday July 31 5.X (TBD) Assignment 3 4 Monday August 03 6.X (TBD) Progress Check 4 4 Wednesday August 05 6.X (TBD) Quiz 4 4 Friday August 07 6.X (TBD) Assignment 4 2.1.3 Course Policies Late Work: Homework and Progress Checks must be turned in on time to receive full credit. You may turn in Homework and Progress Checks up to 2 days late for up to 50% credit. Group Work: Students are welcome to discuss the course with each other, but all work you turn in must be your own. This means no sharing solutions to homework, progress checks, or quizzes. You may not work with other students on quizzes. You are welcome to seek help on Canvas discussion boards and during office hours. Students with Disabilities: The university is committed to providing support for students with disabilities. If you have an accommodation plan, please provide that to me as soon as possible so we can discuss appropriate arrangements. Growth Mindset: This phrase was coined by Carol Dweck to reflect how your learning outcomes can be affected by the way you view the learning process. To quote Dweck: “The view you adopt for yourself profoundly affects the way you lead your life… Believing that your qualities are carved in stone - the fixed mindset - creates an urgency to prove yourself over and over. If you have only a certain amount of intelligence, a certain personality, and a certain moral character — well, then you’d better prove that you have a healthy dose of them. It simply wouldn’t do to look or feel deficient in these most basic characteristics… There’s another mindset in which these traits are not simply a hand you’re dealt and have to live with, always trying to convince yourself and others that you have a royal flush when you’re secretly worried it’s a pair of tens. In this mindset, the hand you’re dealt is just the starting point for development. This growth mindset is based on the belief that your basic qualities are things you can cultivate through your efforts. Although people may differ in every which way — in their initial talents and aptitudes, interests, or temperaments — everyone can change and grow through application and experience.” Programming may be a very new, intimidating thing for you. That’s okay! View this course as a way to grow and gain new skills which you can use to do incredible and important things! Learn by doing: A wise statistics instructor once compared watching someone else solve statistics problems to watching someone else practice shooting basketball free throws. You may learn a little by watching, but at some point you won’t get any better until you try it yourself! The same can be said for programming. Reading a textbook and watching videos are a good start, but you’ll have to actually program in order to get any better! This textbook was designed to be interactive, and I encourage you to “code along with the book” as you read. 2.1.4 Grading Scale Grades will be assigned according to the following scale: Class_Score Letter_Grade 92%-100% A 90%-92% A- 88%-90% B+ 82%-88% B 80%-82% B- 78%-80% C+ 70%-78% C 60%-70% D 0%-60% F Any feedback for this section? Click here "],
+["running-your-first-r-code.html", "2.2 Running your first R Code", " 2.2 Running your first R Code Enough of the boring stuff, let’s run some R code! Normally you will run R on your computer, but since you may not have R installed yet, let’s run some R code using a website first. As you run code, you’ll see some of the things R can do. In a browser, navigate to rdrr.io/snippets, where you’ll see a box that looks like this: Figure 2.2: rdrr code entry box The box comes with some code entered already, but we want to use our own code instead, so delete all the text, from before library(ggplot2) to after factor(cyl)). In its place, type 1+1, then click the big green “Run” button. You should see the [1] 2 displayed below. So if you give R a math expression, it will evaluate it and give the result. Note: the “correct answer” to \\(1+1\\) is 2, but the output also displays [1], which we won’t explain until later, so you can ignore that for now. Next, delete the code you just wrote and type (or copy/paste) the following, and run it: factorial(10) The result should be a very large number, which is equivalent to \\(10!\\), that is, \\(10\\times9\\times8\\times7\\times6\\times5\\times4\\times3\\times2\\times1\\). This is an example of an R function, which we will discuss more later. Aside from math, R can produce plots. Try copy/pasting the following code into the website: x <- -10:10 plot(x, x^2) You should see points in a scatter plot which follow a parabola. Here’s a more complicated example, which you should copy/paste into the website and run: library(ggplot2) theme_set(theme_bw()) ggplot(mtcars, aes(y=mpg, fill=as.factor(cyl))) + geom_boxplot() + labs(title="Engine Fuel Efficiency vs. Number of Cylinders", y="MPG", fill="Cylinders") + theme(legend.position="bottom", axis.ticks.x = element_blank(), axis.text.x = element_blank()) R can be used to make many types of visualizations, which you will do more of later. This may be the first time you’ve seen R, so it’s okay if you don’t understand how to read this code. We’ll talk more later about what each statement is doing, but for now, here is a brief description of some of the code above: -10:10 This creates a sequence of numbers starting from -10 and ending at 10. That is, \\(-10, -9, -8, \\ldots, 8, 9, 10\\). library This is a function which loads an R package. R packages provide extra abilities to R. Any feedback for this section? Click here "],
["getoutoftheclass.html", "2.3 What do you hope to get out of this course?", " 2.3 What do you hope to get out of this course? To close out this chapter, it would be healthy for you to reflect on what you’d like to get from this course. Take some time to think through each question below, and write down your answers. It is fine if your honest answer is I don’t know. In that case, try to come up with some possible answers that might be true. Why are you taking this course? If this course is required for your major, how do you think it is supposed to benefit you in your studes? What types of data sets related to your field of study may require data analysis? What skills do you hope to develop in this course, and how might they be applied in your major and career? Submit your answers to the above reflection to Canvas. Store your answers in a safe place, and refer to them periodically as you progress through the course. You may find that you aren’t achieving your goals and that some adjustment to how you are approaching the course may be necessary. Or you may find that your goals have changed, which is fine! Just update your goals so that you have something to refer back to. Any feedback for this section? Click here "],
["what-is-r.html", "2.4 What is R?", " 2.4 What is R? What is R? This question can be answered several different ways. Here are a few of them: Any feedback for this section? Click here 2.4.1 R is a Programming Language A programming language is a way of providing instructions to a computer. Some popular languages (in no particular order) are C, C++, Java, Python, PHP, Visual Basic, and Swift. Much like other types of languages, programming languages combine text and punctuation (syntax) to create statements which provide meaningful instructions (semantics) to be performed by a computer. These instructions are called “code”. R code can be used to do many things, but primarily R was designed to easily work with data and produce graphics. The R language can be used to use a computer to do the following: - Read and process a set of data in a file or database - Use data to compute statistics and perform statistical tests - Produce nice looking visualizations of data - Save data for others to use. But this list is just the tip of the iceberg. As you will see, R can be used to do so much more! After the instructions are written, the R code is run, that is, the code is provided to the computer, and the computer performs the instructions to produce the desired results. Many other programming languages use different syntax for the same purpose. # comments out a line in R and python % comments out a line in matlab // comments out a line in C++ and javascript Similar to learning a foreign language, learning your first programming language will make it easier to understand other similar ones. 2.4.2 R is software R can also be thought of as the software program which runs R code. In other words, if R code is the computer language, then the R software is what interprets the language and makes the computer follow the instructions laid out in the code. This is sometimes called “base R”. 2.4.3 R is Free The R software is free, so anyone can download R, write R code, and run the R code in order to produce results on their computer. 2.4.4 R is Open Source The R software, which runs R code, is also made up of a bunch of code called source code. In addition to being free, R is also open source, meaning that anyone can look at the source code and understand the “deep-down nuts-and-bolts” of how R works. In addition, anyone is able to contribute to R, in order to improve it and add new features to it. What are the advantages of open-source software? What are some potential downsides? Why do you think the creators of R decided to make it open source? 2.4.5 R is an ecosystem Another way of thinking about R is to include not only the R language and the R software, but also the community of R users and programmers, and the various “add on” software they have created for R. These add on software are called “packages”. 2.4.6 R Packages An R package is software written to extend the capabilities of base R. R packages are often written in R code, so anyone who knows how to write R code can also create R packages. The importance of packages cannot be understated. One of the reasons for the incredible popularity of R is the fact that members from the community can write new packages which enable R to do more. Sometimes packages are written to help folks in particular disciplines (e.g. psychology, geosciences, microbiology, education) do their jobs better. Other times, packages are written to extend the capability of R so that people from many disciplines can use them. R can be used to make web sites, interactive applications, dynamic reproducible reports, and even textbooks (like this one!). The inclusion of R packages, combined with the free and open source nature of R software, has led to the development of a active, diverse, and supportive community of R users who can easily share their code, data, and results with one another. skimr is one example of a package. It provides a frictionless approach to summary statistics which conforms to the principle of least surprise, displaying summary statistics the user can skim quickly to understand their data. 2.4.7 R Interfaces The R software can be run in many different places, including personal computers, remote servers, and websites (as you have seen!). R works on Windows, MAC OSX, and Linux, and R can be run using a terminal or command line (if you know what those are), or using a graphical user interface (with buttons you can click and such). By far one of the most popular ways of using R is with RStudio, which is also open free and open source software. For this course, you’ll be using RStudio. Any feedback for this section? Click here "],
["the-r-community.html", "2.5 The R Community", " 2.5 The R Community We already mentioned that there is active community of R users around the world, ranging from novice to expert level. Here is a partial list of venues where R users interact (aside from the official websites, none of these links should be considered an official endorsement): R Project: The official website for R R Project Mailing Lists: Various email lists to stay informed on R related activities. The R-announce list is a good starting point, which will keep you updated on the latest releases of the R software Twitter #rstats: Many R Users are active on Twitter and you can find them Tidy Tuesday is a weekly online project that focuses on understanding how to summarize, arrange, and make meaningful charts with open source data. You can see the projects others have done by following #tidytuesday on twitter. R-Ladies is a global group dedicated to promoting gender equality in the R community. They have an elaborate list of resources for learning and host educational and networking events. R-Podcast: A periodic podcast with practical advice for using R, and the latest R news. R-Bloggers: A blog website where authors can post examples of code, data analysis, and visualization. 2.5.1 Places to Get Help (If you’re a student taking this class for credit) Students taking the course for credit should seek help from these places, in order: Canvas Discussion boards Office Hours I will not answer homework/quiz/textbook related questions via email 2.5.2 Places to Get Help (anyone) If you find yourself stuck, there are many options available to you, here are a few: Stack Overflow is a message board where users can post questions about issues they’re having. If you search for your error, there’s likely already an answered question about it. If not, you can submit one with a reproducible example that the active community can help you with. R Manuals: With so many R resources available on the internet, sometimes information get’s “boiled down” or simplified for ease of communication. If you need the “official answer” to a question, these manuals are the place to go. Check out “An Introduction to R” for a good reference. Any feedback for this section? Click here "],
["installing-r.html", "Chapter 3 Installing R", " Chapter 3 Installing R In the previous chapter, you ran R code on a website. The purpose of this chapter is to install R on your own computer, so that you can run R without needing access to the internet. Any feedback for this section? Click here "],
["computer-basics.html", "3.1 Computer Basics", " 3.1 Computer Basics If you’re new to computers, this section will be important for you to get set up. We’ll briefly introduce some computer concepts and discuss how they’re relevant to R. If you understand the basics of operating systems, directory structures on your computer, and downloading/installing files, then you can probably skim this section, but be sure to pay attention to the R-specific information. 3.1.1 Operating Systems An operating system is a set of programs that allow you to interact with the computer, and the most popular operating systems are Windows, Mac OS X, and Linux. R works on Windows, Mac OS X, and several Linux-based operating systems, so if you have one of these operating systems, you’ll be able install and use R. At least, this is mostly true: Some versions of Windows that run on ARM processors cannot install R, and installing R on a Chromebook will likely be more complicated (see here). If you’re in this situation, contact the instructor immediately. R isn’t designed to work on tablets or phones which run mobile/tablet operating systems (like iOS, iPadOS, Android, ChromeOS), so these are not an option for R. 3.1.2 Files & Directory Structures A file is a collection of data stored on your computer’s hard drive. Examples of files include: A music file A video A slide presentation A text document Different types of files are often treated differently by your computer. For example, a music file is played with a music player program, a video can be viewed with a video player, and a slide presentation might be viewed with Powerpoint. Most operating systems know the type of a file by looking at the extension, which is at the very end of the file’s name. Examples include “.mp3”, “.doc”, “.txt”, and “.ppt”. When using R, we can write scripts which contain R code, and RMarkdown documents, which include human readable text and code. R scripts usually have either a “.R” or “.r” extension, and we’ll also be using RMarkdown, which use either a “.Rmd” or “.rmd” extension. A directory, or folder, is a collection of files, and computers use directories to logically organize sets of files. When working with R, you may have to organize several different types of files, including R code, data files, and images. It will be important to stay organized when using R, and we will address this more later in the chapter. With the increasing prevalence of the internet in everyday life, it’s becoming less common for files to exist on your computer. When writing R code, you’ll be working with files on your computer, not accessing them over the internet. 3.1.3 Downloads and Installations To install R, you’ll have to download a file from the internet which performs the installation. After you install R, you shouldn’t have to download anything to run R. The specific steps to install R will be different depending on your operating system, and this will be addressed in the next section. Any feedback for this section? Click here "],
-["install-r-r-studio.html", "3.2 Install R & R Studio", " 3.2 Install R & R Studio Here’s where you install R on your personal computer, but you’ll actually be installing two separate programs. The first is the R programming language. The second is a separate program called R Studio, which will be the primary way in which you interact with R in this class, we will say more about this later. 3.2.1 Installing R Installation will look slightly different depending on the operating system, but the major steps are the same. First, navigate to the CRAN Mirrors Site, which lists several locations from which R can be downloaded. Find a location near you (or not, this isn’t critical) and click on the link to be brought to the mirror site. From this point, this will change depending on your operating system. 3.2.1.1 Windows Click “Download R for Windows”, then click “base”. Finally, Click “Download R X.Y.Z for Windows”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version r_version. Your computer might prompt for the location on your computer that you would like to save the file. Select a location (reasonable options are your Downloads folder or the Desktop) and select “save”. When the download completes, find the downloaded file in the File Explorer and double click to run it. This will start the installation process. Follow the on screen prompts. For the most part you can click “next” and “install” as appropriate, and you don’t have to worry about changing any installation settings. Click “Finish” to complete the installation! This video shows the installation process for Windows 3.2.1.2 Mac OS X Click “Download R for (Mac) OS X” Click “R-X.Y.Z.pkg”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version r_version. Your computer might prompt for the location on your computer that you would like to save the file. Select a location and select “save”. When the download completes, find the downloaded file in the Finder and double click to run it. This will start the installation process. Follow the on screen prompts. For the most part you can click “continue”, “agree”, “install”, as appropriate, and you don’t have to worry about changing any installation settings. Click “Close” to complete the installation! 3.2.1.3 Linux We will not provide details on installing R for Linux, because the process varies depending on your distribution, and because if you’re using Linux, chances are you’re more computer proficient than the average user. Suffice it to say, The first step is: Click “Download R for Linux” And you can probably figure things out from there. 3.2.1.4 Conclusion You should now have R installed! Technically speaking, nothing further is required to work with R. You can open the RGui, and start coding immediately. However, for this course we will be using RStudio, which is a very popular program with an incredibly rich set of features, which will enhance your R programming experience. 3.2.2 Installing RStudio Navigate to the RStudio Download Page, and find the download file that matches your operating system. Click the link to download the installer, which starts with “RStudio-” or “rstudio-”. Your computer might prompt for the location on your computer that you would like to save the file. Select a location (reasonable options are your Downloads folder or the Desktop) and select “save”. When the download completes, find the downloaded file and double click to run it. This will start the installation process. From this point, this will change depending on your operating system. 3.2.2.1 Windows Follow the on screen prompts. For the most part you can click “next” and “install” as appropriate, and you don’t have to worry about changing any installation settings. You should now be able to open the start menu, open the RStudio folder, and click on the RStudio icon to open RStudio This video shows the installation process for Windows 3.2.2.2 Mac OS X In the window which opens, drag the RStudio icon into the “Applications” folder. You may need to enter your password (click the “Authenticate” button) in order to do so. You should now be able to navigate to the Applications folder in Finder, and click on the RStudio icon to open RStudio. 3.2.2.3 Conclusion Rstudio also offers a cloud service that allows you to work with R in your browser. We’ll use the desktop version but you can check out the interactive primers on the cloud site. Any feedback for this section? Click here "],
+["install-r-r-studio.html", "3.2 Install R & R Studio", " 3.2 Install R & R Studio Here’s where you install R on your personal computer, but you’ll actually be installing two separate programs. The first is the R programming language. The second is a separate program called R Studio, which will be the primary way in which you interact with R in this class, we will say more about this later. 3.2.1 Installing R Installation will look slightly different depending on the operating system, but the major steps are the same. First, navigate to the CRAN Mirrors Site, which lists several locations from which R can be downloaded. Find a location near you (or not, this isn’t critical) and click on the link to be brought to the mirror site. From this point, this will change depending on your operating system. 3.2.1.1 Windows Click “Download R for Windows”, then click “base”. Finally, Click “Download R X.Y.Z for Windows”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version 4.0.2. Your computer might prompt for the location on your computer that you would like to save the file. Select a location (reasonable options are your Downloads folder or the Desktop) and select “save”. When the download completes, find the downloaded file in the File Explorer and double click to run it. This will start the installation process. Follow the on screen prompts. For the most part you can click “next” and “install” as appropriate, and you don’t have to worry about changing any installation settings. Click “Finish” to complete the installation! This video shows the installation process for Windows 3.2.1.2 Mac OS X Click “Download R for (Mac) OS X” Click “R-X.Y.Z.pkg”, where X, Y, and Z will be numbers. These numbers indicate which version of R you’ll be installing. As of the publishing of this book, R is on version 4.0.2. Your computer might prompt for the location on your computer that you would like to save the file. Select a location and select “save”. When the download completes, find the downloaded file in the Finder and double click to run it. This will start the installation process. Follow the on screen prompts. For the most part you can click “continue”, “agree”, “install”, as appropriate, and you don’t have to worry about changing any installation settings. Click “Close” to complete the installation! 3.2.1.3 Linux We will not provide details on installing R for Linux, because the process varies depending on your distribution, and because if you’re using Linux, chances are you’re more computer proficient than the average user. Suffice it to say, The first step is: Click “Download R for Linux” And you can probably figure things out from there. 3.2.1.4 Conclusion You should now have R installed! Technically speaking, nothing further is required to work with R. You can open the RGui, and start coding immediately. However, for this course we will be using RStudio, which is a very popular program with an incredibly rich set of features, which will enhance your R programming experience. 3.2.2 Installing RStudio Navigate to the RStudio Download Page, and find the download file that matches your operating system. Click the link to download the installer, which starts with “RStudio-” or “rstudio-”. Your computer might prompt for the location on your computer that you would like to save the file. Select a location (reasonable options are your Downloads folder or the Desktop) and select “save”. When the download completes, find the downloaded file and double click to run it. This will start the installation process. From this point, this will change depending on your operating system. 3.2.2.1 Windows Follow the on screen prompts. For the most part you can click “next” and “install” as appropriate, and you don’t have to worry about changing any installation settings. You should now be able to open the start menu, open the RStudio folder, and click on the RStudio icon to open RStudio This video shows the installation process for Windows 3.2.2.2 Mac OS X In the window which opens, drag the RStudio icon into the “Applications” folder. You may need to enter your password (click the “Authenticate” button) in order to do so. You should now be able to navigate to the Applications folder in Finder, and click on the RStudio icon to open RStudio. 3.2.2.3 Conclusion Rstudio also offers a cloud service that allows you to work with R in your browser. We’ll use the desktop version but you can check out the interactive primers on the cloud site. Any feedback for this section? Click here "],
["successfull-installation.html", "3.3 Successfull Installation", " 3.3 Successfull Installation When you successfully install R and RStudio, you should now be able to program in R! Before moving further, you should become acquainted with the different parts of RStudio. To do so, watch the video below: This video gives an introduction to some of the main pieces of RStudio Any feedback for this section? Click here "],
-["running-code-in-rstudio.html", "3.4 Running Code in RStudio", " 3.4 Running Code in RStudio Now that you’re somewhat familiar with RStudio, let’s run the same code as we ran on the website, but this time let’s run it in R. 3.4.1 The R Console: In the R console, type 1+1 and press enter. The output in the console should look like the following: Figure 3.1: code in the console Notice that the output 2 is displayed, and the cursor is on a blank line, waiting for more input. This is how coding in the console works. 3.4.2 R scripts Now let’s run the same code, but in an R script. If you haven’t already, create a new R script by clicking on the New File icon, then selecting R Script like so: Figure 3.2: Click this button to create a new file In the script window which opens, type 1+1 and press enter. Notice how now, the code did not run? In a script, you are free to write R code on several lines before you run it. You can even save the script and load it later in order to run the code it contains. There are multiple ways to run R code in a script. To run a single line of code, do one of the following: Place the cursor on the desired line, hold the <control> key, and press enter. On Mac OS X, hold <command> key and press enter instead Place the curse on the desired line and click the Run button that looks like this: Figure 3.3: code in the console To run multiple lines of code, do one of the following: - Highlight all the code you’d like to run, hold the <control> key, and press enter. On Mac OS X, hold the <command> key and press enter instead. - Highlight all the code you’d like to run, and click the Run button. Run the 1+1 code using one of the methods above, and observe the output. Notice how the output is still in the console window, even you ran the code in a script! Even though running R code from the console and an R script are done differently, they should produce the same results. Both are running R! Now that you’ve run some code in the console and from an R script, let’s try some of the other code we wrote previously. 3.4.3 Same Examples, On Your Computer! In the console, type the command factorial(10). Did you get the same result as you got on the website? Now type the following two lines in an R script and run them: x <- -10:10 plot(x, x^2) This code produces a plot, which should show up in the lower right corner in the “Plots” window. Finally, copy the following code, paste it into your script, and run it: install.packages("ggplot2") library(ggplot2) theme_set(theme_bw()) ggplot(mtcars, aes(y=mpg, fill=as.factor(cyl))) + geom_boxplot() + labs(title="Engine Fuel Efficiency vs. Number of Cylinders", y="MPG", fill="Cylinders") + theme(legend.position="bottom", axis.ticks.x = element_blank(), axis.text.x = element_blank()) You’re now running R code on your computer! The above code block includes a command to install an R package! ggplot2 is a very popular plotting package that can create sophisticated and (arguably) aesthetically pleasing graphs. Imagine you are practicing programming in R and your classmate tells you they heard about an interesting new R command which they’d like you to try out. Would you run the command in an R script, or the R console? How might your answer change if you wanted to keep a record of all the interesting R command you found? 3.4.4 R Markdown You’ve seen how to run R code in the R console, and from an R script, but there’s one more way to run R that we need to talk about: R Markdown R scripts are convenient because they can store multiple R commands in one file. R Markdown takes this idea further and stores code alongside human readable text. There is much that could be said about R Markdown, but for now, we’ll just stick with the basics. To start, watch this video: This Video gives a basic introduction to RMarkdown. As the video stated, there are three types of sections to an RMarkdown document: Header Human readable text Code Chunks There’s only one header, but there can be many blocks of human readable text and many code chunks. See here for more things you can do with RMarkdown. As part of this class, you’ll be filling in an R Markdown document as you complete the progress checks in the book (except for the first progress check box, which you completed already) On Canvas, download the progress_check_2.rmd file and follow the instructions. The next box should be the first code chunk you will include in the document! Run the command 8 / (2*(2+2)) and observe the output! This video should help get you started with the Progress Check Assignments! Any feedback for this section? Click here "],
-["workspace-setup.html", "3.5 Workspace setup", " 3.5 Workspace setup Whenever you are programming in R, and especially for this class, it’s important to stay organized. This section will give you some instructions and tips for how to organize material for this R course 3.5.1 Recommended Settings First of all, let’s set some settings in RStudio. At the top of the R window, click Tools, then Global Options, and do the following: On the left side of the window that pops up, and make sure it’s on the “General” tab Find the “Workspace” section on the right, make the following changes: – uncheck “Restore .RData into workspace on startup” – Change the “Save workspace to .RData on exit” option to never On the left side, select the “Appearance” tab. (Optional) Change the Zoom setting to increase or decrease the size, to fit your screen best. (Optional) Change the “Editor theme:” setting to find a color scheme that looks good to you. Click “Apply”, then “OK” at the bottom of the window. Step 2 ensures that each time you open RStudio, there’s no “memory” of anything you may have been doing in R previously. This is a good option for R beginners to avoid confusion and mistakes. Step 4 can also be done using the shortcuts <control> <shift> + (to increase size) and <control> - (to decrease size). On Mac OS X, the commands are <command> <shift> + and <command> -. 3.5.2 Setting working directory Every time R runs, it has a working directory, which is the folder where R “looks” when loading and saving files. In RStudio, the Files window contains the “More” menu, which has options to set as working directory or go to working directory. This will become more relevant when you start loading data and saving results later in the course. For this course, you’ll be using an RStudio project, which automatically sets the working directory. See here for more information about working directories. 3.5.3 Create RStudio Project and directories for class RStudio also has a feature called projects, which is a way of compartmentalizing your R code. This makes it easy to switch between different projects without having to For this class, you should set up a new project, so all of your project related files are in one place. 3.5.3.1 Create RStudio Project To create an RStudio project, follow these steps: Click on the “new project” button Figure 3.4: Click this button to create a new project In the window that pops up, click on “New Directory” then “New Project”. In the box after “Directory name”, type “RModule1”, which will be the name of the project. Then click the “Browse” button to select where to place the project. You are free to choose any location on your computer that makes sense to you. It might be most convenient to place it on your desktop for now. Click on “Create Project” You should now be in your newly created project. If you look at the Files window in the lower right part of RStudio, you should see the files in your new project directory, which should only be one file, called \"RModule1.rproj. This file is the project file, which tells RStudio that this directory contains an R Project. When you’re working on this course, you should be working in this project. The easiest way to open up the project is to use your operating system’s file explorer and click on the project file. This will automatically set the working directory to the project directory. 3.5.3.2 Create Directory Structure To stay organized, you should also create the following folders inside your project directory scripts data_raw data_clean output You can create these either using your operating system, or the “New Folder” command in the file window within RStudio. 3.5.3.3 Video Check out this video to watch me set up a project and the new directories. 3.5.4 Some useful commands you should know As you program in R, you’ll end up creating many different R objects (more on this later), and sometimes you might want to clear all objects in your R environment. This will reduce the amount of memory that is taken up rm(list=ls()) # Clear everything in your workspace gc() # perform garbage collection used (Mb) gc trigger (Mb) max used (Mb) Ncells 813632 43.5 1556887 83.2 1215792 65 Vcells 1426117 10.9 8388608 64.0 1962768 15 You might also want to clear the R console, which you can do by placing your cursor in the R console and typing <control> l. Any feedback for this section? Click here "],
+["running-code-in-rstudio.html", "3.4 Running Code in RStudio", " 3.4 Running Code in RStudio Now that you’re somewhat familiar with RStudio, let’s run the same code as we ran on the website, but this time let’s run it in R. 3.4.1 The R Console: In the R console, type 1+1 and press enter. The output in the console should look like the following: Figure 3.1: code in the console Notice that the output 2 is displayed, and the cursor is on a blank line, waiting for more input. This is how coding in the console works. 3.4.2 R scripts Now let’s run the same code, but in an R script. If you haven’t already, create a new R script by clicking on the New File icon, then selecting R Script like so: Figure 3.2: Click this button to create a new file In the script window which opens, type 1+1 and press enter. Notice how now, the code did not run? In a script, you are free to write R code on several lines before you run it. You can even save the script and load it later in order to run the code it contains. There are multiple ways to run R code in a script. To run a single line of code, do one of the following: Place the cursor on the desired line, hold the <control> key, and press enter. On Mac OS X, hold <command> key and press return instead Place the cursor on the desired line and click the Run button that looks like this: Figure 3.3: code in the console To run multiple lines of code, do one of the following: - Highlight all the code you’d like to run, hold the <control> key, and press enter. On Mac OS X, hold the <command> key and press return instead. - Highlight all the code you’d like to run, and click the Run button. Run the 1+1 code using one of the methods above, and observe the output. Notice how the output is still in the console window, even you ran the code in a script! Even though running R code from the console and an R script are done differently, they should produce the same results. Both are running R! Now that you’ve run some code in the console and from an R script, let’s try some of the other code we wrote previously. 3.4.3 Same Examples, On Your Computer! In the console, type the command factorial(10). Did you get the same result as you got on the website? Now type the following two lines in an R script and run them: x <- -10:10 plot(x, x^2) This code produces a plot, which should show up in the lower right corner in the “Plots” window. Finally, copy the following code, paste it into your script, and run it: install.packages("ggplot2") library(ggplot2) theme_set(theme_bw()) ggplot(mtcars, aes(y=mpg, fill=as.factor(cyl))) + geom_boxplot() + labs(title="Engine Fuel Efficiency vs. Number of Cylinders", y="MPG", fill="Cylinders") + theme(legend.position="bottom", axis.ticks.x = element_blank(), axis.text.x = element_blank()) You’re now running R code on your computer! The above code block includes a command to install an R package! ggplot2 is a very popular plotting package that can create sophisticated and (arguably) aesthetically pleasing graphs. Imagine you are practicing programming in R and your classmate tells you they heard about an interesting new R command which they’d like you to try out. Would you run the command in an R script, or the R console? How might your answer change if you wanted to keep a record of all the interesting R command you found? 3.4.4 R Markdown You’ve seen how to run R code in the R console, and from an R script, but there’s one more way to run R that we need to talk about: R Markdown R scripts are convenient because they can store multiple R commands in one file. R Markdown takes this idea further and stores code alongside human readable text. There is much that could be said about R Markdown, but for now, we’ll just stick with the basics. To start, watch this video: This Video gives a basic introduction to RMarkdown. As the video stated, there are three types of sections to an RMarkdown document: Header Human readable text Code Chunks There’s only one header, but there can be many blocks of human readable text and many code chunks. See here for more things you can do with RMarkdown. As part of this class, you’ll be filling in an R Markdown document as you complete the progress checks in the book (except for the first progress check box, which you completed already) On Canvas, download the progress_check_2.rmd file and follow the instructions. The next box should be the first code chunk you will include in the document! Run the command 8 / (2*(2+2)) and observe the output! This video should help get you started with the Progress Check Assignments! Any feedback for this section? Click here "],
+["workspace-setup.html", "3.5 Workspace setup", " 3.5 Workspace setup Whenever you are programming in R, and especially for this class, it’s important to stay organized. This section will give you some instructions and tips for how to organize material for this R course 3.5.1 Recommended Settings First of all, let’s set some settings in RStudio. At the top of the R window, click Tools, then Global Options, and do the following: On the left side of the window that pops up, and make sure it’s on the “General” tab Find the “Workspace” section on the right, make the following changes: – uncheck “Restore .RData into workspace on startup” – Change the “Save workspace to .RData on exit” option to never On the left side, select the “Appearance” tab. (Optional) Change the Zoom setting to increase or decrease the size, to fit your screen best. (Optional) Change the “Editor theme:” setting to find a color scheme that looks good to you. Click “Apply”, then “OK” at the bottom of the window. Step 2 ensures that each time you open RStudio, there’s no “memory” of anything you may have been doing in R previously. This is a good option for R beginners to avoid confusion and mistakes. Step 4 can also be done using the shortcuts <control> <shift> + (to increase size) and <control> - (to decrease size). On Mac OS X, the commands are <command> <shift> + and <command> -. 3.5.2 Setting working directory Every time R runs, it has a working directory, which is the folder where R “looks” when loading and saving files. In RStudio, the Files window contains the “More” menu, which has options to set as working directory or go to working directory. This will become more relevant when you start loading data and saving results later in the course. For this course, you’ll be using an RStudio project, which automatically sets the working directory. See here for more information about working directories. 3.5.3 Create RStudio Project and directories for class RStudio also has a feature called projects, which is a way of compartmentalizing your R code. This makes it easy to switch between different projects without having to For this class, you should set up a new project, so all of your project related files are in one place. 3.5.3.1 Create RStudio Project To create an RStudio project, follow these steps: Click on the “new project” button Figure 3.4: Click this button to create a new project In the window that pops up, click on “New Directory” then “New Project”. In the box after “Directory name”, type “RModule1”, which will be the name of the project. Then click the “Browse” button to select where to place the project. You are free to choose any location on your computer that makes sense to you. It might be most convenient to place it on your desktop for now. Click on “Create Project” You should now be in your newly created project. If you look at the Files window in the lower right part of RStudio, you should see the files in your new project directory, which should only be one file, called \"RModule1.rproj. This file is the project file, which tells RStudio that this directory contains an R Project. When you’re working on this course, you should be working in this project. The easiest way to open up the project is to use your operating system’s file explorer and click on the project file. This will automatically set the working directory to the project directory. 3.5.3.2 Create Directory Structure To stay organized, you should also create the following folders inside your project directory scripts data_raw data_clean output You can create these either using your operating system, or the “New Folder” command in the file window within RStudio. 3.5.3.3 Video Check out this video to watch me set up a project and the new directories. 3.5.4 Some useful commands you should know As you program in R, you’ll end up creating many different R objects (more on this later), and sometimes you might want to clear all objects in your R environment. This will reduce the amount of memory that is taken up rm(list=ls()) # Clear everything in your workspace gc() # perform garbage collection used (Mb) gc trigger (Mb) max used (Mb) Ncells 814179 43.5 1558457 83.3 1215810 65.0 Vcells 1429846 11.0 8388608 64.0 1989490 15.2 You might also want to clear the R console, which you can do by placing your cursor in the R console and typing <control> l (careful! that’s a lowercase L). Here's a [more complete list](https://support.rstudio.com/hc/en-us/articles/200711853) of RStudio shortcuts. Any feedback for this section? Click here "],
["reflection.html", "3.6 Reflection", " 3.6 Reflection You should now be set up to Before moving on to the next section, take a note of all you’ve done so far. Did your R installation go smoothly? If not, could you troubleshoot the errors or find help online? Does using R remind you of other programs you have experience with? What could be some reasons that using R code written by someone else might not work on your computer? "],
["r-programming-fundamentals.html", "Chapter 4 R Programming Fundamentals ", " Chapter 4 R Programming Fundamentals "],
-["programming-preliminaries.html", "4.1 Programming Preliminaries", " 4.1 Programming Preliminaries Look at a sentence in a language you don’t know, look carefully at the symbols, spacing and characters. Recall learning a foreign language, how you had to learn the syntax and grammar rules. Now think about English (or another language you know well) and think about the syntax and grammar rules that you take for granted. All human languages rely on a set of rules called grammar, which describe how the language should be used to communicate. When two humans communicate with a language, they both must agree on the the rules of that language. R also has rules that must be followed in order for a human ( you ) to communicate with a computer, in order to tell the computer what to do. In human language, grammar is often fluid and evolving, and two people may have to adapt their use of the language in order to communicate. With R, the fules are fixed, and the computer “knows” them perfectly. It is up to you to learn the rules in order to make the computer do exactly what you want it to do. Since any computer programming language will do exactly what you tell it to do, it’s important to cover some of the basic rules of the R programming language before you can learn what it can do. So let’s get started: 4.1.1 R Commands Like most programming languages, R consists of a set of commands which form the sequence of instructions which the computer completes. You can think of commands as the verbs of R, they are the actions the computer will take. Here is an example of a command, followed by the result. print("hello, world!") [1] "hello, world!" This command is telling R to print out a message. R code usually contains more than one command, and typically each command is put on a separate line. Here are multiple commands, each on a separate line: print("The air is fine!") print(1+1) print(4 > 5) [1] "The air is fine!" [1] 2 [1] FALSE The first command prints another message, the second command does some math then then prints the result, and the third command evaluates whether the statement is true or false and prints the result. Generally, it’s a good idea to put separate commands on separate lines, but you can put multiple commands on the same line, as long as you separate them by a semicolon. See this code for example: x <- 1+1; print(x); print(x^2) [1] 2 [1] 4 In this example, three commands are given on one line. The first command creates a new variable called x, the second command prints the value of x, and the third command prints the value of x squared. We see that the semicolon, ;, serves as the command termination, because it tells R where one command ends and another begins. When a line contains a single command, no semicolon in necessary at the end, but including a semicolon doesn’t have any effect either. print("This line doesn't have a semicolon") print("This line does have a semicolon"); [1] "This line doesn't have a semicolon" [1] "This line does have a semicolon" Including multiple semicolons (e.g. print(“hello”);;) does not work! You’ve just seen your first example of assignment. That is, we created a thing called x , and assigned to it the value of 1+1 using the assignment operator, <-. Formally x is called an object, but we’ll talk about that more objects and assignments later. So far, we’ve seen that you can place one command on one line, multiple commands on multiple lines, multiple commands on one line, so you may ask: can you can place one command on multiple lines? The answer is sometimes, depending on the command, but we will not discuss this now. At this point, we’ve introduced several new types of R commands (assigning a variable, squaring a number, etc.), and we will talk more specifically about these later. The important part of this section is how R code is arranged into different commands Lastly, commands can be “grouped together” using left and right curly braces: { and }. Here’s an example: { print("here's some code that's all grouped together") print(2^3 - 7) w <- "hello" print(w) } [1] "here's some code that's all grouped together" [1] 1 [1] "hello" The above grouped code is indented so that it looks nice, but it doesn’t have to be: { print("here's some code that's all grouped together") print(2^3 - 7) w <- "hello" print(w) } [1] "here's some code that's all grouped together" [1] 1 [1] "hello" Indenting is an example of coding style, which are formatting decisions which don’t affect the results of the code, but are meant to enhance readability. We’ll talk more about coding style later. In some programming languages, Python for example, white space matters. That is, code indents and other spaces change the way the code runs. In R, white space does not matter, so things like indents are used purely for readability. What does it mean to “group” code? At this point there is no practical difference, each command gets executed whether or not it is grouped inside curly braces. However, code grouping will become very important later on, when we discuss control flow later. There are several helpful shortcuts that you can use in R. If you forget to put quotes around something, you can highlight and press the quote key and it will add quotes to both sides. This works with parenthesis too. You can also use tab completion with functions and defined variables. Tab completion allows you to use the same amount of time using a longer, descriptive variable name as a short, meaningless, and easily confused one. This can save you a lot of time and reduce mistakes! In RStudio, open a new R script and type in all the R commands from this section, to verify that you get the same result. It’s good practice! 4.1.2 Comments When writing R code, you may wish to include notes which explain the code to your future self or to other humans. This can be done with comments, which are ignored by R when it is running the code. The “#” comment Here’s an example of some comments: # Let's define y and z y <- 8 z <- y + 5 # adding 5 to y and assigning the result to z ## This is still a comment, even though we're using two #'s Notice that it’s possible for a line to contain only a comment, or for part of a line to be a comment. R decides which part of a line is a comment by looking for the first “#”, and everything after that will be treated as a comment and ignored. R ignores comments, but you should not! If you’re reading code that someone else has written, it’s likely that also paying attention to their comments will greatly help you to understand what their code is doing. It’s also courteous to make good comments in your own code, if only because you may have to return to your own code in the future and re-learn what it is doing! In this book, we will use comments to help explain the R code that you will see. 4.1.3 Blank Lines Blank lines in R are ignored, but they can be used to organize code and enhance readability: print("The sky is blue") # the blank line below here is ignored print("The grass is green") [1] "The sky is blue" [1] "The grass is green" 4.1.4 CaSe SeNsItIvItY In R, variables, functions, and other objects (all of which we’ll talk about later), have names. These names are case sensitive, so you must be careful when referencing an object by name. Here we create two variables and give them different values, notice how they are different from each other: A <- 4 a <- 5 print(a) print(A) [1] 5 [1] 4 This may seem obvious, but case sensitivity applies to functions (which we’ll talk about later) too. We’ve been using the print function a lot in the above examples, which begins with a lower case p. There is no Print function: Print("testing") Error in Print("testing"): could not find function "Print" 4.1.5 ? One very nice thing in R is the documentation that accompanies it. Every function included in R (like print) has documentation that explains how that function works. To access the documentation, use a ? followed by the name of the function, like so: ?print The output of the above code chunk is not shown, because the result of this code is best viewed in RStudio. Go to R Studio and type in ?print and observe what happens! 4.1.6 ?? If you don’t remember the exact name of a function, or would like to search for general matches to a topic, then you can use ??. For example, trying ?Print produces an error, because there is not Print function (remember, R is case sensitive), so there’s no documentation to go with it. However, the following should still work: ??Print Programmers have a sense of humor, too! Try running ????print to see a small joke. Remember, comedic taste varies! This is a lot to remember, but luckily you can use a cheat sheet while you’re learning. As you get more familiar with R, you’ll begin to memorize basic funtions - and google is always there for the rest. Want to know more about R syntax? Try typing ?Syntax in the R console (then press Enter). As we’ve seen, symbols and characters have specific meaning in R. You must be careful not to ignore things like semicolons, curly braces, parentheses, when reading R code. This takes practice! Okay, now that we’ve covered some of the basics, it’s time to start learning how to do useful things in R! The next few sections will describe the different types of data that R can handle. Any feedback for this section? Click here "],
-["data-types.html", "4.2 Data Types", " 4.2 Data Types Think of all the things you might be expected to remember. These different items can probably be categorized into different types of information, like phone numbers, passwords, birthdays, historical events, and math theorems for example. R was designed to handle different types of data as well, though the types are different from the examples just given. R can store and manipulate different pieces of information, called data, and these data can be of several different types. Here are some examples of different types of data: a <- 12.34 # a is a number b <- "Hello" # b is a string of characters c <- TRUE # c is a special type of data that is either true or false R has special names for these examples, and there are other types of data as well. Below, we’ll talk about each data type, one at a time. The term “data” is actually plural! A single piece of data is called a “datum”. So to refer to a set of data, you would say “these data”, and to refer to a single piece of data, you would say “this datum”. 4.2.1 Numeric Many data exist as numbers, and R has a specific data type for storing those numbers, called the numeric data type. Here are some examples: a <- -11 b <- 13.37 c <- 1/137 Note that integers, decimals, and fractions are all examples of numeric data in R. We can prove that these are all the same data type using the class function: class(a) [1] "numeric" class(b) [1] "numeric" class(c) [1] "numeric" So far, we’ve defined the a object a few different times, which is allowed! Every time we define a, R forgets the old value. Therefore we should reuse object names with caution, because it can become difficult to remember what the latest value is! When we discuss loops later, however, we will use code to automatically change the value of an object several times in order to do useful things! When you have numeric objects, you may want to perform math operations on them. R has a number of built in functions to deal with numeric data, here are some examples: print(a + b) # Add two numeric values print(b - c) # subtract two numeric values print(a * b) # multiply two numeric values print(a^3) # take the cube of a numeric value [1] 2.37 [1] 13.3627 [1] -147.07 [1] -1331 When performing math on numeric objects, R will obey order of operations, so the following two examples will give different results: a + b * c # R will perform the multiplication before the addition [1] -10.90241 (a + b) * c # R will perform the addition first, then the multiplication [1] 0.01729927 Notice that we’ve added extra spaces in the code to help you understand what’s going on. This is another example of code style, which we’ll talk more about later. Wait a second, we didn’t use the the print function just now, but R still displayed the results of the calculations! What is going on? This behavior is peculiar to something called R Markdown, which is what we used to create this book (yes, this book was creating using R! Pretty cool, huh?). If the last command given in a code block produces a result, and you don’t assign that result to anything (using <-), then R will print out that result. This means we don’t always have to use the print function when we want to display R output. Notice all the decimal points? R can be very precise when performing computations. However, viewing all of the digits stored by R can be distracting and hard to read. You can show just some of the digits by using the round function: a [1] -11 round(a, 3) [1] -11 It also turns out that R stores more digits than what it shows when it prints, though we won’t go into detail on that now. 4.2.2 Integer In general, numeric data in R are treated as if they can be any decimal number (technically, they are a double precision number, if you know what that means; if not, it’s not important right now). However, there is a way to specify that a specific numeric object is an integer, by placing an “L” at the end of it, like so: x <- 20 # x will be a numeric object y <- 20L # y will be an integer object class(x) [1] "numeric" class(y) [1] "integer" Integers take half of the space in a computer’s memory or hard drive, so if you are working with or storing a lot of numbers which are integers, it might make sense to declare them as integer type in R. This will make more sense when we discuss vectors later. 4.2.3 Character Not all data are numbers! R also has the capability to store strings of characters, and this is the aptly named character type (or sometimes called a character string or just string). Here are some examples: d <- "Hello" # This string is defined with *double* quotes e <- 'how are you?' # This string is defined with *single* quotes! print(d) print(e) [1] "Hello" [1] "how are you?" Notice how we can define character strings using single quotes or double quotes, as long as we are consistent. So this is not valid: # Note the mismatched single/double quotes: f <- "this does not work' Error: <text>:2:6: unexpected INCOMPLETE_STRING 1: # Note the mismatched single/double quotes: 2: f <- "this does not work' ^ So, make sure you are consistent. However, you may see another problem with this: some strings contain quotes in them, like this: g <- 'This won't work' Error: <text>:1:16: unexpected symbol 1: g <- 'This won't ^ Since single quotes are being used to define the string, they can’t be used in the string itself, because R will “think” the string is ending at the second '. One option is to change the defining quotes to be double quotes, then the single quote will be safely included in the string: g <- "I'm happy that this works!" print(g) [1] "I'm happy that this works!" Another option is to use a backslash when using quotes inside the string, so that R “knows” the quote is part of the string and not ending the definition of the string: g <- 'I\\'ve found another way that works!' print(g) [1] "I've found another way that works!" Notice that when we define g we place a \\' anywhere in the string where we want a ' to be, but when printed out, we see that R has interpreted it as just a '. Notice also that we didn’t have to change the defining quotes to be double quotes in this case. The backslash is called the escape character, and it signifies that what follows it should be interpreted literally by R, and any special meaning should be ignored. Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another escape character, like so: g <- “here is a backslash: \\” To see a list of special characters, try typing ?Quotes into the R console Here is an important string to know about: h <- "" # This string is empty! h is a character string with no characters, called an empty string. You can perform math on numeric data, so what can you do with strings? The answer is, quite alot, using some functions that R provides. Here are some of them: nchar(g) # This prints out the number of characters in a string [1] 34 substr(g, 6, 10) # This extracts just part of a string, using the start and stop positions you provide [1] "found" strsplit(g, " ") # This splits the string up using a specified "delimiter" string, a single space in this case. [[1]] [1] "I've" "found" "another" "way" "that" "works!" When you split a string, this produces a list containing a vector of character strings. This is an example of how data can be organized in a structured way. We’ll talk more about so called data structures in the next section. paste("hello", "world") # This combines multiple strings together into one string! [1] "hello world" Remember that you can learn more about a function using ?. Type ?paste into R and read the documentation carefully. Can you determine what the “sep” argument does? What do you think would happen if we ran the code print(“hello”, “world”, sep=“-”)? There are other ways of manipulating strings, but we’ll return to this later. 4.2.4 Logical Numeric objects can be any number, character objects can be any string of characters, but logical objects can only be two different values: True or False Logical data types are also known as “boolean” data types. Here we define some Logical objects: a <- TRUE b <- FALSE c <- T d <- F print(a) [1] TRUE print(b) [1] FALSE print(c) [1] TRUE print(d) [1] FALSE So you can see that we can define a logical object using the full name or just the first letter. Here’s how to get the “opposite” of a logical object !a [1] FALSE Logical data are the simplest type, but there are actually some clever things you can do with them. You can test whether simple mathematical expressions are true or false. # create x and y x <- 3 y <- 4 # check: is x less than y? (should give TRUE) x < y [1] TRUE The third command is a way to check if the value of x is less than the value of y. The result of this comparison is a logical, in this case, TRUE. Here are other ways of making comparisons: x <= y # check if x is less or equal to y [1] TRUE x == y # check if x is equal to y (note how you need two equals signs) [1] FALSE x >= y # check if x is greater or equal to y [1] FALSE x >= y # check if x is greater than y [1] FALSE Comparisons can be made using strings as well: x <- "Hello" y <- "hello" x == y [1] FALSE Remember that R is case sensitive, and two strings must be exactly the same to be considered equal. Of course any object (like x) will be equal to itself: x == x [1] TRUE Surprisingly, logicals can be treated as numerics, where TRUE is treated as 1 and FALSE is treated as 0. Here are some examples: TRUE + TRUE # TRUE is treated as 1 [1] 2 FALSE * 7 # FALSE is treated as 0 [1] 0 (2 < 3) + (1 == 2) # What's going on here? [1] 1 The last example deserves some thought. Start with each expression in parentheses, and decide whether it will evaluate to true or false. Then remember how logicals are treated as numbers, and determine what happens when you add them together. Numeric, integer, character, and logical data types are probably the most important data types to know in R, but there are others that were not covered here. These include: complex factor raw At least one of these (factor) will be covered later, but you can find more information about the other types here In the R console, type the following R commands and observe the result x <- \"5\" y <- 5 z <- (x == y) What data type is x? (check with R using the class function) What data type is y? What data type is z? What is the value of z, and why does this make sense? Now that we’ve discussed different types of data, we’ll now see how they can be structured together in meaningful ways. What about dates? R actually has three built-in date classes. This can be confusing at first, but libraries like lubridate make it easy to work with dates in R. Any feedback for this section? Click here "],
-["data-structures.html", "4.3 Data Structures", " 4.3 Data Structures Imagine a grocery list, shopping list, or to-do list. That list consists of a set of items in a specified order, and the list also has a length. Why do you think it’s useful to organize these items into a list, rather than in some other fashion? Can you think of why it might be useful to store data in a list? Often, you will need to work with many related data, for example: - A sequence of measurements through time - A grid of values - A set of phone numbers In these circumstances, it would make sense to organize the data into a data structure. R provides multiple data structures, each of which are appropriate in various situations. By far the most popular data structure in R is the data frame, but in order to talk about data frames, we must talk about some simpler data structures first. 4.3.1 Vectors A vector is just an ordered set of elements (in other words, data), all of which have the same data type. Vectors can be created for the logical, numeric (double or integer), or character data types. Here’s an example of a vector: x <- c(1, 2, 3) # this is a vector of numeric types print(x) [1] 1 2 3 Note that to create a vector, we use the c function, where c stands for combine. This makes sense, because we are combining three numeric objects into a numeric vector. We may determine the length of any atomic vector like so: length(x) [1] 3 The class function will tell us what type of data is stored in a vector (which makes sense, because all elements of the vector have the same data type). class(x) [1] "numeric" Here’s how to create logical or numeric vectors: y <- c(TRUE, TRUE, FALSE, TRUE) z <- c("to", "be", "or", "not", "to", "be") class(y) [1] "logical" length(y) [1] 4 class(z) [1] "character" length(z) [1] 6 The above statement states that all elements of a vector must have the same data type, so what do you think will happen if you try to create a vector using elements from different data types? Here are some possibilities, can you think of another one? R will produce an error R will combine the elements somehow, but the result won’t be a vector Something else? Whatever happens, humans were behind the decision of how R should behave in this situation. If you were in charge of making this decision, what would make the most sense? Let’s try to create a vector of mixed type and see what happens. Run the following commands in R and think about the output: m <- c(TRUE, “Hello”, 5) class(m) print(m) What changes did R make when creating the vector? What’s happening in the above code is an example of type conversion, which we will talk more about later. For now, remember that every element in an R vector is the same type. You can create empty vectors as placeholders, by indicating the data type and how many elements there are: empty <- numeric(10) # this creates a numeric vector of length 10 This is the first instance of us using a name which is longer than a single character! This new vector is called empty. Let’s print the contents of the vector: print(empty) [1] 0 0 0 0 0 0 0 0 0 0 Even though we didn’t tell R what data to put in the vector, it put a 0 in each element. This is the default value for a new vector. Here’s how you can create new vectors of other types: empty_int <- integer(45) # create integer vector with 45 elements empty_cha <- character(2) # create character vector with 2 elements empty_log <- logical(1000) # create logical vector with 1000 elements!! We saw that the default value for a numeric vector is 0. Use the code above to create empty integer, character, and logical vectors, then print them out to see what default values R has given to each element. Do these make sense? What happens if we create a vector of length 1? It turns out this is the same as just creating a single instance of that data type. Observe how the following are the same. a <- numeric(1) # create vector of length 1 (default value is 0, right?) b <- 0 # create single numeric with value 0 a == b # compare a and b to see if they are the same. [1] TRUE It turns out, you can create a vector of length 0, which contains 0 elements. This may sound odd, but can happen sometimes! However, you cannot create a vector of negative length (e.g. logical(-1) won’t work), or a fractional length (e.g. character(12.7) won’t work). 4.3.1.1 Accessing and Changing Elements After you’ve created a vector, how do you put your data in them? Here’s how you can change the value of a specific element: a <- c(1, 2, 3) # create numeric vector of length 3 a[2] <- 4 # change the value of the second element of a to 4 a # print the result [1] 1 4 3 See how the second element of a has changed? So you can access a specific element using square brackets: [ and ]. In fact, if you want to know the value of the third element (without changing anything), just use: a[3] # access the third element [1] 3 What do you think will be the result of the following code (hint: the result will either be TRUE or FALSE)? vec <- c(4, 5, 6) # create a vector vec[3] == 6 # Remember what == does? Once you make a guess, try it in R and see if you were correct. 4.3.1.2 Working with vectors You can do many things with vectors that you can with single instances of each data type. Recall, you can add a number to a numeric object: a <- 3 # create a numeric object a + 4 # add a number to the object. [1] 7 The same thing is possible with numeric vectors: a <- c(1, 2, 3) # create a numeric vector a + 4 # add a number to EACH ELEMENT of the vector! [1] 5 6 7 This type of behavior is called elementwise behavior. That is, the operation is performed on each element separately. Here are some other elementwise operations: a - 3 [1] -2 -1 0 a * 1.5 [1] 1.5 3.0 4.5 a ^ 2 [1] 1 4 9 a == 2 [1] FALSE TRUE FALSE R has some functions which summarize the values in a vector. One such function is the sum function, which adds the values of each element in the vector: print(a) # print the elements of a as a reminder sum(a) # add all the elements of a together. [1] 1 2 3 [1] 6 Other examples of summary functions include max, min, mean, and sd. We’ll talk about these and other summary functions later. Some operations work on two vectors, as long as they are the same length: b <- c(1, 0, 1) a + b [1] 2 2 4 b * a [1] 1 0 3 a ^ b [1] 1 1 3 You can even compare two vectors, and the result will be a logical vector: z <- a > b # compare a and b, element by element, assign the result to z z # print the value of z [1] FALSE TRUE TRUE The first logical value is the result of a[1] < b[1], the second logical value is the result of a[2] < b[2], etc. what operations can we perform on character vectors? Here are some examples: z == TRUE # which elements are TRUE? [1] FALSE TRUE TRUE This just produces z again (Do you see why?). Here’s how to get the logical “opposite” of z: z == FALSE [1] TRUE FALSE FALSE Or, as we saw before, we can use !, which operates on each element of z: !z [1] TRUE FALSE FALSE Remember how logical objects can be treated as numeric objects (either a 0 or 1)? If we use this with the sum function to determine how many elements are TRUE: sum(z) [1] 2 Here’s another example of using the sum function on a logical vector: sum(a == b) # how many elements do a and b have in common? [1] 1 So there are no elements where a and b are the same. Logical vectors can also be used to access all elements of a vector for which a certain condition is true. We’ll see how to do this later on. Let’s create some character vectors and explore a few things we can do with them: a <- c("I", "have", "to", "have", "a", "donkey") b <- c("You", "want", "to", "sell", "a", "donkey") First, we can do elementwise comparison (assuming equal length), just as we did for numeric vectors: a == b [1] FALSE FALSE TRUE FALSE TRUE TRUE To search for specific character strings in a character vector, you can use the grep function: grep("have", a) # search the vector a for the phrase "have" [1] 2 4 This result shows that the phrase “have” occurs in elements 2 and 4 of a! What if we search for a phrase that doesn’t occur? grep("raddish", a) integer(0) The result is an integer vector of length 0, meaning there are no elements that match the phrase! 4.3.1.3 Vectors of different types What if we try to perform operations between vectors of different types? This will work in some cases, but not others. Here are a few examples: a <- c(1, 2, 3) b <- c("I", "am", "sam") c <- c(TRUE, TRUE, FALSE) a + b # Can you add a numeric vector to a character vector? Error in a + b: non-numeric argument to binary operator a + c # can you add a numeric vector to a logical vector? [1] 2 3 3 We see that you can’t add a numeric vector to a character vector, but you can add a numeric vector to a logical vector. Why is this? Predict whether the following are possible: Can you can multiply a character vector with a numeric vector? Can you can multiply a logical vector with a numeric vector? Check whether you are correct by creating some vectors in R and attempting to multiply them together. Can you make sense of the answer? 4.3.1.4 Special Numeric Vectors There are a few special ways of creating a numeric vector which can be very useful, so we’ll mention them here. The first way creates a sequence of all integers between a starting and ending point: d <- 1:5 # create sequence starting at 1 and ending at 5 d [1] 1 2 3 4 5 Here’s a longer example: d <- 1:100 # create sequence starting at 1 and ending at 5 d [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 [91] 91 92 93 94 95 96 97 98 99 100 In this example, the R output can’t be shown on a single line, so it must be placed on multiple lines. Notice that each line has a different number in brackets: [1], [19], [37] etc. This number indicates which element of the vector is the start of that line. So we finally have an explanation for the [1] which is displayed with all R output. It’s simply indicating that this is the first element of the output. This also reflects the fact stated earlier that any R object can be considered a vector of length 1! When you’re working with large data sets, it’s often helpful to see just the first few results instead of printing the entire thing. You can use head() to print the first six rows. Another way to create a numeric vector is using the seq function, which allows you to specify the interval between each vector element. For example: e <- seq(2, 100, 2) e [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 [39] 78 80 82 84 86 88 90 92 94 96 98 100 Or you can also specify how long you want the vector to be, and seq will determine the appropriate interval to make the elements evenly spaced. seq(1, 10, length.out=3) [1] 1.0 5.5 10.0 seq(1, 10, length.out=5) [1] 1.00 3.25 5.50 7.75 10.00 4.3.1.5 Another Data Type: Factor In the previous section, we avoided talking about the factor data type, because we need the concept of vectors to appreciate their purpose, but now we are equipped to talk about them. Consider the following example of a character vector: cha_vec <- c("cheese", "crackers", "cheese", "crackers", "cheese", "crackers", "cheese") There are seven elements in this vector (length(cha_vec) is 7), but there are only two unique elements, “cheese” and “crackers”. Imagine having two write down this vector on a piece of paper, and the space it would take. Now imagine writing down instead: 1, 2, 1, 2, 1, 2, 1 1 = “cheese” 2 = “crackers” This second method writes down numbers instead of character strings, but also keeps a record of which numbers correspond to which character strings. The total amount of space taken up on the piece of paper is smaller for the second method, and the amount of space saved would be even larger if the character vector were longer and had more repeated elements. This is the essence of what a factor data type is: A character vector stored more efficiently on the computer. For a factor vector, R stores an integer vector (which often takes less space than a character vector), and also maintains a “lookup table” which keeps track of which integers correspond with which character strings. To illustrate, let’s create a factor variable: # create a new factor variable from our existing character vector: fac_vec <- factor(cha_vec) Notice how we started with a character vector and used the factor function to create a factor from it. If we print the new vector, fac_vec [1] cheese crackers cheese crackers cheese crackers cheese Levels: cheese crackers it displays the elements as we would expect, but also includes another line of output giving Levels. This shows that there are only two unique character strings, which are called factor levels. Since R is using integers “behind the scenes” to store the vector, we can see those integers by using the as.integer function: as.integer(fac_vec) [1] 1 2 1 2 1 2 1 This is another example of type conversion, which we will discuss soon. In some situations, numbers may get treated as characters, like so: x <- c(“4”, “5”, “6”) This may pose an issue if this character vector gets converted to a factor, because the “behind the scenes” integers may not agree with the Levels, which represent the original data. This can easily happen when reading in data from a file on your computer, if you’re not careful. We’ll talk more about this later. There are a few neat things you can do with factor vectors. By changing the levels, you can quickly change all occurrences of a string at once. For example: print(fac_vec) levels(fac_vec) <- c("peas", "carrots") # change the levels of fac_vec fac_vec [1] cheese crackers cheese crackers cheese crackers cheese Levels: cheese crackers [1] peas carrots peas carrots peas carrots peas Levels: peas carrots There is more to be said about factors, but this is all we will explore at this point. In newer versions of R, all strings are treated like factors behind the scenes, meaning there’s really no difference between factor and character types in terms of how much space they take up in the computer’s memory. However, R still treats the two types differently, so it’s important to remember that they are different! 4.3.1.6 Combining Vectors Given two vectors, it’s easy to combine them into one vector: a <- c(1, 2, 3) b <- c(4, 5, 6, 7) c(a, b) # combine vectors a and b [1] 1 2 3 4 5 6 7 The combine function (c) is smart enough to recognize that a and b are vectors, and performs concatenation to create the resultant longer vector. You can also use the combine function to add a single element to the end of a vector: a <- c("CEO", "CFO") # initialize a <- c(a, "CTO") # redefine a by combining a with a new element a [1] "CEO" "CFO" "CTO" In R, there may sometimes be more than one way to do the same thing, and one of the ways might be much faster or take much less computer memory to do. In other words, two sets of R commands be correct, but one may perform better than the other. Writing “performant” (high performance) code is an advanced topic that we will not discuss much in this introductory course. You’ve just seen one way to add an element to the end of a vector, but if you do this a lot (perhaps in a for loop, which we’ll talk about later), it can be very slow. In this situation you’re better off creating the whole vector at once and updating each element as needed. What if you try to combine vectors of different types? a <- c(1, 2, 3) b <- c("four", "five") c(a, b) [1] "1" "2" "3" "four" "five" Again, we see that the c function has converted all elements to be character strings, and the resultant vector is a character vector. Since we’ve seen type conversion arise a few times now, it’s appropriate to talk more explicitly about how it works. We’ll do that in the next section. 4.3.1.7 Type conversion There may be times when you’d like to convert from one type of data into another. An example would be the character string \"1\", which R does not view as a number. Therefore, the following does not work: "1" + "2" # R can't add two character string Error in "1" + "2": non-numeric argument to binary operator To remedy issues like this, R provides functions in order to convert from one data type into another: - as.character: converts to character - as.numeric: converts to numeric - as.logical: converts to logical - as.factor: converts to factor Using these functions, R will “do its best” to convert whatever you start with into the desired data type, but it’s not always possible to make the conversion. Below are a few examples which do and don’t work well. Converting from a numeric to a character vector is always possible: x <- c(3, 2, 1) y <- as.character(x) # Here's how to convert to a character vector print(x) print(y) [1] 3 2 1 [1] "3" "2" "1" However, converting from a character vector to a numeric only works if the characters represent numbers. Any element that won’t convert will be given w <- c("1", "12.3", "-5", "22") # this character vector can be converted to numeric as.numeric(w) [1] 1.0 12.3 -5.0 22.0 v <- c("frank", "went", "to", "mars") # this character vector can't be converted to numeric as.numeric(v) Warning: NAs introduced by coercion [1] NA NA NA NA None of the elements can be converted into a number, so R prints a warning message, and the result is an NA in each element, which stands for “not available”. NA indicates that a value is missing, and can arise in many different ways, which we will not explain here. NA values have interesting behavior in R. Generally, anything that “touches” an NA becomes an NA. You can try out these commands for yourself to see what happens: NA * 0 NA - NA c(NA, 1, 2) If only part of a vector can be converted, then the result will contain some converted values and some NA’s: u <- c("1.2", "chicken", "33") as.numeric(u) Warning: NAs introduced by coercion [1] 1.2 NA 33.0 What other conversions are possible? Character vectors can also be converted into logical: s <- c("TRUE", "FALSE", "T", "F", "cat") # all but the last element can be converted to logical as.logical(s) [1] TRUE FALSE TRUE FALSE NA Based on the examples we’ve seen before, it should make sense that numeric vectors containing 0 or 1 can also be converted into a logical vector: as.logical(c(1, 0, 1, 0)) # here we create the vector and convert it in the same line [1] TRUE FALSE TRUE FALSE Logical vectors can also be converted into character or numeric vectors. Based on what you know, make a prediction about what the following commands will produce: as.numeric(c(T, F, F, T)) as.character(c(T, F, F, T)) Check your predictions by typing running the commands in R. Remember that “solo” objects are just vectors of length 1, so any of these type conversions should work on a single object as well, like so: as.numeric("99") [1] 99 Along with the conversion functions as...., there are companion functions which simply check whether a vector is of a certain type: is.character: checks if character is.numeric: checks if numeric is.logical: checks if logical is.factor: checks if factor Here are some examples: a <- c("1", "2", "3") is.character(a) [1] TRUE is.numeric(a) [1] FALSE a <- as.numeric(a) is.character(a) [1] FALSE is.numeric(a) [1] TRUE As we’ve seen, type conversion is sometimes performed automatically, specifically when using the combine function (c). To understand more about this, try typing ?c to bring up the documentation, and have a look at the “Details” section. TODO: include conversion b/t character vec and factor vec. 4.3.2 Matrices Not all data can be arranged as an ordered set of elements, so R has other data structures besides vectors. Another data type is the matrix, which can be thought of as a grid of numbers. Here’s an example of creating a grid: data <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) A <- matrix(data, 3, 3) A [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 Here we’ve made a matrix with three rows and columns, by first creating a vector called `data'', and using thematrix` function and giving it the data, the number of rows, and the number of columns. Notice that R fills the matrix one column at a time, from left to right. Here’s how you access the data within a matrix: A[1,1] # Get the first element of the first row [1] 1 A[2,3] # Get the third element of the second row [1] 8 A[1,] # Get the entire first row [1] 1 4 7 A[,3] # Get the entire third column [1] 7 8 9 Just like with vectors, square brackets must be used to access the elements of a matrix. Don’t use parentheses like this: A(1,2). diag(A) # get the diagonal elements of A [1] 1 5 9 You can get the shape of a matrix with the dim function: dim(A) # how many rows & columns does A have? [1] 3 3 Which gives an integer vector tellins us A has three rows and three columns. In R, create the matrix A above, and write code to compute the first element of the second row times the third element of the third row. You can do some simple math with matrices, like this: A + 1 # Add a number to each element of the matrix [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 A * 2 # Multiply each element by a number [,1] [,2] [,3] [1,] 2 8 14 [2,] 4 10 16 [3,] 6 12 18 A ^ 2 # Square each element [,1] [,2] [,3] [1,] 1 16 49 [2,] 4 25 64 [3,] 9 36 81 If you’ve worked with matrices in a math class, you may have talked about some of the following operations: Here we can find the transpose of a matrix (the rows become columns and the columns become rows): t(A) # find the transpose [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 # Find the trace: sum(diag(A)) # get the diagonal elements of A, then sum them. [1] 15 Here are some things you can do with two matrices: B <- matrix(1, 3, 3) # create a 3x3 matrix of all 1's (notice how we only need one 1?) A + B # Add two matrices together [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 A * B # multiply the elements of A together [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 A %*% B # Perform matrix multiplication between A and B [,1] [,2] [,3] [1,] 12 12 12 [2,] 15 15 15 [3,] 18 18 18 Notice the difference between the last two examples? Just using * multiplies the matching elements of A and B together, while the new operator %*% performs matrix multiplication, like you may have seen in a linear algebra class. In R, perform matrix multiplication between A and the transpose of A. If two matrices don’t have the same shape, you won’t be able to add their elements together: C <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 3, 4) A * C Error in A * C: non-conformable arrays The error message: non-conformable arrays tells us that A and C have different shapes, so it’s impossible to multiply their matching elements together. But you can still perform matrix multiplication between them: A %*% C [,1] [,2] [,3] [,4] [1,] 30 66 102 138 [2,] 36 81 126 171 [3,] 42 96 150 204 Any data type (numeric, character, etc.) can be represented as a vector, but matrices only work with numeric types. A matrix is just a special case of a data structure called an array. Matrices have two dimensions (row and column), and arrays can have any number of dimensions (1, 2, 3, 4, 5, etc.). We won’t discuss arrays in this course much. Try running the following code in R, which should produce an error message: data <- c(4.5, 6.1, 3.3, 2.0) A <- matrix(data, 2, 3) Read the error message and the code carefully, and see if you can figure out the problem. What change would you make to the above code so that it runs? Remember everything inside a vector must have the _same data type_. Here we've seen that matrices _all have to be numeric data types_. Wouldn't it be nice if there were a way to store objects of different types (without doing type conversion)? This is what lists can do! ### Lists <div class="bonus"> <p><a href="https://purrr.tidyverse.org/">purrr</a> is a very useful library for working with lists.</p> </div> A List is an ordered set of _components_. This may sound similar to a vector, but the important difference is that with lists there is no requirement that the components have the same data type. Here is an example of a list: ```{.r .chunk-style} A <- list(42, "chicken", TRUE) A [[1]] [1] 42 [[2]] [1] "chicken" [[3]] [1] TRUE Here we see each component of the list printed in order, with [[1]], [[2]], and [[3]] indicating the first, second, and third components. To access just one of the components, use double square brackets ([[ and ]]): # Get the second component of A A[[2]] [1] "chicken" Notice that each component of A is a different data type (numeric, character, boolean), which is not a problem for lists. Nothing was converted automatically, as we saw happen with vectors. Here’s how to add a component to an existing list: A[[4]] <- matrix(c(1, 2, 3, 4, 5, 6), 2, 3) Notice how we accessed component 4, which didn’t exist yet, and assigned it a value. We actually added a matrix as the fourth component, this is not possible with vectors! Now A has four components: A [[1]] [1] 42 [[2]] [1] "chicken" [[3]] [1] TRUE [[4]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 Lists can even contain other lists! If you try to assign a list to be one of its own components (e.g. A[[5]] <- A), then R will make a copy of A and assign the copy to be one of the components of A. Thus there is no “self reference”, and no issue with Russel’s Paradox. So far we’ve seen vectors, lists, matrices, and arrays. How are they different and how are they similar? List components can also have names. Here we add an component with a name: A[["color"]] <- "yellow" Notice how this new component displays differently? Instead of showing [[5]], the component is labeled with a dollar sign, then its name: $color. You can access components using their name in two ways: A[["color"]] # use double square brackets to access a named element [1] "yellow" A$color # use dollar sign to access a named element [1] "yellow" But the color component is also the fifth component of the list, so we can access it like this as well: A[[5]] [1] "yellow" Here’s a new list created by giving names to each element: person <- list(name = "Millard Fillmore", occupation = "President", birth_year=1800) person $name [1] "Millard Fillmore" $occupation [1] "President" $birth_year [1] 1800 Below is some R code: C$year <- A[2,2] + B[[“age”]] Assuming this code works, what are the data types are A, B, and C? 4.3.2.1 Lists and Vectors Lists and Vectors are different data types, but in some ways they behave the same: Find the length of a list: length(person) # same for vectors and lists! [1] 3 Combine two lists: c(A, person) # same for vectors and lists! [[1]] [1] 42 [[2]] [1] "chicken" [[3]] [1] TRUE [[4]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 $color [1] "yellow" $name [1] "Millard Fillmore" $occupation [1] "President" $birth_year [1] 1800 A == "chicken" # compare against a character color FALSE TRUE FALSE FALSE FALSE However, there are some things that vectors can do that lists can’t: A + 1 # Add a number to each component (won't work) Error in A + 1: non-numeric argument to binary operator A == T # compare against a logical (won't work) Error in eval(expr, envir, enclos): 'list' object cannot be coerced to type 'logical' A == 12 # compare against a numeric (won't work) Error in eval(expr, envir, enclos): 'list' object cannot be coerced to type 'double' So there are trade-offs when deciding whether a list or a vector is most appropriate. 4.3.2.2 Lists of Vectors Certain types of lists show up all the time in R, lists of vectors: vec_1 <- c("Alice", "Bob", "Charlie") vec_2 <- c(99.4, 87.6, 22.1) vec_3 <- c("F", "M", "M") special_list <- list(name=vec_1, grade=vec_2, sex=vec_3) special_list $name [1] "Alice" "Bob" "Charlie" $grade [1] 99.4 87.6 22.1 $sex [1] "F" "M" "M" Here, each list stores a different piece of information about several people. Here’s another example: rocks <- list(specimen=c("A", "B", "C"), type=c("igneous", "metamorphic", "sedimentary"), weight=c(21.2, 56.7, 3.8), age=c(120, 10000, 5000000) ) rocks $specimen [1] "A" "B" "C" $type [1] "igneous" "metamorphic" "sedimentary" $weight [1] 21.2 56.7 3.8 $age [1] 120 10000 5000000 When defining the rocks list, we’ve spread the command accross multiple lines for clarity. The commas at the end of some of the lines indicate that the list has more components, so R will continue reading the next line until it finds the closing parenthesis, ’. There are so many sets of data that fit into this pattern, that R has a special data type called a data frame, which we will discuss in the next section. Create a matrix, a character vector, and a boolean object, then place them all in a new list called “my_list”, with the names “my_matrix”, “my_vector”, and “my_boolean”. 4.3.3 Data Frames At their core, data frames are just lists of vectors, but they also have some extra features as well. Here, we’ll re-define the rocks list from the previous section, but this time we’ll create it as a data frame: rocks <- data.frame(type=c("igneous", "metamorphic", "sedimentary"), weight=c(21.2, 56.7, 3.8), age=c(120, 10000, 5000000)) rocks # we'll add the specimen names later Now when R display rocks, it arranges the data in rows and columns, similar to how it displays matrices. Unlike matrices, however, columns can have non-numeric data! Remember that a data frame is basically a list of vectors, so even though it can contain different types of data (because it is a list), each column is a vector, which means each column must have all elements of the same type. The names of the columns are the names of the components of rocks, and the rows contain the data from each component vector. Remember that a data frame is basically a list of vectors, so we can access a component by its position or name: rocks[[1]] [1] "igneous" "metamorphic" "sedimentary" rocks$weight [1] 21.2 56.7 3.8 However, we can also access a data frame as if it were a matrix: rocks[1,3] # get the datum from the first row, third column. [1] 120 rocks[1,] # get the first row, this gives another data frame with a single row. rocks[,2] # get the second column, this gives a vector. [1] 21.2 56.7 3.8 Here’s how to get the shape of a data frame (number of rows and columns): dim(rocks) [1] 3 3 If we start with a list of vectors, we can convert it to a data frame with as.data.frame: people <- list(name=c("Alice", "Bob", "Charlie"), grade=c(99.4, 87.6, 22.1), sex=c("F", "M", "M")) as.data.frame(people) R comes with pre loaded with several data frames, such as mtcars, which contains data from the 1974 Motor Trend Magazine for 32 different automobiles: mtcars A list of included data sets in R can be found by running data(). Look at the column of car names on the left side of the mtcars data frame. It doesn’t have a column name (like mpg, cyl, etc.), because it’s not actually a column. These are row names, and you can access them like this: row.names(mtcars) [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant" [7] "Duster 360" "Merc 240D" "Merc 230" [10] "Merc 280" "Merc 280C" "Merc 450SE" [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood" [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128" [19] "Honda Civic" "Toyota Corolla" "Toyota Corona" [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28" [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2" [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino" [31] "Maserati Bora" "Volvo 142E" You can also access the column names like this: names(mtcars) [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" [11] "carb" These are two examples of attributes, which are like extra information which are attached to an object. We’ll discuss attributes more later when we discuss R objects. The column names and row names are just vectors, and you can access / modify them as such: row.names(rocks) <- c("A", "B", "C") rocks names(rocks)[[1]] <- "rock type" rocks Row and column names are allowed to have spaces in them, but you must be careful how you access them. The following code will not work: rocks\\(rock type</code> , because R will stop looking for the name you are referencing once it encounters a space. To access this column, you must enclose the reference in “backticks” ( <code>`</code> ) like so: <code>rocks\\)rock type. Look at the set of available data sets in R, and pick 2 data sets. For each data set, answer the following questions: What are the column names? What are the row names? What is the data type for each column? How many rows are in the data frame? How many columns are in the data frame? Any feedback for this section? Click here "],
-["r-objects.html", "4.4 R Objects", " 4.4 R Objects Wherever you are right now, look around your environment. Pick an object and study its attributes. It probably has a shape, a color, a weight, and many other ways of describing it. Now pick another object, and note how it is different than the first in terms of its attributes. What does the word “object” really mean? It’s often easier to give examples than to give a precise definition, but generally objects are “things you can do things with”. That is, you can usually look at them, touch them, smell them, and move them around (when appropriate/possible, of course!). Another useful definition is that objects are nouns. Different objects have different purposes and attributes. Many of these ideas will be true for R objects as well. We’ve already introduced the concepts of objects in R in passing, but here we briefly focus on what they are and how to work with them. 4.4.1 Everything is an object in R What exactly is an object in R? As in real life, it can be difficult to give a definition, but easier to give examples. Here are some examples of objects in R: A numeric variable A vector A matrix A list A data frame A function This list is not exhaustive, but most objects we deal with will look like one of these. In many programming languages, functions are handled differently from other types of objects (i.e. they are not “first class” objects). In R, they are treated the same as any other type of object. You can assign them to variables, pass them to other functions, and can be returned from a function. This is similar to the behavior of Java and Python, but different from C. 4.4.2 Assigning Objects Any object can be assigned to a variable, as we’ve been doing already. Here’s an example: a <- "pink pineapple" The <- is called an assignment operator. This is the most common way of assigning objects in R, but there are others. Sometimes you may see: a = "pink pineapple" which in most cases, has the exact same effect as using the <-, but in a few instances, it has a different effect. Our recommendation is to always use <- when making object assignments. There are other assignment operators as well, <<-, ->>, and <-, but we will not discuss these. You can find out more with the command ?assignOps. One neat thing you can do is assign multiple variables at the same time: a <- b <- "Hello" a [1] "Hello" b [1] "Hello" Even though a and b were assigned at the same time, they are still different! So if you change a with a <- “goodbye”, then the value of b will still be “Hello”. 4.4.3 Attributes Every object in R has attributes, extra information that’s “attached” to the object. Every object has a length attribute: a <- c(1, 2, 3, 4) b <- c("bonjour", "au revoir") length(a) [1] 4 length(b) [1] 2 Every object has a length. Try creating an example of the following and examining the length: 1. A boolean vector with 5 elements 1. A matrix with two rows and two columns 1. A list with two objects in it. Every R object has a mode as well, which tells you what type of object you have. Here are some examples: mode(a) [1] "numeric" mode(b) [1] "character" Every object has a length. Try creating an example of the following and examining the mode: 1. A boolean vector with 5 elements 1. A 2 x 2 matrix 1. The mtcars dataframe Aside from these two attributes, you can list all attributes of an object like this: attributes(mtcars) $names [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" [11] "carb" $row.names [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant" [7] "Duster 360" "Merc 240D" "Merc 230" [10] "Merc 280" "Merc 280C" "Merc 450SE" [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood" [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128" [19] "Honda Civic" "Toyota Corolla" "Toyota Corona" [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28" [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2" [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino" [31] "Maserati Bora" "Volvo 142E" $class [1] "data.frame" To access a specific attribute of an object, you can use the dollar sign ($): attr(mtcars, "class") # get the class attribute for the mtcars dataframe [1] "data.frame" 4.4.4 Null Objects There is a special object called the NULL object, which is really just represents “nothing”. It’s used mainly if you want to remove an element from a list: a <- list(1, 2, 3) a[[2]] <- NULL # replace component 2 with "nothing" a [[1]] [1] 1 [[2]] [1] 3 Or if a function is supposed to return but doesn’t have an object to return (more on this later when we discuss functions). 4.4.5 Removing Objects Sometimes you want to get rid of an object! In R, you can use the rm function like so: a <- "an object" rm(a) a Error in eval(expr, envir, enclos): object 'a' not found As you can see, the error message indicates that a has been removed. Sometimes, you’d like to remove all the objects in your environment. To do this, you can use the command: rm(list=ls()) Any feedback for this section? Click here "],
+["programming-preliminaries.html", "4.1 Programming Preliminaries", " 4.1 Programming Preliminaries Look at a sentence in a language you don’t know, look carefully at the symbols, spacing and characters. Recall learning a foreign language, how you had to learn the syntax and grammar rules. Now think about English (or another language you know well) and think about the syntax and grammar rules that you take for granted. All human languages rely on a set of rules called grammar, which describe how the language should be used to communicate. When two humans communicate with a language, they both must agree on the the rules of that language. R also has rules that must be followed in order for a human ( you ) to communicate with a computer, in order to tell the computer what to do. In human language, grammar is often fluid and evolving, and two people may have to adapt their use of the language in order to communicate. With R, the rules are fixed, and the computer “knows” them perfectly. It is up to you to learn the rules in order to make the computer do exactly what you want it to do. Since any computer programming language will do exactly what you tell it to do, it’s important to cover some of the basic rules of the R programming language before you can learn what it can do. So let’s get started: 4.1.1 R Commands Like most programming languages, R consists of a set of commands which form the sequence of instructions which the computer completes. You can think of commands as the verbs of R, they are the actions the computer will take. Here is an example of a command, followed by the result. print("hello, world!") [1] "hello, world!" This command is telling R to print out a message. R code usually contains more than one command, and typically each command is put on a separate line. Here are multiple commands, each on a separate line: print("The air is fine!") print(1+1) print(4 > 5) [1] "The air is fine!" [1] 2 [1] FALSE The first command prints another message, the second command does some math then then prints the result, and the third command evaluates whether the statement is true or false and prints the result. Generally, it’s a good idea to put separate commands on separate lines, but you can put multiple commands on the same line, as long as you separate them by a semicolon. See this code for example: x <- 1+1; print(x); print(x^2) [1] 2 [1] 4 In this example, three commands are given on one line. The first command creates a new variable called x, the second command prints the value of x, and the third command prints the value of x squared. We see that the semicolon, ;, serves as the command termination, because it tells R where one command ends and another begins. When a line contains a single command, no semicolon is necessary at the end, but including a semicolon doesn’t have any effect either. print("This line doesn't have a semicolon") print("This line does have a semicolon"); [1] "This line doesn't have a semicolon" [1] "This line does have a semicolon" Including multiple semicolons (e.g. print(“hello”);;) does not work! You’ve just seen your first example of assignment. That is, we created a thing called x , and assigned to it the value of 1+1 using the assignment operator, <-. Formally x is called an object, but we’ll talk more about objects and assignment later. So far, we’ve seen that you can place one command on one line, multiple commands on multiple lines, multiple commands on one line, so you may ask: can you can place one command on multiple lines? The answer is sometimes, depending on the command, but we will not discuss this now. At this point, we’ve introduced several new types of R commands (assigning a variable, squaring a number, etc.), and we will talk more specifically about these later. The important part of this section is how R code is arranged into different commands Lastly, commands can be “grouped together” using left and right curly braces: { and }. Here’s an example: { print("here's some code that's all grouped together") print(2^3 - 7) w <- "hello" print(w) } [1] "here's some code that's all grouped together" [1] 1 [1] "hello" The above grouped code is indented so that it looks nice, but it doesn’t have to be: { print("here's some code that's all grouped together") print(2^3 - 7) w <- "hello" print(w) } [1] "here's some code that's all grouped together" [1] 1 [1] "hello" Indenting is an example of coding style, which are formatting decisions which don’t affect the results of the code, but are meant to enhance readability. We’ll talk more about coding style later. In some programming languages, Python for example, white space matters. That is, code indents and other spaces change the way the code runs. In R, white space does not matter, so things like indents are used purely for readability. What does it mean to “group” code? At this point there is no practical difference, each command gets executed whether or not it is grouped inside curly braces. However, code grouping will become very important later on, when we discuss control flow later. There are several helpful shortcuts that you can use in R. If you forget to put quotes around something, you can highlight and press the quote key and it will add quotes to both sides. This works with parentheses too. You can also use tab completion with functions and defined variables. Tab completion allows you to use longer, more descriptive variable names without the additional typing time. This can save you a lot of time and reduce mistakes! In RStudio, open a new R script and type in all the R commands from this section, to verify that you get the same result. It’s good practice! 4.1.2 Comments When writing R code, you may wish to include notes which explain the code to your future self or to other humans. This can be done with comments, which are ignored by R when it is running the code. The “#” comment Here’s an example of some comments: # Let's define y and z y <- 8 z <- y + 5 # adding 5 to y and assigning the result to z ## This is still a comment, even though we're using two #'s Notice that it’s possible for a line to contain only a comment, or for part of a line to be a comment. R decides which part of a line is a comment by looking for the first “#”, and everything after that will be treated as a comment and ignored. R ignores comments, but you should not! If you’re reading code that someone else has written, it’s likely that also paying attention to their comments will greatly help you to understand what their code is doing. It’s also courteous to make good comments in your own code, if only because you may have to return to your own code in the future and re-learn what it is doing! In this book, we will use comments to help explain the R code that you will see. 4.1.3 Blank Lines Blank lines in R are ignored, but they can be used to organize code and enhance readability: print("The sky is blue") # the blank line below here is ignored print("The grass is green") [1] "The sky is blue" [1] "The grass is green" 4.1.4 CaSe SeNsItIvItY In R, variables, functions, and other objects (all of which we’ll talk about later), have names. These names are case sensitive, so you must be careful when referencing an object by name. Here we create two variables and give them different values, notice how they are different from each other: A <- 4 a <- 5 print(a) print(A) [1] 5 [1] 4 This may seem obvious, but case sensitivity applies to functions (which we’ll talk about later) too. We’ve been using the print function a lot in the above examples, which begins with a lower case p. There is no Print function: Print("testing") Error in Print(\"testing\"): could not find function \"Print\" 4.1.5 ? One very nice thing in R is the documentation that accompanies it. Every function included in R (like print) has documentation that explains how that function works. To access the documentation, use a ? followed by the name of the function, like so: ?print The output of the above code chunk is not shown, because the result of this code is best viewed in RStudio. Go to R Studio and type in ?print and observe what happens! 4.1.6 ?? If you don’t remember the exact name of a function, or would like to search for general matches to a topic, then you can use ??. For example, trying ?Print produces an error, because there is not Print function (remember, R is case sensitive), so there’s no documentation to go with it. However, the following should still work: ??Print Programmers have a sense of humor, too! Try running ????print to see a small joke. Remember, comedic taste varies! This is a lot to remember, but luckily you can use a cheat sheet while you’re learning. As you get more familiar with R, you’ll begin to memorize basic funtions - and google is always there for the rest. Want to know more about R syntax? Try typing ?Syntax in the R console (then press Enter). As we’ve seen, symbols and characters have specific meaning in R. You must be careful not to ignore things like semicolons, curly braces, parentheses, when reading R code. This takes practice! Okay, now that we’ve covered some of the basics, it’s time to start learning how to do useful things in R! The next few sections will describe the different types of data that R can handle. Any feedback for this section? Click here "],
+["data-types.html", "4.2 Data Types", " 4.2 Data Types Think of all the things you might be expected to remember. These different items can probably be categorized into different types of information, like phone numbers, passwords, birthdays, historical events, and math theorems for example. R was designed to handle different types of data as well, though the types are different from the examples just given. R can store and manipulate different pieces of information, called data, and these data can be of several different types. Here are some examples of different types of data: a <- 12.34 # a is a number b <- "Hello" # b is a string of characters c <- TRUE # c is a special type of data that is either true or false R has special names for these examples, and there are other types of data as well. Below, we’ll talk about each data type, one at a time. The term “data” is actually plural! A single piece of data is called a “datum”. So to refer to a set of data, you would say “these data”, and to refer to a single piece of data, you would say “this datum”. 4.2.1 Numeric Many data exist as numbers, and R has a specific data type for storing those numbers, called the numeric data type. Here are some examples: a <- -11 b <- 13.37 c <- 1/137 Note that integers, decimals, and fractions are all examples of numeric data in R. We can prove that these are all the same data type using the class function: class(a) [1] "numeric" class(b) [1] "numeric" class(c) [1] "numeric" So far, we’ve defined the a object a few different times, which is allowed! Every time we define a, R forgets the old value. Therefore we should reuse object names with caution, because it can become difficult to remember what the latest value is! When we discuss loops later, however, we will use code to automatically change the value of an object several times in order to do useful things! When you have numeric objects, you may want to perform math operations on them. R has a number of built in functions to deal with numeric data, here are some examples: print(a + b) # Add two numeric values print(b - c) # subtract two numeric values print(a * b) # multiply two numeric values print(a^3) # take the cube of a numeric value [1] 2.37 [1] 13.3627 [1] -147.07 [1] -1331 When performing math on numeric objects, R will obey order of operations, so the following two examples will give different results: a + b * c # R will perform the multiplication before the addition [1] -10.90241 (a + b) * c # R will perform the addition first, then the multiplication [1] 0.01729927 Notice that we’ve added extra spaces in the code to help you understand what’s going on. This is another example of code style, which we’ll talk more about later. Wait a second, we didn’t use the the print function just now, but R still displayed the results of the calculations! What is going on? This behavior is peculiar to something called R Markdown, which is what we used to create this book (yes, this book was creating using R! Pretty cool, huh?). If the last command given in a code block produces a result, and you don’t assign that result to anything (using <-), then R will print out that result. This means we don’t always have to use the print function when we want to display R output. Notice all the decimal points? R can be very precise when performing computations. However, viewing all of the digits stored by R can be distracting and hard to read. You can show just some of the digits by using the round function: a <- 0.123456 round(a, 3) [1] 0.123 It also turns out that R stores more digits than what it shows when it prints, though we won’t go into detail on that now. 4.2.2 Integer In general, numeric data in R are treated as if they can be any decimal number (technically, they are a double precision number, if you know what that means; if not, it’s not important right now). However, there is a way to specify that a specific numeric object is an integer, by placing an “L” at the end of it, like so: x <- 20 # x will be a numeric object y <- 20L # y will be an integer object class(x) [1] "numeric" class(y) [1] "integer" Integers take half of the space in a computer’s memory or hard drive, so if you are working with or storing a lot of numbers which are integers, it might make sense to declare them as integer type in R. This will make more sense when we discuss vectors later. 4.2.3 Character Not all data are numbers! R also has the capability to store strings of characters, and this is the aptly named character type (or sometimes called a character string or just string). Here are some examples: d <- "Hello" # This string is defined with *double* quotes e <- 'how are you?' # This string is defined with *single* quotes! print(d) print(e) [1] "Hello" [1] "how are you?" Notice how we can define character strings using single quotes or double quotes, as long as we are consistent. So this is not valid: # Note the mismatched single/double quotes: f <- "this does not work' Error: :2:6: unexpected INCOMPLETE_STRING 1: # Note the mismatched single/double quotes: 2: f So, make sure you are consistent. However, you may see another problem with this: some strings contain quotes in them, like this: g <- 'This won't work' Error: :1:16: unexpected symbol 1: g Since single quotes are being used to define the string, they can’t be used in the string itself, because R will “think” the string is ending at the second '. One option is to change the defining quotes to be double quotes, then the single quote will be safely included in the string: g <- "I'm happy that this works!" print(g) [1] "I'm happy that this works!" Another option is to use a backslash when using quotes inside the string, so that R “knows” the quote is part of the string and not ending the definition of the string: g <- 'I\\'ve found another way that works!' print(g) [1] "I've found another way that works!" Notice that when we define g we place a \\' anywhere in the string where we want a ' to be, but when printed out, we see that R has interpreted it as just a '. Notice also that we didn’t have to change the defining quotes to be double quotes in this case. The backslash is called the escape character, and it signifies that what follows it should be interpreted literally by R, and any special meaning should be ignored. Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another backslash, which functions as an escape character, like so: g <- “here is a backslash: \\\\”. You will see both backslashes when using the print function (which is meant for any data type), but if you use the special cat function (which is meant for character types specifically), all escape characters will be “processed”, and you will see just a single backslash. Try the same thing with the newline character, \\n! To see a list of special characters, try typing ?Quotes into the R console Here is an important string to know about: h <- "" # This string is empty! h is a character string with no characters, called an empty string. You can perform math on numeric data, so what can you do with strings? The answer is, quite alot, using some functions that R provides. Here are some of them: nchar(g) # This prints out the number of characters in a string [1] 34 substr(g, 6, 10) # This extracts just part of a string, using the start and stop positions you provide [1] "found" strsplit(g, " ") # This splits the string up using a specified "delimiter" string, a single space in this case. [[1]] [1] "I've" "found" "another" "way" "that" "works!" When you split a string, this produces a list containing a vector of character strings. This is an example of how data can be organized in a structured way. We’ll talk more about so called data structures in the next section. paste("hello", "world") # This combines multiple strings together into one string! [1] "hello world" Remember that you can learn more about a function using ?. Type ?paste into R and read the documentation carefully. Can you determine what the “sep” argument does? What do you think would happen if we ran the code print(“hello”, “world”, sep=“-”)? There are other ways of manipulating strings, but we’ll return to this later. 4.2.4 Logical Numeric objects can be any number, character objects can be any string of characters, but logical objects can only be two different values: True or False Logical data types are also known as “boolean” data types. Here we define some Logical objects: a <- TRUE b <- FALSE c <- T d <- F print(a) [1] TRUE print(b) [1] FALSE print(c) [1] TRUE print(d) [1] FALSE So you can see that we can define a logical object using the full name or just the first letter. Here’s how to get the “opposite” of a logical object !a [1] FALSE Logical data are the simplest type, but there are actually some clever things you can do with them. You can test whether simple mathematical expressions are true or false. # create x and y x <- 3 y <- 4 # check: is x less than y? (should give TRUE) x < y [1] TRUE The third command is a way to check if the value of x is less than the value of y. The result of this comparison is a logical, in this case, TRUE. Here are other ways of making comparisons: x <= y # check if x is less or equal to y [1] TRUE x == y # check if x is equal to y (note how you need two equals signs) [1] FALSE x >= y # check if x is greater or equal to y [1] FALSE x >= y # check if x is greater than y [1] FALSE Comparisons can be made using strings as well: x <- "Hello" y <- "hello" x == y [1] FALSE Remember that R is case sensitive, and two strings must be exactly the same to be considered equal. Of course any object (like x) will be equal to itself: x == x [1] TRUE Surprisingly, logicals can be treated as numerics, where TRUE is treated as 1 and FALSE is treated as 0. Here are some examples: TRUE + TRUE # TRUE is treated as 1 [1] 2 FALSE * 7 # FALSE is treated as 0 [1] 0 (2 < 3) + (1 == 2) # What's going on here? [1] 1 The last example deserves some thought. Start with each expression in parentheses, and decide whether it will evaluate to true or false. Then remember how logicals are treated as numbers, and determine what happens when you add them together. Numeric, integer, character, and logical data types are probably the most important data types to know in R, but there are others that were not covered here. These include: complex factor raw At least one of these (factor) will be covered later, but you can find more information about the other types here In the R console, type the following R commands and observe the result x <- \"5\" y <- 5 z <- (x == y) What data type is x? (check with R using the class function) What data type is y? What data type is z? What is the value of z, and why does this make sense? Now that we’ve discussed different types of data, we’ll now see how they can be structured together in meaningful ways. What about dates? R actually has three built-in date classes. This can be confusing at first, but libraries like lubridate make it easy to work with dates in R. Any feedback for this section? Click here "],
+["data-structures.html", "4.3 Data Structures", " 4.3 Data Structures Imagine a grocery list, shopping list, or to-do list. That list consists of a set of items in a specified order, and the list also has a length. Why do you think it’s useful to organize these items into a list, rather than in some other fashion? Can you think of why it might be useful to store data in a list? Often, you will need to work with many related data, for example: - A sequence of measurements through time - A grid of values - A set of phone numbers In these circumstances, it would make sense to organize the data into a data structure. R provides multiple data structures, each of which are appropriate in various situations. By far the most popular data structure in R is the data frame, but in order to talk about data frames, we must talk about some simpler data structures first. 4.3.1 Vectors A vector is just an ordered set of elements (in other words, data), all of which have the same data type. Vectors can be created for the logical, numeric (double or integer), or character data types. Here’s an example of a vector: x <- c(1, 2, 3) # this is a vector of numeric types print(x) [1] 1 2 3 Note that to create a vector, we use the c function, where c stands for combine. This makes sense, because we are combining three numeric objects into a numeric vector. We may determine the length of any atomic vector like so: length(x) [1] 3 The class function will tell us what type of data is stored in a vector (which makes sense, because all elements of the vector have the same data type). class(x) [1] "numeric" Here’s how to create logical or numeric vectors: y <- c(TRUE, TRUE, FALSE, TRUE) z <- c("to", "be", "or", "not", "to", "be") class(y) [1] "logical" length(y) [1] 4 class(z) [1] "character" length(z) [1] 6 The above statement states that all elements of a vector must have the same data type, so what do you think will happen if you try to create a vector using elements from different data types? Here are some possibilities, can you think of another one? R will produce an error R will combine the elements somehow, but the result won’t be a vector Something else? Whatever happens, humans were behind the decision of how R should behave in this situation. If you were in charge of making this decision, what would make the most sense? Let’s try to create a vector of mixed type and see what happens. Run the following commands in R and think about the output: m <- c(TRUE, “Hello”, 5) class(m) print(m) What changes did R make when creating the vector? What’s happening in the above code is an example of type conversion, which we will talk more about later. For now, remember that every element in an R vector is the same type. You can create empty vectors as placeholders, by indicating the data type and how many elements there are: empty <- numeric(10) # this creates a numeric vector of length 10 This is the first instance of us using a name which is longer than a single character! This new vector is called empty. Let’s print the contents of the vector: print(empty) [1] 0 0 0 0 0 0 0 0 0 0 Even though we didn’t tell R what data to put in the vector, it put a 0 in each element. This is the default value for a new vector. Here’s how you can create new vectors of other types: empty_int <- integer(45) # create integer vector with 45 elements empty_cha <- character(2) # create character vector with 2 elements empty_log <- logical(1000) # create logical vector with 1000 elements!! We saw that the default value for a numeric vector is 0. Use the code above to create empty integer, character, and logical vectors, then print them out to see what default values R has given to each element. Do these make sense? What happens if we create a vector of length 1? It turns out this is the same as just creating a single instance of that data type. Observe how the following are the same. a <- numeric(1) # create vector of length 1 (default value is 0, right?) b <- 0 # create single numeric with value 0 a == b # compare a and b to see if they are the same. [1] TRUE It turns out, you can create a vector of length 0, which contains 0 elements. This may sound odd, but can happen sometimes! However, you cannot create a vector of negative length (e.g. logical(-1) won’t work), or a fractional length (e.g. character(12.7) won’t work). 4.3.1.1 Accessing and Changing Elements After you’ve created a vector, how do you put your data in them? Here’s how you can change the value of a specific element: a <- c(1, 2, 3) # create numeric vector of length 3 a[2] <- 4 # change the value of the second element of a to 4 a # print the result [1] 1 4 3 See how the second element of a has changed? So you can access a specific element using square brackets: [ and ]. In fact, if you want to know the value of the third element (without changing anything), just use: a[3] # access the third element [1] 3 What do you think will be the result of the following code (hint: the result will either be TRUE or FALSE)? vec <- c(4, 5, 6) # create a vector vec[3] == 6 # Remember what == does? Once you make a guess, try it in R and see if you were correct. 4.3.1.2 Working with vectors You can do many things with vectors that you can with single instances of each data type. Recall, you can add a number to a numeric object: a <- 3 # create a numeric object a + 4 # add a number to the object. [1] 7 The same thing is possible with numeric vectors: a <- c(1, 2, 3) # create a numeric vector a + 4 # add a number to EACH ELEMENT of the vector! [1] 5 6 7 This type of behavior is called elementwise behavior. That is, the operation is performed on each element separately. Here are some other elementwise operations: a - 3 [1] -2 -1 0 a * 1.5 [1] 1.5 3.0 4.5 a ^ 2 [1] 1 4 9 a == 2 [1] FALSE TRUE FALSE R has some functions which summarize the values in a vector. One such function is the sum function, which adds the values of each element in the vector: print(a) # print the elements of a as a reminder sum(a) # add all the elements of a together. [1] 1 2 3 [1] 6 Other examples of summary functions include max, min, mean, and sd. We’ll talk about these and other summary functions later. Some operations work on two vectors, as long as they are the same length: b <- c(1, 0, 1) a + b [1] 2 2 4 b * a [1] 1 0 3 a ^ b [1] 1 1 3 You can even compare two vectors, and the result will be a logical vector: z <- a > b # compare a and b, element by element, assign the result to z z # print the value of z [1] FALSE TRUE TRUE The first logical value is the result of a[1] < b[1], the second logical value is the result of a[2] < b[2], etc. what operations can we perform on character vectors? Here are some examples: z == TRUE # which elements are TRUE? [1] FALSE TRUE TRUE This just produces z again (Do you see why?). Here’s how to get the logical “opposite” of z: z == FALSE [1] TRUE FALSE FALSE Or, as we saw before, we can use !, which operates on each element of z: !z [1] TRUE FALSE FALSE Remember how logical objects can be treated as numeric objects (either a 0 or 1)? If we use this with the sum function to determine how many elements are TRUE: sum(z) [1] 2 Here’s another example of using the sum function on a logical vector: sum(a == b) # how many elements do a and b have in common? [1] 1 So there are no elements where a and b are the same. Logical vectors can also be used to access all elements of a vector for which a certain condition is true. We’ll see how to do this later on. Let’s create some character vectors and explore a few things we can do with them: a <- c("I", "have", "to", "have", "a", "donkey") b <- c("You", "want", "to", "sell", "a", "donkey") First, we can do elementwise comparison (assuming equal length), just as we did for numeric vectors: a == b [1] FALSE FALSE TRUE FALSE TRUE TRUE To search for specific character strings in a character vector, you can use the grep function: grep("have", a) # search the vector a for the phrase "have" [1] 2 4 This result shows that the phrase “have” occurs in elements 2 and 4 of a! What if we search for a phrase that doesn’t occur? grep("raddish", a) integer(0) The result is an integer vector of length 0, meaning there are no elements that match the phrase! 4.3.1.3 Vectors of different types What if we try to perform operations between vectors of different types? This will work in some cases, but not others. Here are a few examples: a <- c(1, 2, 3) b <- c("I", "am", "sam") c <- c(TRUE, TRUE, FALSE) a + b # Can you add a numeric vector to a character vector? Error in a + b: non-numeric argument to binary operator a + c # can you add a numeric vector to a logical vector? [1] 2 3 3 We see that you can’t add a numeric vector to a character vector, but you can add a numeric vector to a logical vector. Why is this? Predict whether the following are possible: Can you can multiply a character vector with a numeric vector? Can you can multiply a logical vector with a numeric vector? Check whether you are correct by creating some vectors in R and attempting to multiply them together. Can you make sense of the answer? 4.3.1.4 Special Numeric Vectors There are a few special ways of creating a numeric vector which can be very useful, so we’ll mention them here. The first way creates a sequence of all integers between a starting and ending point: d <- 1:5 # create sequence starting at 1 and ending at 5 d [1] 1 2 3 4 5 Here’s a longer example: d <- 1:100 # create sequence starting at 1 and ending at 5 d [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 [19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 [55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 [91] 91 92 93 94 95 96 97 98 99 100 In this example, the R output can’t be shown on a single line, so it must be placed on multiple lines. Notice that each line has a different number in brackets: [1], [19], [37] etc. This number indicates which element of the vector is the start of that line. So we finally have an explanation for the [1] which is displayed with all R output. It’s simply indicating that this is the first element of the output. This also reflects the fact stated earlier that any R object can be considered a vector of length 1! When you’re working with large data sets, it’s often helpful to see just the first few results instead of printing the entire thing. You can use head() to print the first six rows. Another way to create a numeric vector is using the seq function, which allows you to specify the interval between each vector element. For example: e <- seq(2, 100, 2) e [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 [39] 78 80 82 84 86 88 90 92 94 96 98 100 Or you can also specify how long you want the vector to be, and seq will determine the appropriate interval to make the elements evenly spaced. seq(1, 10, length.out=3) [1] 1.0 5.5 10.0 seq(1, 10, length.out=5) [1] 1.00 3.25 5.50 7.75 10.00 4.3.1.5 Another Data Type: Factor In the previous section, we avoided talking about the factor data type, because we need the concept of vectors to appreciate their purpose, but now we are equipped to talk about them. Consider the following example of a character vector: cha_vec <- c("cheese", "crackers", "cheese", "crackers", "cheese", "crackers", "cheese") There are seven elements in this vector (length(cha_vec) is 7), but there are only two unique elements, “cheese” and “crackers”. Imagine having two write down this vector on a piece of paper, and the space it would take. Now imagine writing down instead: 1, 2, 1, 2, 1, 2, 1 1 = “cheese” 2 = “crackers” This second method writes down numbers instead of character strings, but also keeps a record of which numbers correspond to which character strings. The total amount of space taken up on the piece of paper is smaller for the second method, and the amount of space saved would be even larger if the character vector were longer and had more repeated elements. This is the essence of what a factor data type is: A character vector stored more efficiently on the computer. For a factor vector, R stores an integer vector (which often takes less space than a character vector), and also maintains a “lookup table” which keeps track of which integers correspond with which character strings. To illustrate, let’s create a factor variable: # create a new factor variable from our existing character vector: fac_vec <- factor(cha_vec) Notice how we started with a character vector and used the factor function to create a factor from it. If we print the new vector, fac_vec [1] cheese crackers cheese crackers cheese crackers cheese Levels: cheese crackers it displays the elements as we would expect, but also includes another line of output giving Levels. This shows that there are only two unique character strings, which are called factor levels. Since R is using integers “behind the scenes” to store the vector, we can see those integers by using the as.integer function: as.integer(fac_vec) [1] 1 2 1 2 1 2 1 This is another example of type conversion, which we will discuss soon. In some situations, numbers may get treated as characters, like so: x <- c(“4”, “5”, “6”) This may pose an issue if this character vector gets converted to a factor, because the “behind the scenes” integers may not agree with the Levels, which represent the original data. This can easily happen when reading in data from a file on your computer, if you’re not careful. We’ll talk more about this later. There are a few neat things you can do with factor vectors. By changing the levels, you can quickly change all occurrences of a string at once. For example: print(fac_vec) levels(fac_vec) <- c("peas", "carrots") # change the levels of fac_vec fac_vec [1] cheese crackers cheese crackers cheese crackers cheese Levels: cheese crackers [1] peas carrots peas carrots peas carrots peas Levels: peas carrots There is more to be said about factors, but this is all we will explore at this point. In newer versions of R, all strings are treated like factors behind the scenes, meaning there’s really no difference between factor and character types in terms of how much space they take up in the computer’s memory. However, R still treats the two types differently, so it’s important to remember that they are different! 4.3.1.6 Combining Vectors Given two vectors, it’s easy to combine them into one vector: a <- c(1, 2, 3) b <- c(4, 5, 6, 7) c(a, b) # combine vectors a and b [1] 1 2 3 4 5 6 7 The combine function (c) is smart enough to recognize that a and b are vectors, and performs concatenation to create the resultant longer vector. You can also use the combine function to add a single element to the end of a vector: a <- c("CEO", "CFO") # initialize a <- c(a, "CTO") # redefine a by combining a with a new element a [1] "CEO" "CFO" "CTO" In R, there may sometimes be more than one way to do the same thing, and one of the ways might be much faster or take much less computer memory to do. In other words, two sets of R commands be correct, but one may perform better than the other. Writing “performant” (high performance) code is an advanced topic that we will not discuss much in this introductory course. You’ve just seen one way to add an element to the end of a vector, but if you do this a lot (perhaps in a for loop, which we’ll talk about later), it can be very slow. In this situation you’re better off creating the whole vector at once and updating each element as needed. What if you try to combine vectors of different types? a <- c(1, 2, 3) b <- c("four", "five") c(a, b) [1] "1" "2" "3" "four" "five" Again, we see that the c function has converted all elements to be character strings, and the resultant vector is a character vector. Since we’ve seen type conversion arise a few times now, it’s appropriate to talk more explicitly about how it works. We’ll do that in the next section. 4.3.1.7 Type conversion There may be times when you’d like to convert from one type of data into another. An example would be the character string \"1\", which R does not view as a number. Therefore, the following does not work: "1" + "2" # R can't add two character string Error in \"1\" + \"2\": non-numeric argument to binary operator To remedy issues like this, R provides functions in order to convert from one data type into another: - as.character: converts to character - as.numeric: converts to numeric - as.logical: converts to logical - as.factor: converts to factor Using these functions, R will “do its best” to convert whatever you start with into the desired data type, but it’s not always possible to make the conversion. Below are a few examples which do and don’t work well. Converting from a numeric to a character vector is always possible: x <- c(3, 2, 1) y <- as.character(x) # Here's how to convert to a character vector print(x) print(y) [1] 3 2 1 [1] "3" "2" "1" However, converting from a character vector to a numeric only works if the characters represent numbers. Any element that won’t convert will be given w <- c("1", "12.3", "-5", "22") # this character vector can be converted to numeric as.numeric(w) [1] 1.0 12.3 -5.0 22.0 v <- c("frank", "went", "to", "mars") # this character vector can't be converted to numeric as.numeric(v) Warning: NAs introduced by coercion [1] NA NA NA NA None of the elements can be converted into a number, so R prints a warning message, and the result is an NA in each element, which stands for “not available”. NA indicates that a value is missing, and can arise in many different ways, which we will not explain here. NA values have interesting behavior in R. Generally, anything that “touches” an NA becomes an NA. You can try out these commands for yourself to see what happens: NA * 0 NA - NA c(NA, 1, 2) If only part of a vector can be converted, then the result will contain some converted values and some NA’s: u <- c("1.2", "chicken", "33") as.numeric(u) Warning: NAs introduced by coercion [1] 1.2 NA 33.0 What other conversions are possible? Character vectors can also be converted into logical: s <- c("TRUE", "FALSE", "T", "F", "cat") # all but the last element can be converted to logical as.logical(s) [1] TRUE FALSE TRUE FALSE NA Based on the examples we’ve seen before, it should make sense that numeric vectors containing 0 or 1 can also be converted into a logical vector: as.logical(c(1, 0, 1, 0)) # here we create the vector and convert it in the same line [1] TRUE FALSE TRUE FALSE Logical vectors can also be converted into character or numeric vectors. Based on what you know, make a prediction about what the following commands will produce: as.numeric(c(T, F, F, T)) as.character(c(T, F, F, T)) Check your predictions by typing running the commands in R. Remember that “solo” objects are just vectors of length 1, so any of these type conversions should work on a single object as well, like so: as.numeric("99") [1] 99 Along with the conversion functions as...., there are companion functions which simply check whether a vector is of a certain type: is.character: checks if character is.numeric: checks if numeric is.logical: checks if logical is.factor: checks if factor Here are some examples: a <- c("1", "2", "3") is.character(a) [1] TRUE is.numeric(a) [1] FALSE a <- as.numeric(a) is.character(a) [1] FALSE is.numeric(a) [1] TRUE As we’ve seen, type conversion is sometimes performed automatically, specifically when using the combine function (c). To understand more about this, try typing ?c to bring up the documentation, and have a look at the “Details” section. 4.3.2 Matrices Not all data can be arranged as an ordered set of elements, so R has other data structures besides vectors. Another data type is the matrix, which can be thought of as a grid of numbers. Here’s an example of creating a grid: data <- c(1, 2, 3, 4, 5, 6, 7, 8, 9) A <- matrix(data, 3, 3) A [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 Here we’ve made a matrix with three rows and columns, by first creating a vector called `data'', and using thematrix` function and giving it the data, the number of rows, and the number of columns. Notice that R fills the matrix one column at a time, from left to right. Here’s how you access the data within a matrix: A[1,1] # Get the first element of the first row [1] 1 A[2,3] # Get the third element of the second row [1] 8 A[1,] # Get the entire first row [1] 1 4 7 A[,3] # Get the entire third column [1] 7 8 9 Just like with vectors, square brackets must be used to access the elements of a matrix. Don’t use parentheses like this: A(1,2). diag(A) # get the diagonal elements of A [1] 1 5 9 You can get the shape of a matrix with the dim function: dim(A) # how many rows & columns does A have? [1] 3 3 Which gives an integer vector tellins us A has three rows and three columns. In R, create the matrix A above, and write code to compute the first element of the second row times the third element of the third row. You can do some simple math with matrices, like this: A + 1 # Add a number to each element of the matrix [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 A * 2 # Multiply each element by a number [,1] [,2] [,3] [1,] 2 8 14 [2,] 4 10 16 [3,] 6 12 18 A ^ 2 # Square each element [,1] [,2] [,3] [1,] 1 16 49 [2,] 4 25 64 [3,] 9 36 81 If you’ve worked with matrices in a math class, you may have talked about some of the following operations: Here we can find the transpose of a matrix (the rows become columns and the columns become rows): t(A) # find the transpose [,1] [,2] [,3] [1,] 1 2 3 [2,] 4 5 6 [3,] 7 8 9 # Find the trace: sum(diag(A)) # get the diagonal elements of A, then sum them. [1] 15 Here are some things you can do with two matrices: B <- matrix(1, 3, 3) # create a 3x3 matrix of all 1's (notice how we only need one 1?) A + B # Add two matrices together [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 A * B # multiply the elements of A together [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 A %*% B # Perform matrix multiplication between A and B [,1] [,2] [,3] [1,] 12 12 12 [2,] 15 15 15 [3,] 18 18 18 Notice the difference between the last two examples? Just using * multiplies the matching elements of A and B together, while the new operator %*% performs matrix multiplication, like you may have seen in a linear algebra class. In R, perform matrix multiplication between A and the transpose of A. If two matrices don’t have the same shape, you won’t be able to add their elements together: C <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), 3, 4) A * C Error in A * C: non-conformable arrays The error message: non-conformable arrays tells us that A and C have different shapes, so it’s impossible to multiply their matching elements together. But you can still perform matrix multiplication between them: A %*% C [,1] [,2] [,3] [,4] [1,] 30 66 102 138 [2,] 36 81 126 171 [3,] 42 96 150 204 Any data type (numeric, character, etc.) can be represented as a vector, but matrices only work with numeric types. A matrix is just a special case of a data structure called an array. Matrices have two dimensions (row and column), and arrays can have any number of dimensions (1, 2, 3, 4, 5, etc.). We won’t discuss arrays in this course much. Try running the following code in R, which should produce an error message: data <- c(4.5, 6.1, 3.3, 2.0) A <- matrix(data, 2, 3) Read the error message and the code carefully, and see if you can figure out the problem. What change would you make to the above code so that it runs? Remember everything inside a vector must have the _same data type_. Here we've seen that matrices _all have to be numeric data types_. Wouldn't it be nice if there were a way to store objects of different types (without doing type conversion)? This is what lists can do! ### Lists <div class="bonus"> <p><a href="https://purrr.tidyverse.org/">purrr</a> is a very useful library for working with lists.</p> </div> A List is an ordered set of _components_. This may sound similar to a vector, but the important difference is that with lists there is no requirement that the components have the same data type. Here is an example of a list: ```{.r .chunk-style} A <- list(42, "chicken", TRUE) A [[1]] [1] 42 [[2]] [1] "chicken" [[3]] [1] TRUE Here we see each component of the list printed in order, with [[1]], [[2]], and [[3]] indicating the first, second, and third components. To access just one of the components, use double square brackets ([[ and ]]): # Get the second component of A A[[2]] [1] "chicken" Notice that each component of A is a different data type (numeric, character, boolean), which is not a problem for lists. Nothing was converted automatically, as we saw happen with vectors. Here’s how to add a component to an existing list: A[[4]] <- matrix(c(1, 2, 3, 4, 5, 6), 2, 3) Notice how we accessed component 4, which didn’t exist yet, and assigned it a value. We actually added a matrix as the fourth component, this is not possible with vectors! Now A has four components: A [[1]] [1] 42 [[2]] [1] "chicken" [[3]] [1] TRUE [[4]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 Lists can even contain other lists! If you try to assign a list to be one of its own components (e.g. A[[5]] <- A), then R will make a copy of A and assign the copy to be one of the components of A. Thus there is no “self reference”, and no issue with Russel’s Paradox. So far we’ve seen vectors, lists, matrices, and arrays. How are they different and how are they similar? List components can also have names. Here we add an component with a name: A[["color"]] <- "yellow" Notice how this new component displays differently? Instead of showing [[5]], the component is labeled with a dollar sign, then its name: $color. You can access components using their name in two ways: A[["color"]] # use double square brackets to access a named element [1] "yellow" A$color # use dollar sign to access a named element [1] "yellow" But the color component is also the fifth component of the list, so we can access it like this as well: A[[5]] [1] "yellow" Here’s a new list created by giving names to each element: person <- list(name = "Millard Fillmore", occupation = "President", birth_year=1800) person $name [1] "Millard Fillmore" $occupation [1] "President" $birth_year [1] 1800 Below is some R code: C$year <- A[2,2] + B[[“age”]] Assuming this code works, what are the data types are A, B, and C? 4.3.2.1 Lists and Vectors Lists and Vectors are different data types, but in some ways they behave the same: Find the length of a list: length(person) # same for vectors and lists! [1] 3 Combine two lists: c(A, person) # same for vectors and lists! [[1]] [1] 42 [[2]] [1] "chicken" [[3]] [1] TRUE [[4]] [,1] [,2] [,3] [1,] 1 3 5 [2,] 2 4 6 $color [1] "yellow" $name [1] "Millard Fillmore" $occupation [1] "President" $birth_year [1] 1800 A == "chicken" # compare against a character color FALSE TRUE FALSE FALSE FALSE However, there are some things that vectors can do that lists can’t: A + 1 # Add a number to each component (won't work) Error in A + 1: non-numeric argument to binary operator A == T # compare against a logical (won't work) Error in eval(expr, envir, enclos): 'list' object cannot be coerced to type 'logical' A == 12 # compare against a numeric (won't work) Error in eval(expr, envir, enclos): 'list' object cannot be coerced to type 'double' So there are trade-offs when deciding whether a list or a vector is most appropriate. 4.3.2.2 Lists of Vectors Certain types of lists show up all the time in R, lists of vectors: vec_1 <- c("Alice", "Bob", "Charlie") vec_2 <- c(99.4, 87.6, 22.1) vec_3 <- c("F", "M", "M") special_list <- list(name=vec_1, grade=vec_2, sex=vec_3) special_list $name [1] "Alice" "Bob" "Charlie" $grade [1] 99.4 87.6 22.1 $sex [1] "F" "M" "M" Here, each list stores a different piece of information about several people. Here’s another example: rocks <- list(specimen=c("A", "B", "C"), type=c("igneous", "metamorphic", "sedimentary"), weight=c(21.2, 56.7, 3.8), age=c(120, 10000, 5000000) ) rocks $specimen [1] "A" "B" "C" $type [1] "igneous" "metamorphic" "sedimentary" $weight [1] 21.2 56.7 3.8 $age [1] 120 10000 5000000 When defining the rocks list, we’ve spread the command accross multiple lines for clarity. The commas at the end of some of the lines indicate that the list has more components, so R will continue reading the next line until it finds the closing parenthesis, ). There are so many sets of data that fit into this pattern, that R has a special data type called a data frame, which we will discuss in the next section. Create a matrix, a character vector, and a boolean object, then place them all in a new list called “my_list”, with the names “my_matrix”, “my_vector”, and “my_boolean”. 4.3.3 Data Frames At their core, data frames are just lists of vectors, but they also have some extra features as well. Here, we’ll re-define the rocks list from the previous section, but this time we’ll create it as a data frame: rocks <- data.frame(type=c("igneous", "metamorphic", "sedimentary"), weight=c(21.2, 56.7, 3.8), age=c(120, 10000, 5000000)) rocks # we'll add the specimen names later Now when R display rocks, it arranges the data in rows and columns, similar to how it displays matrices. Unlike matrices, however, columns can have non-numeric data! Remember that a data frame is basically a list of vectors, so even though it can contain different types of data (because it is a list), each column is a vector, which means each column must have all elements of the same type. The names of the columns are the names of the components of rocks, and the rows contain the data from each component vector. Remember that a data frame is basically a list of vectors, so we can access a component by its position or name: rocks[[1]] [1] "igneous" "metamorphic" "sedimentary" rocks$weight [1] 21.2 56.7 3.8 However, we can also access a data frame as if it were a matrix: rocks[1,3] # get the datum from the first row, third column. [1] 120 rocks[1,] # get the first row, this gives another data frame with a single row. rocks[,2] # get the second column, this gives a vector. [1] 21.2 56.7 3.8 Here’s how to get the shape of a data frame (number of rows and columns): dim(rocks) [1] 3 3 If we start with a list of vectors, we can convert it to a data frame with as.data.frame: people <- list(name=c("Alice", "Bob", "Charlie"), grade=c(99.4, 87.6, 22.1), sex=c("F", "M", "M")) as.data.frame(people) R comes with pre loaded with several data frames, such as mtcars, which contains data from the 1974 Motor Trend Magazine for 32 different automobiles: mtcars A list of included data sets in R can be found by running data(). Look at the column of car names on the left side of the mtcars data frame. It doesn’t have a column name (like mpg, cyl, etc.), because it’s not actually a column. These are row names, and you can access them like this: row.names(mtcars) [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant" [7] "Duster 360" "Merc 240D" "Merc 230" [10] "Merc 280" "Merc 280C" "Merc 450SE" [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood" [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128" [19] "Honda Civic" "Toyota Corolla" "Toyota Corona" [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28" [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2" [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino" [31] "Maserati Bora" "Volvo 142E" You can also access the column names like this: names(mtcars) [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" [11] "carb" These are two examples of attributes, which are like extra information which are attached to an object. We’ll discuss attributes more later when we discuss R objects. The column names and row names are just vectors, and you can access / modify them as such: row.names(rocks) <- c("A", "B", "C") rocks names(rocks)[[1]] <- "rock type" rocks Row and column names are allowed to have spaces in them, but you must be careful how you access them. The following code will not work: rocks$rock type , because R will stop looking for the name you are referencing once it encounters a space. To access this column, you must enclose the reference in “backticks” ( ` ) like so: rocks$`rock type`. Look at the set of available data sets in R, and pick 2 data sets. For each data set, answer the following questions: What are the column names? What are the row names? What is the data type for each column? How many rows are in the data frame? How many columns are in the data frame? Any feedback for this section? Click here "],
+["r-objects.html", "4.4 R Objects", " 4.4 R Objects Wherever you are right now, look around your environment. Pick an object and study its attributes. It probably has a shape, a color, a weight, and many other ways of describing it. Now pick another object, and note how it is different than the first in terms of its attributes. What does the word “object” really mean? It’s often easier to give examples than to give a precise definition, but generally objects are “things you can do things with”. That is, you can usually look at them, touch them, smell them, and move them around (when appropriate/possible, of course!). Another useful definition is that objects are nouns. Different objects have different purposes and attributes. Many of these ideas will be true for R objects as well. We’ve already introduced the concepts of objects in R in passing, but here we briefly focus on what they are and how to work with them. 4.4.1 Everything is an object in R What exactly is an object in R? As in real life, it can be difficult to give a definition, but easier to give examples. Here are some examples of objects in R: A numeric variable A vector A matrix A list A data frame A function This list is not exhaustive, but most objects we deal with will look like one of these. In many programming languages, functions are handled differently from other types of objects (i.e. they are not “first class” objects). In R, they are treated the same as any other type of object. You can assign them to variables, pass them to other functions, and can be returned from a function. This is similar to the behavior of Java and Python, but different from C. 4.4.2 Assigning Objects Any object can be assigned to a variable, as we’ve been doing already. Here’s an example: a <- "pink pineapple" The <- is called an assignment operator. This is the most common way of assigning objects in R, but there are others. Sometimes you may see: a = "pink pineapple" which in most cases, has the exact same effect as using the <-, but in a few instances, it has a different effect. Our recommendation is to always use <- when making object assignments. There are other assignment operators as well, <<-, ->>, and <-, but we will not discuss these. You can find out more with the command ?assignOps. One neat thing you can do is assign multiple variables at the same time: a <- b <- "Hello" a [1] "Hello" b [1] "Hello" Even though a and b were assigned at the same time, they are still different! So if you change a with a <- “goodbye”, then the value of b will still be “Hello”. 4.4.3 Attributes Every object in R has attributes, extra information that’s “attached” to the object. Every object has a length attribute: a <- c(1, 2, 3, 4) b <- c("bonjour", "au revoir") length(a) [1] 4 length(b) [1] 2 Every object has a length. Try creating an example of the following and examining the length: 1. A boolean vector with 5 elements 1. A matrix with two rows and two columns 1. A list with two objects in it. Every R object has a mode as well, which tells you what type of object you have. Here are some examples: mode(a) [1] "numeric" mode(b) [1] "character" Every object has a length. Try creating an example of the following and examining the mode: 1. A boolean vector with 5 elements 1. A 2 x 2 matrix 1. The mtcars dataframe Aside from these two attributes, you can list all attributes of an object like this: attributes(mtcars) $names [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" [11] "carb" $row.names [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant" [7] "Duster 360" "Merc 240D" "Merc 230" [10] "Merc 280" "Merc 280C" "Merc 450SE" [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood" [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128" [19] "Honda Civic" "Toyota Corolla" "Toyota Corona" [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28" [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2" [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino" [31] "Maserati Bora" "Volvo 142E" $class [1] "data.frame" To access a specific attribute of an object, you can do this: attr(mtcars, "class") # get the class attribute for the mtcars dataframe [1] "data.frame" 4.4.4 Null Objects There is a special object called the NULL object, which is really just represents “nothing”. It’s used mainly if you want to remove an element from a list: a <- list(1, 2, 3) a[[2]] <- NULL # replace component 2 with "nothing" a [[1]] [1] 1 [[2]] [1] 3 Or if a function is supposed to return but doesn’t have an object to return (more on this later when we discuss functions). 4.4.5 Removing Objects Sometimes you want to get rid of an object! In R, you can use the rm function like so: a <- "an object" rm(a) a Error in eval(expr, envir, enclos): object 'a' not found As you can see, the error message indicates that a has been removed. Sometimes, you’d like to remove all the objects in your environment. To do this, you can use the command: rm(list=ls()) Any feedback for this section? Click here "],
["wrap-up.html", "4.5 Wrap Up", " 4.5 Wrap Up This concludes the chapter. Next we’ll use the R skills you’ve developed in this chapter to do interesting things with Data! "],
-["working-with-data.html", "Chapter 5 Working with Data", " Chapter 5 Working with Data This section is still under construction "],
-["loading-saving-data.html", "5.1 Loading / Saving Data", " 5.1 Loading / Saving Data This section is still under construction 5.1.1 “Taster”(?) list of file forms and sources 5.1.2 reading/writing csv 5.1.3 Best practices principles of Tidy Data (wickham 2014) raw data as read only Give me some feedback about the content in this section. "],
-["downloading-and-saving.html", "5.2 Downloading and Saving", " 5.2 Downloading and Saving This section is still under construction downloading and saving example csv from canvas or website Give me some feedback about the content in this section. "],
-["working-with-data-1.html", "5.3 Working With Data", " 5.3 Working With Data This section is still under construction 5.3.1 Basic indexing matrices: can use single index nested indexing: [[1]][3] 5.3.2 Advanced indexing dplyr example Give me some feedback about the content in this section. 5.3.3 Summarizing vectors Mean, std, etc. 5.3.4 Summarizing matrices 5.3.5 Summarizing vectors 5.3.6 Basic Plotting ggplot example Remember to pay close attention to the syntax when using ggplot. R won’t work if parenthesis or quotes are not paired. If you’re adding multiple layers, put each on a new line. Adding structure to make your code more readable is like doing your future self a favor (and anyone reading your code). "],
-["practice.html", "5.4 Practice", " 5.4 Practice This section is still under construction asdf "],
+["working-with-data.html", "Chapter 5 Working with Data", " Chapter 5 Working with Data This chapter is still under construction In this Chapter, we’ll use R to do interesting things with data. This is by far the most popular use of the R programming language, and arguably the most fun! You’ll learn how to load data sets into R, do interesting things with them, and save your results. "],
+["quick-example.html", "5.1 Quick Example", " 5.1 Quick Example Before diving into detail, let’s do a quick example so you can begin to see what is possible with data in R. As we mentioned in the last chapter, R includes some pre-packaged data sets, which you can access with the data() command. One of the data sets is Seatbelts, which documents road casualties in Great Britain between 1969 and 1984. Firstly, we need to convert Seatbelts to a dataframe, because it starts out as a “Time-Series”, which we haven’t discussed. Seatbelts <- data.frame(as.matrix(Seatbelts), date=time(Seatbelts)) We’ve also added a month and year column look at the dimensions of the data set with the dim command: dim(Seatbelts) # get the number of dimensions in the Seatbelts dataframe [1] 192 9 This shows that there are 192 rows (months), and 9 columns (variables measured each month). We could also determine the number of rows and columns separately using the nrow and ncol functions. To view the first few rows of the Seatbelts dataframe, use the head command: head(Seatbelts) # view first few rows of the Seatbelts dataset This is a good way to learn which variables are being measured (columns) and see some example observations (rows) for each variable. Because these data are included with R, more information about each variable can be found with: ?Seatbelts Next, let’s view a summary of each column with the summary function: summary(Seatbelts) DriversKilled drivers front rear Min. : 60.0 Min. :1057 Min. : 426.0 Min. :224.0 1st Qu.:104.8 1st Qu.:1462 1st Qu.: 715.5 1st Qu.:344.8 Median :118.5 Median :1631 Median : 828.5 Median :401.5 Mean :122.8 Mean :1670 Mean : 837.2 Mean :401.2 3rd Qu.:138.0 3rd Qu.:1851 3rd Qu.: 950.8 3rd Qu.:456.2 Max. :198.0 Max. :2654 Max. :1299.0 Max. :646.0 kms PetrolPrice VanKilled law Min. : 7685 Min. :0.08118 Min. : 2.000 Min. :0.0000 1st Qu.:12685 1st Qu.:0.09258 1st Qu.: 6.000 1st Qu.:0.0000 Median :14987 Median :0.10448 Median : 8.000 Median :0.0000 Mean :14994 Mean :0.10362 Mean : 9.057 Mean :0.1198 3rd Qu.:17203 3rd Qu.:0.11406 3rd Qu.:12.000 3rd Qu.:0.0000 Max. :21626 Max. :0.13303 Max. :17.000 Max. :1.0000 date Min. :1969 1st Qu.:1973 Median :1977 Mean :1977 3rd Qu.:1981 Max. :1985 Since each column is numeric, R shows a five number summary (minimum, first quartile, median, third quartile, maximum) and mean for each column. We learn, for example, that the average number of drivers killed per month is 1670, and that the greatest price of petrol was £0.13 per litre! Let’s view a histogram of DriversKilled: hist(Seatbelts$DriversKilled, breaks=20) Figure 5.1: Histogram of Drivers Killed in Seatbelt data We see that in some months, more than 150 drivers were killed! We can calculate how many exactly like so: sum(Seatbelts$DriversKilled > 150) [1] 33 To investigate the effect of the seat belt law, let’s create a scatter plot Drivers killed against time: plot(Seatbelts$date, Seatbelts$DriversKilled) Figure 5.2: UK Seatbelt deaths vs time By adding a col argument to the plot function, we can color the points based on whether the law was in effect: plot(Seatbelts$date, Seatbelts$DriversKilled, col=(Seatbelts$law+2)) Figure 5.3: UK Seatbelt deaths vs time, red = no seatbelt law, green = seatbelt law There do appear to be fewer deaths, but there is so much fluctuation in deaths each year that it’s difficult to tell. Let’s change the x-axis to reflect month of the year instead of date: plot((Seatbelts$date %% 1) * 12 + 1, Seatbelts$DriversKilled, xlab = "Month", col=Seatbelts$law + 2) Figure 5.4: UK Driver Deaths vs. Month This plot shows that there is a clear seasonal effect in the number of deaths with higher deaths occurring in the Fall/Winter compared to Spring/Summer. We can also see that within each month, the traffic deaths after enacting the Seatbelt law are among the lowest. Another data set included in R is mtcars. Following the example above, find the dimension of mtcars and have R print out a summary of each column, then create a scatter plot of fuel economy (mpg) to engine displacement. What do you observe about the relationship between these two variables? This concludes the quick example. In the rest of this chapter, we’ll talk in more detail about the different steps of working with data, and how to complete them using R! People often use data in order to answer questions, but often times, learning about data can generate even more questions than it answers. Take a moment to think of a question that you have about the Seatbelts dataset. Do you think the question can be answered using the data alone? If not, what other sources of data might be available which can help answer the question? Any feedback for this section? Click here "],
+["loading-saving-data.html", "5.2 Loading / Saving Data", " 5.2 Loading / Saving Data This section is still under construction 5.2.1 “Taster”(?) list of file forms and sources 5.2.2 Data Format Talk about what columns and rows mean (long vs. wide? probably not…) 5.2.3 reading/writing csv 5.2.4 Best practices principles of Tidy Data (wickham 2014) raw data as read only Give me some feedback about the content in this section. "],
+["downloading-and-saving.html", "5.3 Downloading and Saving", " 5.3 Downloading and Saving This section is still under construction downloading and saving example csv from canvas or website Give me some feedback about the content in this section. "],
+["working-with-data-1.html", "5.4 Working With Data", " 5.4 Working With Data This section is still under construction 5.4.1 Basic indexing matrices: can use single index nested indexing: [[1]][3] 5.4.2 Advanced indexing dplyr example Give me some feedback about the content in this section. 5.4.3 Summarizing vectors Mean, std, etc. 5.4.4 Summarizing matrices 5.4.5 Summarizing vectors 5.4.6 Basic Plotting ggplot example Remember to pay close attention to the syntax when using ggplot. R won’t work if parenthesis or quotes are not paired. If you’re adding multiple layers, put each on a new line. Adding structure to make your code more readable is like doing your future self a favor (and anyone reading your code). "],
+["practice.html", "5.5 Practice", " 5.5 Practice This section is still under construction asdf "],
["performing-data-analysis.html", "Chapter 6 Performing Data Analysis", " Chapter 6 Performing Data Analysis This section is still under construction "],
["basic-control-flow.html", "6.1 Basic Control Flow", " 6.1 Basic Control Flow This section is still under construction If/else loops (for, while) switch Give me some feedback about the content in this section. "],
["advanced-control-flow.html", "6.2 Advanced Control Flow", " 6.2 Advanced Control Flow This section is still under construction *apply family Give me some feedback about the content in this section. "],
diff --git a/docs/src/style.css b/docs/src/style.css
index a6a7d5d..b0b6ec8 100644
--- a/docs/src/style.css
+++ b/docs/src/style.css
@@ -33,7 +33,13 @@ pre {
.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre.output-style,
.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre.output-style,
.book .book-body .page-wrapper .page-inner section.normal pre.output-style{
- /*background: #d1fff8;*/
background: #fffbcf;
}
+
+/* Set background of code error messages */
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre.error-style,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre.error-style,
+.book .book-body .page-wrapper .page-inner section.normal pre.error-style{
+ background: #ffd4d4;
+}
\ No newline at end of file
diff --git a/docs/successfull-installation.html b/docs/successfull-installation.html
index d534312..7a90f42 100644
--- a/docs/successfull-installation.html
+++ b/docs/successfull-installation.html
@@ -222,21 +222,23 @@
diff --git a/docs/working-with-data-1.html b/docs/working-with-data-1.html
index 6a31979..a314a09 100644
--- a/docs/working-with-data-1.html
+++ b/docs/working-with-data-1.html
@@ -4,11 +4,11 @@
- 5.3 Working With Data | R Module 1
+ 5.4 Working With Data | R Module 1
-
+
@@ -16,7 +16,7 @@
-
+
@@ -222,21 +222,23 @@
-This section is still under construction
+This chapter is still under construction
+
In this Chapter, we’ll use R to do interesting things with data.
+This is by far the most popular use of the R programming language, and arguably the most fun!
+You’ll learn how to load data sets into R, do interesting things with them, and save your results.
You might also want to clear the R console, which you can do by placing your cursor in the R console and typing <control> l (careful! that’s a lowercase L).
+
Here's a [more complete list](https://support.rstudio.com/hc/en-us/articles/200711853) of RStudio shortcuts.
Any feedback for this section? Click here
diff --git a/docs/wrap-up.html b/docs/wrap-up.html
index 36f1d67..e324192 100644
--- a/docs/wrap-up.html
+++ b/docs/wrap-up.html
@@ -222,21 +222,23 @@
diff --git a/index.Rmd b/index.Rmd
index e92c915..a374899 100644
--- a/index.Rmd
+++ b/index.Rmd
@@ -22,6 +22,12 @@ biblio-style: apalike
# class.output="output-style" Use the CSS style "output-style" to style output of code chunks
knitr::opts_chunk$set(fig.align="center", comment=NA, results="hold", class.source="chunk-style", class.output="output-style")
+knitr::knit_hooks$set(
+ error = function(x, options){
+ paste0('\n
', x, '
')
+ }
+)
+
# A recent change in RMarkdown required this change.
diff --git a/rscripts/create_feedback_blocks.R b/rscripts/create_feedback_blocks.R
index 314605c..c501314 100644
--- a/rscripts/create_feedback_blocks.R
+++ b/rscripts/create_feedback_blocks.R
@@ -16,7 +16,8 @@ section_headings <- c(
"Programming Preliminaries",
"Data Types",
"Data Structures",
- "R Objects"
+ "R Objects",
+ "Quick Example"
)
link_prefix <- "https://docs.google.com/forms/d/e/1FAIpQLSePQZ3lIaCIPo9J2owXImHZ_9wBEgTo21A0s-A1ty28u4yfvw/viewform?entry.1684471501="
diff --git a/src/01-CoursePreliminaries.Rmd b/src/01-CoursePreliminaries.Rmd
index c73a0e1..0cf3dea 100644
--- a/src/01-CoursePreliminaries.Rmd
+++ b/src/01-CoursePreliminaries.Rmd
@@ -78,7 +78,7 @@ First, some important details:
- __Instructor__: [Alex Fout](mailto:fout@colostate.edu)
-- __Office Hours__: TBD on Google Meet (look for email invite)
+- __Office Hours__: Held via Google Meet (look for email invite), schedule on Canvas.
- __Webpages__: [Canvas](https://canvas.colostate.edu), [this textbook](https://csu-r.github.io/Module1/)
@@ -153,7 +153,7 @@ In a browser, navigate to [rdrr.io/snippets](https://rdrr.io/snippets/){target="
knitr::include_graphics("src/images/rdrr.png")
```
-The box comes with some code entered already, but we want to use our own code instead, so delete all the text, from `library(ggplot2)` to `factor(cyl))`.
+The box comes with some code entered already, but we want to use our own code instead, so delete all the text, from before `library(ggplot2)` to after `factor(cyl))`.
In its place, type `1+1`, then click the big green "Run" button.
You should see the `[1] 2` displayed below.
So if you give R a math expression, it will evaluate it and give the result.
diff --git a/src/02-InstallingR.Rmd b/src/02-InstallingR.Rmd
index 249b068..2bc2a8c 100644
--- a/src/02-InstallingR.Rmd
+++ b/src/02-InstallingR.Rmd
@@ -82,7 +82,7 @@ From this point, this will change depending on your operating system.
#### Windows
- Click "Download R for Windows", then click "base".
-- Finally, Click "Download R X.Y.Z for Windows", where X, Y, and Z will be numbers. These numbers indicate which version of R you'll be installing. As of the publishing of this book, R is on version `r_version`.
+- Finally, Click "Download R X.Y.Z for Windows", where X, Y, and Z will be numbers. These numbers indicate which version of R you'll be installing. As of the publishing of this book, R is on version `r r_version`.
- Your computer might prompt for the location on your computer that you would like to save the file. Select a location (reasonable options are your `Downloads` folder or the `Desktop`) and select "save".
- When the download completes, find the downloaded file in the File Explorer and double click to run it. This will start the installation process.
- Follow the on screen prompts. For the most part you can click "next" and "install" as appropriate, and you don't have to worry about changing any installation settings.
@@ -95,7 +95,7 @@ From this point, this will change depending on your operating system.
#### Mac OS X
- Click "Download R for (Mac) OS X"
-- Click "R-X.Y.Z.pkg", where X, Y, and Z will be numbers. These numbers indicate which version of R you'll be installing. As of the publishing of this book, R is on version `r_version`.
+- Click "R-X.Y.Z.pkg", where X, Y, and Z will be numbers. These numbers indicate which version of R you'll be installing. As of the publishing of this book, R is on version `r r_version`.
- Your computer might prompt for the location on your computer that you would like to save the file. Select a location and select "save".
- When the download completes, find the downloaded file in the Finder and double click to run it. This will start the installation process.
- Follow the on screen prompts. For the most part you can click "continue", "agree", "install", as appropriate, and you don't have to worry about changing any installation settings.
@@ -175,7 +175,7 @@ Now that you're somewhat familiar with RStudio, let's run the same code as we ra
### The R Console:
-In the _R console_, type `1+1` and press enter.
+In the _R console_, type `1+1` and press `enter`.
The output in the console should look like the following:
```{r, fig.cap="code in the console", echo=F}
@@ -193,22 +193,22 @@ If you haven't already, create a new R script by clicking on the `New File` icon
knitr::include_graphics("src/images/rstudio_newfile.png")
```
-In the script window which opens, type `1+1` and press enter.
+In the script window which opens, type `1+1` and press `enter`.
Notice how now, the code did _not_ run?
In a script, you are free to write R code on several lines before you run it.
You can even save the script and load it later in order to run the code it contains.
There are multiple ways to run R code in a script.
To run a single line of code, do one of the following:
-- Place the cursor on the desired line, hold the `` key, and press enter. On Mac OS X, hold `` key and press enter instead
-- Place the curse on the desired line and click the `Run` button that looks like this:
+- Place the cursor on the desired line, hold the `` key, and press `enter`. On Mac OS X, hold `` key and press `return` instead
+- Place the cursor on the desired line and click the `Run` button that looks like this:
```{r, fig.cap="code in the console", echo=F}
knitr::include_graphics("src/images/rstudio_run.png")
```
To run multiple lines of code, do one of the following:
-- Highlight all the code you'd like to run, hold the `` key, and press enter. On Mac OS X, hold the `` key and press enter instead.
+- Highlight all the code you'd like to run, hold the `` key, and press `enter`. On Mac OS X, hold the `` key and press `return` instead.
- Highlight all the code you'd like to run, and click the `Run` button.
Run the `1+1` code using one of the methods above, and observe the output.
@@ -395,7 +395,11 @@ rm(list=ls()) # Clear everything in your workspace
gc() # perform garbage collection
```
-You might also want to clear the R console, which you can do by placing your cursor in the R console and typing ` l`.
+You might also want to clear the R console, which you can do by placing your cursor in the R console and typing ` l` (careful! that's a lowercase L).
+
+```{bonus}
+Here's a [more complete list](https://support.rstudio.com/hc/en-us/articles/200711853) of RStudio shortcuts.
+```
```{block, type='feedback'}
Any feedback for this section? Click [here](https://docs.google.com/forms/d/e/1FAIpQLSePQZ3lIaCIPo9J2owXImHZ_9wBEgTo21A0s-A1ty28u4yfvw/viewform?entry.1684471501=Workspace%20Setup)
diff --git a/src/04-RProgrammingFundamentals.Rmd b/src/04-RProgrammingFundamentals.Rmd
index a9411f4..584e20d 100644
--- a/src/04-RProgrammingFundamentals.Rmd
+++ b/src/04-RProgrammingFundamentals.Rmd
@@ -18,7 +18,7 @@ When two humans communicate with a language, they both must agree on the the rul
R also has rules that must be followed in order for a human ( _you_ ) to communicate with a computer, in order to tell the computer what to do.
In human language, grammar is often fluid and evolving, and two people may have to adapt their use of the language in order to communicate.
-With R, the fules are fixed, and the computer "knows" them perfectly.
+With R, the rules are fixed, and the computer "knows" them perfectly.
It is up to you to learn the rules in order to make the computer do exactly what you want it to do.
Since any computer programming language will do exactly what you tell it to do,
@@ -58,7 +58,7 @@ x <- 1+1; print(x); print(x^2)
In this example, three commands are given on one line.
The first command creates a new _variable_ called `x`, the second command prints the value of `x`, and the third command prints the value of `x` _squared_.
We see that the semicolon, `;`, serves as the command _termination_, because it tells R where one command ends and another begins.
-When a line contains a single command, no semicolon in necessary at the end, but including a semicolon doesn't have any effect either.
+When a line contains a single command, no semicolon is necessary at the end, but including a semicolon doesn't have any effect either.
```{r, class.source="chunk-style"}
print("This line doesn't have a semicolon")
@@ -72,7 +72,7 @@ Including multiple semicolons (e.g. `print("hello");;`) does not work!
```{block, type="bonus"}
You've just seen your first example of _assignment_.
That is, we created a thing called `x` , and _assigned_ to it the value of `1+1` using the _assignment operator_, `<-`.
-Formally `x` is called an object, but we'll talk about that more objects and assignments later.
+Formally `x` is called an object, but we'll talk more about objects and assignment later.
```
@@ -119,9 +119,9 @@ At this point there is no practical difference, each command gets executed wheth
However, code grouping will become very important later on, when we discuss _control flow_ later.
```{block, type="bonus"}
-There are several helpful shortcuts that you can use in R. If you forget to put quotes around something, you can highlight and press the quote key and it will add quotes to both sides. This works with parenthesis too.
+There are several helpful shortcuts that you can use in R. If you forget to put quotes around something, you can highlight and press the quote key and it will add quotes to both sides. This works with parentheses too.
-You can also use tab completion with functions and defined variables. Tab completion allows you to use the same amount of time using a longer, descriptive variable name as a short, meaningless, and easily confused one. This can save you a lot of time and reduce mistakes!
+You can also use tab completion with functions and defined variables. Tab completion allows you to use longer, more descriptive variable names without the additional typing time. This can save you a lot of time and reduce mistakes!
```
```{block, type="progress"}
@@ -230,7 +230,11 @@ As we've seen, symbols and characters have specific meaning in R. You must be ca
Okay, now that we've covered some of the basics, it's time to start learning how to do useful things in R!
The next few sections will describe the different types of data that R can handle.
+```{block2, type="reflect"}
+
+A common mistake is forgetting to put things in quotes. Try highlighting a word then pressing the quote key. What would you expect to happen and what actually happens? Try again with parenthesis. Do you think this shortcut makes it easier to write code in R?
+```
```{block, type='feedback'}
Any feedback for this section? Click [here](https://docs.google.com/forms/d/e/1FAIpQLSePQZ3lIaCIPo9J2owXImHZ_9wBEgTo21A0s-A1ty28u4yfvw/viewform?entry.1684471501=Programming%20Preliminaries)
```
@@ -328,7 +332,7 @@ However, viewing all of the digits stored by R can be distracting and hard to re
You can show just some of the digits by using the `round` function:
```{r}
-a
+a <- 0.123456
```
```{r}
@@ -400,7 +404,9 @@ Notice also that we didn't have to change the defining quotes to be double quote
The backslash is called the _escape character_, and it signifies that what follows it should be interpreted _literally_ by R, and any special meaning should be ignored.
```{block, type="bonus"}
-Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another escape character, like so: `g <- "here is a backslash: \\"`
+Since backslash also has special meaning itself, if you want a backslash in your string, you need to use another backslash, which functions as an escape character, like so: `g <- "here is a backslash: \\\\"`. You will see both backslashes when using the `print` function (which is meant for any data type), but if you use the special `cat` function (which is meant for character types specifically), all escape characters will be "processed", and you will see just a single backslash.
+
+Try the same thing with the newline character, `\\n`!
To see a list of special characters, try typing `?Quotes` into the R console
```
@@ -1136,13 +1142,13 @@ is.character(a)
is.numeric(a)
```
+
```{block, type="bonus"}
As we've seen, type conversion is sometimes performed automatically, specifically when using the combine function (`c`).
To understand more about this, try typing `?c` to bring up the documentation, and have a look at the "Details" section.
```
-TODO: include conversion b/t character vec and factor vec.
### Matrices
@@ -1265,7 +1271,7 @@ Matrices have two dimensions (row and column), and arrays can have any number of
We won't discuss arrays in this course much.
```
-```{block, type="progress"}
+```{block2, type="progress"}
Try running the following code in R, which should produce an error message:
```{r, eval=F}
data <- c(4.5, 6.1, 3.3, 2.0)
@@ -1416,7 +1422,7 @@ rocks
```{block, type="caution"}
When defining the `rocks` list, we've spread the command accross multiple lines for clarity.
-The commas at the end of some of the lines indicate that the list has more components, so R will continue reading the next line until it finds the closing parenthesis, `'`.
+The commas at the end of some of the lines indicate that the list has more components, so R will continue reading the next line until it finds the closing parenthesis, `)`.
```
There are so many sets of data that fit into this pattern, that R has a special data type called a _data frame_, which we will discuss in the next section.
@@ -1512,7 +1518,7 @@ names(rocks)[[1]] <- "rock type"
rocks
```
-```{block, type="caution"}
+```{block2, type="caution"}
Row and column names are allowed to have spaces in them, but you must be careful how you access them.
The following code will not work: `rocks$rock type` , because R will stop looking for the name you are referencing once it encounters a space.
To access this column, you must enclose the reference in "backticks" ( `` ` `` ) like so: ``rocks$`rock type` ``.
@@ -1667,7 +1673,7 @@ Aside from these two attributes, you can list all attributes of an object like t
attributes(mtcars)
```
-To access a specific attribute of an object, you can use the dollar sign (`$`):
+To access a specific attribute of an object, you can do this:
```{r}
attr(mtcars, "class") # get the class attribute for the mtcars dataframe
```
diff --git a/src/05-WorkingWithData.Rmd b/src/05-WorkingWithData.Rmd
index fb2dbf3..fefc865 100644
--- a/src/05-WorkingWithData.Rmd
+++ b/src/05-WorkingWithData.Rmd
@@ -1,30 +1,155 @@
# Working with Data
+Congratulations! You're all set up and ready to work with data. That's what we're all here for, right?
+
+There's a lot to take in while you're getting started, so don't worry if you need to look up the command names or documentation.
+
+It's important to make good habits when you're starting out, but keep in mind that R is constantly being updated and new packages come out frequently. There are also many ways to get to the same place sometimes, so feel free to get your feet dirty experimenting!
+
```{block, type="caution"}
-This section is still under construction
+This chapter is still under construction
+```
+
+In this Chapter, we'll use R to do interesting things with _data_.
+This is by far the most popular use of the R programming language, and arguably the most fun!
+You'll learn how to load data sets into R, do interesting things with them, and save your results.
+
+## Quick Example
+
+Before diving into detail, let's do a quick example so you can begin to see what is possible with data in R.
+As we mentioned in the last chapter, R includes some pre-packaged data sets, which you can access with the `data()` command.
+One of the data sets is `Seatbelts`, which documents road casualties in Great Britain between 1969 and 1984.
+Firstly, we need to convert `Seatbelts` to a dataframe, because it starts out as a "Time-Series", which we haven't discussed.
+```{r}
+Seatbelts <- data.frame(as.matrix(Seatbelts), date=time(Seatbelts))
+```
+
+We've also added a month and year column
+
+look at the dimensions of the data set with the `dim` command:
+```{r}
+dim(Seatbelts) # get the number of dimensions in the Seatbelts dataframe
+```
+This shows that there are 192 rows (months), and 9 columns (variables measured each month).
+We could also determine the number of rows and columns separately using the `nrow` and `ncol` functions.
+To view the first few rows of the Seatbelts dataframe, use the `head` command:
+
+```{r}
+head(Seatbelts) # view first few rows of the Seatbelts dataset
```
+This is a good way to learn which variables are being measured (columns) and see some example observations (rows) for each variable.
+Because these data are included with R, more information about each variable can be found with:
+
+```{r, eval=F}
+?Seatbelts
+```
+
+Next, let's view a summary of each column with the summary function:
+
+```{r}
+summary(Seatbelts)
+```
+
+Since each column is numeric, R shows a five number summary (minimum, first quartile, median, third quartile, maximum) and mean for each column.
+We learn, for example, that the average number of drivers killed per month is 1670, and that the greatest price of petrol was £0.13 per litre!
+Let's view a histogram of `DriversKilled`:
+
+```{r, fig.cap="Histogram of Drivers Killed in Seatbelt data"}
+hist(Seatbelts$DriversKilled, breaks=20)
+```
+We see that in some months, more than 150 drivers were killed!
+We can calculate how many exactly like so:
+
+```{r}
+sum(Seatbelts$DriversKilled > 150)
+```
+To investigate the effect of the seat belt law, let's create a scatter plot Drivers killed against time:
+```{r, fig.keep='all', fig.cap="UK Seatbelt deaths vs time"}
+plot(Seatbelts$date, Seatbelts$DriversKilled)
+```
+By adding a `col` argument to the `plot` function, we can color the points based on whether the law was in effect:
+
+```{r, fig.keep='all', fig.cap="UK Seatbelt deaths vs time, red = no seatbelt law, green = seatbelt law"}
+plot(Seatbelts$date, Seatbelts$DriversKilled, col=(Seatbelts$law+2))
+```
+
+There do appear to be fewer deaths, but there is so much fluctuation in deaths each year that it's difficult to tell.
+Let's change the x-axis to reflect month of the year instead of date:
+
+```{r, fig.cap="UK Driver Deaths vs. Month"}
+plot((Seatbelts$date %% 1) * 12 + 1, Seatbelts$DriversKilled,
+ xlab = "Month", col=Seatbelts$law + 2)
+```
+This plot shows that there is a clear seasonal effect in the number of deaths with higher deaths occurring in the Fall/Winter compared to Spring/Summer.
+We can also see that within each month, the traffic deaths after enacting the Seatbelt law are among the lowest.
+
+```{block, type="progress"}
+Another data set included in R is `mtcars`. Following the example above, find the dimension of `mtcars` and have R print out a summary of each column, then create a scatter plot of fuel economy (`mpg`) to engine displacement.
+What do you observe about the relationship between these two variables?
+```
+
+This concludes the quick example.
+In the rest of this chapter, we'll talk in more detail about the different steps of working with data, and how to complete them using R!
+
+```{block, type="reflect"}
+People often use data in order to answer questions, but often times, learning about data can generate even _more_ questions than it answers.
+Take a moment to think of a question that you have about the `Seatbelts` dataset.
+Do you think the question can be answered using the data alone?
+If not, what other sources of data might be available which can help answer the question?
+```
+
+```{block, type='feedback'}
+Any feedback for this section? Click [here](https://docs.google.com/forms/d/e/1FAIpQLSePQZ3lIaCIPo9J2owXImHZ_9wBEgTo21A0s-A1ty28u4yfvw/viewform?entry.1684471501=Quick%20Example)
+```
+
## Loading / Saving Data
+
```{block, type="caution"}
This section is still under construction
```
+
### "Taster"(?) list of file forms and sources
+### Data Format
+
+Talk about what columns and rows mean
+
+(long vs. wide? probably not...)
+
### reading/writing csv
+Let's import some data about animal crossing from the tidytuesday github repo. We'll use the package readr here. The read_csv() function from the readr package is is an example of a command that is faster/better than the similar read.csv() function in base R.
+
+```{r, message = FALSE}
+library(tidyverse) # this loads all of the packages in the tidyverse
+villagers <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/villagers.csv')
+# `readr::` tells us this command is from the readr package. It's not necessary if there is no conflicting command (ex. another package has a command called `read_csv()` but here it gives us context. Using read_csv() will work just as well.
+```
+
+Okay, so we have an object named `villagers` now, but how do we see what's inside?
+
+(Would you rather have this is this section or "Practice" or as an assignment? -liz)
+
+```{r}
+str(villagers)
+```
+
### Best practices
principles of Tidy Data (wickham 2014)
-raw data as read only
+raw data as read only
```{block, type="feedback"}
Give me some [feedback](https://forms.gle/2Fw8y4aY51ijdmyE9) about the content in this section.
```
-## Downloading and Saving
+## Downloading and Saving
+
+
```{block, type="caution"}
This section is still under construction
diff --git a/src/style.css b/src/style.css
index a6a7d5d..b0b6ec8 100644
--- a/src/style.css
+++ b/src/style.css
@@ -33,7 +33,13 @@ pre {
.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre.output-style,
.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre.output-style,
.book .book-body .page-wrapper .page-inner section.normal pre.output-style{
- /*background: #d1fff8;*/
background: #fffbcf;
}
+
+/* Set background of code error messages */
+.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre.error-style,
+.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre.error-style,
+.book .book-body .page-wrapper .page-inner section.normal pre.error-style{
+ background: #ffd4d4;
+}
\ No newline at end of file