### Introduction to R notes.

*Rachael McBride*

*30 September 2015*

## R - the basics

### Interactive console

Unlike some languages, R has an interactive console. This allows you to try out your code before you execute it as a script. It is particularly useful for exploratory data analysis i.e. when you first get a data set and are trying to understand its properties and characteristics.

`10 + 5`

`## [1] 15`

### Primitive data types and variables

Primitive data types are the basic building blocks of a programming language. We will examine numerical, textual and boolean types here. We also introduce variables. A variable is a symbolic name associated with a value, and whose associated value may be changed.

#### Numerical:

```
a = 10
b = 5.1
```

Above, the value 10 is known as a *int* or *integer*. We have assigned the value 10 to a variable named `a`

. The above decimal 5.1 is known as a *float*. We have assigned this value to a variable named `b`

.

We can now use the variables `a`

and `b`

to perform a number of operations or calculations:

`a + b`

`## [1] 15.1`

`a * b`

`## [1] 51`

`a / b`

`## [1] 1.960784`

#### Textual:

Text-based values are store in primitive data types known as strings.

`"learning R"`

`## [1] "learning R"`

```
test_string_1 = "learning R"
test_string_1
```

`## [1] "learning R"`

Here, we have assigned the text “learning R” to the string variable `test_string_1`

.

#### Boolean:

The boolean primitive data type can have one of two values - `TRUE`

or `FALSE`

.

```
test_boolean_1 = TRUE
test_boolean_2 = T
test_boolean_3 = FALSE
test_boolean_4 = F
```

For booleans in R, `T`

can be used interchangeably with `TRUE`

, and vice versa for `F`

and `FALSE`

.

`test_boolean_1`

`## [1] TRUE`

`test_boolean_2`

`## [1] TRUE`

`test_boolean_3`

`## [1] FALSE`

`test_boolean_4`

`## [1] FALSE`

### Variable names

In R, variable names:

Are case sensitive e.g. variable ‘a’ is not the same as variable ‘A’

Cannot begin with a number e.g. a variable called ‘1a’ is not accepted by R, but a variable called ‘a1’ is.

### Calling an built-in function

Lets try R’s built-in function for calculating the square root of a number

`sqrt(9)`

`## [1] 3`

`sqrt(25)`

`## [1] 5`

We can assign the results of the function to variables that we can use later

```
sqrt_of_9 = sqrt(9)
sqrt_of_25 = sqrt(25)
```

`sqrt_of_9`

`## [1] 3`

`sqrt_of_25`

`## [1] 5`

## Finding your way around R

**Q.** How do I find out more about my current session in R?

**A.** Try:

` sessionInfo()`

This is useful to know when installing libraries, as not all libraries are available for all version numbers.

**Q.** How do I see what libraries are already installed?

**A.** Try:

` library()`

**Q.** How do I find out more about about a library, a topic etc.?

**A.** Use R’s help system. If you know the specific name:

` help("mean")`

or

` ?mean`

If you have an idea of what you are looking for, but not quite sure what it is called, try:

` help.search("cluster")`

or

` ??clust`

This is similar to running a fuzzy matching search.

**Q.** Ok, the ‘cluster’ library looks interesting. How do I use it?

**A.** Use `library()`

to load the library of interest. For example:

`library("cluster") `

**Q.** I would like to use a package or library that is currently not installed on my computer. How can I install it?

**A.** Use `install.packages("name_of_package", dependencies = T)`

. See `?install.packages`

for more.

**Q.** I want to play around with R more. Are there any test data sets I can use?

**A.** R comes with test data sets. See `data()`

for more. For example, to use data on the survival of passengers in the Titanic, `data(Titanic)`

## Primary data structures in R

The primary data structures are:

vector

dataframe

matrix

list

### The vector

#### Create a vector

Allows you to store a collection of elements.

```
transport = c("car", "bus", "plane")
lotto = c(7, 22, 32, 34, 40, 42)
```

#### Add names to the elements of the vector

To add names to an existing vector:

`names(transport) = c("road", "bus lane", "plane")`

To check:

`transport`

```
## road bus lane plane
## "car" "bus" "plane"
```

`names(transport)`

`## [1] "road" "bus lane" "plane"`

To add names as you create a vector:

```
ages = c("Ann" = 2, "Barry" = 4, "Bosco" = 7)
ages
```

```
## Ann Barry Bosco
## 2 4 7
```

`names(ages)`

`## [1] "Ann" "Barry" "Bosco"`

#### Get the length of a vector(or other objects)

Use `length()`

. **Note:** This can be used on a number of different objects.

`length(transport)`

`## [1] 3`

`length(ages)`

`## [1] 3`

#### Exercise:

Peter, Bob and Jill each have a number of jelly babies. Peter has 6, Bob has 8 and Jill has 10. Each person eats 2 of their jelly babies. They then discover another packet of jelly babies behind the couch with 6 jelly babies in it. They decide to share it: Peter gets 4, Bob gets 2 less than Peter and Jill gets 2 less than Bob. Use vectors in R to calculate how much each person has in the end

Create a vector, called ‘start’, that reflect how many each person starts out with.

Create a vector called ‘eats’, that represents how many jelly babies each person eats. (Hint: Try

`?rep`

)Create a vector called ‘gets’, that represents how much jelly babies from the new bag each person gets. (Hint: Try

`?seq`

)Subtract ‘eat’ from ‘start’ and add ‘gets’

#### Access elements of the vector

**By condition:** Access elements by filtering on a particular condition.

`lotto`

`## [1] 7 22 32 34 40 42`

`lotto > 33`

`## [1] FALSE FALSE FALSE TRUE TRUE TRUE`

`lotto[lotto > 33]`

`## [1] 34 40 42`

**By location:** Access elements by location.

`lotto[1]`

`## [1] 7`

`lotto[2:3]`

`## [1] 22 32`

`lotto[3:length(lotto)]`

`## [1] 32 34 40 42`

`lotto[length(lotto):4]`

`## [1] 42 40 34`

#### Concatenate vectors

To join two or more vectors together:

```
a = c(1, 2, 3, 4)
b = c(5, 6, 7, 8)
together = c(a, b)
together
```

`## [1] 1 2 3 4 5 6 7 8`

### The list

A data container that can store different types of data structures at the same time.

#### Create a list

```
age = 3
allergies = TRUE
friends = c("Joe", "Tyler", "Nina")
child_1 = list("age" = age, "allergies" = allergies , "friends" = friends)
child_2 = list("age" = 2, "allergies" = FALSE, friends = "James", "note" = "Will not eat fish fingers")
```

`child_1`

```
## $age
## [1] 3
##
## $allergies
## [1] TRUE
##
## $friends
## [1] "Joe" "Tyler" "Nina"
```

`child_2`

```
## $age
## [1] 2
##
## $allergies
## [1] FALSE
##
## $friends
## [1] "James"
##
## $note
## [1] "Will not eat fish fingers"
```

#### Add to a list i.e. create a list of lists

`children = list("Ann" = child_1, "Tomasz" = child_2)`

`children`

```
## $Ann
## $Ann$age
## [1] 3
##
## $Ann$allergies
## [1] TRUE
##
## $Ann$friends
## [1] "Joe" "Tyler" "Nina"
##
##
## $Tomasz
## $Tomasz$age
## [1] 2
##
## $Tomasz$allergies
## [1] FALSE
##
## $Tomasz$friends
## [1] "James"
##
## $Tomasz$note
## [1] "Will not eat fish fingers"
```

#### Accessing elements in a list

**By name:**

`names(child_1)`

`## [1] "age" "allergies" "friends"`

`child_1[["age"]]`

`## [1] 3`

`child_1$age`

`## [1] 3`

`names(children)`

`## [1] "Ann" "Tomasz"`

`children[["Tomasz"]]`

```
## $age
## [1] 2
##
## $allergies
## [1] FALSE
##
## $friends
## [1] "James"
##
## $note
## [1] "Will not eat fish fingers"
```

`children$"Tomasz"`

```
## $age
## [1] 2
##
## $allergies
## [1] FALSE
##
## $friends
## [1] "James"
##
## $note
## [1] "Will not eat fish fingers"
```

`children$Tomasz$note`

`## [1] "Will not eat fish fingers"`

### The dataframe

A ‘table’ of data.

#### Create a dataframe

Use `data.frame()`

```
dframe = data.frame(transport, ages)
dframe
```

```
## transport ages
## road car 2
## bus lane bus 4
## plane plane 7
```

#### Explore

Get summary statistics, row and column names, dimensions and the first 5 rows of the dataframe

`summary(dframe)`

```
## transport ages
## bus :1 Min. :2.000
## car :1 1st Qu.:3.000
## plane:1 Median :4.000
## Mean :4.333
## 3rd Qu.:5.500
## Max. :7.000
```

`rownames(dframe)`

`## [1] "road" "bus lane" "plane"`

`colnames(dframe)`

`## [1] "transport" "ages"`

`dim(dframe)`

`## [1] 3 2`

`head(dframe)`

```
## transport ages
## road car 2
## bus lane bus 4
## plane plane 7
```

#### Access particular entries in a dataframe

You can access paticular entries in a dataframe by specifying names and or locations of the row(s) and columns of interest.

**By location:**

`dframe[, 2] # Access the 2nd column`

`## [1] 2 4 7`

`dframe[3, ] # Access the 3rd row`

```
## transport ages
## plane plane 7
```

`dframe[3, 2] # Access the value in the third row, end column`

`## [1] 7`

`dframe[1:2,] # Access the first 2 rows`

```
## transport ages
## road car 2
## bus lane bus 4
```

**By name:**

```
dframe[,'ages'] # Access the 'ages' column
dframe['plane',] # Access the 'plane' row
```

**Or a mixture of both:**

`dframe[2:3, 'ages'] # Access the 2nd and 3rd values in the 'age' column`

`## [1] 4 7`

**Note:** To access values in a data frame, remember `[row, column]`

or ‘RC’

Note that the row returned is in the form of a list

```
result = dframe[3,]
result
```

```
## transport ages
## plane plane 7
```

`typeof(result)`

`## [1] "list"`

`names(result)`

`## [1] "transport" "ages"`

`result$ages`

`## [1] 7`

#### Add another column

The data frame `dframe`

contains ages. Lets add the names of the people for these ages.

`ages`

```
## Ann Barry Bosco
## 2 4 7
```

Extract the names associated with each age:

`names(ages)`

`## [1] "Ann" "Barry" "Bosco"`

Add the column to `dframe`

using the `cbind()`

:

```
dframe = cbind(dframe, names(ages))
head(dframe)
```

```
## transport ages names(ages)
## road car 2 Ann
## bus lane bus 4 Barry
## plane plane 7 Bosco
```

#### Re-name the columns

Use `colnames()`

. Row names can be renamed in a similar fashion using `rownames()`

`colnames(dframe) = c("modes of transport", "ages", "names")`

### The matrix

#### Create a matrix

`m = matrix(data = seq(1, 8), nrow = 2, ncol = 4, byrow = T)`

See `?matrix()`

for more details.

#### Give it row names and column names

In two separate steps:

```
rownames(m) = c("row1", "row2")
colnames(m) = c("column1", "column2", "column3", "column4")
```

Or in one single step:

```
dimnames(m) = list(c("row1", "row2"),
c("column1", "column2", "column3", "column4"))
```

#### Accessing particular values in a matrix

Similar to accessing particular values in a data frame

Recall, `m`

is:

`m`

```
## column1 column2 column3 column4
## row1 1 2 3 4
## row2 5 6 7 8
```

Get the contents of the row named *“row1”*

`m["row1",]`

```
## column1 column2 column3 column4
## 1 2 3 4
```

Get the contents of the column named *“column2”*

`m[, "column2"]`

```
## row1 row2
## 2 6
```

Get the contents of the cell in row *“row1”*, column *“column2”*

`m["row1", "column2"]`

`## [1] 2`

Similarly, location can be used.

Get the contents of the first column:

`m[1,]`

```
## column1 column2 column3 column4
## 1 2 3 4
```

Get the contents of the second column:

`m[,2]`

```
## row1 row2
## 2 6
```

Get the contents of the cell in the first row and second column

`m[1, 2]`

`## [1] 2`

## Data I/O

There are many ways to get data in and out of R.

### R objects

#### To save:

To save one object:

`save(dframe, file = "dframe_output.RData")`

To save multiple R objects to a single file:

`save(list = c("dframe", "children"), file = "dframe_and_children_output.RData")`

#### To load:

First remove traces of existing versions from your R session:

```
rm(list = c("dframe", "children")) # Remove the existing variables
ls() # Verify that they have been successfully removed
```

```
## [1] "a" "age" "ages" "allergies"
## [5] "b" "child_1" "child_2" "friends"
## [9] "lotto" "m" "result" "sqrt_of_25"
## [13] "sqrt_of_9" "test_boolean_1" "test_boolean_2" "test_boolean_3"
## [17] "test_boolean_4" "test_string_1" "together" "transport"
```

Next read in the fresh variables

```
load("dframe_and_children_output.RData") # Load the variables back into R
ls() # Verify that they have been loaded successfully
```

```
## [1] "a" "age" "ages" "allergies"
## [5] "b" "child_1" "child_2" "children"
## [9] "dframe" "friends" "lotto" "m"
## [13] "result" "sqrt_of_25" "sqrt_of_9" "test_boolean_1"
## [17] "test_boolean_2" "test_boolean_3" "test_boolean_4" "test_string_1"
## [21] "together" "transport"
```

### Text files

Vector and data frame contents can be stored and read in from text files. For this, R has a collection of built-in functions for text files of various formats.

#### Writing contents to a text file

The base function is `write.table()`

. Its help file provides a detailed description of the different function arguments available to you. The other data input functions in this help file are variants of `write.table()`

with different default argument values.

`?write.table()`

**Tab-delimited files:**

To write out to a tab-delimited file with column names and rownames:

```
file_contents = write.table(x = dframe,
file = "dframe-tab_delim.txt",
quote = F,
sep = "\t")
```

**Comma-separated files: **To read in a comma-separated file with column names and row names:

```
file_contents = write.csv(x = dframe,
file = "dframe-comma_separated.csv",
quote = F)
```

#### Reading in contents from a text file:

The base function is `read.table()`

. Its help file provides a detailed description of the different function arguments available to you. The other data input functions in this help file are variants of `read.table()`

with different default argument values.

`?read.table()`

**Tab-delimited files:**

To read in a tab-delimited file with column names and rownames:

```
file_contents = read.table(file = "dframe-tab_delim.txt",
header = T,
sep = "\t")
```

**Comma-separated files: **To read in a comma-separated file with column names and row names:

```
file_contents = read.csv(file = "dframe-comma_separated.csv",
row.names = 1)
```

### Reading from a database:

R has libraries that allow to a number of databases, allowing you to read data from and write data to a variety of databases. Below is a list of types of databases and R libraries that can be used to connect to them.

**MySQL:**RMySQL**Microsoft SQL:**RODBC**PostgreSQL:**RPostgreSQL**MongoDB:**RMongo, rmongodb

Please see individual libraries for more.

## Control structures

Control structures allow you to implement different code depending on a given condition of a variable or parameter

### If…else:

Allows you to execute a piece of code if, and only if, a given condition is met. Otherwise, another piece of code is executed.

In other words,

*if (condition 1 is TRUE) {*

*then execute this piece of code….*

*} else {*

*execute this code instead….*

*}*

For example,

```
current_bank_balance = 10
```if(current_bank_balance > 0){

print(paste("You have E", current_bank_balance, " in your account", sep = "")) # Execute if current_bank_balance is less than or equal to zero
}else{
print("Oh-oh! You are out of money") # Otherwise, execute this
}

`## [1] "You have E10 in your account"`

Re-run the above, varying the value of `current_bank_balance`

### ifelse:

`ifelse`

is a more concise version of `if...else`

, but its use may decrease the readability of your code.

```
ifelse(current_bank_balance > 0,
paste("You have E", current_bank_balance, " in your account", sep = ""),
"Oh-oh! You are out of money")
```

`## [1] "You have E10 in your account"`

### The switch function:

`if...else`

is useful when there are two scenarios or cases to consider. If there are more than two scenarios or cases, consider using ‘switch’.

In other words,

*switch (input,*

*“case1” = return_value_for_case1,*

*“case2” = return_value_for_case2,*

*“case3” = return_value_for_case2,*

*…..)*

For example,

```
animal = "horse"
type = switch(animal, "horse" = "mammal", "snake" = "reptile", "trout" = "fish")
print(type)
```

`## [1] "mammal"`

Re-run for an animal type of snake and trout

#### Exercise

A bank has 3 different account types: current, savings_1 and savings_2 account. Each account has the following characteristics:

*Current*: current interest rate = 0.05%*Savings_1*: current interest rate = 1.2%*Savings_2*: current interest rate = 2%

Create a switch statement that will return the correct rate for each bank account type.

## Loops

Loops are a way to re-run code until a given condition is met. Two types of loops in R are ‘for’ and ‘while’

### For loop:

Repeats a portion of code for each element in a vector

In other words,

*for (each_element in a_vector) {*

*execute this code*

*}*

For example,

```
# Print out each element of the transport vector
transport = c("car", "bus", "plane")
names(transport) = c("road", "bus lane", "plane")
```for(each in transport){
print("====================") # Acts as a visual separator
print(each)
}

```
## [1] "===================="
## [1] "car"
## [1] "===================="
## [1] "bus"
## [1] "===================="
## [1] "plane"
```

`for`

loops can also be used to keep track of the index or location of a vector, as in the following example:

```
# Print out each element of the transport vector
```for(index in 1:length(transport)){
print("====================") # Acts as a visual separator
print(paste("Element ", index, ": ", transport[index], sep = ""))
}

```
## [1] "===================="
## [1] "Element 1: car"
## [1] "===================="
## [1] "Element 2: bus"
## [1] "===================="
## [1] "Element 3: plane"
```

### While loop:

Continues to execute a portion of code as along as a given condition is met. When the condition is no longer met, the loop is exited.

In other words,

*while (condition is met) {*

*execute this code*

*}*

As example,

```
count = 0
```while(count < 10){
print("====================") # Acts as a visual separator
print(count) # Print the current value of count
count = count + 2 # Increment count by 2
}

```
## [1] "===================="
## [1] 0
## [1] "===================="
## [1] 2
## [1] "===================="
## [1] 4
## [1] "===================="
## [1] 6
## [1] "===================="
## [1] 8
```

Another example:

```
remainder = TRUE # Initialise the starting condition
count = 0 # Initialise a counter
```while(remainder){
print("====================") # Acts as a visual separator
print(count) # Print the current value of count
count = count + 1
if(count %% 6){
remainder = FALSE
print("No remainder.Exiting loop...")
}
}

```
## [1] "===================="
## [1] 0
## [1] "No remainder.Exiting loop..."
```

## Functions

Functions allow you to re-use code. Functions are useful in situations where your code will perform the same operations repeatedly during the execution of your code.

Functions can take inputs and return outputs

An example of a function without specified inputs or outputs:

```
if_finished = function(){ # No specified inputs
print("Complete!") # No specified outputs. Rather a message is printed to a console
}
```if_finished()

`## [1] "Complete!"`

An example of a function with a specified input and output:

```
get_squared_root = function(value)
{
# Return the square root of given value
result = value^0.5
return(result)
}
answer = get_squared_root(9) # input is 9. Function output captured by 'answer'
answer
```

`## [1] 3`

**Note:** If there is no `return()`

at the end of the function, the last value of the function is returned instead.

This allows us to re-write `get_squared_root()`

more succinctly:

```
get_squared_root = function(value)
{
# Return the square root of given value
value^0.5
}
answer = get_squared_root(9) # input is 9. Function output captured by 'answer'
answer
```

`## [1] 3`

Returning multiple results from a function: Combine multiple results into one variable such as a vector or list and return that one variable

```
get_squared_root_and_squared_values = function(value)
{
# Return the square root and squared value of given value
squared_root = value^0.5
squared = value*value
return(c("sqrt" = squared_root, "squared" = squared))
}
answer = get_squared_root_and_squared_values(9) # input is 9. Function output captured by 'answer'
answer
```

```
## sqrt squared
## 3 81
```

`answer["sqrt"]`

```
## sqrt
## 3
```

`answer["squared"]`

```
## squared
## 81
```

End of Session 1