Explore UCD

UCD Home >

Getting Started with R

Getting Started with R

A tutorial about data analysis using R (Website Version)

AUTHOR
AFFILIATION

Jon Yearsley

School of Biology and Environmental Science, UCD

PUBLISHED

January 1, 2024


How to Read this Tutorial

This tutorial is a mixture of R code chunks and explanations of the code. The R code chunks will appear in boxes.

Below is an example of a chunk of R code:

# This is a chunk of R code. All text after a # symbol is a comment
# Set working directory using setwd() function
setwd('Enter the path to my working directory')

# Clear all variables in R's memory
rm(list=ls())    # Standard code to clear R's memory

Sometimes the output from running this R code will be displayed after the chunk of code.

Here is a chunk of code followed by the R output

2 + 4            # Use R to add two numbers
[1] 6

What is R?

Ris powerful, open source software for data analysis. It has become the software of choice for analyzing data in research, industry, public bodies and beyond. R has a homepage at(opens in a new window)https://www.r-project.orgwith links to the software download site and R tutorials.

The internet is your friend: There are many sites on the internet offering discussion forums and tutorials in R.

Some good internet resources are:

What is RStudio?

We will always be using R from withinRStudio.

RStudio is a program ((opens in a new window)https://www.rstudio.com) that provides a user friendly environment for R. RStudio will automatically start R (if R is installed on your computer). The basic RStudio desktop is free, works well, and simplifies many data analysis tasks with R.

(opens in a new window)Video Tutorial:A brief introduction to RStudio (2 mins)

When you open RStudio for the very first time you should go to the global setting (Tools -> Global Options) and:

  1. Set “Save workspace to .RData on exit” to Never.
  2. Uncheck the option “Restore .Rdata in workspace at startup

Restore .Rdata in workspace at startup

R’s console window

When you start RStudio (or R) for the first time you will see a window with a welcome message and the command prompt (>).

R’s console window

Figure 1: What you will see when you first open RStudio. The R console is on the left-hand side. RStudio looks very similar on all platforms.

The command prompt,

Thecommand promptis a>symbol in R’sconsole window. This prompt is R’s way of saying “I’m waiting for your instructions to tell me what to do”.

Typing commands at a command prompt is called acommand line interface.

When you start to use R theconsole windowwill have the following text

R is free software and comes with ABSOLUTELY NO WARRANTY.   
You are welcome to redistribute it under certain conditions.   
Type 'license()' or 'licence()' for distribution details.   
   
  Natural language support but running in an English locale   
   
R is a collaborative project with many contributors.   
Type 'contributors()' for more information and   
'citation()' on how to cite R or R packages in publications.   

Type 'demo()' for some demos, 'help()' for on-line help, or   
'help.start()' for an HTML browser interface to help.   
Type 'q()' to quit R.   
   
>    

And there at the bottom left is the command prompt,>, waiting for you to send your first command.

R’s command prompt works like this:

  1. You type a command at the command prompt and press the return key
  2. R interprets your command
  3. If R can understand the command then
    • R performs the calculations that the command specifies
    • R prints the result of its calculation in the console window underneath your original command
  4. If R cannot understand the command then you’ll receive an red error message

Why bother typing?

You are probably asking:

“Why are you asking me to use a program that requires typing the commands? Why don’t you teach a more user-friendly program with buttons to press and menus?”

the command line interface approach has three big advantages. It makes your data analysis:

  1. easily repeatable
  2. easily documented
  3. easily shareable

Being able to repeat, document and share your work is a big deal in both science and in the commercial world.

These three advantages are all captured by theR script

R Scripts

Typing commands at the command prompt can quickly become tiring and error prone. R scripts are the solution.

An R script is a text filethat contains

  • a set of R commands (these are intended for the computer to interpret)
  • accompanyingcomments(these are intended for humans to read)

The commands in an R script can be sent to R’s command prompt to be executed all at once.

The comments in an R script briefly explain the commands, explain the broader goal of the R script (e.g. logical sections like chapters of a book) and add structure to the script that makes it more readable.

An R script allows you to collaborate and communicate your data analysis methods, because it makes it easy to share, repeat and modify. For most of this module we will be using R scripts to share R commands.

To make it easier to execute an R script we will be using R with theRStudio software((opens in a new window)https://www.rstudio.com).

Your first R script

Every R script should start with a short header which explains what the R script does, who wrote the script and when it was written.

Below is a header template you can use at the start of your own R scripts.

# ********** Start of header **************
# Title: <The title of your R script> 
#
# Add a short description of the R script here.
#
# Author: <your name>  (email address)
# Date: <today's date>
#
# *********** End of header ****************

# Two common commands at the start of an R script are:
rm(list=ls())         # Clear R's memory

setwd('~/DataModule') # Set the working directory 
# Replace '~/DataModule' with the name of your own directory

# ******************************************
# Write your commands below. 
# Remember to use comments to explain your commands

After the header you can start typing your commands.

Here is alink to an example R script, showing the use of a header followed by R commands and annotated with comments. The R script generates an image of a Mandlebrot fractalhttp://www.ucd.ie/ecomodel/Resources/Sheet1_RScriptExample.R

Since R scripts contain only text they can be edited in any text editor (for example, notepad [PCs], textedit [macs] or gedit, nano [Linux]).

(opens in a new window)Video iconVideo Tutorial:Creating a new R script with RStudio (1 min)

RStudio makes it very easy to send the commands in an R script into the R console. Here is a video explaining how this is done.

(opens in a new window)Video iconVideo Tutorial:Sending R commands into the console (2 mins)

The # symbol

The hash symbol,#, symbol allows you to annotate a R script by add a comment after the#.

Annotating your R scripts is important because:

  • it allows you to document what you are doing
  • it makes it easier for you to return to your work
  • it allows you to easily share your R scripts with others

All text appearing after the hash symbol is ignored by R. Text after the#is foryounot R. The#can appear at the start of a line or it can appear after a command.

The R script template above is a good example of using annotation in an R script.

Practise using this important symbol by properly annotating your R scripts and seeing good practice in how others annotate their scripts.

RStudio allows comments to become section headings. The video below explains the basics of creating sections in an R script.

(opens in a new window)Video iconVideo Tutorial:Inserting sections into an R script using RStudio (3 mins)

The working directory

R’s working directory is the directory on your computer where R will look for data files and save files. When you start R it will use its default for the working directory. Type thegetwd()command at the command prompt to find out which directory this is.

getwd()          # Display the current working directory

The working directory can be changed using thesetwd()command.

Recommended Practice
The first command in an R script should set the working directory using thesetwd()command.

Note:If you are using RStudio then you can use theSession -> Set Working Directory -> Choose Directory …menu set the working directory. This will send asetwd()command to the R console that you can copy into your R script.

(opens in a new window)Video iconVideo Tutorial:Setting the working directory using RStudio (2 mins)

Help within R

R has built-in help pages. You can access help by typing?followed by the name of the command.

?rm              # Display the help page for the rm command
?setwd           # Display the help page for the setwd command

Installing add-on packages

One powerful aspect of R is the ability to add functionality by loadingpackages. These packages must be installed and loaded before the functionality can be used in R. These are not loaded automatically because there are too many possible packages that are available to use.

Many of these packages are open source and written by academic researchers that are specialists in their field. A list of packages is maintained on the R website:(opens in a new window)https://cran.r-project.org/web/packages/

We will install three packages:ggplot2,reshape2andtidyr

# Install the plotting package ggplot2, 
# reshape2 and tidyr (requires internet access)
install.packages('ggplot2')
install.packages('reshape2')
install.packages('tidyr')

Once a package has been installed on your computer it must be loaded each time you want to use it, using thelibrary()command.

# Load the ggplot2 package 
# so that its functions are available for use
library('ggplot2')

The webpage for theggplot2package is(opens in a new window)https://ggplot2.tidyverse.org/

R the calculator (operators)

Artithmetic operators

R can do arithmetic. For example, addition is coded in R by theoperator symbol+

Try some examples…

2+3              # 2 plus 3 (addition operator)
2-3              # 2 minus 3 (subtraction operator)
2*3              # 2 multiplied by 3 (multiplication operator)
2/3              # 2 divided by 3 (division operator)
2^3              # 2 to the power of 3 (power operator)

R has other kinds of operators for other types of calculations.

Here are some examples:

Relational operators

Symbol Definition
== equals to (note the double equals sign)
!= not equals to
> greater than
>= greater than or equal to
< less than
<= less than or equal to

Logical operators

Symbol Definition
! logical NOT
& logical AND
| logical OR
# ****************************************************
# Examples of logical calculations -------------------

2 == 3                # Is 2 equal to 3?
[1] FALSE
2 < 3                 # Is 2 less than 3?
[1] TRUE

Here are some more logical statements for you to try

2 != 3                # Is 2 not equal to 3?
'A' == 'B'            # Is 'A' equal to 'B'?
'A' == 'a'            # Is 'A' equal to 'a'?

The continuation prompt, ‘+’

If we send only part of a command to R it will recognise that the command is incomplete by displaying thecontinuation prompt(the symbol+). This prompt means R is waiting for the rest of the command.

For example, try typing1*2*3*and then pressingEnter(Return). Notice that this multiplcation is incomplete. R will display the continuation prompt.

what to do if you see the continuation prompt:

  1. You continue to type the rest of a command
  2. Press the Esc key to remove the continuation prompt and return to the main command prompt.

Mathematical functions

Try some examples…

# ****************************************************
# Mathematical functions -----------------------------

cos(2/3)              # cosine of 2/3
exp(2)                # exponential function of 2
log10(2)              # logarithm (base 10) of 2
log(2)                # logarithm (base e) of 2
sqrt(2)               # square-root of 2
2^0.5                 # square-root of 2
3^2                   # 3 to the power of 2
round(5/3, digits=3)  # round 5/3 to 3 decimal places  
signif(5/3, digits=3) # round 5/3 to 3 significant figures  
floor(5/3)            # round 5/3 to the largest integer less than 5/3  
abs(-1.4)             # absolute value, ignore the minus sign

Large and small numbers (exponents)

R uses anexponent notationto display very large and very small numbers. Here is how R reports 3 million

3000000
[1] 3e+06

Thee+06is short-hand for 10+06. In a scientific report3e+06should be written inscientific notationas 3x106.

The e in e+06 stands for exponent. It can be upper or lower case.

Very small numbers will have a negative exponent. For example,

3.5e-9

in scientific notation this number is written 3.5x10-9

Variables and arrays

The result of a calculation can be saved to a variable. A variable can have almost any name. Below we save the result of 2+3 to a variable called ‘a’

a = 2 + 3        # Save output to a variable called 'a'
a                # Display the value of variable 'a'
[1] 5

The variable can then be used in future calculations. For example,

a / 10          # Use the variable 'a' in a calculation
[1] 0.5

Note that variables can also be assigned using the<-operator instead of the=operator.

For example,

a <- 2 + 3       # Assign a variable called 'a' using <- operator
a                # Display the value of variable 'a'
[1] 5

You will see both=and<-operators used in textbooks.

Arrays of numbers

A variable that contains several numbers is called anarray.

You can find more information on working with arrays athttp://www.ucd.ie/ecomodel/Resources/Manipulate_Data_WebVersion.html

Creating an array

An array can be created using thec()function that combines several numbers (this is known as theconcatenate function)

b = c(1,2,3,5,7) # An array of the first 5 prime numbers
b                # Display the value of variable 'b'
[1] 1 2 3 5 7

An array of whole numbers can be quickly created using a colon (:)

c(5:15)         # An array of whole numbers from 5 to 15
 [1]  5  6  7  8  9 10 11 12 13 14 15

Types of variables

Number (num)

A typical number (a real number) is given the data typenum(num standard for numerical). A real number isquantitative data.

You can see the data type of a variable using thestr()function (this function displays the structure of the variable).

x1 = 2.45          # Numerical (R's data type='num') 
str(x1)
 num 2.45

Integer (int)

A whole number (an integer) can be given the data typenum, or it can be explicitly distinguished as a whole number using the data typeint(int stands for integer). A whole number isquantitative data.

You can see the data type of a variable using thestr()function (this function displays the structure of the variable).

x2 = 5             # Numerical whole number (R's data type='num') 
str(x2)
 num 5
x3 = as.integer(5) # Numerical whole number (R's data type='int') 
str(x3)
 int 5

Character (chr)

Text (either a single letter/number or a series of letters/numbers) is given the data typechr(chr standard for character). A character isqualitative data(see the section labelled ‘Factor’).

You can see the data type of a variable using thestr()function (this function displays the structure of the variable).

x4 = 's'           # Character (R's data type='chr')
str(x4)
 chr "s"
x5 = 'hello'       # Character string (R's data type='chr')
str(x5)
 chr "hello"

Logical (logi)

Logical variables (i.e. variables that can only take the valuesTRUEorFALSE) are given the data typelogi(logi standard for logical).

You can see the data type of a variable using thestr()function (this function displays the structure of the variable).

x6 = TRUE          # Logical (TRUE/FALSE) (R's data type='logi')
str(x6)
 logi TRUE
x7 = NA            # A missing value (R explicitly recognises missing data)
str(x7)
 logi NA


Factor (Factor)

A factor is aqualitative variablethat forms a list of names. Qualitative variables are given the data typeFactor.

You can see the data type of a variable using thestr()function (this function displays the structure of the variable).

 # An array of place names as a factor
x8 = as.factor(c('Dublin','Cork','Galway')) 
str(x8)
 Factor w/ 3 levels "Cork","Dublin",..: 2 1 3

Factors will be important when we start to analyse data because there is an important distinction between quantitative data and qualitative data.

Missing data

Missing data are data points points that could not be recorded for some reason. Missing data is an important type of data that R explicitly recognizes. R uses the valueNAto represent missing data. Missing data should not be set to a value (e.g. 0) because this can be misinterpreted as being the value zero!

Missing data should be included in data sets and explicitly represented. For example, if we had failed to record the number 5 in a data set of whole numbers it would be represented as

# Explicitly record missing data as NA
k = c(1,2,3,4,NA,6,7)    

R must account for missing data when performing calculations. Often this means that R must remove missing data before performing a calculation. Many of R’s statistical functions have an argumentna.rm=TRUEwhich tells R to remove missing values before performing a calculation.

mean(k, na.rm=T)   # Calculate mean after removing missing data
[1] 3.833333

In an ideal world missing data should not occur.

How would you try to avoid the following common reasons for missing data?
1. lost laboratory notebooks
2. bad handwriting
3. corrupted data files

Logical variables

A logical variable can take one of two values,TRUEorFALSE. Logical variables are very useful for manipulating data.

Single logical expressions

Here are some examples of logical expressions (using some of the operators from above) and their logical output

2.5 > 1          # Is 2.5 greater than 1?
[1] TRUE
-1  <= 3         # Is -1 less than or equal to 3?
[1] TRUE

Some more for you to try…

5 == 2           # Is 5 equal to 2 (NOTE: logical equals is ==)?
5 != 2           # Is 5 not equal to 2?

Logical calculations can be performed on variables. In this example we use the variablebfrom above,

b != 5           # Is each element of b not equal to 5 
[1]  TRUE  TRUE  TRUE FALSE  TRUE

Combining logical expressions

Logical expressions can be combined. There are two basic operations for this:

  • the AND function, represented by the symbol &
  • the OR function, represented by the symbol |

An example of an AND statement

# Combine two logical expressions using & (logical AND)

(b!=5) & (b>2)   # b not equal to 5 AND b greater than 2
[1] FALSE FALSE  TRUE FALSE  TRUE

An example of an OR statement

# Combine two logical expressions using | (logical OR)

(b!=5) | (b>2)   # b not equal to 5 OR b greater than 2
[1] TRUE TRUE TRUE TRUE TRUE

Closing R

Recommended practice before closing R

  1. Make sure you save all R scripts you have been working on
  2. Save important variables using thesave()command (this command will be covered in the data import tutorial)
  3. When you close R you will be asked if you want tosave the workspace. This is rarely useful.

Summary of topics

  • R and the R command prompt
  • R scripts and how to annotate them using comments
  • R’s working directory and how to set its value
  • Obtaining help in R
  • Using R as an advanced calculator
  • Exponential notation for large and small numbers
  • Variables
  • Missing data
  • R packages

Further Reading

All these books can be found in UCD’s library

  • Andrew P. Beckerman and Owen L. Petchey, 2012Getting Started with R: An introduction for biologists(Oxford University Press, Oxford) [Chapter 1, 2]
  • Mark Gardner, 2012Statistics for Ecologists Using R and Excel(Pelagic, Exeter) [Chapter 3]
  • Michael J. Crawley, 2015Statistics : an introduction using R(John Wiley & Sons, Chichester) [Appendix]
  • Tenko Raykov and George A Marcoulides, 2013Basic statistics: an introduction with R(Rowman and Littlefield, Plymouth)
  • John Verzani, 2005Using R for introductory statistics(Chapman and Hall, London) [Chapter 1]

Ecological Modelling

University College Dublin, Belfield, Dublin 4, Ireland.
T: +353 1 716 7777 |