A Crash Course in Data Handling with R
Overview
This Learning Enhancement project has been funded through the HEA and the National Forum for the Enhancement of Teaching and Learning.
MODULE TITLE: | A Crash Course in Data Handling with R |
---|---|
MODULE COORDINATOR: | Dr Adam Kane |
MODULE CODE: | BIOL30030, BIOL40360, ZOOL40500 |
STUDENT COHORT: | Third & Fourth Year Undergraduate Students, UCD School of Biology and Environmental Sciences |
COLLABORATOR(S): | Willson Gaul and Jon Yearsley |
Background
The R statistical programming language is commonly used in many scientific fields, including ecology, biology, and environmental sciences, and is used in multiple modules within the School of Biology & Environmental Sciences (including ZOOL40500, BIOL30030, BIOL40390, ENVB40370, ZOOL40490, BIOL40550). R is also a great tool for visualising your data. Quantitative skills are an important transferable skill being looked for by employers, and R skills are an important qualification for students proceeding to graduate degrees in biology and environmental sciences.
An introduction to R is taught in some SBES modules (e.g. BIOL30030) while other modules (e.g. ZOOL40500) assume that students already have some R skills. Many late stage SBES modules progress rapidly to focus on data analysis, statistics, and subject content. Students who do not grasp R fundamentals quickly can fall behind in coursework, especially if they have not figured out how to effectively use the wide range of online help forums and other online resources. In the recent offering of ZOOL40500, the module coordinator found that some students who had previous R experience but had not used R in many months had difficulty during practical sessions because they did not remember how to do basic preparatory tasks in R, such as installing packages and importing data. We therefore designed this module in conjunction with and as a complementary support to existing SBES modules that use R. Our aim was that this “crash course” would help students learn or re-familiarize themselves with basic R tasks in preparation for practical work in other modules.
Goals
The main objective of the workshop was to provide additional support to students in overcoming the technical barriers to using R so that they could engage with the statistical and data analysis content of other modules they were taking. Specifically, we aimed to:
- Identify the steps of the data importing and data cleaning process that students are stuck on.
- Teach solutions to the technical challenges identified.
- Teach students how to effectively search for and read help forums and other online resources when troubleshooting coding issues.
The Innovative Approach
Before the workshop, we asked students to work through an R-based task and email to us a one-sentence description of where they became stuck and unable to move forward on their own. This was to encourage students to clearly articulate what they were trying to do and what problems they had.
The workshop focused on breaking tasks down into small, sequential steps. We focused on clearly stating each task in normal "human" language before attempting to write the task in "computer" language.
Workshops used a "live code-along" format in which the instructor typed a line of code, explained what it did, and then each student typed the same line of code and ran it on their own computer. As the workshop progressed, the instructor transitioned from typing complete lines of code to typing only portions of the necessary code, and then asking students to complete the code themselves.
In response to COVID-19 restrictions, we transitioned to hosting the workshops on Zoom. We prepared a series of three instruction documents that laid out step-by-step lines of code for completing common tasks. These instruction documents are now available online.
We did "live coding" workshops on Zoom in which the instructor ran and explained each line of code from the instruction documents. Then, we divided students into Zoom breakout rooms which were each supervised by a collaborator, and the students typed and ran the same lines of code themselves by following the instruction documents, asking questions as needed.
Results
We delivered:
- 2 in-person workshops to 21 attendees in 2020
- 5 live workshops online via Zoom to more than 35 attendees in 2020-2021
We created three step-by-step instruction documents covering:
- Setting up the R working environment
- Loading data
- Thinking like a computer, sub-setting data, and doing logical tests in R
The pdf instruction documents, as well as Rmarkdown files and example datasets for creating the instruction documents were placed in a publicly available online GitHub repository.
We identified the practice tests from the BIOL30030 "Working with biological data" module as good motivators for students. Based on the pre-workshop questions that we sent to students, questions asked during the R workshops, and statistics from Brightspace, we learned that students were taking the practice tests multiple times to practice performing those tasks in R. Given students motivation to take the practice tests multiple times, it is worth diversifying the practice tests to include other important R skills.
The Zoom sessions were more difficult to deliver than in-person sessions, though there were a few minor benefits to using Zoom. Minor benefits of Zoom were the ease of sharing code using the "chat" box, and the ease of recording the lecture portions of the workshop to share with students who had scheduling conflicts. The biggest obstacle when using Zoom was that it was not possible for the instructor to share their screen and demonstrate code at the same time that students were typing code in their own R scripts. This made the Zoom sessions less interactive and more of a lecture format.
Despite these challenges the workshops received very positive overall feedback from students, one of whom said;
Thank you for hosting the workshops. I learned a lot last week from the help I received.