Skip to content

Introduction to Data Wrangling using R and tidyverse

  • Level(s) of Study: Professional / Short course
  • Start Date(s): Wednesday 17 April to Thursday 18 April, 2024
  • Duration: 2 days, 9:30 am – 5:30 pm
  • Study Mode(s): Short course
  • Campus: City Campus
  • Entry Requirements:
    More information

Introduction:

On this two-day course, you will gain a comprehensive practical introduction to data wrangling using R. In particular, we focus on tools provided by R's `tidyverse`, including `dplyr`, `tidyr`, `purrr`, etc. Data wrangling is the art of taking raw and messy data and formatting and cleaning it so that data analysis and visualization etc may be performed on it. Done poorly, it can be a time consuming, labourious, and error-prone. Fortunately, the tools provided by R's `tidyverse` allow us to do data wrangling in a fast, efficient, and high-level manner, which can have dramatic consequences for the ease and speed with which we analyse data.

This course is aimed at anyone who is involved in real world data analysis, where the raw data is messy and complex. Data analysis of this kind is practiced widely throughout academic scientific research, as well as widely throughout the public and private sectors.

Level: CPD, Advanced / Professional

The course will cover these key topics:

  • Reading in data into R using tools such as readr and readxl
  • Wrangling with the powerful `dplyr` R package, focusing on filtering observations, selecting and modifying variables, and other major data manipulation operations
  • Summarising data in `dplyr` using descriptive statistics
  • Merging and joining independent data frames
  • Pivoting and reshaping data using the `tidyr` R package

The course will take 6 contact hours per day plus two 1-hour breaks.

The sessions will be as follows:

  • Session 1: 9:30am-11:30am;
  • Session 2: 12:30am-2:30pm;
  • Session 3: 3:30pm-17:30pm

Tutor Profile: Mark Andrews is an Associate Professor at Nottingham Trent University whose research and teaching is focused on statistical methodology in research in the social and biological sciences. He is the author of 2021 textbook on data science using R that is aimed at scientific researchers, and has a forthcoming new textbook on statistics and data science that is aimed at undergraduates in science courses. His background is in computational cognitive science and mathematical psychology.

Other available online CPD courses in this series include

Introduction to statistics using R and Rstudio CPD course

Introduction to Data Visualization with R using ggplot

Introduction to Generalized Linear Models in R

Introduction to Multilevel (hierarchical, or mixed effects) Models in R

Introduction to Bayesian Data Analysis with R

Any questions?  Contact kelly.smith@ntu.ac.uk, Commercial Manager, School of Social Sciences.

The course tutor was fantastic at explaining everything, the pace was just right, and the content was exactly what I was expecting and more. I will definitely be using all of the techniques covered in the course in my own data analysis.

What you’ll study

During the course you’ll:

  • Gain a comprehensive practical introduction to data wrangling using R and its complementary tools and interrelated packages, such as tidyverse, dplyr, tidyr, and purr
  • Discover how to read data of different types into R, and cover in detail all the dplyr tools such as, select, filter, and mutate
  • Learn how to use pipe operator (%>%) to create data wrangling pipelines that take raw messy data on the one end and return cleaned tidy data at the other
  • Discover how to perform descriptive or summary statistics on data using dplyr's summarise and group_by functionalities
  • Learn how to combine data frames, including concatenating all data files in a folder and use SQL operations to merge information in different data frames.
  • develop an understanding of how to "pivot" data from a "wide" to "long" format and back using tidyr's pivot_longer and pivot_wider

What will I gain?  

By the end of the course, you’ll be able to read messy and unstructured data into R and apply the principles of data wrangling to convert these datasets into optimally structured formats.  These data wrangling techniques will help with expediting data analysis tasks in a fast, efficiently robust, and to a high-level.

  • On completion of at least 80% of the course, you’ll receive a certificate of attendance.

Where you'll learn

The course is delivered through interactive online workshops via Zoom. It will be practical, hands-on, and workshop based. There will be some brief lecture style presentations throughout, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Throughout the course, and we will use real-world data sets and coding examples.

Staff Profiles

Mark Andrews - Associate Professor

School of Social Sciences

Mark Andrews

Campus and facilities

Entry requirements

This course is aimed at anyone who is interested in using R for data science or statistics, such as researchers or analysts studying for/ have already studied a PhD in a field of science that involves extensive statistical analysis.

For this module, familiarity with R is assumed, however, a comprehensive introduction to R is taught in the first module, Introduction to statistics using R and Rstudio.

Getting in touch

If you need more help or information, get in touch through our enquiry form

Fees and funding

The fee for this course is £360 (VAT Inclusive) - £300 (VAT Exclusive)

Payment is due at the time of booking.

How to apply

You can book your place via the NTU online store:

Book your spot here.

For queries, please contact kelly.smith@ntu.ac.uk.