Introduction to Data Wrangling using R and tidyverse
- Level(s) of Study: Short course
- Start Date(s): 26 April 2023
- Duration: Wednesday to Thursday 9.30 am - 5.30 pm
- Study Mode(s): Short course
- Campus: City Campus
-
Entry Requirements:
More information
Introduction:
On this two-day course, you will gain a comprehensive practical introduction to data wrangling using R. In particular, we focus on tools provided by R's `tidyverse`, including `dplyr`, `tidyr`, `purrr`, etc. Data wrangling is the art of taking raw and messy data and formating and cleaning it so that data analysis and visualization etc may be performed on it. Done poorly, it can be a time consuming, labourious, and error-prone. Fortunately, the tools provided by R's `tidyverse` allow us to do data wrangling in a fast, efficient, and high-level manner, which can have dramatic consequence for ease and speed with which we analyse data.
This course is aimed at anyone who is involved in real world data analysis, where the raw data is messy and complex. Data analysis of this kind is practiced widely throughout academic scientific research, as well as widely throughout the public and private sectors.
Level: CPD, Advanced / Professional
The course will cover these key topics:
- Reading in data into R using tools such as readr and readxl
- Wrangling with the powerful `dplyr` R package, focusing on filtering observations, selecting and modifying variables, and other major data manipulation operations
- Summarising data in `dplyr` using descriptive statistics
- Merging and joining data independent data frames
- Pivoting and reshaping data using the `tidyr` R package
The course will take 6 contact hours per day plus two 2-hour breaks.
The sessions will be as follows:
- Session 1: 9:30am-11:30am;
- Session 2: 12:30am-2:30pm;
- Session 3: 3:30pm-17:30pm
Tutor Profile: Mark Andrews is an Associate Professor at Nottingham Trent University whose research and teaching is focused on statistical methodology in research in the social and biological sciences. He is the author of 2021 textbook on data science using R that is aimed at scientific researchers, and has a forthcoming new textbook on statistics and data science that is aimed at undergraduates in science courses. His background is in computational cognitive science and mathematical psychology.
Other available online CPD courses in this series include
Introduction to statistics using R and Rstudio CPD course
Introduction to Data Visualization with R using ggplot
Introduction to Generalized Linear Models in R
Introduction to Multilevel (hierarchical, or mixed effects) Models in R
Introduction to Bayesian Data Analysis with R
Any questions? Contact kelly.smith@ntu.ac.uk, Commercial Manager, School of Social Sciences.
The course tutor was fantastic at explaining everything, the pace was just right, and the content was exactly what I was expecting and more. I will definitely be using all of the techniques covered in the course in my own data analysis.
What you’ll study
During the course you’ll:
- Gain a comprehensive practical introduction to data wrangling using R and its complementary tools and interrelated packages, such as tidyverse, dplyr, tidyr, and purr
- Discover how to read data of different types into R, and cover in detail all the dplyr tools such as, select, filter, and mutate
- Learn how to use pipe operator (%>%) to create data wrangling pipelines that take raw messy data on the one end and return cleaned tidy data at the other
- Discover how to perform descriptive or summary statistics on data using dplyr's summarise and group_by functionalities
- Learn how to combine data frames, including concatenating all data files in a folder and use SQL operations to merge information in different data frames.
- develop an understanding of how to "pivot" data from a "wide" to "long" format and back using tidyr's pivot_longer and pivot_wider
What will I gain?
By the end of the course, you’ll be able to read messy and unstructured data into R and apply the principles of data wrangling to convert these datasets into optimally structured formats. These data wrangling techniques will help with expediting data analysis tasks in a fast, efficiently robust, and to a high-level.
- On completion of at least 80% of the course, you’ll receive a certificate of attendance.
Where you'll learn
The course is delivered through interactive online workshops via Zoom. It will be practical, hands-on, and workshop based. There will be some brief lecture style presentations throughout, i.e., using slides or blackboard, to introduce and explain key concepts and theories. Throughout the course, and we will use real-world data sets and coding examples.
Campus and facilities
Fees and funding
The fee for this course is £360 (VAT Inclusive) - £300 (VAT Exclusive)
Payment is due at the time of booking.
The fee for this course is £360 (VAT Inclusive) - £300 (VAT Exclusive)
Payment is due at the time of booking.
How to apply
You can book your place via the NTU online store:
You can book your place via the NTU online store: