Introduction to the Tidyverse course by Johns Hopkins University
Introduction to the Tidyverse free videos and free material uploaded by Johns Hopkins University Staff .
Tidy Data
Before we can discuss all the ways in which R makes it easy to work with tidy data, we have to first be sure we know what tidy data are Tidy datasets, by design, are easier to manipulate, model, and visualize because the tidy data principles that we’ll discuss in this course impose a general framework and a consistent set of rules on data In fact, a well-known quote from Hadley Wickham is that “tidy datasets are all alike but every messy dataset is messy in its own way” Utilizing a consistent tidy data format allows for tools to be built that work well within this framework, ultimately simplifying the data wrangling, visualization, and analysis processes By starting with data that are already in a tidy format or by spending the time at the beginning of a project to get data into a tidy format, the remaining steps of your data science project will be easier
From Non-Tidy –> Tidy
The reason it’s important to discuss what tidy data are an what they look like is because out in the world, most data are untidy If you are not the one entering the data but are instead handed the data from someone else to do a project, more often than not, those data will be untidy Untidy data are often referred to simply as messy data. In order to work with these data easily, you’ll have to get them into a tidy data format This means you’ll have to fully recognize untidy data and understand how to get data into a tidy format The following common problems seen in messy datasets again come from Hadley Wickham’s paper on tidy data (http://vita.had.co.nz/papers/tidy-data.pdf). After briefly reviewing what each common problem is, we will then take a look at a few messy datasets We’ll finally touch on the concepts of tidying untidy data, but we won’t actually do any practice yet That’s coming soon!
The Data Science Life Cycle & Tidyverse Ecosystem
With a solid understanding of tidy data and how tidy data fit into the data science life cycle, we’ll take a bit of time to introduce you to the tidyverse and tidyverse-adjacent packages that we’ll be teaching and using throughout this specialization Taken together, these packages make up what we’re referring to as the tidyverse ecosystem The purpose for the rest of this course is not for you to understand how to use each of these packages (that’s coming soon!), but rather to help you familiarize yourself with which packages fit into which part of the data science life cycle
Note that the official tidyverse packages below are bold. All other packages are tidyverse-adjacent, meaning they follow the same conventions as the official tidyverse packages and work well within the tidy framework and structure of data analysis
Data Science Project Organization & Workflows
Data science projects vary quite a lot so it can be difficult to give universal rules for how they should be organized However, there are a few ways to organize projects that are commonly useful In particular, almost all projects have to deal with files of various sorts—data files, code files, output files, etc This section talks about how files work and how projects can be organized and customized
Case Studies
Throughout this specialization, we’re going to make use of a number of case studies from Open Case Studies to demonstrate the concepts introduced in the course We’ll generally make use of the same case studies throughout the specialization, providing continuity to allow you to focus on the concepts and skills being taught (rather than the context) while working with interesting data These case studies aim to address a public-health question and all of them use real data
Project: Organizing a New Data Science Project
This project will allow you to create a new project and organize the files that will be needed to engage in a future data analysis
This course introduces a powerful set of data science tools known as the Tidyverse The Tidyverse has revolutionized the way in which data scientists do almost every aspect of their job We will cover the simple idea of "tidy data" and how this idea serves to organize data for analysis and modeling We will also cover how non-tidy can be transformed to tidy data, the data science project life cycle, and the ecosystem of Tidyverse R packages that can be used to execute a data science project
If you are new to data science, the Tidyverse ecosystem of R packages is an excellent way to learn the different aspects of the data science pipeline, from importing the data, tidying the data into a format that is easy to work with, exploring and visualizing the data, and fitting machine learning models If you are already experienced in data science, the Tidyverse provides a power system for streamlining your workflow in a coherent manner that can easily connect with other data science tools
In this course it is important that you be familiar with the R programming language If you are not yet familiar with R, we suggest you first complete R Programming before returning to complete this course
Write a public review