Part 5 Thoughtful Workflow
At this point, I recommend you pause and think about your workflow. I give you permission to spend some time and energy sorting this out! It can be as or more important than learning a new R function or package. The experts don’t talk about this much, because they’ve already got a workflow; it’s something they do almost without thinking.
Working through subsequent material in R Markdown documents, possibly using Git and GitHub to track and share your progress, is a great idea and will leave you more prepared for your future data analysis projects. Typing individual lines of R code is but a small part of data analysis and it pays off to think holistically about your workflow.
If you want a lot more detail on workflows, you can wander over to the optional bit on r basics and workflow.
5.1 R Markdown
If you are in the mood to be entertained, start the video from the beginning. But if you’d rather just get on with it, start watching at 6:52.
You can follow along with the slides here if they do not appear below.
R Markdown is an accessible way to create computational documents that combine prose and tables and figures produced by R code.
An introductory R Markdown workflow, including how it intersects with Git, GitHub, and RStudio, is now maintained within the Happy Git site:
5.2 Git and Github
First, it’s important to realize that Git and GitHub are distinct things. GitHub is an online hosting platform that provides an array of services built on top of the Git system. (Similar platforms include Bitbucket and GitLab.) Just like we don’t need Rstudio to run R code, we don’t need GitHub to use Git… But, it will make our lives so much easier.
Git can be very powerful and useful, but it can also take some getting used to. In this class, we are going to work with some of its most basic functions. We will do all of our interfacing with Git using the GitHub app and website.
You can follow along with the slides here if they do not appear below.
5.2.1 What is Github?
5.2.2 Git
Git is a distributed Version Control System (VCS). It is a useful tool for easily tracking changes to your code, collaborating, and sharing.
(Wait, what?) Okay, try this: Imagine if Dropbox and the “Track changes” feature in MS Word had a baby. Git would be that baby. In fact, it’s even better than that because Git is optimized for the things that social scientists and data scientists spend a lot of time working on (e.g. code).
The learning curve is worth it – I promise you!
With Git, you can track the changes you make to your project so you always have a record of what you’ve worked on and can easily revert back to an older version if need be. It also makes working with others easier -— groups of people can work together on the same project and merge their changes into one final source!
GitHub is a way to use the same power of Git all online with an easy-to-use interface. It’s used across the software world and beyond to collaborate and maintain the history of projects.
There’s a high probability that your favorite app, program or package is built using Git-based tools. (RStudio is a case in point.)
Scientists and academic researchers are starting to use it as well. Benefits of version control and collaboration tools aside, Git(Hub) helps to operationalize the ideals of open science and reproducibility. Journals have increasingly strict requirements regarding reproducibility and data access. GH makes this easy (DOI integration, off-the-shelf licenses, etc.). I run my entire lab on GH; this entire course is running on github; these lecture notes are hosted on github…
5.3 Getting Help with R
You can follow along with the slides here if they do not appear below.