Part 47 Lab 07: Make a portfolio piece
For this week’s activities, make a portfolio piece for class.
Do the following:
- Make a new GitHub repo for your project and clone it to your computer.
- Make an appropriate file structure for your portfolio project, including, e.g.,
- An RStudio project file.
- A
data
folder - A
figures
folder - Other folders as appropriate.
- Choose a dataset of interest to you
47.1 Step 1: Make a new GitHub repo
Make a new GitHub repo for your project. It’s a good habit to make a new repo for each different project/paper you work on.
Add a link to this new project repo to you main class repo README.
Clone your new project repo down to your computer.
47.2 Step 2: Set up your folder structure
In your new repo, prepare your folder structure. Folders and files you should probably include in this folder structure are:
- A
README.md
file describing the project and the contents of the subfolders - An RStudio Project .Rproj file in the folder root
- A
data
folder that holds the relevant data files for the project- Could also have separate
data_raw
anddata
folders
- Could also have separate
- One or more RMarkdown manuscripts
- An
output
folder that will hold your output files- Depending on how many figures and other output files you may have,
you might want to split/subfolder this into
figures
,reports
, etc.
- Depending on how many figures and other output files you may have,
you might want to split/subfolder this into
- Other subfolders as needed, e.g.:
admin
for adminstrative documents (e.g., IRB approval, grant information)doc
for documentation (e.g., variable codebooks),R
as a place to store functions and scripts that you call from your markdown (e.g., a data import and cleaning script)
All of these folders should be described in your README.md file for this project (e.g., make a markdown table describing the folders).
47.3 Step 3: Find, download, and document a dataset
In this step, you need to identify a dataset you want to work with for your final project. Some potential sources for datasets include:
- ICPSR
- Google Datasets
- Harvard Dataverse
- FiveThirtyEight’s GitHub repos
- Data from your own research work or experience
The requirements for this dataset:
- It cannot be already available as an R package like
gapminder
,nycflights13
, or one of the datasets indatasets
- It’s okay if there is any R package you use to download the data, such as
rtweet
, so long as it isn’t already a neatly packaged dataset for you.
- It’s okay if there is any R package you use to download the data, such as
- It should be a dataset that needs some cleaning, reshaping, wrangling before it’s ready for analysis. This is the case for almost any real-world data.
Talk to me if you are having trouble finding an interesting or suitable dataset!
Once you have identified a dataset, download it and place it in your data
folder.
- Bonus: If this is data you are downloading from the internet, write and run a .R script to download the dataset
- Hint: Try to write this as a separate .R file from your homework .Rmd file
and call it in your .Rmd using
source()
.
- Hint: Try to write this as a separate .R file from your homework .Rmd file
and call it in your .Rmd using
- Your data file should have a useful descriptive name (no
data.xlsx
!). It should include:- A description of the file (e.g., “gapminder-data-all-countries”)
- A date in a sortable format (e.g., “2020-03-04”)
- No spaces
- The file extension (e.g., “.csv” or “.xlsx”)
Document your data.
Describe the key variables, their types (e.g., character, integrer, numeric), their possible values (e.g., for a personality item, maybe integers from 1-5), etc.
Describe how missing data is indicated (blank cells, “NA”, “-999”, etc.).
This documentation could be in the main project README.md file or in a README.md or other .md file in the data
folder.
47.4 Step 4: Practice input, process, output
In your .Rmd file for this homework assignment (call it hw04.Rmd
), practice reading your data file into R, processing the imported data, and outputting files.
Import your file using R functions (not the buttons in RStudio).
You will want to output two things:
- A data file, such as cleaned dataset and/or a summarized dataset/table
- A plot, in several formats, including (1) a bitmap format, (2) a vector format, and (3) PDF
Save your data file and plots to your output
folder (or other more specific folders if that is your structure).
Be sure to use descriptive file names and document what these files are with a .md file.
Save your data file using write_csv()
or a similar function.
Save your images using ggsave()
.
Do not use the buttons in RStudio.
47.5 Step 5: Finish your data exploration report
Finish writing your RMarkdown document to describe the data, conduct your analyses, and describe your interpretations or conclusions.
library(tidyverse)
library(tidytext)
library(textdata)
::opts_chunk$set(echo = TRUE, include = TRUE, fig.width = 5, fig.height = 4, fig.align = "center") knitr