Part 34 Lab 06: Join those tables!
Load required packages:
library(tidyverse)
34.1 Exercise 1: singer
The package singer
comes with two smallish data frames about songs.
Let’s take a look at them (after minor modifications by renaming and shuffling):
You can download the singer data from the class repo:
<- read_csv("https://raw.githubusercontent.com/bwiernik/progdata-class/master/data/singer/songs.csv")
songs <- read_csv("https://raw.githubusercontent.com/bwiernik/progdata-class/master/data/singer/loc.csv")
locations
<- as_tibble(songs) |>
(time rename(song = title))
<- as_tibble(locations) |>
(album select(title, everything()) |>
rename(album = release,
song = title))
- We really care about the songs in
time
. But, for which of those songs do we know its corresponding album?
|>
time FILL_THIS_IN(album, by = FILL_THIS_IN)
- Go ahead and add the corresponding albums to the
time
tibble, being sure to preserve rows even if album info is not readily available.
|>
time FILL_THIS_IN(album, by = FILL_THIS_IN)
- Which songs do we have “year”, but not album info?
|>
time FILL_THIS_IN(album, by = "song")
- Which artists are in
time
, but not inalbum
?
|>
time anti_join(album, by = "FILL_THIS_IN")
- You’ve come across these two tibbles, and just wish all the info was available in one tibble. What would you do?
|>
FILL_THIS_IN FILL_THIS_IN(FILL_THIS_IN, by = "song")
34.2 Exercise 2: LOTR
Load in three tibbles of data on the Lord of the Rings:
<- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv")
fell <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv")
ttow <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv") retk
- Stack these into a single tibble.
FILL_THIS_IN(fell, FILL_THIS_IN)
- Which races are present in “The Fellowship of the Ring” (
fell
), but not in any of the other ones?
|>
fell FILL_THIS_IN(FILL_THIS_IN, by = "Race") |>
FILL_THIS_IN(FILL_THIS_IN, by = "Race")
34.3 Exercise 3: Set Operations
Let’s use three set functions: intersect
, union
and setdiff
. We’ll work
with two toy tibbles named y
and z
, similar to Data Wrangling Cheatsheet
<- tibble(x1 = LETTERS[1:3], x2 = 1:3)) (y
<- tibble(x1 = c("B", "C", "D"), x2 = 2:4)) (z
- Rows that appear in both
y
andz
FILL_THIS_IN(y, z)
- You collected the data in
y
on Day 1, andz
in Day 2. Make a data set to reflect that.
FILL_THIS_IN(
mutate(y, day = "Day 1"),
mutate(z, day = "Day 2")
)
- The rows contained in
z
are bad! Remove those rows fromy
.
FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)