Part 34 Lab 06: Join those tables!
Load required packages:
library(tidyverse)34.1 Exercise 1: singer
The package singer comes with two smallish data frames about songs.
Let’s take a look at them (after minor modifications by renaming and shuffling):
You can download the singer data from the class repo:
songs <- read_csv("https://raw.githubusercontent.com/bwiernik/progdata-class/master/data/singer/songs.csv")
locations <- read_csv("https://raw.githubusercontent.com/bwiernik/progdata-class/master/data/singer/loc.csv")
(time <- as_tibble(songs) |>
rename(song = title))
(album <- as_tibble(locations) |>
select(title, everything()) |>
rename(album = release,
song = title))- We really care about the songs in
time. But, for which of those songs do we know its corresponding album?
time |>
FILL_THIS_IN(album, by = FILL_THIS_IN)- Go ahead and add the corresponding albums to the
timetibble, being sure to preserve rows even if album info is not readily available.
time |>
FILL_THIS_IN(album, by = FILL_THIS_IN)- Which songs do we have “year”, but not album info?
time |>
FILL_THIS_IN(album, by = "song")- Which artists are in
time, but not inalbum?
time |>
anti_join(album, by = "FILL_THIS_IN")- You’ve come across these two tibbles, and just wish all the info was available in one tibble. What would you do?
FILL_THIS_IN |>
FILL_THIS_IN(FILL_THIS_IN, by = "song")34.2 Exercise 2: LOTR
Load in three tibbles of data on the Lord of the Rings:
fell <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv")
ttow <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv")
retk <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv")- Stack these into a single tibble.
FILL_THIS_IN(fell, FILL_THIS_IN)- Which races are present in “The Fellowship of the Ring” (
fell), but not in any of the other ones?
fell |>
FILL_THIS_IN(FILL_THIS_IN, by = "Race") |>
FILL_THIS_IN(FILL_THIS_IN, by = "Race")34.3 Exercise 3: Set Operations
Let’s use three set functions: intersect, union and setdiff. We’ll work
with two toy tibbles named y and z, similar to Data Wrangling Cheatsheet
(y <- tibble(x1 = LETTERS[1:3], x2 = 1:3))(z <- tibble(x1 = c("B", "C", "D"), x2 = 2:4))- Rows that appear in both
yandz
FILL_THIS_IN(y, z)- You collected the data in
yon Day 1, andzin Day 2. Make a data set to reflect that.
FILL_THIS_IN(
mutate(y, day = "Day 1"),
mutate(z, day = "Day 2")
)- The rows contained in
zare bad! Remove those rows fromy.
FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)