Part 34 Lab 06: Join those tables!

Load required packages:

library(tidyverse)

34.1 Exercise 1: singer

The package singer comes with two smallish data frames about songs. Let’s take a look at them (after minor modifications by renaming and shuffling):

You can download the singer data from the class repo:

songs <- read_csv("https://raw.githubusercontent.com/bwiernik/progdata-class/master/data/singer/songs.csv")
locations <- read_csv("https://raw.githubusercontent.com/bwiernik/progdata-class/master/data/singer/loc.csv")

(time <- as_tibble(songs) |> 
   rename(song = title))

(album <- as_tibble(locations) |> 
   select(title, everything()) |> 
   rename(album = release,
          song  = title))
  1. We really care about the songs in time. But, for which of those songs do we know its corresponding album?
time |> 
  FILL_THIS_IN(album, by = FILL_THIS_IN)
  1. Go ahead and add the corresponding albums to the time tibble, being sure to preserve rows even if album info is not readily available.
time |> 
  FILL_THIS_IN(album, by = FILL_THIS_IN)
  1. Which songs do we have “year”, but not album info?
time |> 
  FILL_THIS_IN(album, by = "song")
  1. Which artists are in time, but not in album?
time |> 
  anti_join(album, by = "FILL_THIS_IN")
  1. You’ve come across these two tibbles, and just wish all the info was available in one tibble. What would you do?
FILL_THIS_IN |> 
  FILL_THIS_IN(FILL_THIS_IN, by = "song")

34.2 Exercise 2: LOTR

Load in three tibbles of data on the Lord of the Rings:

fell <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Fellowship_Of_The_Ring.csv")
ttow <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Two_Towers.csv")
retk <- read_csv("https://raw.githubusercontent.com/jennybc/lotr-tidy/master/data/The_Return_Of_The_King.csv")
  1. Stack these into a single tibble.
FILL_THIS_IN(fell, FILL_THIS_IN)
  1. Which races are present in “The Fellowship of the Ring” (fell), but not in any of the other ones?
fell |> 
  FILL_THIS_IN(FILL_THIS_IN, by = "Race") |> 
  FILL_THIS_IN(FILL_THIS_IN, by = "Race")

34.3 Exercise 3: Set Operations

Let’s use three set functions: intersect, union and setdiff. We’ll work with two toy tibbles named y and z, similar to Data Wrangling Cheatsheet

(y <-  tibble(x1 = LETTERS[1:3], x2 = 1:3))
(z <- tibble(x1 = c("B", "C", "D"), x2 = 2:4))
  1. Rows that appear in both y and z
FILL_THIS_IN(y, z)
  1. You collected the data in y on Day 1, and z in Day 2. Make a data set to reflect that.
FILL_THIS_IN(
  mutate(y, day = "Day 1"),
  mutate(z, day = "Day 2")
)
  1. The rows contained in z are bad! Remove those rows from y.
FILL_THIS_IN(FILL_THIS_IN, FILL_THIS_IN)