Part 24 Lab 3: Explore gapminder with ggplot2 and dplyr

In this lab, you will explore the gapminder dataset by making summary tables and visualizations.

24.1 Getting started

Make a new RMarkdown script for this lab. In the setup chunk at the top of your scripts, load the packages needed for this lab.

These include tidyverse and gapminder.

24.2 Exercise 1: Basic dplyr

Use dplyr functions to achieve the following. For each exercise, print your result.

24.2.1 1.1

Use filter() to subset the gapminder data to three countries of your choice in the 1970’s.

24.2.2 1.2

Start with the original gapminder data and use the pipe operator |> to first do the above filter and then select the “country” and “gdpPercap” variables.

24.2.3 1.3

Make a new variable in gapminder for the change in life expectancy from the previous measurement for that country. Filter this table to show all of the entries that have experienced a drop in life expectancy. Save this result as a new object.

Hint: you might find the lag() or diff() functions useful.

24.2.4 1.4

Filter gapminder so that it shows the max GDP per capita experienced by each country.

Hint: you might find the max() function useful here.

24.2.5 1.5

Produce a scatterplot of Canada’s life expectancy vs. GDP per capita using ggplot2.

In your plot, put GDP per capita on a log scale.

24.3 Exercise 2: Explore two variables with dplyr and ggplot2

Use gapminder, palmerpenguins::penguins, or another dataset of your choice. (Check out a dataset from the datasets or psych R package if you want!

24.3.1 2.1

Pick two quantitative variables to explore.

  • Make a summary table of descriptive statistics for these variables using summarize().
    • Include whatever staistics you feel appropriate (mean, median sd, range, etc.).
  • Make a scatterplot of these variables using ggplot().

24.3.2 2.2

Pick one categorical variable and one quantitative variable to explore.

  • Make a summary table giving the sample size (hint: n()) and descriptive statistics for the quantitative variable by group.
  • Make one or more useful plots to visualize these variables.
    • Try to make a raincloud plot with points, boxplots, and densities for each group.

24.4 Bonus Exercise: Recycling (Optional)

Evaluate this code and describe the result. The goal was to get the data for Rwanda and Afghanistan. Does this work? Why or why not? If not, what is the correct way to do this?

filter(gapminder, country == c("Rwanda", "Afghanistan"))