Part 41 Import a file from the web/cloud

41.1 Import a CSV file from the internet

To import a CSV file directly from the web, assign the URL to a variable

url <- "http://gattonweb.uky.edu/sheather/book/docs/datasets/magazines.csv"

and then apply read_csv file to the url.

dat <- read_csv(url)

You can do this in one step if you like:

read_csv("http://gattonweb.uky.edu/sheather/book/docs/datasets/magazines.csv")

41.2 Import an Excel file (.xls or .xlsx) from the internet

First, we’ll need the package to load in Excel files:

library(readxl) 

Datafiles from this tutorial were obtained from: https://beanumber.github.io/sds192/lab-import.html#data_from_an_excel_file

Unlike with a CSV file, to import an .xls or .xlsx file from the internet, you first need to download it locally.

Note: The folder you want to save the file to has to exist!. If it doesn’t, you will get an error.

You can create the folder path in one of three ways:

  1. Create them directly in Finder/Windows Explorer
  2. Use the buttons in the Files tab in RStudio
  3. Use the dir.create() function:
if ( !dir.exists( here::here("participation", "data") ) ) {
  dir.create( here::here("participation", "data"), recursive = TRUE )
}

Next, you download the file. To download it, create a new object called xls_url and then use download.file() to download it to a specified path.

xls_url <- "http://gattonweb.uky.edu/sheather/book/docs/datasets/GreatestGivers.xls"
download.file(
  xls_url, 
  here::here("participation", "data", "some_file.xls"), 
  mode = "wb"
)

NOTE: Don’t assign the result of download.file().

NOTE: The mode = "wb" argument at the end is really important if you are on Windows. If you omit, you will probably get a message about downloading a corrupt file. More details about this behavior can be found here.

Naming a file “some_file” is extremely bad practice (hard to keep track of the files). You should always give it a more descriptive name. It’s also a good idea to avoid spaces in filenames. You should come up with a system for naming your files and use it consistently. My file names look like this: progdata_example-dataset_2021-03-02_bmw.csv

Often, it’s a good idea to name the file similarly (or the same as) the original file (sometimes that might not be a good idea if the original name is non-descriptive).

There’s handy trick to extract the filename from the URL:

file_name <- basename(xls_url)
download.file(
  xls_url, 
  here::here("participation", "data", file_name), 
  mode = "wb"
)

Now we can import the file:

dat <- read_excel(
  here::here("participation", "data", file_name)
)