Part 41 Import a file from the web/cloud
41.1 Import a CSV file from the internet
To import a CSV file directly from the web, assign the URL to a variable
<- "http://gattonweb.uky.edu/sheather/book/docs/datasets/magazines.csv" url
and then apply read_csv file to the url
.
<- read_csv(url) dat
You can do this in one step if you like:
read_csv("http://gattonweb.uky.edu/sheather/book/docs/datasets/magazines.csv")
41.2 Import an Excel file (.xls or .xlsx) from the internet
First, we’ll need the package to load in Excel files:
library(readxl)
Datafiles from this tutorial were obtained from: https://beanumber.github.io/sds192/lab-import.html#data_from_an_excel_file
Unlike with a CSV file, to import an .xls or .xlsx file from the internet, you first need to download it locally.
Note: The folder you want to save the file to has to exist!. If it doesn’t, you will get an error.
You can create the folder path in one of three ways:
- Create them directly in Finder/Windows Explorer
- Use the buttons in the Files tab in RStudio
- Use the
dir.create()
function:
if ( !dir.exists( here::here("participation", "data") ) ) {
dir.create( here::here("participation", "data"), recursive = TRUE )
}
Next, you download the file.
To download it, create a new object called xls_url
and then use download.file()
to download it to a specified path.
<- "http://gattonweb.uky.edu/sheather/book/docs/datasets/GreatestGivers.xls"
xls_url download.file(
xls_url, ::here("participation", "data", "some_file.xls"),
heremode = "wb"
)
NOTE: Don’t assign the result of download.file()
.
NOTE: The mode = "wb"
argument at the end is really important if you are on Windows.
If you omit, you will probably get a message about downloading a corrupt file.
More details about this behavior can be found here.
Naming a file “some_file” is extremely bad practice (hard to keep track of the files).
You should always give it a more descriptive name.
It’s also a good idea to avoid spaces in filenames.
You should come up with a system for naming your files and use it consistently.
My file names look like this: progdata_example-dataset_2021-03-02_bmw.csv
Often, it’s a good idea to name the file similarly (or the same as) the original file (sometimes that might not be a good idea if the original name is non-descriptive).
There’s handy trick to extract the filename from the URL:
<- basename(xls_url)
file_name download.file(
xls_url, ::here("participation", "data", file_name),
heremode = "wb"
)
Now we can import the file:
<- read_excel(
dat ::here("participation", "data", file_name)
here )