Part 40 Reading data from disk
The same csv file that we just saved to disk can be imported into R again by specifying the path where it exists:
<- read_csv(here::here("participation", "data", "gap_asia_2007.csv"))
dat dat
Notice that the output of the imported file is the same as the original tibble.
read_csv()
was intelligent enough to detect the types of the columns.
This won’t always be true so it’s worth checking!
In particular, be on the lookout for any columns it imports as col_character()
!
The read_csv() function has many additional options including the ability to specify column types (e.g., is “1990” a year or a number?), skip columns, skip rows, rename columns on import, trim whitespace, and more.
To control the column types, use the cols()
function:
<- read_csv(
dat ::here("participation", "data", "gap_asia_2007.csv"),
herecol_types = cols(
country = col_factor(),
continent = col_factor(),
year = col_date(format = "%Y"),
.default = col_double() # all other columns as numeric (double)
)
) dat
By default, it leaves all columns as col_guess()
, but it’s better to be explicit.
Another important option to set is the na
argument, which specifies what values to treat as NA
on import.
By default, read_csv()
treats blank cells (i.e., ""
) and cells with "NA"
as missing.
You might need to change this (e.g., if missing values are entered as -999
).
Note that readxl::read_excel()
by default only has na = c("")
(no "NA"
)!
<- read_csv(
dat ::here("participation", "data", "gap_asia_2007.csv"),
herecol_types = cols(
country = col_factor(),
continent = col_factor(),
year = col_date(format = "%Y"),
.default = col_double() # all other columns as numeric (double)
),na = c("", "NA", -99, "No response")
) dat