read_dataset()
is the function that downloads our data in our R package. You can find the following documentation by running ?read_dataset
in R.
Our recommendation is to use tidy data (as opposed to lists, where lists is currently the default). You can have either combined or relational data (where relational separates the metadata table from the data table). One could recommend relational, for size reasons. You can have either wide (with a separate column for each variable), or long. Long is usually better for data analysts.
read_dataset {econdatar} | R Documentation |
read_dataset
Description
Returns the data for the given data set – ECONDATA:id(version), as a list, or as tidy data.table‘s. Available data sets can be looked up using read_database()
or from the web platform (https://www.econdata.co.za/). Tidying can be done directly within read_dataset()
, or ex-post using tidy_data()
.
Usage
read_dataset(id, tidy = FALSE, ...)
## S3 method for class 'eds_dataset'
tidy_data(x, wide = TRUE, ...)
Arguments
id | Data set identifier. | |||||||||||||||||||||||||||||||||||||||||||||
x | A raw API return object to be tidied. Can also be done directly in | |||||||||||||||||||||||||||||||||||||||||||||
wide | Should the tidied data be returned in wide or long format. | |||||||||||||||||||||||||||||||||||||||||||||
... | Further Optional arguments:
|
Details
An EconData account (https://www.econdata.co.za/) is required to use this function. The user must provide their credentials either through the function arguments, or by setting the ECONDATA_CREDENTIALS
environment variable using the syntax "username;password"
, e.g. Sys.setenv(ECONDATA_CREDENTIALS="username;password")
. If credentials are not supplied by the aforementioned methods a GUI dialog will prompt the user for credentials.
Value
If tidy = FALSE
, a list of data frames is returned, where the names of the list are the EconData series codes, and each data frame has a single column named ‘OBS_VALUE’ containing the data, with corresponding dates attached as rownames. Each data frame further has a "metadata"
attribute providing information about the series. The entire list of data frames also has a "metadata"
attribute, providing information about the dataset. If multiple datasets (or versions of a dataset if version
is specified as ‘all’) are being queried, a list of such lists is returned.
If tidy = TRUE
and wide = TRUE
(the default), a single data.table is returned where the first column is the date, and the remaining columns are series named by their EconData codes. Each series has two attributes: "label"
provides a variable label combining important metadata from the "metadata"
attribute in the non-tidy format, and "source.code"
gives the series code assigned by the original data provider. The table has the same dataset-level "metadata"
attribute as the list of data frames if tidy = FALSE
. If multiple datasets (or versions of a dataset if version
is specified as ‘all’) are being queried, a list of such data.table‘s is returned.
If tidy = TRUE
and wide = FALSE
and compact = FALSE
(the default), a named list of two data.table‘s is returned. The first, "data"
, has columns ‘code’, ‘date’ and ‘value’ providing the data in a long format. The second, "metadata"
, provides dataset and series-level matadata, with one row for each series. If compact = TRUE
, these two datasets are combined, where all repetitive content is converted to factors for more efficient storage. If multiple datasets (or versions of a dataset if version
is specified as ‘all’) are being queried, compact = FALSE
gives a nested list, whereas compact = TRUE
binds everything together to a single long frame. In general, if wide = FALSE
, no attributes are attached to the tables or columns in the tables.
Examples
# library(econdatar)
# Sys.setenv(ECONDATA_CREDENTIALS="username;password")
# for ids/versions see: https://www.econdata.co.za/
# Electricity Generated
ELECTRICITY <- read_dataset(id = "ELECTRICITY")
ELECTRICITY_WIDE <- tidy_data(ELECTRICITY) # Or: read_dataset("ELECTRICITY", tidy = TRUE)
ELECTRICITY_LONG <- tidy_data(ELECTRICITY, wide = FALSE)
# Same as tidy_data(ELECTRICITY, wide = FALSE, combine = TRUE):
with(ELECTRICITY_LONG, metadata[data, on = "data_key"])
# CPI Analytical Series: Different Revisions
CPI_ANL <- read_dataset(id = "CPI_ANL_SERIES", version = "all")
CPI_ANL_WIDE <- tidy_data(CPI_ANL)
CPI_ANL_LONG <- tidy_data(CPI_ANL, wide = FALSE, combine = TRUE)
CPI_ANL_ALLMETA <- tidy_data(CPI_ANL, wide = FALSE, allmeta = TRUE) # v2.0 has some 0-obs series
# Can query a specific version by adding e.g. version = "2.0.0" to the call
# Returns 5-10 years (daily average bond yields) not yet contained in the latest release
# (particularly useful for daily data that is released monthly)
MARKET_RATES <- read_dataset(id = "MARKET_RATES", series_key = "CMJD003.B.A", release = "unreleased")