Options when downloading data in our R package

read_dataset() is the function that downloads our data in our R package. You can find the following documentation by running ?read_dataset in R.

Our recommendation is to use tidy data (as opposed to lists, where lists is currently the default). You can have either combined or relational data (where relational separates the metadata table from the data table). One could recommend relational, for size reasons. You can have either wide (with a separate column for each variable), or long. Long is usually better for data analysts.

read_dataset {econdatar}R Documentation

read_dataset

Description

Returns the data for the given data set – ECONDATA:id(version), as a list, or as tidy data.table‘s. Available data sets can be looked up using read_database() or from the web platform (https://www.econdata.co.za/). Tidying can be done directly within read_dataset(), or ex-post using tidy_data().

Usage

read_dataset(id, tidy = FALSE, ...)

## S3 method for class 'eds_dataset'

tidy_data(x, wide = TRUE, ...)

Arguments

id

Data set identifier.

x

A raw API return object to be tidied. Can also be done directly in read_dataset() by setting tidy = TRUE. See tidy below for tidying options.

wide

Should the tidied data be returned in wide or long format.

...

Further Optional arguments:

agencyid character. Defaults to ECONDATA. Agency responsible for the metadata creation/maintenance.
version character. Version(s) of the data (different versions will have different metadata), or ‘all’ to return all available versions.
series_key character. A character vector specifying a subset of time series (see the web platform (export function) for details).
release character (optionally with format %Y-%m-%dT%H:%M:%S, to be coerced to a date/time). The release description, which will return the data associated with that release (if the given description matches an existsing release); or a date/time which will return the data as it was at the given time; or ‘latest’ which will return the latest release; or ‘unreleased’ which will return any unreleased data (useful for data that is updated more often than it is released, e.g. daily data).
file character. File name for retrieving data sets stored as JSON data from disk (output of read_dataset().
username character. Web username.
password character. Web password.
tidy logical. Return data and metadata in tidy data.table‘s (see Value), by passing the result through tidy_data. If TRUE, read_dataset()/tidy_data() admit the following additional arguments:
wide logical, default: TRUE. Returns data in a column-based format, with "label" and "source_identifier" attributes to columns (when available) and an overall “metadata” attribute to the table, otherwise a long-format is returned. See Value.
codelabel logical, default: FALSE. If wide = TRUE, setting codelabel = TRUE the data key will be used to generate the "label", when available.
combine logical, default: FALSE. If wide = FALSE, setting combine = TRUE will combine all data and metadata into a single long table, whereas the default FALSE will return data and metadata in separate tables, for more efficient storage.
origmeta logical, default: FALSE. If wide = FALSE, setting origmeta = TRUE will combine all metadata fields attached to the series in the dataset as they are. The default is to construct a standardized set of metadata variables, and then drop those not observed. See also allmeta.
allmeta logical, default: FALSE. If wide = FALSE, setting allmeta = TRUE always returns the full set of metadata fields, regardless of whether they are recorded for the given dataset. It is also possible that there are series with zero observations in a dataset. Such series are dropped in tidy output, but if combine = FALSE, allmeta = TRUE retains their metadata in the metadata table.
prettymeta logical, default: TRUE. Attempts to make the returned metadata more human readable replacing each code category and enumeration with its name. It is advisable to leave this set to TRUE, in some cases, where speed is paramount you may want to set this flag to FALSE. If multiple datasets are being querioed this option is automatically set to FALSE.
release logical, default: FALSE. TRUE allows you to apply tidy_data() to objects returned by read_release(). All other flags to tidy_data() are ignored.

Details

An EconData account (https://www.econdata.co.za/) is required to use this function. The user must provide their credentials either through the function arguments, or by setting the ECONDATA_CREDENTIALS environment variable using the syntax "username;password", e.g. Sys.setenv(ECONDATA_CREDENTIALS="username;password"). If credentials are not supplied by the aforementioned methods a GUI dialog will prompt the user for credentials.

Value

If tidy = FALSE, a list of data frames is returned, where the names of the list are the EconData series codes, and each data frame has a single column named ‘OBS_VALUE’ containing the data, with corresponding dates attached as rownames. Each data frame further has a "metadata" attribute providing information about the series. The entire list of data frames also has a "metadata" attribute, providing information about the dataset. If multiple datasets (or versions of a dataset if version is specified as ‘all’) are being queried, a list of such lists is returned.

If tidy = TRUE and wide = TRUE (the default), a single data.table is returned where the first column is the date, and the remaining columns are series named by their EconData codes. Each series has two attributes: "label" provides a variable label combining important metadata from the "metadata" attribute in the non-tidy format, and "source.code" gives the series code assigned by the original data provider. The table has the same dataset-level "metadata" attribute as the list of data frames if tidy = FALSE. If multiple datasets (or versions of a dataset if version is specified as ‘all’) are being queried, a list of such data.table‘s is returned.

If tidy = TRUE and wide = FALSE and compact = FALSE (the default), a named list of two data.table‘s is returned. The first, "data", has columns ‘code’, ‘date’ and ‘value’ providing the data in a long format. The second, "metadata", provides dataset and series-level matadata, with one row for each series. If compact = TRUE, these two datasets are combined, where all repetitive content is converted to factors for more efficient storage. If multiple datasets (or versions of a dataset if version is specified as ‘all’) are being queried, compact = FALSE gives a nested list, whereas compact = TRUE binds everything together to a single long frame. In general, if wide = FALSE, no attributes are attached to the tables or columns in the tables.

Examples

# library(econdatar)
# Sys.setenv(ECONDATA_CREDENTIALS="username;password")
# for ids/versions see: https://www.econdata.co.za/

# Electricity Generated
ELECTRICITY <- read_dataset(id = "ELECTRICITY")
ELECTRICITY_WIDE <- tidy_data(ELECTRICITY) # Or: read_dataset("ELECTRICITY", tidy = TRUE)
ELECTRICITY_LONG <- tidy_data(ELECTRICITY, wide = FALSE)
# Same as tidy_data(ELECTRICITY, wide = FALSE, combine = TRUE):
with(ELECTRICITY_LONG, metadata[data, on = "data_key"])

# CPI Analytical Series: Different Revisions
CPI_ANL <- read_dataset(id = "CPI_ANL_SERIES", version = "all")
CPI_ANL_WIDE <- tidy_data(CPI_ANL)
CPI_ANL_LONG <- tidy_data(CPI_ANL, wide = FALSE, combine = TRUE)
CPI_ANL_ALLMETA <- tidy_data(CPI_ANL, wide = FALSE, allmeta = TRUE) # v2.0 has some 0-obs series

# Can query a specific version by adding e.g. version = "2.0.0" to the call

# Returns 5-10 years (daily average bond yields) not yet contained in the latest release
# (particularly useful for daily data that is released monthly)
MARKET_RATES <- read_dataset(id = "MARKET_RATES", series_key = "CMJD003.B.A", release = "unreleased")
[Package econdatar version 3.0.2 Index]