Options when downloading data in our R package

read_dataset() is the function that downloads our data in our R package. You can find the following documentation by running ?read_dataset in R. Please see our blog post on the functionality.

read_dataset {econdatar}R Documentation

read_dataset

Description

Returns the data for the given data set – ECONDATA:id(version), as a list, or as tidy data.table‘s. Available data sets can be looked up using read_database() or from the web platform. Tidying can be done directly within read_dataset(), or ex-post using tidy_data().

Usage

read_dataset(id, tidy = TRUE, ...)

## S3 method for class 'eds_dataset'

tidy_data(x, wide = TRUE, ...)

Arguments

id

Data set identifier.

x

A raw API return object to be tidied. Can also be done directly in read_dataset() by setting tidy = TRUE. See tidy below for tidying options.

wide

Specifies whether the tidied data be returned in wide or long format.

...

Further optional arguments:

agencyid character. Defaults to ECONDATA. Agency responsible for the metadata creation/maintenance.
version character. Version(s) of the data (different versions will have different metadata), or ‘all’ to return all available versions.
series_key character. A character vector specifying a subset of time series (see the web platform (export function) for an example). Select groups of dimensions with a + character and seperate groups with a . character (leave group blank to return all). For example, series_key = "MIN001+MIN002..S".
start_date any object that can be coerced with as.Date(). The first observation returned by the query.
end_date any object that can be coerced with as.Date(). The last observation returned by the query.
release character (optionally with format %Y-%m-%dT%H:%M:%S, to be coerced to a date/time). The release description, which will return the data associated with that release (if the given description matches an existsing release); or a date/time which will return the data as it was at the given time; or ‘latest’ which will return the latest release; or ‘unreleased’ which will return any unreleased data (useful for data that is updated more often than it is released, e.g. daily data).
file character. File name for retrieving data sets stored as JSON data from disk (output of read_dataset().
tidy

logical. Return data and metadata in tidy data.table‘s (see Value), by passing the result through tidy_data. If TRUE, read_dataset()/tidy_data() admit the following additional arguments:

wide logical, default: TRUE. Returns data in a column-based format, with "label" and "source_identifier" attributes to columns (when available) and an overall “metadata” attribute to the table, otherwise a long-format is returned. See Value.
prettify logical, default: TRUE. Attempts to make the returned metadata more human readable replacing each code category and enumeration with its name. It is advisable to leave this set to TRUE, in some cases, where speed is paramount you may want to set this flag to FALSE. If multiple datasets are being queried this option is automatically set to FALSE.
combine logical, default: FALSE. If wide = FALSE, setting combine = TRUE will combine all data and metadata into a single long table, whereas the default FALSE will return data and metadata in separate tables, for more efficient storage.

Details

An EconData account is required to use this function. The user must provide an API token, which can be found on the Account page of the online portal; a GUI dialog will prompt the user for their API token. Credentials can also be supplied by setting the ECONDATA_CREDENTIALS environment variable using the syntax: "client_id;client_secret", e.g. Sys.setenv(ECONDATA_CREDENTIALS="client_id;client_secret"), when available.

Value

If tidy = FALSE, a list of data frames is returned, where the names of the list are the EconData series codes, and each data frame has a single column named ‘OBS_VALUE’ containing the data, with corresponding dates attached as rownames. Each data frame further has a "metadata" attribute providing information about the series. The entire list of data frames also has a "metadata" attribute, providing information about the dataset. If multiple datasets (or versions of a dataset if version is specified as ‘all’) are being queried, a list of such lists is returned.

If tidy = TRUE and wide = TRUE (the default), a single data.table is returned where the first column is the date, and the remaining columns are series named by their EconData codes. Each series has two attributes: "label" provides a variable label combining important metadata from the "metadata" attribute in the non-tidy format, and "source_identifier" gives the series code assigned by the original data provider where available. The table has the same dataset-level "metadata" attribute as the list of data frames if tidy = FALSE. If multiple datasets are being queried, a list of such data.table‘s is returned.

If tidy = TRUE and wide = FALSE and combine = FALSE (the default), a named list of two data.table‘s is returned. The first, "data", has columns ‘series_key’, ‘time_period’ and ‘obs_value’ providing the data in a long format. The second, "metadata", provides dataset and series-level matadata, with one row for each series. If combine = TRUE, these two datasets are combined, where all repetitive content is converted to factors for more efficient storage. If multiple datasets are being queried, combine = FALSE gives a nested list, whereas combine = TRUE binds everything together to a single long frame.

Examples

# library(econdatar)

# Mining production and sales
read_dataset(id = "MINING")

# Tidy options
(MINING <- read_dataset(id = "MINING", tidy = FALSE))
# Same as: read_dataset(id = "MINING", tidy = TRUE)
tidy_data(MINING, wide = TRUE)
tidy_data(MINING, wide = FALSE)
tidy_data(MINING, wide = FALSE, combine = TRUE, prettify = FALSE)

# Can query a specific version by adding e.g. version = "1.0.0" to the call
read_dataset(id = "MINING", version = "all")
read_dataset(id = "MINING", version = "1.0.0")

# Using the series key
read_dataset(id = "MINING", series_key = "MIN001+MIN002..S")
read_dataset(id = "MINING", series_key = c("MIN001+MIN002..S", "MIN009.I.N"))

# Using start and end dates
read_dataset(id = "MINING",
             series_key = "MIN001.I.S",
             start_date = "2010-01-01",
             end_date = Sys.Date()-365)

# Returns 5-10 years (daily average bond yields) not yet contained in the latest release
# (particularly useful for daily data that is released monthly)
read_dataset(id = "MARKET_RATES",
             series_key = "CMJD003.B.A",
             release = "unreleased")

# library(tibble)
POP <- read_dataset(id = "POPULATION_DATA_REG",
                    series_key = "POP...80",
                    tidy = TRUE,
                    wide = FALSE)
str(POP)
print(names(POP$data))
print(names(POP$metadata))
as_tibble(POP$data)
as_tibble(POP$metadata) |> view()
[Package econdatar version 4.0.0 Index]