Major changes with EconDataR version 4

In prior versions of the EconDataR package, read_dataset() returned data from the EconData server as a list of tables by default (the R equivalent of JSON or dictionary format). Version 4 returns the data as a single table (by default), where each series takes up one column in the table.

Although arguments within the function have always been there to rather shape the data as a table, this was not the default. Users who are new to a package and its functions usually start off with using the defaults for the arguments, as that is quicker than reading the documentation at the start. So, we have improved some defaults in the EconDataR package, aimed at what a new R user would want.

These are major changes, which can cause bugs in your existing code when you upgrade the version of the package. Existing users should please keep this in mind when upgrading to version 4.0.0 and later of the package.

`read_dataset()` arguments

The read_dataset() function now makes tidy = TRUE the default (still keeping wide = TRUE), returning a single data.table. For code that expects the data to be returned as a list (of data.frames), please specify tidy = FALSE to revert back to the previous expected behaviour of the function.

Tidy and wide (default)

Let’s take a look at what happens when we use the default wide = TRUE:

mining_wide <- read_dataset("MINING")

The print command returns

> print(mining_wide)
Key: <time_period>
     time_period MIN001.I.N MIN001.I.S MIN001.S.N MIN001.S.S MIN002.I.N MIN002.I.S MIN002.S.N MIN002.S.S MIN003.I.N MIN003.I.S MIN003.S.N MIN004.I.N MIN004.I.S MIN004.S.N
          <Date>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>
  1:  1980-01-01      105.2      108.2     1397.3     1315.2       53.2       55.1      433.5         NA       43.6       45.4      102.3       38.0         NA       23.9
  2:  1980-02-01      105.6      108.1     1531.0     1507.8       52.4       54.7      403.7         NA       43.3       45.7      115.3       30.7         NA       23.1
  3:  1980-03-01      105.1      106.2     1309.2     1340.6       54.8       55.6      472.3         NA       43.7       43.2      125.2       36.3         NA       34.6
  4:  1980-04-01      105.8      107.0     1157.8     1255.0       56.4       57.1      454.8         NA       43.0       45.1      121.9       39.2         NA       26.9
  5:  1980-05-01      108.7      107.2      947.5     1019.3       56.5       55.7      387.1         NA       45.3       44.5      112.5       40.1         NA       34.1
 ---                                                                                                                                                                      
536:  2024-08-01      101.1       93.5    60252.5    57504.7      102.5       94.8    57984.2    56329.7       95.3       89.8    17817.1       90.7       80.5     8789.5
537:  2024-09-01      100.4       97.3    68332.6    67066.4      101.9       99.3    57470.0    56705.2       93.7       90.9    17421.9       89.7       94.5     7150.8
538:  2024-10-01       97.1       94.6    70580.4    69103.1       97.1       95.8    54816.1    56477.3       97.6       90.8    18168.1       69.2       80.9     5224.8
539:  2024-11-01       99.5       94.6    72705.5    73383.0      101.5       96.8    56925.4    56686.6       97.2       93.0    18296.4       85.3       88.3     7556.8
540:  2024-12-01       87.7       90.9    63793.0    66095.0       89.7       92.5    55268.6    56561.7       81.3       95.7    17104.0       75.2       74.2     7371.2

Dataset-level metadata will be available at attr(mining_wide, "metadata") in list format, and each column has “label” and “source_identifier” attributes attached in R. Following this example, attr(mining_wide$MIN001.I.N, "label") gives “Total, gold included – Volumes – Neither seasonally adjusted nor calendar adjusted data”, and attr(mining_wide$MIN001.I.N, "source_identifier") gives “FMP20000”. So, you might run

mining_wide_renamed <- mining_wide
for (widecol in 2:ncol(mining_wide)) {
    names(mining_wide_renamed)[widecol] <- attr(mining_wide[[widecol]], "label")
}

in order to automatically get the labels from the metadata. Or, alternatively, using the wide = FALSE and combine = TRUE sets all the metadata up at the start, as additional columns alongside a long obs_value column of values (much more amenable for tidyverse-style wrangling).

Generate a code snippet on the app

A useful way of getting R code with parameters that extracts the relevant dataset is by clicking the “Export” button in the EconData web app, after clicking “Submit” (which first views the data).

Leave out the version and release arguments, in order to get the latest data by default (or set those arguments to “latest”). Please cite EconData if you use our service in published research.

Extracting a subset of series keys

We previously showed in our Fast EconDataR Downloads blog post how the series_key argument can be used to extract a subset of series, which can dramatically speed up the download time for large datasets. The syntax originally only accepted a variety of concept codes within each dimension, taking the union of all the possible combinations of the concept codes chosen. This syntax still exists, but now you can also specify a string vector of series keys. For example,

mining_subset <- read_dataset(id = "MINING",
    series_key = c("MIN001.I.N", "MIN002.S.S"))

returns only

> print(mining_subset)
Key: <time_period>
     time_period MIN001.I.N MIN002.S.S
          <Date>      <num>      <num>
  1:  1980-01-01      105.2         NA
  2:  1980-02-01      105.6         NA
  3:  1980-03-01      105.1         NA
  4:  1980-04-01      105.8         NA
  5:  1980-05-01      108.7         NA
 ---                                  
536:  2024-08-01      101.1    56329.7
537:  2024-09-01      100.4    56705.2
538:  2024-10-01       97.1    56477.3
539:  2024-11-01       99.5    56686.6
540:  2024-12-01       87.7    56561.7

even though there are other possible combinations of the codes from those series.

`start_date` and `end_date`

Something which has been available in the API, but not previously built into the R package, was the option to restrict the time frame between a start date and an end date. Both of these options are optional, and needs to be in a format that as.Date() parses, ideally YYYY-MM-DD. For example,

mining_short <- read_dataset("MINING",
        start_date = "2000-01-01")

The first 240 months are excluded:

> print(mining_short)
Key: <time_period>
     time_period MIN001.I.N MIN001.I.S MIN001.S.N MIN001.S.S MIN002.I.N MIN002.I.S MIN002.S.N MIN002.S.S MIN003.I.N MIN003.I.S MIN003.S.N MIN004.I.N MIN004.I.S MIN004.S.N
          <Date>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>      <num>
  1:  2000-01-01       94.7      102.7     6244.1     6433.6       70.8       75.8     4476.9         NA       80.8       85.3     1366.5       44.9         NA      196.1
  2:  2000-02-01      100.2      103.0     7115.7     7420.7       72.4       75.2     4963.8         NA       82.8       86.7     1563.5       42.5         NA      216.0
  3:  2000-03-01      100.9      100.2     8958.5     8141.5       73.5       73.5     6688.4         NA       85.9       85.6     1515.3       43.7         NA      245.3
  4:  2000-04-01       95.8      100.5     7521.3     8169.5       70.6       73.9     5790.2         NA       81.1       87.1     1523.5       46.4         NA      286.9
  5:  2000-05-01       96.6       95.7     8222.7     8153.8       72.6       72.1     6166.9         NA       85.9       85.3     1582.2       46.9         NA      218.2
 ---                                                                                                                                                                      
296:  2024-08-01      101.1       93.5    60252.5    57504.7      102.5       94.8    57984.2    56329.7       95.3       89.8    17817.1       90.7       80.5     8789.5
297:  2024-09-01      100.4       97.3    68332.6    67066.4      101.9       99.3    57470.0    56705.2       93.7       90.9    17421.9       89.7       94.5     7150.8
298:  2024-10-01       97.1       94.6    70580.4    69103.1       97.1       95.8    54816.1    56477.3       97.6       90.8    18168.1       69.2       80.9     5224.8
299:  2024-11-01       99.5       94.6    72705.5    73383.0      101.5       96.8    56925.4    56686.6       97.2       93.0    18296.4       85.3       88.3     7556.8
300:  2024-12-01       87.7       90.9    63793.0    66095.0       89.7       92.5    55268.6    56561.7       81.3       95.7    17104.0       75.2       74.2     7371.2

`read_release(tidy = TRUE)`

The read_release() function, which produces summary metadata about the vintages that exist for a dataflow, has also shifted from a tidy = FALSE specification to a tidy = TRUE specification by default. If you used read_release() without explicitly specifying the tidy argument, you can revert to the previous behaviour by specifying tidy = FALSE, in order to revert back to the previous expected output of that function.

Installing certain versions of EconDataR

In order to stick to the previous 3.3.0 version of EconDataR, please run the following code.

library(remotes)
install_github("coderaanalytics/econdatar@3.3.0")

In order to reinstall the latest version of EconDataR, please run

library(remotes)
install_github("coderaanalytics/econdatar@4.0.1")

Please see the GitHub repo for details.

Summary

In summary, EconDataR version 4 introduces key default changes to the read_dataset() function, prioritizing a tidy data format for a more familiar layout, especially for new R users. While these changes may require adjustments to existing code, they ultimately reduce on-boarding frictions for new users.

#econdatar