Making use of vintage data

Data vintages are necessary when assessing the real-time performance of data analytics. Whether this may be to back-test a model or to look at whether a historical decision was the correct one, having the information that was available at the time, is essential.

EconData has been designed in order to make this kind of analysis possible. In this tutorial we will cover how to access the various vintages of GDP and how to construct a real-time time series of GDP growth.

As per usual we start by loading the necessary packages. If this is your first time using the EconData R package, you can find the installation instructions here.

if (TRUE) {
  library(econdatar)
  library(dplyr)
  library(ggplot2)
  library(readr)
  library(tidyr)
  library(stringr)
}

Next, we return all the vintages/releases of GDP from EconData. As usual, more information on the read_release function can be found by typing ?read_release in the R console.

releases <- read_release(id = "NATL_ACC",
                         version = "all")

This object is a list of each of the versions of the StatsSA data. For each version, there is a tidy table (by default), detailing the releases for that version. We will loop over these tables in order to extract all the releases. As we iterate through the loop, we will request the data for that release from EconData and compile the results into a tibble.

gdp_vintages <- tibble()
for (data_i in seq_len(length(releases))) {
  data_version <- names(releases)[[data_i]]
  cat(paste("\n", data_version, "\n"))
  dataset <- releases[[data_i]]
  for(release_i in seq_len(nrow(dataset))) {
    i <- slice(dataset, release_i)
    cat(paste(i$description, "\n"))
    natl_acc <- read_dataset(id = "NATL_ACC",
                version = str_sub(data_version, 2, -1),
                series_key = ifelse(as.numeric(str_sub(data_version, 2, 2)) >= 2,
                            "PROD.GDP.Q.R.S",
                            "KBP6006.R.S"),
                release = i$description,
                tidy = TRUE,
                wide = FALSE,
                combine = TRUE) %>%
        as_tibble() %>%
        mutate(description = i$description,
              release = i$release) %>%
        relocate(release, description, time_period, obs_value,
                series_name, series_key)
    gdp_vintages <- bind_rows(gdp_vintages, natl_acc)
  }
}
gdp_vintages <- gdp_vintages %>%
  arrange(release, time_period, series_key)

The resulting table looks like the following.

> gdp_vintages
# A tibble: 7,394 × 23
   release             description  time_period obs_value series_name series_key
   <dttm>              <chr>        <date>          <dbl> <fct>       <fct>     
 1 2004-05-25 11:30:00 2004Q1 (200… 1993-01-01     506080 GDP         KBP6006.R…
 2 2004-05-25 11:30:00 2004Q1 (200… 1993-04-01     511631 GDP         KBP6006.R…
 3 2004-05-25 11:30:00 2004Q1 (200… 1993-07-01     518995 GDP         KBP6006.R…
 4 2004-05-25 11:30:00 2004Q1 (200… 1993-10-01     522840 GDP         KBP6006.R…
 5 2004-05-25 11:30:00 2004Q1 (200… 1994-01-01     522096 GDP         KBP6006.R…
 6 2004-05-25 11:30:00 2004Q1 (200… 1994-04-01     527760 GDP         KBP6006.R…
 7 2004-05-25 11:30:00 2004Q1 (200… 1994-07-01     533695 GDP         KBP6006.R…
 8 2004-05-25 11:30:00 2004Q1 (200… 1994-10-01     542600 GDP         KBP6006.R…
 9 2004-05-25 11:30:00 2004Q1 (200… 1995-01-01     544564 GDP         KBP6006.R…
10 2004-05-25 11:30:00 2004Q1 (200… 1995-04-01     546982 GDP         KBP6006.R…
# ℹ 7,384 more rows
# ℹ 17 more variables: data_set_name <fct>, data_set_ref <fct>,
#   provision_agreement_ref <fct>, data_provider_ref <fct>,
#   unit_multiplier <fct>, frequency <fct>, base_period <fct>,
#   price_concept <fct>, seasonal_adjustment <fct>, unit_of_measure <fct>,
#   label <fct>, release_description <fct>, source_data_set <fct>,
#   data_set_description <fct>, time_transformation <fct>, …
# ℹ Use `print(n = ...)` to see more rows

As usual all the code in this blog post can be found on GitHub along with the code for the previous posts. We hope that you found this slightly more advanced tutorial useful and that you get value from EconData’s vintage data.

The Codera Analytics team

#data #econdatar #real-time #release #Tutorial #vintage

Inflation and weights by CPI categories in SA - Codera Analytics

[…] EconData makes historical CPI weights back to 1970 available on the website, in excel, R or Python. […]

GDP Revisions in SA - Codera Analytics

[…] here) and assessments of the usefulness of economic and financial indicators for forecasting GDP. Our EconData platform makes it possible to easily assess revisions in publicly available macroeconom…. Today’s post shows that revisions have affected the implied level of output, as well as the […]

Comments are closed.