Data vintages are necessary when assessing the real-time performance of data analytics. Whether this may be to back-test a model or to look at whether a historical decision was the correct one, having the information that was available at the time, is essential.
EconData has been designed in order to make this kind of analysis possible. In this tutorial we will cover how to access the various vintages of GDP and how to construct a real-time time series of GDP growth.
As per usual we start by loading the necessary packages. If this is your first time using the EconData R package, you can find the installation instructions here.
if (TRUE) {
library(econdatar)
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)
library(stringr)
}
Next, we return all the vintages/releases of GDP from EconData. As usual, more information on the read_release
function can be found by typing ?read_release
in the R console.
releases <- read_release(id = "NATL_ACC",
version = "all")
This object is a list of each of the versions of the StatsSA data. For each version, there is a tidy table (by default), detailing the releases for that version. We will loop over these tables in order to extract all the releases. As we iterate through the loop, we will request the data for that release from EconData and compile the results into a tibble.
gdp_vintages <- tibble()
for (data_i in seq_len(length(releases))) {
data_version <- names(releases)[[data_i]]
cat(paste("\n", data_version, "\n"))
dataset <- releases[[data_i]]
for(release_i in seq_len(nrow(dataset))) {
i <- slice(dataset, release_i)
cat(paste(i$description, "\n"))
natl_acc <- read_dataset(id = "NATL_ACC",
version = str_sub(data_version, 2, -1),
series_key = ifelse(as.numeric(str_sub(data_version, 2, 2)) >= 2,
"PROD.GDP.Q.R.S",
"KBP6006.R.S"),
release = i$description,
tidy = TRUE,
wide = FALSE,
combine = TRUE) %>%
as_tibble() %>%
mutate(description = i$description,
release = i$release) %>%
relocate(release, description, time_period, obs_value,
series_name, series_key)
gdp_vintages <- bind_rows(gdp_vintages, natl_acc)
}
}
gdp_vintages <- gdp_vintages %>%
arrange(release, time_period, series_key)
The resulting table looks like the following.
> gdp_vintages
# A tibble: 7,394 × 23
release description time_period obs_value series_name series_key
<dttm> <chr> <date> <dbl> <fct> <fct>
1 2004-05-25 11:30:00 2004Q1 (200… 1993-01-01 506080 GDP KBP6006.R…
2 2004-05-25 11:30:00 2004Q1 (200… 1993-04-01 511631 GDP KBP6006.R…
3 2004-05-25 11:30:00 2004Q1 (200… 1993-07-01 518995 GDP KBP6006.R…
4 2004-05-25 11:30:00 2004Q1 (200… 1993-10-01 522840 GDP KBP6006.R…
5 2004-05-25 11:30:00 2004Q1 (200… 1994-01-01 522096 GDP KBP6006.R…
6 2004-05-25 11:30:00 2004Q1 (200… 1994-04-01 527760 GDP KBP6006.R…
7 2004-05-25 11:30:00 2004Q1 (200… 1994-07-01 533695 GDP KBP6006.R…
8 2004-05-25 11:30:00 2004Q1 (200… 1994-10-01 542600 GDP KBP6006.R…
9 2004-05-25 11:30:00 2004Q1 (200… 1995-01-01 544564 GDP KBP6006.R…
10 2004-05-25 11:30:00 2004Q1 (200… 1995-04-01 546982 GDP KBP6006.R…
# ℹ 7,384 more rows
# ℹ 17 more variables: data_set_name <fct>, data_set_ref <fct>,
# provision_agreement_ref <fct>, data_provider_ref <fct>,
# unit_multiplier <fct>, frequency <fct>, base_period <fct>,
# price_concept <fct>, seasonal_adjustment <fct>, unit_of_measure <fct>,
# label <fct>, release_description <fct>, source_data_set <fct>,
# data_set_description <fct>, time_transformation <fct>, …
# ℹ Use `print(n = ...)` to see more rows
As usual all the code in this blog post can be found on GitHub along with the code for the previous posts. We hope that you found this slightly more advanced tutorial useful and that you get value from EconData’s vintage data.
The Codera Analytics team
#data #econdatar #real-time #release #Tutorial #vintage