Working with EconData’s Public Finance Data

Introduction

In this tutorial, we apply the Econdatar R package to demonstrate how to use the Treasury Budget data on EconData. We will create three visualisations: an area chart, a hairy line plot, and a stacked bar chart. The Econdatar package helps you streamline and automate your workflows by directly importing the latest data available on the platform.

Beneath each visualization, you’ll find an “expand to see code” option. Clicking on this will reveal the R scripts that were used to create each plot, offering a hands-on learning experience on how to reproduce these visuals.

Treasury Main Budget Table 10 (Government Debt): Area chart of debt composition

expand to see code

Creating an Area Chart of Composition of Debt

In this example, we will be working with EconData’s Treasury Main Budget tables and how to use them for analysis.

Load necessary packages:

packages <- c( "tidyverse",
               "econdatar",
               "scales"
             )

invisible(lapply(packages, library, c=TRUE))

Learn how to install the econdatar package here.

Create a custom color palette for the plots (Optional):

palette <- color_list <- c("#02acba", "#273b8d", "#be1e2d", "#c0c5cb",
                           "#3F3A34", "#D8C472", "#4A4ACC", "#B3EFB2", 
                           "#AA78A6", "#41E1D2", "#9a8c98", "#E8E288", 
                           "#1D3461", "#F7C1BB", "#7280D9", "#EDD4B2", 
                           "#FD5345", "#BFB6BB", "#306B34", "#D499B9", 
                           "#C7F0BD", "#DDA448", "#F2542D", "#43AA8B", 																									 
                           "#95B8D1" )

Import all series of Table 10 (Government Debt) data from EconData

Assign the imported data to the variable debt_list.

debt_list <- read_dataset( id = "GOVTDEBT", 
                          version = "1.0.0",
                          series_key = "N.D.CND+PGF+CFD+PNL+DDM+PGL+PND+DDN+PGD+FDT+GFE+FDN+DND+PNF+DDC+CTB+CDD+DDG+CGB+TDG+FDC+CBB+DDB+TDN+CMD+DDL+DDT",
                          release = "2024 (2024-02-21)" # This specifies the budget release                                                           of 2024
                          ) 

The econdatar package imports economic and financial data as a list of dataframes, each indexed by date like a time series object. Lists in R are highly versatile, allowing for efficient storage and manipulation of complex data structures.

Each dataframe in the list represents a single data series. To access a specific series, use the syntax list[["series_key"]], replacing "list" and "series_key" with your actual list name and the key for the series you’re interested in.

The following examples demonstrate practical ways to manage and analyze multiple series within this structure, highlighting the convenience and power of working with lists and dataframes in R for time series analysis.

Prepare the Data:

First, filter the list for the subsection of Table 10 we are interested in: line items related to the composition of public debt (all Mnemonic codes starting with ‘N.D.C’).

filtered_list <- debt_list[grep("N.D.C", names(debt_list))]
​
filtered_list <- lapply(filtered_list, function(df) {
  df %>%
    # EconData stores the date as the row index in R
    # The following line stores the row index as a variable called `DATE`
    rownames_to_column(var = "DATE")
})
​
# Store the dataframes in the list in a long format dataframe:
comp_debt <- bind_rows(filtered_list, .id = "MNEMONIC") %>% 
  mutate(DATE = as.Date(DATE)) %>% 
  filter(MNEMONIC != "N.D.CMD" & MNEMONIC != "N.D.CDD")
​
# Optional: Order the series by size
comp_debt$MNEMONIC <- factor(comp_debt$MNEMONIC , levels=c("N.D.CBB", 
                                                           "N.D.CND", 
                                                           "N.D.CTB", 
                                                           "N.D.CFD", 
                                                           "N.D.CGB") )

Plot the data:

area_chart <- comp_debt %>% 
  # create a ggplot object
  ggplot(
    aes(x=DATE, 
        y=OBS_VALUE, 
        fill = MNEMONIC
        )) +
  
  geom_area() + # for an area chart
  
  scale_fill_manual(values = palette) + # use the custom color palette we defined earlier
  
  theme_bw() + # general theme
  
  labs(x = "Date",
       y = "Proportion of Total Debt",
       title = "Composition of Debt in South Africa") 

Treasury Main Budget Table 10 (Government Debt): Hairyline plot of debt over time

expand to see code

Here we will produce a hairy line plot. Hairy lines are a great way to visualise how forecasts differ from realised values over time. Here we will make use of all the available vintages (previous and current budget releases) to draw a hairy line for Gross Government Debt (% of GDP) over time.

To do this, we need to add the data vintage by vintage / release by release.

Define the releases

releases <- c("2024 (2024-02-21)",
              "2023 (2023-02-22)",
              "2022 (2022-02-23)",
              "2021 (2021-02-24)",
              "2020 (2020-02-26)",
              "2019 (2019-02-20)",
              "2018 (2018-02-21)",
              "2017 (2017-02-22)",
              "2016 (2016-02-24)",
              "2015 (2015-02-25)",
              "2014 (2014-02-26)",
              "2013 (2013-02-27)")

These releases are the specific dates on which the budget data were released every year from 2013 to 2024.

Import and transform the data:

# Initialize an empty list to store data from each release
all_data <- list()
​
# Loop through each release
for(i in seq_along(releases)) {
  # Fetch data for the current release
  debt_list <- read_dataset(id = "GOVTDEBT",
                            version = "1.0.0",
                            series_key = "N.D.PGL",
                            release = releases[i])
  
  # Convert row names to a column "DATE" in each dataframe within the list
  debt_list <- lapply(debt_list, function(df) {
    df <- tibble::rownames_to_column(df, var = "DATE")
  })
  
  # Combine the dataframes within the list into one dataframe, adding an ID column
  combined_df <- dplyr::bind_rows(debt_list, .id = "MNEMONIC")
  
  # Add a column for the vintage based on the release
  combined_df$VINTAGE <- releases[i]
  
  # Append the combined data to the all_data list
  all_data[[i]] <- combined_df
}
​
# Combine all vintages into a single dataframe
total_debt <- dplyr::bind_rows(all_data) %>% 
  # Get more descriptive labels
  mutate(LABEL = "Gross loan debt (percent GDP)",
         OBS_STATUS = ifelse(OBS_STATUS == "A", "Realised", "Forecast"),
         DATE = as.Date(DATE)) 

Plot the hairy line:

hairy_line <- total_debt %>% 
  filter(DATE >= as.Date("2000-01-01")) %>% 
  ggplot(aes( x = DATE, 
              y = OBS_VALUE, 
              group = VINTAGE, # So that we have multiple lines, each representing a vintage
              color = OBS_STATUS # To differentiate between observed and forecasted values
              )) +
  
      geom_line(size = 1) +
  
      theme_bw() + # general theme
      scale_color_manual(values = palette) + # here using the custom color palette
      
          labs(title = "Gross Debt Forecasts vs. Actual Values",
               x = "Date",
               y = "Observed Value (% of GDP)",
               color = "Label",
               linetype = "Observation Status") + 
      
          theme(legend.position = "bottom") +
  
  scale_y_continuous(labels = percent_format(), limits = c(0,1)) # Format y-axis as %

Treasury Budget Annexure AW15 (Total Transfers to Provinces): Stacked bar chart of conditional grants and provincial equitable share over time

This image has an empty alt attribute; its file name is stacked_bar-1024x640.png
expand to see code

In this example, we will be working with the Treasury Provincial Annexure Tables and how to conduct data analyses of them. A stacked bar chart may be used to see the evolution of allocations over time.

Import the data from EconData:

For the provincial tables, the series_key is structured as "Section"."Province"."Table+Line Item". Here we use Section “A”, All the provinces, Table “P” and Line Items “E” and “G” (Equitable Share & Conditional Grants to Provinces).

pes <- read_dataset(
  id = "GOVTANNEX_PROV",
  version = "1.0.0",
  series_key = "A.EC+FS+GP+KN+LM+MP+NW+NC+WC.PE+PG",
  release = "2024 (2024-02-21)")
​
# Reformat the list into a long dataframe
pes <- bind_rows(lapply(names(pes), function(name) {
  df <- pes[[name]]
  df <- rownames_to_column(df, var = "DATE") # Change the rownames to a column
  df$MNEMONIC <- name  # Adding an identifier for each dataframe
  return(df)
}))

Let’s get more descriptive labels:

# Province code to name mapping
province_names <- c(EC = "Eastern Cape", FS = "Free State", GP = "Gauteng", 
                    NC = "Northern Cape", NW = "North West", WC = "Western Cape", 
                    LM = "Limpopo", KN = "KwaZulu-Natal", MP = "Mpumalanga")
​
# Directly create LINE_ITEM column
pes <- pes %>%
  mutate(
    DATE = as.Date(DATE),
    PROVINCE_CODE = str_extract(MNEMONIC, "(?<=\\.)([A-Z]{2})(?=\\.)"), # Extracts the                                                                                province code
    PROVINCE = province_names[PROVINCE_CODE],
    ITEM_CODE = str_sub(MNEMONIC, -2), # Extracts the last two characters
    ITEM = case_when(
      ITEM_CODE == "PE" ~ "Equitable Share",
      ITEM_CODE == "PG" ~ "Conditional Grants",
      TRUE ~ NA_character_ # Handles any MNEMONIC codes that do not end in "PE" or "PG"
    ),
    LINE_ITEM = paste0(PROVINCE, ": ", ITEM)
  ) %>%
  select(-PROVINCE_CODE, -ITEM_CODE) # Optionally remove intermediary columns

Plot the stacked bar chart:

stacked_bar <- ggplot(pes, aes(x = DATE, y = OBS_VALUE, group = MNEMONIC, fill = PROVINCE)) +
  
  geom_col(position = "fill") + # Stacks the bars and normalizes their height
  
  facet_wrap(~ITEM, scales = "free_y") + # Facet by ITEM
  
  theme_bw() +
  
  labs(
    title = "National Treasury Allocations to Provinces over Time",
    x = "Year",
    y = "Allocation",
    fill = "Province"
  ) +
  scale_fill_manual(values = palette) +
  
  scale_y_continuous(labels = percent_format()) + # Format y-axis as percentages
  scale_x_date(date_breaks = "2 years", date_labels = "%Y") # Date breaks every 2 years

Conclusion

There are many more dataseries from the Treasury’s budget data. This tutorial should equip you to use these sources effectively. Have you checked out EconData’s User Guide? It provides more detail on how to use EconData as well as some R programming essentials.