Over 12 million data points and counting

An introduction to the BA900 supervisory banking data.

An introduction to the BA900 supervisory banking data

Public availability of data is crucial not only to democratising information, but also for enabling research and analysis to better understand what is happening in the economy. One area where South Africa stands out internationally in this respect is the availability of detailed bank-level balance sheet statistics.

Since 2008, the South African Reserve Bank (SARB) has published monthly supervisory banking data at a bank level as part of its BA900 release. The data is very granular, giving a detailed view into balance sheet items such as deposits and loans. Economists and analysts can, for example, look at the extent of different types of credit extension, such as mortgages or unsecured credit, mapped against type of borrower, by bank, for each month. Such granularity enables ongoing analysis of developments in the banking sector and impact of market conditions and regulation on bank behaviour. Having said that, the data is available only as bank-by-bank monthly snapshots and it takes a considerable amount of time and effort to download a data series of interest. The Econdata platform addresses this for a range of different data sources, allowing users to search, select and download data as a timeseries free of charge.

A further innovation is the custom R package which allows for automated data download and wrangling. This means that users can design reporting templates or models based on specific BA900 data and then automate such analysis and reporting without having to visit the Econdata website again or do any manual repeated work each month.

To make this more tangible, this post provides an example of how to automate a report on the monthly mortgage extension to households of the four biggest commercial banks in South Africa. The code below downloads the data, calculates descriptive statistics of annual growth and relative market share, and saves the results in a Microsoft excel document.

A full version of the script, which includes code for installing the relevant packages, is available below.

Please note that the recent update to the new EconDataR package will affect this code, we will update the code soon.

1 – start by loading the libraries for the econdatar and dplyr packages

library(econdatar, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

2 – assign the download to a list (mortgage_list) and specify the parameters of the download. In this case we want to focus on residential mortgages to households for South African property, which e.g. excludes corporate and agricultural mortgages. This is line item 157 in the BA900, and the first column (“Domestic assets”). The banks for this exercise are ABSA, Nedbank, FNB and Standard Bank. For these parameters, the search query will be:

mortgage_list <- read_econdata(
  agencyid = "ECONDATA",
  id = "BA900",
  version = "1.0",
  key = "ABS+FRB+NED+STB..L157",
  provideragencyid = "ECONDATA",
  providerid = "SARB",
  releasedescription = "Dec 2022 (2023-02-05)")

Note: In order to use the "read_econdata()" function a username and password is required, this form will be active when the function is run but the window may not pop up. Please check your task bar for the window to fill in your credentials.

Notice that the parameter “key” captures the details of the banks, i.e. ABS is ABSA and NED is Nedbank, and so forth, and L157 is the line item we are interested in. Also note that a pop up will appear prompting you to provide your username and password.

3 - convert the list structure to a data frame for ease of wrangling:

df1 <- as.data.frame(do.call(rbind,mortgage_list)) %>%
  dplyr::rename(obs = OBS_VALUE)
info = rownames(df1)
df2 = cbind(df1,info)
rownames(df2) <- c(1:nrow(df2))
df3 <- df2 %>%
  dplyr::mutate(bank = substr(info,1,3),
                date_ymd = as.Date(substr(info,13,22),"%Y-%m-%d"),
                bcd_type = substr(info,5,6),
                l_type = substr(info,8,11)) %>%

This approach is not the only way to convert the list to a data frame, but a personal preference. “df3” is the data frame which is now ready for wrangling.

4 - the extracted data includes all the columns in the BA900, such as foreign and domestic assets. We are less concerned with mortgages to purchase properties in the French riviera, or anything outside of South Africa’s borders for that case. So, we will filter out all the irrelevant columns and only leave “H1”.

df4 <- df3 %>%

5 - the descriptive statistics of relative market share and annual growth can now be calculated. This will require two separate instances of grouping the data first by date and then by bank.

df5 <- df4 %>%
  dplyr::group_by(date_ymd) %>%
  dplyr::mutate(marketshare=obs/sum(obs)*100) %>%
  dplyr::ungroup() %>%
  dplyr::group_by(bank) %>%
  dplyr::mutate(delta=(obs/dplyr::lag(obs,12,order_by = date_ymd)-1)*100) %>%

6 - the final step is to pivot the table into a wide view to enable greater accessibility and then save the new format in an MS excel csv document.

df6_1 <- df5 %>%
  dplyr::select(date_ymd,bank,marketshare) %>%
  tidyr::pivot_wider(names_from = bank, values_from = marketshare)
write.csv(df6_1,"marketshare.csv", row.names = FALSE)
df6_2 <- df5 %>%
  dplyr::select(date_ymd,bank,delta) %>%
  tidyr::pivot_wider(names_from = bank, values_from = delta)
write.csv(df6_2,"annualgrowth.csv", row.names = FALSE) 

Notice that two separate excel csv documents are created, one showing the market share and the other showing the annual growth.

The extracted data is the tip of the iceberg as there are over 70 000 datapoints added to the BA900 form dataset alone each month. In the next blog post, we will expand on the analysis by adding in other data sources and running regression analysis. We will also compare the data accessibility of BA900 with that of other countries.


If you have not updated the EconData R Package, please do so by running the following code snippet before loading the Econdata package:

if(!require(remotes)) install.packages("remotes"); library(remotes)

By Pietman Roos (Associate - Codera Analytics)