Software investment in the banking industry

A tutorial on using the BA900 supervisory banking data

A tutorial on using the BA900 supervisory banking data

The South African banking industry increased its collective investment in software from close to R3 billion in 2008 to over R30 billion towards the end of 2022, which is an annual average growth of 18% over the period. This tenfold increase in software as an asset class is also reflected in the ratio of software to total non-financial assets, which increased from around 15% in 2008 to over 30% from 2014. All of this data is available in the BA900 releases.

Following on the previous blog that introduced the BA900 and gave an example for downloading residential mortgage data for the largest commercial banks, this post will give an overview on how to automate downloading any line item, (in this case 265 and 258 of the BA900), and further tips for data wrangling.

Please note that the recent update to the new EconDataR package will affect this code, we will update the code soon.

1 – start by loading the libraries for the econdatar, dplyr, tidyr, lubridate, and ggplot packages

library(dplyr, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)

2 – assign values to several “shortcuts” that will be used in functions later on

allbanks = paste(c("ABK", "ABL", "ABS", "ATH",
  "BAR", "BCC", "BID", "BNP",
  "BOC", "BOI", "BOT", "CAN",
  "CAP", "CCB", "CRA", "CTI",
  "CWB", "CZB", "DBK", "DIS",
  "FIN", "FRB", "GBS", "GRI",
  "GRO", "GSI", "HAB", "HBZ",
  "HSB", "ICI", "IMP", "INT",
  "JPM", "MCT", "MEE", "NED",
  "RBS", "RGL", "SCG", "SFN",
  "STB", "STC", "UBA", "VBS",
  "ZMB"), sep = "+")
prevmonth <- floor_date(Sys.Date(),unit = "month") %m+% days(-1)
releasedate <- Sys.Date() %m+% months(-3)
descript_paste = paste(month(releasedate,label=TRUE,abbr=TRUE),
                       " ",
                       " ",

“allbanks“ is a way to capture all the banks available BA900 series, and prevmonth, releasedate and descript_paste will be used to create date agnostic inputs to automate downloading data over the course of time.

3 – specify the first function to download data:

fun_downloadecon <- function(ell="L058"){
  keypaste = paste(allbanks,"..",ell,".",sep="")
  download_list <- read_econdata(
    agencyid = "ECONDATA",
    id = "BA900",
    version = "1.0",
    key = keypaste,
    provideragencyid = "ECONDATA",
    providerid = "SARB",
    releasedescription = descript_paste)

4 – specify the second function to convert the downloaded data from a list to a data frame:

fun_list2df <- function(listname){
  df1 <-,listname)) %>%
    dplyr::rename(obs = OBS_VALUE)
  info = rownames(df1)
  df2 = cbind(df1,info)
  rownames(df2) <- c(1:nrow(df2))
  df3 <- df2 %>%
    dplyr::mutate(bank = substr(info,1,3),
                  date_ymd = as.Date(substr(info,13,22),"%Y-%m-%d"),
                  bcd_type = substr(info,5,6),
                  l_type = substr(info,8,11)) %>%

Once these functions have been specified, they can be used to download any line item from the BA900.

5 – combine the two functions into a single line of code, and define the input “ell” (i.e. L or line) to instruct the function to download and convert the specific line item

software_df <- fun_list2df(fun_downloadecon(ell = "L265"))

Note that when executing the above line of script, you will be prompted to provide your username and password.

6 – a further refinement is to download software and total non-financial assets in one line of code, by changing the “ell” to “L258+L265”

nonfin_digi_df <- fun_list2df(fun_downloadecon(ell = "L258+L265"))

7 - since there are five columns in the BA900 that are populated, M1 to M5, we only need to consider M5, which is the total, so will filter for M5

df2 <- nonfin_digi_df %>%

8 – now we can compare, first on an aggregate basis, software investment as a percentage of total non-financial assets (which includes physical premises), by summarising the data for each data and type, and then calculating the fraction of software tot total non-financial assets

sum1_df <- df2 %>%
  dplyr::group_by(date_ymd,l_type) %>%
  dplyr::summarise(value = sum(obs)) %>%
  dplyr::ungroup() %>%
  tidyr::pivot_wider(names_from = l_type, values_from = value) %>%
  dplyr::mutate(ratio = L265/L258*100)

Note that the dataframe “sum1_df” has been widened to make calculating the ratio easier, since the only variables remaining are the aggregates of the two series.

9 – whilst this can be stored in a Microsoft Excel template, there is also a quick way to have a look at what the graphs of the trend would look like

sum1_df %>%
  ggplot() +
  geom_line(mapping = aes(x=date_ymd,y=ratio))

The output should look as per below:

Clearly, there has been a significant shift in investment in software from the 2010 to 2015 in South Africa’s banking sector - it now accounts for more than half the value of non-financial assets. There are several considerations to strip out when analysing this trend, for example, possible valuations changes in real property, or changes in accounting standards for recognising software as an asset.

10 - the aggregate trend line leaves one obvious question: how do individual banks relate to the trend? One way to express this is by sampling banks by their size and then comparing the graphs, as is done below by way of the filter function

big4_df <- df2 %>%
  dplyr::filter(bank == "ABL" | bank =="FRB" | bank == "NED" | bank == "STB") %>%
  dplyr::group_by(date_ymd,bank) %>%
  dplyr::mutate(ratio = obs/(sum(obs)-obs)*100) %>%
  dplyr::ungroup() %>%

Notice that the pivot wider function is not used, since that would result in 12 columns (4 banks with 3 columns each) besides the data, which is unwieldly.  Rather, a workaround is used to compute the ratio per bank per period using the dplyr::group function.

11 – the output can again be graphed by the following

big4_df %>%
  ggplot() +
  geom_line(mapping = aes(x=date_ymd,y=ratio, colour=bank))

The output should look as per below:

These figures are quite surprising and potentially calls into question whether the aggregate numbers are comparable across banks. As mentioned, these figures may be affected by differences in accounting policies on software capitalisation or other changes in the assets of different banks. Unfortunately, the Prudential Authority did not respond to our query about the data.

By Pietman Roos (Associate - Codera Analytics)

Leave a Reply 1

Your email address will not be published. Required fields are marked *

Hey! This is my 1st comment here soo I just wanted to give a quick shout out and say I really enjoy reading through your posts.
Can you suggest any other blogs/websites/forums that deal
with the same subjects? Thank you so much!