How to learn R
R is a programming language that focuses on beauty, creativity and necessity. It helps with graphical output, document compilation, and also statistics and data engineering, all of which can be essential in the toolkit of an office worker. Using R, rather than Excel, adds integrity to the data analysis process, as the code shows the processing pipeline right from the raw state of the data until the finish, and is reproducible. R can be used to build apps, to scrape information off the web, or to run models and machine learning, though Python is better designed for these functions.
The two main factors when deciding how to learn R are costs and personal motivation, and there is often a trade-off between the two, but we have some recommendations for the most optimal choices. Unless you are a savant, we also think that interactivity is important when learning. Click on this paragraph to expand this section, where we describe our recommendations in order, with our highest recommended choice first.
Swirl
The main reason why we recommend the swirl
package in R so highly is because this option is totally free. Once you install R on your computer (and hence immediately become familiar with the real-life programming environment), and install the swirl
package in an R session, you can start a swirl
session. When swirl
is running, you work through open-source, pre-scripted courses that are interactive. The computer waits until you have typed in answers correctly in the console before moving on to the next description or question, and you press “Enter” in order to progress through the content. You can find free swirl
courses on the internet and download them; instructors write swirl
courses in order to offer free guidance to those learning R, and the courses can be distributed widely. We recommend some popular courses to get started with.
Installing R
- Firstly, go to https://cloud.r-project.org/ and install the R language onto your computer.
- Secondly, go to https://posit.co/downloads/ and install RStudio. This integrated development environment (IDE) is commonly used by R programmers and is specifically designed for the R language.
An IDE is a computer program that programmers use to write programming code, and IDEs usually do not restrict a programmer to writing in one specific language (although they sometimes are only designed for one language). That being said, - Go to https://code.visualstudio.com/download and install VS Code. VS Code is a free IDE that is popular and has a lot of features, including extensions. Check out the documentation for using R in VS Code. It is quite easy to develop code using an advanced IDE such as VS Code. VS Code can be used for Python and a variety of other languages as well. After this step though, we’ll focus on RStudio as that is a better way to be introduced to R for those unfamiliar with the language.
- Open RStudio, make sure that you have a fast internet connection, and type the following lines into the console, waiting for the computer to run the commands each time.
install.packages("tidyverse") install.packages("swirl") install.packages("remotes") library("remotes") install_github("coderaanalytics/econdatar") # If you haven't already installed Codera's EconData R package library("tidyverse") library("swirl") install_course("R_Programming") install_course("Exploratory_Data_Analysis") install_course_github("NickCH-K", "Econometrics")
Note that the free “R_Programming” swirl course comes from https://github.com/swirldev/swirl_courses#swirl-courses .
- Since you have already run the
library("swirl")
command, you haveswirl
loaded into your current R session. Go ahead and typeswirl()
to start aswirl
session. - Choose the R_Programming course to start with! Follow the on-screen prompts to get learning.
- Carry on frequently returning to your
swirl
learning over the following days and weeks. The second Swirl course you should do isExploratory_Data_Analysis
, which introduces data visualization and introductory machine learning. After this, the “Econometrics” course takes you through statistical analysis in R.
DataCamp
Our second-place recommendation is www.DataCamp.com. This is because DataCamp is actually relatively cheap compared to academies with tutors and scheduled classes (which can cost several thousand rand for only about two months). In comparison, DataCamp costs about R720 per month, making it much more cost-efficient than many online courses.
The tutorials on DataCamp are a great introduction to R (and Python or SQL). They are interactive, go at your pace, and the user experience of the app looks good. The tutorials are packaged into organised series making up courses and then series of courses make up certificates.
The difficulty of online courses is that one’s motivation is mostly internal. Try to find developer friends who can hold you accountable (you can share your public DataCamp profile with them) and who can encourage you to carry on working hard. Developer friends can help out with questions you get stuck on, but the DataCamp courses provide a clear structure for learning material for beginners.
W3Schools
https://www.w3schools.com/r/default.asp provides a reference.
Statistics Globe
Statistics Globe is an organised website that offers a wealth of free tutorials for R (and Python and other data science and statistics topics), as web pages or YouTube videos. They also offer short courses for more interactivity.
MITx MicroMasters course: Data Analysis for Social Scientists
The outline reads:
This statistics and data analysis course will introduce you to the essential notions of probability and statistics. We will cover techniques in modern data analysis: estimation, regression and econometrics, prediction, experimental design, randomized control trials (and A/B testing), machine learning, and data visualization. We will illustrate these concepts with applications drawn from real world examples and frontier research. Finally, we will provide instruction for how to use the statistical package R and opportunities for students to perform self-directed empirical analyses.
Udemy
We recommend using https://www.udemy.com/ to take an online course on R.
Communities
- Check out the Data Science Learning Community!
- Ask and answer questions at StackOverflow.
Advanced R courses
- Please read the Tidyverse Style Guide.
- Please read Hadley Wickham’s seminal paper that defines “tidy” data.
- For further study, please use Hadley Wickham’s Advanced R book.
- We recommend Kevin Kotze’s Time Series Analysis course.
- Using APIs
Documentation
Package Documentation
You should refer to package documentation as a reference for syntax, from the built-in help files, the online PDF reference manual from CRAN, or the help documentation on https://www.rdocumentation.org/ (which is hosted by DataCamp).
Cheat Sheets ☍
There are a number of “cheat sheets” that accompany the various packages. Please see
R Project Manual
The R Project for Statistical Computing provides this comprehensive manual and these FAQs.
Visualization
Here is a nice overview of different chart types.
Other programming languages
Web Development
If you want to learn web development, be sure to use https://developer.mozilla.org/en-US/docs/Learn as a reference.
Codecademy
(e.g. https://www.codecademy.com/learn/introduction-to-javascript )
Explore AI Academy (Python)
In order to learn Python, a fairly expensive option could be to take a course with Explore AI Academy.
Writing
Click here to learn about popular alternative ways to write documents other than using Word.
Markup languages are defined in Wikipedia as:
a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing.
MS Word is not strictly necessary in order for computer-literate people to do work, as one can either use:
- .txt documents, for basic notes
- Markdown documents, for most note-taking and documentation
- Google Docs, featuring collaboration tools on a widely-accepted platform
- LibreOffice, an open-source and free suite of software akin to Microsoft Office
- LaTeX, for lengthy or precise documents
- R Markdown, which is especially helpful for automating documents
Excel is not strictly needed for data visualization, as ggplot2
in R provides a much more robust, iterable and automatable way of generating graphs. Codera has however developed a method for automating the updating of native Excel charts, where one prepares the data in R, loads the Excel file into R, and then saves the Excel file again, overwriting the data.
Markdown
You can use VS Code to edit Markdown files, refer to https://www.markdownguide.org/ for documentation, and preview Markdown files using Typora.
Markdown files can be exported to a variety of formats using Typora, including PDF, HTML, Word, LaTeX, Epub, OpenOffice, image, Wiki. Markdown files also render on a lot of websites, including those that host Git repositories, or cloud storage services.
LaTeX
LaTeX is a markup language, intended to produce scientific documents, reports, academic articles, or any neat, professional-looking document. It is a word processor that gives one precise control over the processing of reports, and is able to handle the management of reports that run into hundreds of pages. A LaTeX document can be collaborated on using Git, or Overleaf. LaTeX (or even LyX) allows one to automate reports, pulling in images and tables from the directory, which can be updated by software. Overleaf requires no installation, although has constraints for large documents, so we recommend using an offline editor such as Texmaker. Traditional editors such as Texmaker run faster, but do not ignore bugs.
Aidan’s online introductory paragraphs on LaTeX.
LaTeX files export the document to PDF format. We recommend SumatraPDF as the PDF viewer, as that lets one recompile the PDF file while it is still being previewed. One can also create slideshow presentations using LaTeX (usually by using the beamer
package). So, when presenting a PDF presentation, go into full-screen mode in Adobe Acrobat by pressing Ctrl + L.
R Markdown
R Markdown (.Rmd files) can also facilitate automation of reports: RStudio can export the file to HTML, Word, or PDF (using pandoc, through LaTeX). The PDF version of R Markdown is thus a dumbed-down version of LaTeX, which uses Markdown language.
R code chunks can be run in an R Markdown file, running analysis and producing graphs or tables to be exported directly into the document. The disadvantage of this is that it can be difficult to debug errors, as the error information isn’t clear when using R Markdown, but the main advantage is that descriptive statistics can be re-run on an automated basis, once bugs are cleared.
YAML is used in the document heading.
Git
Git is software that creates a central repository for programming code, when developers work together on a project. Click on this paragraph to get a few tips with getting started.
Git is not intended to be used to store large amounts of data, but rather to help with observing changes to lines within programming scripts. A remote (centralized) git repository can be on a local server, but usually developers use GitHub, GitLab, or another online server to host the remote repository. Here are the beginning steps to getting started with Git:
- Install Git (Git is already pre-installed on Mac or Linux)
- Make sure that you have VS Code installed (or an alternative IDE)
- Install GitHub CLI (If you use a Mac, please install Homebrew if you haven’t already, as it makes installing programs easy.)
- Have a GitHub account
- Set your Git username and email in the terminal:
git config --global user.name "FirstName LastName" git config --global user.email "username@domain.co.za"
Aidan’s introductory Medium article
Always make sure you open a Git Bash terminal, and run git fetch; git status
, before staging and committing your work. If your branch is not up-to-date with the remote, see if you can fast-forward your local repo, by running git pull
. This will help there to be a linear history, which is neater.