Introduction to R

06 Jun 2018

In a strange twist of fate I have been tasked to introduce my overseas colleagues to R. LOL

I shall sneak some python bits in.


Before you begin, ask questions.

Ask if they have heard of R before. Ask if there is any reason anyone has asked them to learn about R. Ask what tools they are currently using, and what are their pain points.

From here you can establish a starting point, and an end point. What knowledge should they walk away with?


What is R? R is a language and environment for statistical computing and graphics. It is free. It is open source - meaning people can modify and share the source code. Think of it as a public collaboration.

What isn’t R? R is not a general programming language - it has its roots in statistics.

You can do minimal general programming in R such as automating mail sending in R (see below) or build a webapp using Shiny for data visualisation/displaying your Machine Learining output (see https://www.r-bloggers.com/tutorial-build-webapp-in-r-using-shiny/)).


library (RDCOMClient)

OutApp <- COMCreate("Outlook.Application")
outMail = OutApp$CreateItem(0)
outMail[["To"]] = "test@test.com"
outMail[["subject"]] = "Test Subject"
outMail[["body"]] = "Body of email"               
outMail$Send()

While R can make it work, fundamentally other alternatives like Python, having a dedicated webapp framework and ability to interact with web languages like JavaScript allows them to scale much better than R Shiny apps. That’s why developers are able to build entire systems out of Python, while R is more specialised for data use.

The conclusion: if you are already operating in R for all your data use, by all means continue with it to build simple apps for your own personal use - you don’t need to know JS, HTML or CSS!

If you are a developer who does need to build a production level app serving many users, you do need to prototype in Python with Flask (which is a micro web development framework) and combine it with JavaScript, HTML and CSS to create the frontend. (An alternative is Dash! Equivalent to R Shiny. But I digress). To go into production, you will then pass it on to your DevOps team (if you have one) who will use Django or other fully featured frameworks to fully develop it.

See here for more comments: https://www.quora.com/Do-you-prefer-using-Rs-Shiny-or-Pythons-Flask-to-make-data-focused-web-apps

Why not VBA/Excel?

Reproduction of previously conducted analyses on new datasets is not easy. In Excel, to replicate a report you did last month/quarter/year, you copy and paste the new data into the worksheets with formulas. There are a million and one excel sheets and references, and if you forget to drag the formula to cover sufficient rows, or someone accidentally presses a button that edits the formula…

It’s hard to detect such errors.

But fine, there is VBA. However, VBA is only used within Excel/Access. And at times, Excel is slow/unable to handle large datasets.

Also, here we are talking about basic data manipulation.

R can allow you to store a series of data manipulation steps in a script, and re-apply it across similar data. It can create data tables, but also graphs for visualisation. On top of that, hello machine learning algorithms!

Why not (just) R? After processing the data in R, you deliver a nice, clean set of numbers in a report to the end user.

They want to know why the numbers have changed. They want to cut, slice, dice the data, interact with it. Sure, you have data visualisation tools, but 1) where did those numbers come from? 2) how can I filter it by my own requirements?

It can be very convenient to just click on the cell to see how each number is actually derived.

(I am still not the biggest fan of Tableau, but I NEED TO FIND AN ALTERNATIVE FIRST)

Where can I get R? Install it at http://cran.r-project.org/ Also install R Studio from http://www.rstudio.com/ which provides a user friendly environment to work in.

How do I learn more about R?

A quick scroll reveals the basic functions, to the more complex machine capabilities of R. https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/

For a more in depth tutorial to begin learning from scratch, I would suggest the below: http://paldhous.github.io/ucb/2016/dataviz/week7.html https://www.listendata.com/2016/08/dplyr-tutorial.html

Alternatives to R? Python! Highly recommended for readability and ability to extend to general programming ;)

It can do 95% of what R can do (some say in a more clunky manner as R’s syntax is really clean; but Python is more readable!!).