‘R’ is Rad & it CAN make stats fun!
Let’s start by being very upfront about this topic.
There are many people out there that do not enjoy statistics or crunching numbers. Even worse is when you have to learn a new stats program to help you make sense of your data. I am not one of these people. I am all about the numbers and the stats! AND I am a believer that anyone can get a basic handle on this to help them tell the story of their data. This is where ‘R’ can help.
‘R’ is an open source data management, processing and statistical package. You can download it for free and then have it on your computer whenever you want to use it. You can also download numerous packages (add-ons) to ‘R’ which various people have also written and made open source which can essentially do nearly anything (logical) that you can think of. All it takes is some time and patience to learn and understand the coding language. This can be daunting at first and it will not come easy to everyone, but if you have the time and a bit of a logical brain – you will come out on top eventually. For those that don’t necessarily have the time or logic wired brain, there are other options including ‘RCommander’ which is a clickable, drop down version of ‘R’ (also available online for free).
One of the challenges that is facing us more and more these days are the size of the data sets that are rolling in. With new technologies to collect more data more often we feel that we have numbers tracking in that are going to answer all of science’s big questions. The challenge is however, if we don’t have the skills to process that data, visualise what it’s telling us and break it down to an understandable story – then it is relatively useless to us. Insert ‘R’ and some programming skills into this equation and you start to have a situation where these things become manageable, because ‘R’ has some serious data management, graphing and statistical power which, with the right skills, can really break it down.
Before we go any further. Yes I just said “programming”. It’s a daunting word I know, but really it shouldn’t be. Programming, or “writing code” in ‘R’, is just like learning another practical language:
You’ve had an epiphany overnight, you’ve decided to become a chef. The following day you walk into the kitchen, are handed a recipe. You stare blankly. Words such as “affriander”, “al dente”, and “cassolette” jump from the page and you start to question your rash life choice. Weeks later you’re beginning to learn the terminology, you’ve cooked your first spaghetti bol without making a mistake. Well done, I’m proud of you.
Coding is the same. However, instead of reading the recipes, we write them and ask ‘R’ to do the cooking. Be open to the idea of learning this new language, because what you can do with it will help you in dealing with data, and certainly impress your friends.
I won’t rave on forever about all the different reasons why ‘R’ is just awesome, but I will just list a few of the rad things that ‘R’ can do
(1) Tracking your data movements. When using ‘R’ you use code (saved as a text file, like in notepad – see example picture below) to list out the steps in your data management, transformations, calculations, stats etc. This means that whenever you are looking at your data or code, you can see what you have done and why you have done it. In excel you might delete a column, a data row (eg a particular animal/s) or change an obvious outlying data point and this is easy. However, the next time you come back to that data set you will have done all these changes days, weeks or months earlier and you can’t remember at all what you have done. The good thing in ‘R’ is that you can go back and ‘edit’ this code at any time and re-run the code very easily (simple copy, paste and run).
Picture; some example R code
(2) It can make incredible, sweet as plots. As I have spoken about in an earlier blog – research and data is all about making pretty graphs. ‘R’ has this covered. There are multiple ways, add-ons and colourful ways to make graphs in R. There is a whole learning space within ‘R’ to make this work for you and many forums to help guide you. So if you’re into graphs and data visualisation get into R.
(3) Loops. Loops are when you ask ‘R’ to carry out a certain task, then you tell it to do that task for x number of cases. For example, you might want to run a simple regression analysis on each of 20 variables for your one outcome variable (the thing you’re interested in). Doing this in a clickable program means you have to go and click the appropriate buttons each time for 20 different analyses. In R, you can code it just once and tell the program to do the same thing for the list of 20 variable. This takes some coding skills, but once you have that covered – it’s life changing. I tried to convince a friend of mine to use ‘R’ a couple of years ago. She was doing the same task (opening an excel file, adding a column, doing a simple step and then saving it in a new place) for 1800 different excel files. Painful! So I showed her how we could do this in ‘R’ (I think the coding took about an hour), including having the new files saved in appropriately named folders. Then we clicked run and we watched as the program carried out this painstaking task in just a few minutes. It would have taken her probably 2 or three days to do the same thing.
(4) R can seriously make your life easier. I was at a RAID networking event in Wagga last night and talking to Thom Williams – who has become a bit of an ‘R’ wizard in the last year or so. He often rings me to tell me about the latest package he has been using and what it can do. Thom was telling me a very practical story about using ‘R’ for a task. He has written an ‘R’ program that;
- Scans through a website and picks up data specific to someone’s name or title
- Turns that into a plot outlining the individual’s performance
- Put’s key stats of the individual’s details and their performance in text below it.
- Prints all this information into a PDF with the details of the individual/ID as a header/title
- Has the information emailed to a list of specific individuals.
That’s amazing. This means that anytime he has to do that task, he just has to open ‘R’ and click run on his code. Seriously…great and saving heaps of time!! We can use this same process for graphing, reports or simple analysis (or complex analysis) of data that is coming in on a regular basis.
(5) One of the best things about ‘R’ that I have already mentioned it that it’s open source. This is pertinent to working in developing countries because it means everyone can use it. In many cases excel can handle some of the data sets that are being used for our projects and that is OK. However, if you want to get some stats involved or some more interesting graphs – then ‘R’ can help out considerably as many other stats packages are expensive. So when it comes to training yourself and your team’s overseas – it becomes much cheaper and practical. Furthermore, ‘RCommander ‘the clickable version of ‘R’ can be a great simple tool to help get your team up to speed with statistic
Now…. ‘R’ is something that can help you in these scenarios and more, but it’s not something you can pick up and run with on Day 1. It does take a bit of time and patience. I don’t advocate everyone learning ‘R’ just for the sake of it. However, if you are in a position where you have a lot of data coming in or you have many similar data sets rolling in week after week or month after month – then get into it. Especially if you are in the earlier part of your career.
Lastly, a few tips or tricks for new ‘R’ players;
- Learn from a data set that you know and understand.
- Use examples from the ‘R’ pages or forums like stack overflow, R-Cookbook and just any links on google, just type ‘How to ….. in R’ there will be plenty of hits.
- Have an experienced ‘R’ user on hand to help with checking for simple errors in code. You can spend hours figuring minor coding problems, but someone with a bit of experience can usually pick it up quickly.
- Go to a workshop or course to kick start your ‘R’ life. If that’s not available, try get your hands on some basic material or a manual to get you going. There are plenty on the internet.
- Want ‘R’ to teach you how to use ‘R’? You’re in luck! Some amazing people wrote a package that does just that. {swirl} is a great first place to get familiar the ‘R’ environment.
Anyway – that’s enough about ‘R’. Remember…
Acknowledgements; a big thanks to Thom Williams and Emma Hand who helped with putting some of the ideas in this blog together.