Learning R during your PhD

A year from now you may wish you had started today.
Karen Lamb

It’s been a few years since I have completed my PhD, but thinking back, I think it would have been the best and worst time to learn R (“a free software environment for statistical computing and graphics”).

The best time because R gives you immediate feedback. I think especially in the beginning of a PhD, direct feedback is usually missing. A PhD is a rather complex project and it usually takes some time until you get feedback. It’s natural to feel stupid (Schwartz, 2008). And when you are doing something that makes you feel stupid, it is nice to have these small moments of success.

On the other hand, it’s probably the worst time too. Because R does give you this feedback. It’s a bit like doing project work for grants during your PhD. The grant proposals are usually fleshed out, the tasks are (mostly) clear, and there are strict deadlines. Everything your PhD is not. So it’s easy for the project work to swallow your PhD time. Similarly, you can lose yourself in R. Especially when you are quite persistent in wanting to get that damn script running and it does not work. (It’s usually a sign to take a break and try later, when you have forgotten about your wrong approach, or don’t mind discarding the invested work that will not work out).

So, yeah, if you are doing empirical work, it really pays to learn R as soon as possible. It’s really useful and might provide you with the dopamine hits you need to go on. Just make sure you don’t use it as an excuse to avoid that rather complex other thing you should be doing.

BTW, still can recommend the following sources for R:

  • Burk & Anton’s Tadaa-Data (German) (R für Psychos; https://r-intro.tadaa-data.de/book/): Short and rather informal German introduction to R for psychology students. Apparently still a work in progress but useful nonetheless.
  • Grolemund und Wickham’s «R for Data Science» (http://r4ds.had.co.nz/): Extremly well written book that is available for free on the website. Starts with visualizing data and yeah, that’s how you should start learning R. Uses the «tidyverse» package which makes it easier to work with data.
  • RStudio’s ggplot2-cheatsheet (https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf): Great cheatsheet for ggplot2, a package for data visualization in R. Brilliant concept to describe plots with grammar.
  • STHDA ggplot2 (http://www.sthda.com/english/wiki/ggplot2-essentials): Website with focus on different visualizations/plots.
  • Field, Miles, & Field (2012)’s «Discovering Statistics Using R»: Background information about R and statistics.
  • RStudio (https://www.rstudio.com): The software environment you should use for working with R. Makes things much, much easier than the R console. Note: You have to install R separately.
  • datacolada’s Blog Entry 69 (http://datacolada.org/69): A few very good tips on structuring your R script.

So, happy learning.

 

Sources:

  • Schwartz, M. A. (2008). The importance of stupidity in scientific research. Journal of Cell Science, 121(11), 1771–1771. http://doi.org/10.1242/jcs.033340