R and Docker

One of the frustrations I have with using the R programming language for data analysis is missing virtual environments from Python. Python has great virtual environments built in and conda comes with a really good system as well.

When I got to OPIA, I started to use R (because aside from SAS, that is the other software that people use. It was a bit of a rough transition, but I am getting to the point where I can understand the difference.

As an aside, I will say that while there were some switching costs from Python to R, I was able to get up and running with R really quickly. I was doing actual work with R right away. I won't say that either one is better, but I think that is absolutely worth learning both if you:

  • work with people who use the other system or
  • have an interest in ways that data analysis can be expressed in code.

While using Python, I got used to working with virtual environments. So, when using R, just having a big pile of installed packages drove me nuts. I just feels like I'm not worrying about reproducibility enough.

So, this weekend I resolved to set up Docker on my home computer and see if I can get R working on it. I can quite happily report that I got it working. What's more, the image that I was using includes LaTeX, so I could create PDFs with knitr right out of the box. That is a major advantage, since I was having problems with getting LaTeX to run on my Windows machine at work.

On Tuesday (Monday is a holiday), I will get Docker running on my new Windows computer at work. (I wanted a Mac, but it didn't seem be super compatible with the systems I used day to day.) If that works smoothly, then I will have a nice way to run all my knitr reports in HTML and PDF.

So, here's what I discovered.

  • The rockr/verse image includes RStudio, the tidyverse and LaTeX.
  • You run RStudio Server off the docker image -- so you access RStudio in your web browser.
  • You can create a Dockerfile that takes an existing image and adds things to it using the following commands:
    • FROM - other docker images
    • RUN - run a command (including an R command) - which can be used to install other packages.
    • ADD - to copy data files into the container.
  • Once you create the Dockerfile, you use docker build to create the image.
  • Once you have the image, use docker run to start it. Then you can go to the RStudio from your web browser.
  • You can mount volumes in your docker run commands, so you can have your R code as a part of that.