Software

R

Install R. You’ll need R version 3.4.0 or higher.1 Download and install R for Windows or Mac (download the latest R-3.x.x.pkg file for your appropriate version of Mac OS).

RStudio

Download and install RStudio Desktop version >= 1.0.143.

R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment that makes using R much easier. You need R installed before you install RStudio.

CRAN packages

Launch RStudio (RStudio, not R itself). Ensure that you have internet access, then copy and paste the following command into the Console panel (usually the lower-left panel, by default) and hit the Enter/Return key.

install.packages("tidyverse")

A few notes:

  • Commands are case-sensitive.
  • You must be connected to the internet.
  • The tidyverse package is kind of a meta-package that automatically installs/loads many core packages that we use throughout the workshops.2
  • Even if you’ve installed these packages in the past, do re-install the most recent version. Many of these packages are updated often, and we may use new features in the workshop that aren’t available in older versions.
  • If you’re using Windows you might see errors about not having permission to modify the existing libraries – disregard these. You can avoid this by running RStudio as an administrator (right click the RStudio icon, then click “Run as Administrator”).

Check that you’ve installed everything correctly by closing and reopening RStudio and entering the following commands at the console window (don’t worry about the Conflicts with tidy packages warning):

library(tidyverse)

This may produce some notes or other output, but as long as you don’t get an error message, you’re good to go. If you get a message that says something like: Error in library(somePackageName) : there is no package called 'somePackageName', then the required packages did not install correctly. Please do not hesitate to email me prior to the course if you are still having difficulty. In this email, please copy and paste what you typed in the console, and all of the output that streams by in the console.

Bioconductor

For some classes (e.g., RNA-seq), you’ll need to install a few Bioconductor packages. These packages are installed differently than “regular” R packages from CRAN. Copy and paste these lines of code into your R console one at a time.

source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite("DESeq2")
biocLite("ggtree")

A few notes:

  • We will be using the latest versions of Bioconductor from the 3.5 release. This requires R version 3.4.0 or higher. If you have R 3.4.0 installed, running the commands above will install Bioconductor 3.5. See http://bioconductor.org/news/bioc_3_5_release/.
  • If at any point in the Bioconductor package installations you get prompts in the console asking you to update any existing packages, type n at the prompt at hit enter.
  • If you see a note long the lines of “binary version available but the source version is later”, followed by a question, “Do you want to install from sources the package which needs compilation? y/n”, type n for no, and hit enter.

Check that you’ve installed everything correctly by closing and reopening RStudio and entering the following commands at the console window, one at a time:

library(DESeq2)
library(ggtree)

If you get a message that says something like: Error in library(somePackageName) : there is no package called 'somePackageName', then the required packages did not install correctly. Please do not hesitate to email me prior to the course if you are still having difficulty. In this email, please copy and paste what you typed in the console, and all of the output that streams by in the console.

RMarkdown

Several additional setup steps required for the reproducible research with RMarkdown class.

  1. First, install R, RStudio, and the tidyverse package as described above. Also install the knitr and rmarkdown packages.

    install.packages("knitr")
    install.packages("rmarkdown")
  2. Next, launch RStudio (not R). Click File, New File, R Markdown. This may tell you that you need to install additional packages (knitr, yaml, htmltools, caTools, bitops, rmarkdown, and maybe a few others). Click “Yes” to install these.
  3. Optional: If you want to convert to PDF, you will need to install a \(\LaTeX\) typesetting engine. This differs on Mac and Windows. Note that this part of the installation may take up to several hours, and isn’t strictly required for the class.

Get Data

Click the Data link on the navbar at the top. You can download all the data needed by downloading this zip file or by downloading individual data sets as needed at the Data page.

Required Reading

Tidy Data

  1. “For Big-Data Scientists, Janitor Work Is Key Hurdle to Insights.” The New York Times, August 8, 2014. Available at: https://nyti.ms/2jNUfHo.
  2. Sections 1-3 of Wickham, H. “Tidy Data.” Journal of Statistical Software 59:10 (2014).
  3. Click through each section of stephenturner.us/dataorganization/.

RMarkdown Background

Spend a few minutes to learn a little bit about Markdown. All you really need to know is that Markdown is a lightweight markup language that lets you create styled text (like bold, italics, links, etc.) using a very lightweight plain-text syntax: (like **bold**, _italics_, [links](http://bioconnector.org/markdown), etc.). The resulting text file can be rendered into many downstream formats, like PDF (for printing) or HTML (websites).

  1. (30 seconds) Read the summary paragraph on the Wikipedia page.
  2. (1 minute) Bookmark and refer to this markdown reference: http://commonmark.org/help/.
  3. (5-10 minutes) Run through this 10-minute in-browser markdown tutorial: http://commonmark.org/help/tutorial/.
  4. (5-10 minutes) Go to http://dillinger.io/, an in-browser Markdown editor, and play around. Write a simple markdown document, and export it to HTML and/or PDF.
  5. (10 minutes) See RStudio’s excellent documentation on Rmarkdown at http://rmarkdown.rstudio.com/. Click “Getting Started” and watch the 1 minute video on the Introduction page. Continue reading through each section here on the navigation bar to the left (Introduction through Cheatsheets, and optionally download and print out the cheat sheet). Finally, browse through the RMarkdown Gallery.
  6. (0 seconds) No need to look now, but don’t forget that the course help page has some useful resources on Markdown+RMarkdown.

Phylogenetic trees

  1. How to read a phylogeny epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny.

  1. R version 3.4.0 was released April 2017. If you have not updated your R installation since then, you need to upgrade to a more recent version, since several of the required packages depend on a version at least this recent. You can check your R version with the sessionInfo() command.

  2. Installing/loading the tidyverse tidyverse will install/load the core tidyverse packages that you are likely to use in almost every analysis: ggplot2 (for data visualisation), dplyr (for data manipulation), tidyr (for data tidying), readr (for data import), purrr (for functional programming), and tibble (for tibbles, a modern re-imagining of data frames). It also installs a selection of other tidyverse packages that you’re likely to use frequently, but probably not in every analysis (these are installed, but you’ll have to load them separately with library(packageName)). This includes: hms (for times), stringr (for strings), lubridate (for date/times), forcats (for factors), DBI (for databases), haven (for SPSS, SAS and Stata files), httr (for web apis), jsonlite (or JSON), readxl (for .xls and .xlsx files), rvest (for web scraping), xml2 (for XML), modelr (for modelling within a pipeline), and broom (for turning models into tidy data). After installing tidyverse with install.packages("tidyverse") and loading it with library(tidyverse), you can use tidyverse_update() to update all the tidyverse packages installed on your system at once.