(Refer back to the Advanced Data Manipulation lesson).
- dplyr verbs
- the pipe
%>%
- the
tbl_df
- variable creation
- multiple conditions
- properties of grouped data
- aggregation
- summary functions
- window functions
We’re going to work with a different dataset for the homework here. It’s a cleaned-up excerpt from the Gapminder data. Download the gapminder.csv data by clicking here or using the link above. Download it, and save it in a data/
subfolder of the project directory where you can access it easily from R.
Load the dplyr and readr packages, and read the gapminder data into R using the read_csv()
function (n.b. read_csv()
is not the same as read.csv()
). Assign the data to an object called gm
.
In your submitted homework assignment, I would prefer you use the read_csv()
function to read the data directly from the web (see below). This way I can run your R code without worrying about whether I have the data/
directory or not.
library(dplyr)
library(readr)
# Preferably: read data from web
gm <- read_csv("http://bioconnector.org/workshops/data/gapminder.csv")
# Alternatively read from file:
# gm <- read_csv("data/gapminder.csv")
# Display the data
gm
Use dplyr functions to address the following questions:
## # A tibble: 5 x 2
## continent n
## <chr> <int>
## 1 Africa 52
## 2 Americas 25
## 3 Asia 33
## 4 Europe 30
## 5 Oceania 2
## # A tibble: 1 x 6
## country continent year lifeExp pop gdpPercap
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 Albania Europe 1997 73 3428038 3193
## # A tibble: 5 x 2
## continent mean.lifeExp
## <chr> <dbl>
## 1 Africa 52.5
## 2 Americas 67.2
## 3 Asia 63.7
## 4 Europe 73.2
## 5 Oceania 74.8
## # A tibble: 5 x 2
## country Total.GDP
## <chr> <dbl>
## 1 United States 7.68e+13
## 2 Japan 2.54e+13
## 3 China 2.04e+13
## 4 Germany 1.95e+13
## 5 United Kingdom 1.33e+13
## # A tibble: 22 x 3
## country lifeExp year
## <chr> <dbl> <int>
## 1 Australia 80.4 2002
## 2 Australia 81.2 2007
## 3 Canada 80.7 2007
## 4 France 80.7 2007
## 5 Hong Kong, China 80.0 1997
## 6 Hong Kong, China 81.5 2002
## 7 Hong Kong, China 82.2 2007
## 8 Iceland 80.5 2002
## 9 Iceland 81.8 2007
## 10 Israel 80.7 2007
## # ... with 12 more rows
## # A tibble: 10 x 2
## country r
## <chr> <dbl>
## 1 France 0.996
## 2 Austria 0.993
## 3 Belgium 0.993
## 4 Norway 0.992
## 5 Oman 0.991
## 6 United Kingdom 0.990
## 7 Italy 0.990
## 8 Israel 0.988
## 9 Denmark 0.987
## 10 Australia 0.986
ungroup
before arrange
ing. This also behaves differently in more recent versions of dplyr.## # A tibble: 48 x 3
## continent year mean.pop
## <chr> <int> <dbl>
## 1 Americas 2007 35954847
## 2 Americas 2002 33990910
## 3 Americas 1997 31876016
## 4 Americas 1992 29570964
## 5 Americas 1987 27310159
## 6 Americas 1982 25211637
## 7 Americas 1977 23122708
## 8 Americas 1972 21175368
## 9 Europe 2007 19536618
## 10 Europe 2002 19274129
## # ... with 38 more rows
## # A tibble: 3 x 2
## country sd.pop
## <chr> <dbl>
## 1 Sao Tome and Principe 45906
## 2 Iceland 48542
## 3 Montenegro 99738
## [1] "tbl_df" "tbl" "data.frame"
## Source: local data frame [36 x 6]
## Groups: country [22]
##
## # A tibble: 36 x 6
## country continent year lifeExp pop gdpPercap
## <chr> <chr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1982 39.9 12881816 978
## 2 Bosnia and Herzegovina Europe 1992 72.2 4256013 2547
## 3 Bosnia and Herzegovina Europe 1997 73.2 3607000 4766
## 4 Bulgaria Europe 2002 72.1 7661799 7697
## 5 Bulgaria Europe 2007 73.0 7322858 10681
## 6 Croatia Europe 1997 73.7 4444595 9876
## 7 Czech Republic Europe 1997 74.0 10300707 16049
## 8 Czech Republic Europe 2002 75.5 10256295 17596
## 9 Czech Republic Europe 2007 76.5 10228744 22833
## 10 Equatorial Guinea Africa 1977 42.0 192675 959
## # ... with 26 more rows
Source: https://raw.githubusercontent.com/4va/biodatasci/master/r-dplyr-homework.Rmd