My first stab at a basic R programming curriculum. I think teaching just these topics without overall motivating examples would be extremely boring, but if you're a self-taught R user, this might be useful to help spot your gaps.
Notes:
I've tried to break up in to separate pieces, but it's not always possible: e.g. knowledge of data structures and subsetting are tidy intertwined.
Level of Bloom's taxonomy listed in square brackets, e.g. http://bit.ly/15gqPEx. Few categories currently assess components higher in the taxonomy.
basic data structures (vector, matrix, list and data frame):
list and describe their differences (dimensionality, homogeneous vs. heterogeneous) [knowledge]
pick the best data structure for a given problem [application]
recall functions to coerce data structures between different forms [knowledge], and recognise which coercions are lossy [comprehension]
match data types and the functions that identify them, and remember common gotchas (is.vector, is.numeric etc.) [comprehension]
str
:
interpret the output of str
[comprehension]
use str
and subsetting to extract desired pieces from an arbitrary object (for example, extract the r squared value from a linear model) [application]
vectors:
recognise which types of data corresponding to the four common atomic vectors (character, double, integer, logical) [knowledge]
recognise the use of L
to create integer vectors [knowledge]
create new vectors with c()
, and correctly predict vector type when multiple types are mixed (e.g. what is the type of c(1, 1L, F)
) [application]
create named vectors with c()
, recognise how named vectors are printed and how to extract values with character subsetting [application]
employ implicit logical to numerical coercion to compute number and proportion of TRUEs in a vector (e.g. what proportion of values are missing?) [application]
predict how missing values propagate [application], and discuss why is.na()
is necessary [synthesis]
data frames:
use data.frame()
to create a data frame from multiple vectors, and control the names of the generated columns [application]
describe the situations under which strings are coerced to factors, and recall how to use I
, asis = TRUE
or stringsAsFactors = FALSE
to prevent conversion [knowledge]
combine two or more data frames with cbind()
and rbind()
, and describe what conditions must be true for the combination to work [knowledge]
use head()
, tail()
, summary()
and str()
to get an overview of a data frame [application]
describe how 1d and 2d subsetting of data frame differ, and enumerate the circumstances under which subsetting a data frame will return a column instead of a data frame [comprehension]
matrices
contrast 1d vector operations and 2d matrix operations (e.g. names()
vs. colnames()
& rownames()
, length()
vs nrow()
and ncol()
). [analysis]
predict the output when a matrix is coerced into a vector (i.e. remember that R matrices are stored col-wise)
lists
create a new list with list()
, and selectively name components [application]
convert a list into a vector with unlist, and apply implicit coercion rules to predict type of output [application]
NULL
strings vs. factors vs. ordered factors
recall the key differences (cardinality, ordering) between strings, factors and ordered factors [knowledge]
select the most appropriate type for a given variable [analysis]
describe the operation of drop = TRUE
, when it is needed, and remedies if you are using it frequently [application]
match data types with conversion and testing functions, and list common gotchas (e.g. converting an ordered factor to a factor) [knowledge]
know enough about floating point math to predict the output of sqrt(2)^ 2 - 2 == 0
and spot potentially hazardous use of equality comparisons [application]
types of subsetting
match the six types of subsetting objects with their results [knowledge]
compare and contrast the use of subsetting, match
and %in%
when looking for matching values across two vectors [application]
use integer subsetting to order multidimensional structures [application]
apply De Morgan's rule to simplify a complicated double negation [application]
identify uses of which()
that are redundant (i.e. only need which you want the position of nth TRUE) [analysis]
use repeated values in numeric indexing to create a "subset" that is larger than the original set [application]
use character subsetting to create a lookup table [application]
understand how 1d subsetting generalises to 2d subsetting [comprehension]
describe the difference between simplifying and preserving subsetting ([`` vs
[[, when
drop = FALSE` is necessary) [analysis]
understand the difference between x$y
and x[["y"]]
and know when to use each form [application]
use subsetting with assignment to change multiple values in a data structure at once [application]
use subsetting with assignment and NULL to remove elements from a list/data frame [application]
identify when subsetting + assignment will fail because the number of values to assign does not match the number of values in the subset [analysis]
use R's boolean operators to recreate english expressions (e.g. x is less than 50 and more than 25). Recall the difference between R's or and or in regular English. [application]
compare and contrast &
and |
with &&
and ||
[analysis]
identify the correct function to read/write a data frame to/from disk (csv, tab delimited or fixed width file) [application]
use common arguments (na.string
, sep
, header
) to deal with files that have unusual structure [analysis]
recongise the lack of symmetry between read.csv()
and write.csv()
, and describe which options should be used by default [knowledge]
use subset & transform to reduce the amount of typing for common data manipulation operations [knowledge]
use readRDS
/saveRDS
to cache binary R objects that were expensive to compute [application]
understand what save()
and load()
do, how they differ from readRDS()
and saveRDS()
[knowledge] and when to use them instead of the single object variants [evaluation]
convert a simple script into parameterised functions [synthesis]
describe a simple R function in words [synthesis]
describe R's argument matching semantics (position, partial, exact) [knowledge], predict how they apply in a specific situation [application], and evaluate good and less-good use of the three different types [evaluation]
describe the parts of a function using correct terminology: body, formal arguments, return value [comprehension]
use scoping rules to predict how names are mapped to values [application]
describe short-circuiting and its impact on expressions like is.null(x) || all(is.na(x))
or TRUE || stop("!")
execute a script of R code with source())
describe the structure of an if statement [comprehension]
use a for loop to repeat the same operation on different elements of a data structure [application]
convert a for loop to a while loop [analysis]
illustrate why 1:length(x)
is dangerous and suggest a safer way [application]
correct the identing and spacing of a piece of poorly formatted source code [application]
describe what vectorisation means, distinguish internal and external vectorisation, and the performance consequence of each functions [knowledge]
use vectorised operations instead of for loops to perform simple mathematical operations (log, addition, subtraction etc.) [application]
use lapply()
, sapply()
and apply()
to vectorise operations that are not already vectorised. [analysis]
convert an lapply()
call to a for loop [application]
recognise a for-loop that can be rewritten to use lapply
[knowledge]
match common non-vectorised equivalents to their vectorised equivalents (e.g. min()
and pmin()
, sum()
to cumsum()
and colSums()
) [knowledge]
describe basic recycling rules, and know how to avoid them when necesary [knowledge]
recognise and remedy simple syntax errors (missing quotes, missing parentheses etc.) [comprehension]
use try()
to recover from an error [application]
interpret the output of `traceback()`` to identify where an error occured [application]
initiate an interactive debugger with browser()
or options(error = recover())
[application]
list the commands used to control browser()
/recover()
[knowledge]
use options(warn = 2)
to convert warnings into errors for debug
create a minimal reproducible example to get help from others [synthesis]
find help for a function, data set, and package [knowledge]
read and interpret the documentation of a function [analysis]
use google to identify the name of a function that performs a given task
install a packages with install.packages()
[comprehension]
load a package with library()
or require()
[comprehension]
determine which packages are out of date [application]
understand lifetime of install.packages
/library
effects [comprehension]
use ::
to refer to a function in a specific package