EduCAT: R
Notes from Advanced R by Hadley Wickham
Data Structures
Vectors
- There are six types of atomic vector: logical, character, integer, double, complex and raw.
- A list can contain different types (heterogeneous) while an atomic vector can only contain one type (homogeneous)
- A list can contain nested vectors whereas an atomic vector flattens nested vectors.
- To make a list of heterogeneous types use
list()
and notc()
as the latter will coerce your types into the most flexible type in your list.
Attributes
- For storing metadata about an object.
y <- 1:10 attr(y, "my_att") <- "This is a vector"
- Names, Dimensions and Class are the only attributes that survive when a vector has been modified.
Factors
- A factor is a vector of predefined values; used to store categorical data.
- A factor has two attributes: Class and Levels.
- Levels: the set of allowed values.
- You can't use values that are not in the levels. Recognise this error?
## Warning in `[<-.factor`(`*tmp*`, 2, value = xx): invalid factor level, NA ## generated
- Setting levels:
state_char <- c("flip", "flip", "flip") state_factor <- factor(state_char, levels=c("flip", "flop")) table(state_factor) ##state_factor ##flip flop ## 3 0
- A column of numeric vectors will be read as factors if the column contains a non-numeric value.
- Remedy 1: coerce the vector from a factor to a character and then a double.
- Remedy 2: use a
na.strings="..."
argument when reading in data file. - Tip: Convert a factor to a character if you need it to be string-like.
- Example:
letters <- c("a", "b", "c") # Atomic Vector of Characters f1 <- factor(letters) # f1 is a factor with the same values as letters levels(f1) <- rev(levels(f1)) # Reverse f1's levels f2 <- rev(factor(letters)) # f2, factor, same values, reverse order f3 <- factor(letters, levels=rev(letters)) # f3, factor, same, reversed levels
Matrices and Arrays
- Useful commands to play with:
a <- matrix(1:6, c(2,3)) # same as a <- matrix(1:6, nrow=2, ncol=3) # extra dimension b <- array (1:12, c(2,3,2))
- length()
- colnames()
- rownames()
- cbind()
- rbind()
Data frames
- List of equal-length vectors. 2d structure.
df <- data.frame(x=1:3, y=c("a", "b", "c"), stringsAsFactors=FALSE)
- Coerce an object with
as.data.frame()
- Combine with others data frames using
cbind()
andrbind()
. - Use
plyr::rbind.fill()
to work around the column-name rule.
Subsetting
Data types
- 1d subsetting: Select, omit, order
x[c(1,3)] x[-1] x[order(x)]
- 2d subsetting: matrix[rows, columns]
a[1:2, ]
- 2d subsetting: data frames
df <- data.frame(x=1:3, y=1:3, z=letters[1:3]) df[df$x==2,] df[df$y==2,] df[df$z=="b",] df[2,]
- S3 objects consist of atomic vectors, arrays and lists. S4 objects have
@
andslot()
. See object-oriented systems. - A function for extracting the diagonal of a matrix:
mydiag <- function(x){ i.n <- nrow(x) y <- seq(1, i.n, 1) for (i in (1: i.n)){ y[i] <- x[i,i] } return(y) }
-
df[is.na(df)] <- 0
converts NA values to 0.
Subsetting Operators
- Use
[[
to pull values from a list. - One common mistake with the
$
is to try to use it when the name of the column is stored as a variable.var <- "cyl" mtcars$var ##NULL mtcar[[vars]] ## [1] 6 6 4 ...
- Preserve the data frame with empty subset
faithful[] <- lapply(faithful, as.integer) # still a df faithful <- lapply(faithful, as.integer) # now a list of 2
Applications
- Lookup tables
- Matching and merging
- Bootstrapping
df <- data.frame(x=rep(1:3, each=2), y=6:1, z=letters[1:6]) set.seed(10) df[sample(nrow(df)),] # random reordering # Select 6 bootstrap replicants df[sample(nrow(df), 6, rep=T),]
- Use the vector boolean operators
&
and|
for subsetting. -
&&
and||
are scalar operators and useful for if statements.
Vocabulary
Basics
match
assign
get
all.equal
complete.cases
cummax, cummin
rle
missing
on.exit
invisible
setdiff
setequal
which
sweep
data.matrix
rep_len
seq_len, seq_along
split
expand.grid
next, break
switch
sapply, vapply
apply
tapply
replicate
Common Data Structure Vocab
- See stringr
Working with R
ls, exists
library, require # What's the difference?
demo
example
# Debugging
traceback
browser
recover
options(error = )
stop, warning, message
tryCatch, try
I/O
- Useful for package and function writing
Style
Tips
- File names should be meaningful and end in
.R
. Prefix with number if need to be run in sequence. - Variables and functions - lowercase,
_
between words, nouns for variables and verbs for functions. - Spaces around operators.
- Indent with two spaces, not with tabs. Except for function definitions.
- Use
<-
and not=
, for assignment. - Comments should explain the why, not the what.
Functions
Components
- Three components: body, formals and environment e.g.
body(f)
- Except primitive functions e.g.
sum
written in C - A function to determine whether an object is a function.
is.fn <- function(x){ if (class(x)=="function"){ y <- "yes" } else { y <- "no" } return(y) } is.fn(is.fn) ## "yes"
exists()
function
-
codetools::findGlobals()
- gives a function's external dependencies - You must always rely on functions defined in base R or other R packages.
- Note you could turn
(
into a function!
Every Operation is a Function Call
- "Everything that exists is an object; everything that happens is a function call." John Chambers (Creator of the S programming language)
- Equivalent operations (note backticks, not apostrophes)
x + y `+` (x, y)
-
sapply()
- powerful function accepts "+" and `x`
Function Arguments
- Call a function - specify arguments by complete or partial names
f <- function(catalog, peril, region){ list(c = catalog, p = peril, r=region) }
- To publish code on CRAN you must use complete names.
- Send a list to a function using
do.call()
do.call(mean, args)
- Setting default values
Special Calls
- Prefix arguments: names of functions come before e.g. mean()
- Infix arguments: names in-between e.g. +, -, >
- Replacement functions
Return Values
- Suggests reserving explicit
return(y)
for early returns. - See also
invisible()
values. - The function
on.exit()
restores any changes to global state by impure functions. - Hadley's example:
in_dir <- function(dir, code) { old <- setwd(dir) on.exit(setwd(old)) force(code) } getwd()
OO Field Guide
Alastair Clarke
11th December, 2018