my_sum <- c(10,20,30,40)Contemporary R programming
Compiled on R 4.3.1
Programming in R
Programming in R, quite similar to programming in other languages,
especially Python, Matlab, …
Learning how to program, quite similar too
- program… a lot
- keep solving your problems
- re-write !!
Make the computer do the work for you
- create your own algorithms → to process input to output
- talk to the computer
- split up a problem into small(er) steps
- for each step, make everything explicit
- gain automation
- gain performance
- tweak & rerun
- weed away errors
- gain reproducibility
- gain performance
- avoid
- copy-pasting in your code ? Again !!
- typing in specific values ? Again !!
- make one change, and thus many others ? Again !!
Get the most out of what you can do with the computer
readable code
- by future you, by peers / reviewers
- from well-documented to self-explanatory
easily extendable and general code
- modularity / encapsulation
- avoid hard coding
→ use variables for flexibility
efficient code (speed)
iterations (do x for every instance of y)
The Essence of R Programming
Functions and arguments
Define your own functions - to avoid repetition (reusable) - to increase readability - to reduce errors - to encapsulate code (scoping)
Use build in functions: whenever you can - base R + use packages
Arguments → conditional implementation ~ flexibility
Iterations and Functions
example: cumulative sum:
- assume 4 numbers
- manually get cumulative sum
c(10,10+20,10+20+30,10+20+30+40)- existing function for the cumulative sum
cumsum(my_sum)- iterate to get the cumulative sum
out <- numeric()
for(it in 1:length(my_sum)){
out <- c(out,sum(out[it-1],my_sum[it]))
}
out[1] 10 30 60 100
- define a function:
<- function( )
my_cumsum <- function(values){
out <- numeric()
for(it in 1:length(values)){ out <- c(out,sum(out[it-1],values[it])) }
return(out) }- use that function
# call and reuse
my_cumsum(my_sum)
my_cumsum(c(5,4,3))- use that function multiple times
# for all at once
map(
list(
a=my_sum,
b=c(5,4,3)
),
my_cumsum)Packages and Environments
- Bring in functions defined in packages
- locally
install.packages('tidyverse') - in your workspace
library(tidyverse)
- locally
- Note: because
tidyverseincludesdplyr, the functionselectis understood.
mtcars %>% select(mpg,cyl) %>% slice(1:2) mpg cyl
Mazda RX4 21 6
Mazda RX4 Wag 21 6
Different package - same function name
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:gtExtras':
select
The following object is masked from 'package:dplyr':
select
mtcars %>% select(mpg,cyl) %>% slice(1:2)
Error in select(., mpg, cyl) : unused arguments (mpg, cyl)- Functions are defined within environments
getAnywhere(select)3 differing objects matching 'select' were found
in the following places
package:MASS
package:gtExtras
package:dplyr
namespace:dplyr
namespace:MASS
namespace:tidyselect
Use [] to view one of them
Environments and Namespaces
Packages can be made explicit with ::
environment(select)<environment: namespace:MASS>
environment(dplyr::select)<environment: namespace:dplyr>
mtcars %>% dplyr::select(mpg,cyl)Different package - same function name
library(MASS)mtcars %>% select(mpg,cyl) %>% slice(1:2)
Error in select(., mpg, cyl) : unused arguments (mpg, cyl)- Functions are defined within environments
getAnywhere(select)3 differing objects matching 'select' were found
in the following places
package:MASS
package:gtExtras
package:dplyr
namespace:dplyr
namespace:MASS
namespace:tidyselect
Use [] to view one of them
A package can be made default
environment(select)<environment: namespace:MASS>
environment(dplyr::select)<environment: namespace:dplyr>
mtcars %>% dplyr::select(mpg,cyl)Explicit: better but cumbersome
- combine often used libraries on top
- use
::for unique / rare use - overwrite function name to be sure
select <- dplyr::select
mtcars %>% select(mpg,cyl) %>% slice(1:2) mpg cyl
Mazda RX4 21 6
Mazda RX4 Wag 21 6
A package can be made default
environment(select)<environment: namespace:dplyr>
environment(dplyr::select)<environment: namespace:dplyr>
mtcars %>% dplyr::select(mpg,cyl)Modularity and Flexibility
Solve big problems
- by solving many small problems (chain)
- by extending small problems (embed)
Chunks of code (eg., functions),
each with simple input and output
Make code run for the more general case
Link chunks of code automatically
- output is input
- using arguments to functions
Define once so changes are made once
- DRY (Don’t Repeat Yourself)
Readability
Consistent naming of variables and functions
- nouns for variable names
- verbs for functions
- fixed composition order
eg., lm_dta_sub - glm_dtb_ext - combine what belongs together
eg., 1st, 2nd and 3rd element
Short and meaningful naming
Functions instead of code
Isolate the core of the program
mtcars[mtcars$mpg > 21 & mtcars$hp < 60, c(1,4,6)]
mtcars[mtcars$mpg > 21 & mtcars$hp < 60, c('mpg','hp','wt')]mtcars %>% filter(mpg>21,hp<60) %>% select(mpg,hp,wt) mpg hp wt
Honda Civic 30.4 52 1.615
mtcars %>%
filter(mpg>21,hp<60) %>%
select(mpg,hp,wt)R Programming Specifics
R as a tool
- dedicated to statistics: but much more
- open source (almost fully)
- highly modular
- uses vectorisation
Wickham, H. (2019). Advanced R, Second Edition. CRC Press.
R as a language
- functional / kinda object oriented
- use of lexical scoping
- dynamically-typed
- specific choices for memory use
- copy on modify
- modify in place when unique reference
- modify in place for environments
- lists store references, not values
- build on C / Fortran
- can be fast !! but often is not
Speed
It matters how you do things
- Make use of vectorisation.
c(1:5)^2→ 1, 4, 9, 16, 25
really!! make use of it - Avoid creating objects in loops (immutable objects).
nr_iter <- 100000
# vectorisation
system.time(out <- (1:nr_iter)^2) user system elapsed
0 0 0
# pre-allocating memory
out2 <- as.numeric(NA,length=100000)
system.time(for(it in 1:nr_iter) out2[it] <- it^2) user system elapsed
0.11 0.00 0.08
# growing output
out3 <- numeric()
system.time(for(it in 1:nr_iter) out3 <- c(out3,it^2)) user system elapsed
16.66 2.77 22.20
# using tidyverse
system.time(out4 <- map_dbl(1:nr_iter,~.x^2)) user system elapsed
0.22 0.01 0.27
R Objects
R workspaces contain R objects
- data structures
- functions are objects too
- new object types can be created
Objects differ in how they are used
- inspect objects
- extract information from objects
R for you - typically use of data frames
- data-frames are lists<br/>heterogeneous
- matrices are more efficient<br/>homogeneous
and vectors
- (mostly) double for numeric
- factor for categorical
fixed length categories (numerical)
R Functions
R workspaces contain R functions
- check with
lsf.str()lsf.str("package::dplyr") - look at the function, eg.
lm - look for information, eg.
?lm
Go to the help file
- conditional on arguments (input)
- give a return value (output)
- with examples
Most functions in packages
- load into workspace
libraryorrequire - preferably use heavily used packages
- maybe prioritise tidyverse