<- c(10,20,30,40) my_sum
Contemporary R programming
Compiled on R 4.3.1
Programming in R
Programming in R, quite similar to programming in other languages,
especially Python, Matlab, …
Learning how to program, quite similar too
- program… a lot
- keep solving your problems
- re-write !!
Make the computer do the work for you
- create your own algorithms → to process input to output
- talk to the computer
- split up a problem into small(er) steps
- for each step, make everything explicit
- gain automation
- gain performance
- tweak & rerun
- weed away errors
- gain reproducibility
- gain performance
- avoid
- copy-pasting in your code ? Again !!
- typing in specific values ? Again !!
- make one change, and thus many others ? Again !!
Get the most out of what you can do with the computer
readable code
- by future you, by peers / reviewers
- from well-documented to self-explanatory
easily extendable and general code
- modularity / encapsulation
- avoid hard coding
→ use variables for flexibility
efficient code (speed)
iterations (do x for every instance of y)
The Essence of R Programming
Functions and arguments
Define your own functions - to avoid repetition (reusable) - to increase readability - to reduce errors - to encapsulate code (scoping)
Use build in functions: whenever you can - base R + use packages
Arguments → conditional implementation ~ flexibility
Iterations and Functions
example: cumulative sum:
- assume 4 numbers
- manually get cumulative sum
c(10,10+20,10+20+30,10+20+30+40)
- existing function for the cumulative sum
cumsum(my_sum)
- iterate to get the cumulative sum
<- numeric()
out for(it in 1:length(my_sum)){
<- c(out,sum(out[it-1],my_sum[it]))
out
} out
[1] 10 30 60 100
- define a function:
<- function( )
<- function(values){
my_cumsum <- numeric()
out for(it in 1:length(values)){ out <- c(out,sum(out[it-1],values[it])) }
return(out) }
- use that function
# call and reuse
my_cumsum(my_sum)
my_cumsum(c(5,4,3))
- use that function multiple times
# for all at once
map(
list(
a=my_sum,
b=c(5,4,3)
), my_cumsum)
Packages and Environments
- Bring in functions defined in packages
- locally
install.packages('tidyverse')
- in your workspace
library(tidyverse)
- locally
- Note: because
tidyverse
includesdplyr
, the functionselect
is understood.
%>% select(mpg,cyl) %>% slice(1:2) mtcars
mpg cyl
Mazda RX4 21 6
Mazda RX4 Wag 21 6
Different package - same function name
library(MASS)
Attaching package: 'MASS'
The following object is masked from 'package:gtExtras':
select
The following object is masked from 'package:dplyr':
select
%>% select(mpg,cyl) %>% slice(1:2)
mtcars in select(., mpg, cyl) : unused arguments (mpg, cyl) Error
- Functions are defined within environments
getAnywhere(select)
3 differing objects matching 'select' were found
in the following places
package:MASS
package:gtExtras
package:dplyr
namespace:dplyr
namespace:MASS
namespace:tidyselect
Use [] to view one of them
Environments and Namespaces
Packages can be made explicit with ::
environment(select)
<environment: namespace:MASS>
environment(dplyr::select)
<environment: namespace:dplyr>
%>% dplyr::select(mpg,cyl) mtcars
Different package - same function name
library(MASS)
%>% select(mpg,cyl) %>% slice(1:2)
mtcars in select(., mpg, cyl) : unused arguments (mpg, cyl) Error
- Functions are defined within environments
getAnywhere(select)
3 differing objects matching 'select' were found
in the following places
package:MASS
package:gtExtras
package:dplyr
namespace:dplyr
namespace:MASS
namespace:tidyselect
Use [] to view one of them
A package can be made default
environment(select)
<environment: namespace:MASS>
environment(dplyr::select)
<environment: namespace:dplyr>
%>% dplyr::select(mpg,cyl) mtcars
Explicit: better but cumbersome
- combine often used libraries on top
- use
::
for unique / rare use - overwrite function name to be sure
<- dplyr::select
select %>% select(mpg,cyl) %>% slice(1:2) mtcars
mpg cyl
Mazda RX4 21 6
Mazda RX4 Wag 21 6
A package can be made default
environment(select)
<environment: namespace:dplyr>
environment(dplyr::select)
<environment: namespace:dplyr>
%>% dplyr::select(mpg,cyl) mtcars
Modularity and Flexibility
Solve big problems
- by solving many small problems (chain)
- by extending small problems (embed)
Chunks of code (eg., functions),
each with simple input and output
Make code run for the more general case
Link chunks of code automatically
- output is input
- using arguments to functions
Define once so changes are made once
- DRY (Don’t Repeat Yourself)
Readability
Consistent naming of variables and functions
- nouns for variable names
- verbs for functions
- fixed composition order
eg., lm_dta_sub - glm_dtb_ext - combine what belongs together
eg., 1st, 2nd and 3rd element
Short and meaningful naming
Functions instead of code
Isolate the core of the program
$mpg > 21 & mtcars$hp < 60, c(1,4,6)]
mtcars[mtcars$mpg > 21 & mtcars$hp < 60, c('mpg','hp','wt')] mtcars[mtcars
%>% filter(mpg>21,hp<60) %>% select(mpg,hp,wt) mtcars
mpg hp wt
Honda Civic 30.4 52 1.615
%>%
mtcars filter(mpg>21,hp<60) %>%
select(mpg,hp,wt)
R Programming Specifics
R as a tool
- dedicated to statistics: but much more
- open source (almost fully)
- highly modular
- uses vectorisation
Wickham, H. (2019). Advanced R, Second Edition. CRC Press.
R as a language
- functional / kinda object oriented
- use of lexical scoping
- dynamically-typed
- specific choices for memory use
- copy on modify
- modify in place when unique reference
- modify in place for environments
- lists store references, not values
- build on C / Fortran
- can be fast !! but often is not
Speed
It matters how you do things
- Make use of vectorisation.
c(1:5)^2
→ 1, 4, 9, 16, 25
really!! make use of it - Avoid creating objects in loops (immutable objects).
<- 100000
nr_iter # vectorisation
system.time(out <- (1:nr_iter)^2)
user system elapsed
0 0 0
# pre-allocating memory
<- as.numeric(NA,length=100000)
out2 system.time(for(it in 1:nr_iter) out2[it] <- it^2)
user system elapsed
0.11 0.00 0.08
# growing output
<- numeric()
out3 system.time(for(it in 1:nr_iter) out3 <- c(out3,it^2))
user system elapsed
16.66 2.77 22.20
# using tidyverse
system.time(out4 <- map_dbl(1:nr_iter,~.x^2))
user system elapsed
0.22 0.01 0.27
R Objects
R workspaces contain R objects
- data structures
- functions are objects too
- new object types can be created
Objects differ in how they are used
- inspect objects
- extract information from objects
R for you - typically use of data frames
- data-frames are lists<br/>heterogeneous
- matrices are more efficient<br/>homogeneous
and vectors
- (mostly) double for numeric
- factor for categorical
fixed length categories (numerical)
R Functions
R workspaces contain R functions
- check with
lsf.str()
lsf.str("package::dplyr")
- look at the function, eg.
lm
- look for information, eg.
?lm
Go to the help file
- conditional on arguments (input)
- give a return value (output)
- with examples
Most functions in packages
- load into workspace
library
orrequire
- preferably use heavily used packages
- maybe prioritise tidyverse