# Speeding up Numerical Simulation in R

In this post I am going to look at how we can run fairly simple

simulations in R with high efficiency. Though many individual-based

simulation models might be simple, they can still take ages to run in R

because of it is an interpreted language. In particular loops are

notoriously slow in R, but loops are often the most logical way to

program a simulation that proceeds in steps through time. So what option

are there to help speed things up? That is, options besides 'rewrite

your simulation in C++', which is simply not feasible for most of us,

and often not worth the effort. Can we get close to the speedup of a C++

rewrite in R? I will explore this question by using one of the simplest

simulation models out there, Conway's Game of Life (GoL).

Conway's GoL is a simple cellular automata, interesting for its ability

to create mesmerizing and complex-looking patterns despite its simple

set of rules. It is a good model to do tests on it, because slightly

more complicated but similar sets of rules can lead to very useful

models in ecology and evolution (e.g. grid-based spatial models). GoL is

setup on a square grid. Each grid cell can take a value of 0 or 1. If we

consider a cell with a 1 as 'alive', and a cell with a 0 as 'dead', then

each round of the simulation each cell is updated according to the

following rules:

- If a cell is a alive:
- the cell dies if:
- only 1 of the 8 surrounding cells are alive, or
- 4 or more of the 8 surrounding cells are alive

- the cell remains alive if:
- 2 or 3 of the 8 surrounding cells are alive

- the cell dies if:
- If a cell is dead:
- the cell becomes alive if:
- exactly 3 or the 8 surrounding cells are alive

- the cell becomes alive if:

We'll start with the most naive way to create this model in R, by

looping through each cell in each iteration of the model. First we

create the lattice, using a simple matrix. We'll start off the

simulation with 40% cells randomly chosen to be alive. We'll make a

fairly large lattice (200 x 200) to highlight how slow R can be even for

such a simple simulation.

```
nr <- 200
nc <- 200
lattice <- matrix(rbinom(nr*nc,1,0.4), nrow = nr, ncol = nc)
```

And here is out loop-based method:

```
GoL_loop <- function(lattice, n_it) {
lattice <- cbind(0, rbind(0, lattice, 0), 0) ## add buffer of zeroes
storage <- array(0, c(nrow(lattice), ncol(lattice), n_it + 1))
storage[ , , 1] <- lattice
for(n in seq_len(n_it)) {
updated_lattice <- lattice
for(i in 2:(nrow(lattice) - 1)){
for(j in 2:(ncol(lattice) - 1)) {
neighbours <- lattice[i + 1, j] +
lattice[i - 1, j] +
lattice[i, j + 1] +
lattice[i, j - 1] +
lattice[i + 1, j + 1] +
lattice[i + 1, j - 1] +
lattice[i - 1, j + 1] +
lattice[i - 1, j - 1]
## apply GoL rules
updated_lattice[i, j] <- ifelse(lattice[i, j] == 1,
ifelse(neighbours < 2 | neighbours > 3, 0, 1),
ifelse(neighbours == 3, 1, 0))
}
}
lattice <- updated_lattice
storage[ , , n + 1] <- lattice
}
storage
}
timed_loop <- system.time(
test <- GoL_loop(lattice, 100)
)
timed_loop
## user system elapsed
## 33.08 0.02 33.14
```

So, that looks pretty cool, but it was ridiculously slow. It took 33.14

seconds to run on my system! That is because every iteration we have to

loop through every single cell and update it (that's 200x200 = 4000). I

would like to run it for more than 100 iterations but I don't want to

wait that long. What can I do?

# Two Ways to Make Simulations faster in R

Typically there are two main ways of making simulations run faster:

- Rewrite using R efficiencies
- Compile code

First, I'll talk about rewriting code to take advantage of efficiencies

in the R language. That is to say, in general, R is inefficient at

loops, however, R is not inefficient at everything. Using this method

tends to take a deeper understanding of R's underlying architecture if

we want to fully take advantage of it. Compiling code usually means

rewriting your code in another language that can be compiled such as C,

or C++. Compiled languages are generally much faster than R. However,

they are usually much more difficult to learn, and don't have all the

nice features of R for manipulating and analyzing data (and simulations

can produce a lot of cool data to manipulate and analyse). Recently,

however, it has become possible to take advantage of some of the

benefits of compiling without having to leave the warm, comforting

embrace of R, and everything Rish. I'll give some examples of that later

in this post.

But first let's look at one way we can rewrite our code so that it runs

faster in R, using features of R which have been highly optimized.

As an example, we can take advantage of one thing that R is pretty

efficient at, and that is doing vectorized operations and matrix math.

If we can convert the logic of our model into a set of operations on

vectors and matrices, we can substantially increase speeds. The

disadvantage of this is that the resulting code is likely to be more

difficult to understand, and it sometimes takes a truly convoluted set

of operations to recreate a relatively simple simulation. Luckily, in

this simple case we can convert our GoL simulation to a set of matrix

additions fairly easily. This method was suggested in a blog

post by Petr Keil in 2012. The idea is

to simply create 8 matrices in each iteration, which are shifted

versions of the original lattice. Then you can simply add all the

matrices together to calculate the number of living neighbours. This at

least eliminates the loops within each time step. Let's try it:

```
GoL_matrix <- function(lattice, n_it) {
nr <- nrow(lattice)
nc <- ncol(lattice)
storage <- array(0, c(nrow(lattice), ncol(lattice), n_it + 1))
storage[ , , 1] <- lattice
for(n in seq_len(n_it)) {
lattice_L <- cbind(lattice[ , -1], 0)
lattice_R <- cbind(0, lattice[ , -nc])
lattice_U <- rbind(lattice[-1, ], 0)
lattice_D <- rbind(0, lattice[-nr, ])
lattice_LU <- rbind(cbind(lattice[ , -1], 0)[-1, ], 0)
lattice_RU <- rbind(cbind(0, lattice[ , -nc])[-1, ], 0)
lattice_LD <- rbind(0, cbind(lattice[ , -1], 0)[-nr, ])
lattice_LR <- rbind(0, cbind(0, lattice[ , -nc])[-nr, ])
neighbours <- lattice_L + lattice_R + lattice_U + lattice_D +
lattice_LU + lattice_RU + lattice_LD + lattice_LR
## apply GoL rules
lattice <- ifelse(lattice == 1,
ifelse(neighbours < 2 | neighbours > 3, 0, 1),
ifelse(neighbours == 3, 1, 0))
storage[ , , n + 1] <- lattice
}
storage
}
timed_matrix <- system.time(
test <- GoL_matrix(lattice, 100)
)
timed_matrix
## user system elapsed
## 0.81 0.06 0.87
```

Okay that was way faster! It only took 0.87 seconds on my system. Okay

let's run it for longer.

```
test <- GoL_matrix(lattice, 200)
```

Okay, but what if we want to run it for a *really* long time? Can we

make this code go even faster? Well, perhaps. Earlier I mentioned than

compiling code could make it run faster. Since R is an interpreted

language, compiling is normally not available. However, more recent

versions of R do include a command that will try and compile R functions

for you. The command is `cmpfun`

in the `compiler`

package (included

with base R). I suspect by compiling we can make our first loop-based

function run much faster, but that it won't help our second matrix-based

function, because that is already as optimized as possible in R. Let's

find out shall we?

```
library(compiler)
GoL_loop_cmp <- cmpfun(GoL_loop)
timed_loop_cmp <- system.time(
test <- GoL_loop_cmp(lattice, 100)
)
timed_loop_cmp
## user system elapsed
## 12.06 0.00 12.06
```

Okay, so that took 12.06 seconds, which is quite a bit faster than the

first time we tried it, nearly 3 times as fast in fact. It's still not

nearly as fast as the matrix method, but you could see how just by using

R's compile feature you could save a lot of time. Now that's see what

happens if we compile the matrix function.

```
GoL_matrix_cmp <- cmpfun(GoL_matrix)
timed_matrix_cmp <- system.time(
test <- GoL_matrix_cmp(lattice, 100)
)
timed_matrix_cmp
## user system elapsed
## 0.70 0.16 0.86
```

Okay, so as expected, we didn't really see any speedup when we compiled

the matrix method. It looks like that is about as fast as it gets! So

when using the compiler in R, we can get advantages if our original code

is relatively inefficient (e.g. uses a lot of looping). Another

limitation of compilation in R is that it will usually only work well if

you mostly only use simple mathematical functions in your function. If

you use a lot of more complicated functions from external packages for

example, R's compiler is unlikely to be able to compile your function

effectively. With trial and error it is possible to get a better feel

for when a function can be improved in this way.

But is there any other option if you want more speed? Should you give in

and learn C++? Or perhaps Julia (which combines

some of the ease of use of R, with the speed being closer to C and

family)? Well, perhaps, but there is another option in R which has

emerged quite recently. This is made possible by a new R package known

as `nimble`

. Nimble is its own programming

language that can be embedded in R, which is mainly meant for

programming sophisticated statistical models (similarly to OpenBUGS,

JAGS, or Stan). However, it provides a feature that could be of use far

beyond statistical models. It allows the translation and compilation of

functions written in R syntax into C++ code, which it calls

`nimbleFunctions`

. Unfortunately, only a small subset of the R language

has been made available in the `nimble`

language, but if your simulation

is simple enough to be programmed using this subset, this could offer a

way to substantially boost speed without having to learn a new language

(except for the elements of `nimble`

itself which differ very slightly

from R in a few cases).

Let's see if we can make our GoL simulation into a `nimbleFunction`

.

```
library(nimble)
##
## Attaching package: 'nimble'
## The following object is masked from 'package:stats':
##
## simulate
GoL_nimble <- nimbleFunction(
setup = function(lattice, n_it, nr, nc) {
lattice <- lattice
n_it <- n_it
nr <- nr
nc <- nc
storage <- array(0, c(nr, nc, n_it + 1))
},
run = function() {
storage[ , , 1] <<- lattice
for(n in 1:n_it) {
updated_lattice <- lattice
for(i in 2:(nr - 1)){
for(j in 2:(nc - 1)) {
neighbours <- lattice[i + 1, j] +
lattice[i - 1, j] +
lattice[i, j + 1] +
lattice[i, j - 1] +
lattice[i + 1, j + 1] +
lattice[i + 1, j - 1] +
lattice[i - 1, j + 1] +
lattice[i - 1, j - 1]
## apply GoL rules
if(lattice[i, j] == 1) {
if(neighbours < 2 | neighbours > 3) {
updated_lattice[i, j] <- 0
}
} else {
if(neighbours == 3) {
updated_lattice[i, j] <- 1
}
}
}
}
lattice <<- updated_lattice
storage[ , , n + 1] <<- lattice
}
}
)
nimble_lattice <- cbind(0, rbind(0, lattice, 0), 0) ## add buffer of zeroes
test_nimble <- GoL_nimble(nimble_lattice, 10, nrow(nimble_lattice), ncol(nimble_lattice))
GoL_nimble_cmp <- compileNimble(test_nimble, showCompilerOutput = TRUE)
## compiling... this may take a minute. On some systems there may be some compiler warnings that can be safely ignored.
## compilation finished.
timed_nimble <- system.time(GoL_nimble_cmp$run())
timed_nimble
## user system elapsed
## 0.02 0.00 0.02
```

Wow! That took only 0.02 seconds! That's so fast it is nearly can't be

timed at 100 iterations. Just to convince ourselves that that worked the

way it should, let's look at a gif of the output, and while we're at it,

time a longer sim, to see if we can get a better idea how fast it is.

Note that I am not including the compilation time here, as that is just a

one-time time cost, so it doesn't have an effect when we want to run a

simulation many times (which we usually do).

```
test_nimble2 <- GoL_nimble(nimble_lattice, 300, nrow(nimble_lattice), ncol(nimble_lattice))
GoL_nimble_cmp2 <- compileNimble(test_nimble2, showCompilerOutput = TRUE)
## compiling... this may take a minute. On some systems there may be some compiler warnings that can be safely ignored.
## compilation finished.
timed_nimble2 <- system.time(GoL_nimble_cmp2$run())
timed_nimble2
## user system elapsed
## 0.09 0.00 0.09
```

To understand how the `nimbleFunction`

works, let's dive a little more

into the details of the function I wrote above. `Nimble`

uses an

object-oriented approach (which makes sense since C++ is also

object-oriented). So a `nimbleFunction`

has two parts: the setup and the

run part. The setup function creates data that is embedded in the

object, and can be accessed from the run function. In the example above,

I created an array to store each iteration of the simulation, and also

embedded the lattice matrix and the other parameters in the object for

easy access later. `Nimble`

is then able to determine the types of

variables automatically (necessary for C++). After compiling the

function, we can access the run function with `nimble_object$run`

, then

we can access the results with `nimble_object$storage`

.

It is also possible to make a `nimbleFunction`

which acts more like an R

function, which only takes arguments and then returns a value. In this

case, we simply omit the setup part. If we do this however, we will have

to declare our variable types, because this is necessary during C++

compilation (normally the setup function automatically determines this).

This is a more typically Rish version:

```
GoL_nimble2 <- nimbleFunction(
run = function(lattice = integer(2), n_it = integer(0), nr = integer(0), nc = integer(0)) {
storage <- array(0, c(nr, nc, n_it + 1), type = "integer")
storage[ , , 1] <- lattice
for(n in 1:n_it) {
updated_lattice <- lattice
for(i in 2:(nr - 1)){
for(j in 2:(nc - 1)) {
neighbours <- lattice[i + 1, j] +
lattice[i - 1, j] +
lattice[i, j + 1] +
lattice[i, j - 1] +
lattice[i + 1, j + 1] +
lattice[i + 1, j - 1] +
lattice[i - 1, j + 1] +
lattice[i - 1, j - 1]
## apply GoL rules
if(lattice[i, j] == 1) {
if(neighbours < 2 | neighbours > 3) {
updated_lattice[i, j] <- 0
}
} else {
if(neighbours == 3) {
updated_lattice[i, j] <- 1
}
}
}
}
lattice <- updated_lattice
storage[ , , n + 1] <- lattice
}
return(storage)
returnType(integer(3))
}
)
## Warning in nf_checkDSLcode(code): Detected possible use of R functions
## in nimbleFunction run code. For this nimbleFunction to compile, these
## functions must defined as nimbleFunctions or nimbleFunction methods: c.
## Warning in nf_checkDSLcode(code): Note that until version 0.6-3 of NIMBLE,
## c() cannot be used as a stand-alone function, but its use to create vector
## arguments to a function may be valid.
nimble_lattice <- cbind(0, rbind(0, lattice, 0), 0) ## add buffer of zeroes
test_nimble3 <- GoL_nimble2(nimble_lattice, 10, nrow(nimble_lattice), ncol(nimble_lattice))
GoL_nimble_cmp3 <- compileNimble(GoL_nimble2, showCompilerOutput = TRUE) # compile directly into function
## compiling... this may take a minute. On some systems there may be some compiler warnings that can be safely ignored.
## compilation finished.
timed_nimble3 <- system.time(test <- GoL_nimble_cmp3(nimble_lattice, 100, nrow(nimble_lattice), ncol(nimble_lattice)))
timed_nimble3
## user system elapsed
## 0.04 0.00 0.03
```

That works, but actually took slightly longer, probably because of

overhead from passing the object between R and the compiled code. In the

previous version, the data was already embedded in the C++ object, which

meant there was no need to pass data between R and C++. This probably

won't make a big difference if we ran the simulation for longer. An

advantage of this method, however, is that we don't have to recompile

every time we decide to change the size of the lattice.

Declaration is done using the type and dimension of the parameters, for

example `lattice`

is a 2 dimensional integer array (a matrix), so it is

declared as `integer(2)`

. `integer(0)`

means a integer scalar.

`returnType(x)`

specifies the type for what the function returns (e.g.

`integer(3)`

for a 3 dimensional integer array). If you were using

non-integers the type would be `double(n)`

.

One of the cool things about `nimble`

is that you can run the created

`nimbleFunction`

directly in R, before compiling. That will be slow (as

slow as native R), but is pretty useful for debugging before compilation

(compiler errors are often going to be fairly cryptic compared with an

error you might get from R).

I think there is a lot of potential for using `nimble`

for simulations

of intermediate complexity, where you want it to run fast, but you don't

want to have to learn C++ and get into the intricacies of using `rcpp`

to interface with R. Eventually I have the feeling I will want to learn

some C++ (or pick it up again, as I actually learned some way back in

high school). But in the meantime this seems like a pretty decent

solution.

I will post some more about my adventures with `nimble`

in the coming

months, as I plan to use it to create a more complicated ecological

simulation.