Archive

Archive for the ‘R’ Category

R: no nested FOR loops

August 14, 2008 7 comments

I am an avid user of the R project for statistical computing and use it for much of my computational and statistical analyses. I will be posting tips, tricks and hints for using R on this blog, especially when I can’t find the information elsewhere on the internet.

One of the main benefits of R is that it is vectorized, so you can simplify your code writing a great deal using vectorized functions. For example you don’t have to write a loop to add 5 to every element in a vector X. You can just write “X + 5”.

The R mantra is to minimize the use of loops wherever possible as they are inefficient and use a great deal of memory. I have been pretty good about this except for one situation – Running a simulation over a parameter space defined by a few vectors. For instance, anyone who is familiar with writing code in C or some other non-vectorized language will recognize the following pseudocode for running a simulation over a parameter space of 4 variables with nreps replications for each parameter set:

for (i in 1:length(m)) {
for (j in 1:length(sde)) {
for (k in 1:length(dev.coef)) {
for (l in 1:length(ttd)) {
for (rep in 1:nreps) {
Run the simulation
}
}
}
}
}


(my apologies for the bad formatting, blogger keeps taking out the spaces I put in. If anyone knows how to easily get the html version of my text to treat spaces as spaces, please, let me know).

This runs the simulation over a parameter space that is defined in a vector for each parameter. This is how I have been writing my R code for years.

I FINALLY FOUND OUT HOW TO VECTORIZE THIS PROCESS! I am not sure why it took me so long, but at long last I can do all of this in one command, using the apply and expand.grid functions. The apply command will apply a function sequentially to data taken from rows of an array and expand.grid takes factors and combines them into an array.

For example, say my parameter space is defined by:

> m <- c(1,2,3,4)
> n <- c("m","f")
> o <- c(12,45,34)

I can call expand.grid to get all of the combinations put into rows in an array:

> expand.grid(m,n,o)
Var1 Var2 Var3
1 1 m 12
2 2 m 12
3 3 m 12
4 4 m 12
5 1 f 12
6 2 f 12
7 3 f 12
8 4 f 12
9 1 m 45
10 2 m 45
11 3 m 45
12 4 m 45
13 1 f 45
14 2 f 45
15 3 f 45
16 4 f 45

then I can run my simulation command on the rows of this array by using apply() with MARGIN = 1 as a parameter telling apply to use the rows of the array:

So in my example, I have replaced the following ugly, inefficient code full of nested for loops (summarized above) with the following single command:

out <- apply(expand.grid(m,sde,dev.coef,ttd,reps), MARGIN = 1,
function(x) one.generation(N, x[1],x[2],x[3],x[4]))

Fantastic! I love it when I figure out how to do things more succinctly in R. I hopefully will regularly put in R tidbits on this blog in the hopes that some ‘ignorant’ R programmer like myself will find this post the first time he searches the internet for it. It may have taken me years to finally figure it out, but that doesn’t mean it has to take that long for everyone else! 🙂

Categories: opensource, R, statistics