Home > opensource, R, statistics > R: no nested FOR loops

R: no nested FOR loops

I am an avid user of the R project for statistical computing and use it for much of my computational and statistical analyses. I will be posting tips, tricks and hints for using R on this blog, especially when I can’t find the information elsewhere on the internet.

One of the main benefits of R is that it is vectorized, so you can simplify your code writing a great deal using vectorized functions. For example you don’t have to write a loop to add 5 to every element in a vector X. You can just write “X + 5”.

The R mantra is to minimize the use of loops wherever possible as they are inefficient and use a great deal of memory. I have been pretty good about this except for one situation – Running a simulation over a parameter space defined by a few vectors. For instance, anyone who is familiar with writing code in C or some other non-vectorized language will recognize the following pseudocode for running a simulation over a parameter space of 4 variables with nreps replications for each parameter set:

for (i in 1:length(m)) {
for (j in 1:length(sde)) {
for (k in 1:length(dev.coef)) {
for (l in 1:length(ttd)) {
for (rep in 1:nreps) {
Run the simulation
}
}
}
}
}


(my apologies for the bad formatting, blogger keeps taking out the spaces I put in. If anyone knows how to easily get the html version of my text to treat spaces as spaces, please, let me know).

This runs the simulation over a parameter space that is defined in a vector for each parameter. This is how I have been writing my R code for years.

I FINALLY FOUND OUT HOW TO VECTORIZE THIS PROCESS! I am not sure why it took me so long, but at long last I can do all of this in one command, using the apply and expand.grid functions. The apply command will apply a function sequentially to data taken from rows of an array and expand.grid takes factors and combines them into an array.

For example, say my parameter space is defined by:

> m <- c(1,2,3,4)
> n <- c("m","f")
> o <- c(12,45,34)

I can call expand.grid to get all of the combinations put into rows in an array:

> expand.grid(m,n,o)
Var1 Var2 Var3
1 1 m 12
2 2 m 12
3 3 m 12
4 4 m 12
5 1 f 12
6 2 f 12
7 3 f 12
8 4 f 12
9 1 m 45
10 2 m 45
11 3 m 45
12 4 m 45
13 1 f 45
14 2 f 45
15 3 f 45
16 4 f 45

then I can run my simulation command on the rows of this array by using apply() with MARGIN = 1 as a parameter telling apply to use the rows of the array:

So in my example, I have replaced the following ugly, inefficient code full of nested for loops (summarized above) with the following single command:

out <- apply(expand.grid(m,sde,dev.coef,ttd,reps), MARGIN = 1,
function(x) one.generation(N, x[1],x[2],x[3],x[4]))

Fantastic! I love it when I figure out how to do things more succinctly in R. I hopefully will regularly put in R tidbits on this blog in the hopes that some ‘ignorant’ R programmer like myself will find this post the first time he searches the internet for it. It may have taken me years to finally figure it out, but that doesn’t mean it has to take that long for everyone else! 🙂

Advertisements
Categories: opensource, R, statistics
  1. March 19, 2009 at 2:32 pm

    nice work !!and so good of you !!

  2. Anonymous
    December 16, 2010 at 12:42 am

    Thank you so much for sharing this!!!

  3. Anonymous
    August 5, 2011 at 2:00 pm

    apply for rows and column is just fine, but how do you vectorize when it comes to diagonals? For example, in a population matrix where rows are age classes and columns are years then age a in year t becomes age a+1 in year t+2. Is there an apply function for diagonals!!!!

  4. Anonymous
    August 25, 2011 at 7:15 pm

    YEAH! Thanks. I have been doing this badly too. This is awesome.

  5. March 16, 2012 at 10:21 am

    This factoid was invaluable. I am a beginner (almost intermediate) R user working on my PhD in genetic epidemiology and have started realizing that loops are the only efficient way to tackle the huge amounts of data. BUT nested loops are not easy to learn. The expand.grid() function will have a lot of functionality in the work I do. Thanks a bunch.

  6. Pradeep
    April 22, 2013 at 7:57 pm

    Thanks for sharing! I was struggling with slow loops in R for a bit and will be sure to check how this fares in comparison.

  7. Ema
    May 12, 2014 at 2:48 pm

    Thanks a lot!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: