Hi I'm writing a function that uses four parameters (scalars) and I need to run it in an iterative process (the parameters vary to find the minimum RSS). I don't want to use loops and so tried the do.call function. However it didn't work. My understanding is that the do.call simple runs the function replacing the arguments (scalars by vectors), instead of runing the function for each set of scalars in the list, what I need. Can you please tell me if there is another way of doing it whithout using the for loop ? Thanks EJ ps: Follows an example (off course the example doesn't make much sense but describes the problem).> funfunction(a,b){ vec <- rnorm(25) res <- a*vec^b res }> fun(2,3)[1] 7.006278e+00 3.515010e-01 7.989718e+00 -3.377766e-02 -1.879471e-02 [6] -2.920680e-01 1.174834e+00 -1.088638e-03 6.448725e+00 2.591805e+00 [11] -4.313672e-04 -9.171867e-03 -6.793569e+00 -2.480562e+01 -1.514828e+01 [16] -1.259896e-01 -7.504192e-02 6.647855e-02 5.609645e-01 1.093114e-01 [21] 1.802123e+00 7.650033e-03 -3.534951e+00 -2.028473e-03 -2.837360e+01> do.call("fun",list(a=c(1:6),b=rnorm(6)))[1] 1.4766338 NaN 3.0214852 3.8132530 0.2753699 NaN [7] NaN NaN 2.9998547 NaN NaN 6.3050385 [13] 0.5970596 0.8722498 2.9931344 4.0664852 NaN NaN [19] 2.8121803 NaN 2.9989127 NaN NaN NaN [25] 14.4631627 Warning messages: 1: longer object length is not a multiple of shorter object length in: vec^b 2: longer object length is not a multiple of shorter object length in: a * vec^b>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Ernesto Jardim <ernesto at ipimar.pt> writes:> Hi > > I'm writing a function that uses four parameters (scalars) and I need to > run it in an iterative process (the parameters vary to find the minimum > RSS). > > I don't want to use loops and so tried the do.call function. However it > didn't work. My understanding is that the do.call simple runs the > function replacing the arguments (scalars by vectors), instead of runing > the function for each set of scalars in the list, what I need. > > Can you please tell me if there is another way of doing it whithout > using the for loop ?That's not what do.call does. It is for situations where the argument list of a single call needs to be constructed from simpler components. Your example is equivalent to fun(a=c(1:6), b=rnorm(6)) The loop over multiple parallel vectors is only doable via something like lapply(1:6, function(i)fun(a[i],b[i])) However, I recently played with this and got as far as this: napply <- function(..., FUN) { FUN <- match.fun(FUN) x <- list(...) lens <- sapply(x,length) len <- max(lens) if (any(lens != len)) x <- lapply(x, rep, length=len) tuples <- lapply(seq(length=len), function(i)lapply(x,"[", i)) sapply(tuples, function(t)eval(as.call(c(FUN,t)))) }> napply(a=c(1:6),b=rnorm(6), FUN=fun)[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.0259135 NaN 3.003882 NaN NaN 20.299212 [2,] NaN 1.977696 3.026111 NaN 3.951746 19.107481 [3,] 1.1840499 2.024837 NaN 8.289768 NaN 7.479917 [4,] 0.9756922 2.003576 NaN NaN 4.236000 NaN [5,] 1.0010550 2.006045 NaN NaN NaN 1302.330425 [6,] NaN NaN NaN 2.472650 NaN NaN [7,] NaN 2.094956 NaN NaN NaN 3.685879 [8,] 0.8646628 NaN 2.993435 NaN 3.369501 NaN [9,] NaN 2.044915 3.006433 6.426090 6.123980 19.235790 [10,] 1.6051736 NaN 3.011986 NaN 3.638641 NaN ....> > fun > function(a,b){ > > vec <- rnorm(25) > res <- a*vec^b > res > > } > > fun(2,3) > [1] 7.006278e+00 3.515010e-01 7.989718e+00 -3.377766e-02 > -1.879471e-02 > [6] -2.920680e-01 1.174834e+00 -1.088638e-03 6.448725e+00 > 2.591805e+00 > [11] -4.313672e-04 -9.171867e-03 -6.793569e+00 -2.480562e+01 > -1.514828e+01 > [16] -1.259896e-01 -7.504192e-02 6.647855e-02 5.609645e-01 > 1.093114e-01 > [21] 1.802123e+00 7.650033e-03 -3.534951e+00 -2.028473e-03 > -2.837360e+01 > > do.call("fun",list(a=c(1:6),b=rnorm(6))) > [1] 1.4766338 NaN 3.0214852 3.8132530 0.2753699 NaN > [7] NaN NaN 2.9998547 NaN NaN 6.3050385 > [13] 0.5970596 0.8722498 2.9931344 4.0664852 NaN NaN > [19] 2.8121803 NaN 2.9989127 NaN NaN NaN > [25] 14.4631627 > Warning messages: > 1: longer object length > is not a multiple of shorter object length in: vec^b > 2: longer object length > is not a multiple of shorter object length in: a * vec^b > >-- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On 10 Apr 2002, Ernesto Jardim wrote:> Hi > > I'm writing a function that uses four parameters (scalars) and I need to > run it in an iterative process (the parameters vary to find the minimum > RSS). > > I don't want to use loops and so tried the do.call function. However it > didn't work. My understanding is that the do.call simple runs the > function replacing the arguments (scalars by vectors), instead of runing > the function for each set of scalars in the list, what I need. > > Can you please tell me if there is another way of doing it whithout > using the for loop ?And why wouldn't you want to use the for() loop? Unless your function is vectorised you're not going to gain anything by getting rid of the for() loop. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Ernesto Jardim
2002-Apr-12 16:56 UTC
[R] problem with do.call or how to speed code avoiding for() loops [SUMMARY]
Hi These is the summary of the discussion about do.call posted on Wed, 2002-04-10 at 13:00, by Ernesto Jardim. The initial problem was about the use of do.call function. The purpose was to avoid for() loops and speed up code. Regarding do.call it was referred by Peter Dalgaard that do.call is for "situations where the argument list of a single call needs to be constructed from simpler components". Also Peter said that, to loop over paralel vectors something like lapply should be used and presented a napply function example. Thomas Lumley raised the problem that for() loops should only be avoided if one is using vectorised functions and explained what it means (see message bellow). Ernesto Jardim questioned the fact that, if the family of apply functions are writean entirly in R, then these functions would only be usefull for simplicity in writing code. Several messages referred that apply is only R code. Douglas Bates said that "S Programming" discuss the need to profile the code before implementing changes, if one wants to make it faster. Thomas referred to paralel processing and the increase in speed that it will bring to apply() when it will be implemented. Prof. Ripley answered this issue saying that apply() just streamlines a for() loop but lapply() is faster (it makes a call to compiled code) and its use is encouradge. Also stating that apply() is a matter of style. All relevant messages are pasted bellow. If something is wrong in this summary please let me know and I'll correct it. Regards EJ --------------------------------------- Starting message: On Wed, 2002-04-10 at 13:00, Ernesto Jardim wrote: Hi I'm writing a function that uses four parameters (scalars) and I need to run it in an iterative process (the parameters vary to find the minimum RSS). I don't want to use loops and so tried the do.call function. However it didn't work. My understanding is that the do.call simple runs the function replacing the arguments (scalars by vectors), instead of runing the function for each set of scalars in the list, what I need. Can you please tell me if there is another way of doing it whithout using the for loop ? Thanks EJ ps: Follows an example (off course the example doesn't make much sense but describes the problem).> funfunction(a,b){ vec <- rnorm(25) res <- a*vec^b res }> fun(2,3)[1] 7.006278e+00 3.515010e-01 7.989718e+00 -3.377766e-02 -1.879471e-02 [6] -2.920680e-01 1.174834e+00 -1.088638e-03 6.448725e+00 2.591805e+00 [11] -4.313672e-04 -9.171867e-03 -6.793569e+00 -2.480562e+01 -1.514828e+01 [16] -1.259896e-01 -7.504192e-02 6.647855e-02 5.609645e-01 1.093114e-01 [21] 1.802123e+00 7.650033e-03 -3.534951e+00 -2.028473e-03 -2.837360e+01> do.call("fun",list(a=c(1:6),b=rnorm(6)))[1] 1.4766338 NaN 3.0214852 3.8132530 0.2753699 NaN [7] NaN NaN 2.9998547 NaN NaN 6.3050385 [13] 0.5970596 0.8722498 2.9931344 4.0664852 NaN NaN [19] 2.8121803 NaN 2.9989127 NaN NaN NaN [25] 14.4631627 Warning messages: 1: longer object length is not a multiple of shorter object length in: vec^b 2: longer object length is not a multiple of shorter object length in: a * vec^b>--------------------------------------- Peter Dalgaard: That's not what do.call does. It is for situations where the argument list of a single call needs to be constructed from simpler components. Your example is equivalent to fun(a=c(1:6), b=rnorm(6)) The loop over multiple parallel vectors is only doable via something like lapply(1:6, function(i)fun(a[i],b[i])) However, I recently played with this and got as far as this: napply <- function(..., FUN) { FUN <- match.fun(FUN) x <- list(...) lens <- sapply(x,length) len <- max(lens) if (any(lens != len)) x <- lapply(x, rep, length=len) tuples <- lapply(seq(length=len), function(i)lapply(x,"[", i)) sapply(tuples, function(t)eval(as.call(c(FUN,t)))) }> napply(a=c(1:6),b=rnorm(6), FUN=fun)[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.0259135 NaN 3.003882 NaN NaN 20.299212 [2,] NaN 1.977696 3.026111 NaN 3.951746 19.107481 [3,] 1.1840499 2.024837 NaN 8.289768 NaN 7.479917 [4,] 0.9756922 2.003576 NaN NaN 4.236000 NaN [5,] 1.0010550 2.006045 NaN NaN NaN 1302.330425 [6,] NaN NaN NaN 2.472650 NaN NaN [7,] NaN 2.094956 NaN NaN NaN 3.685879 [8,] 0.8646628 NaN 2.993435 NaN 3.369501 NaN [9,] NaN 2.044915 3.006433 6.426090 6.123980 19.235790 [10,] 1.6051736 NaN 3.011986 NaN 3.638641 NaN ....> > fun > function(a,b){ > > vec <- rnorm(25) > res <- a*vec^b > res > > } > > fun(2,3) > [1] 7.006278e+00 3.515010e-01 7.989718e+00 -3.377766e-02 > -1.879471e-02 > [6] -2.920680e-01 1.174834e+00 -1.088638e-03 6.448725e+00 > 2.591805e+00 > [11] -4.313672e-04 -9.171867e-03 -6.793569e+00 -2.480562e+01 > -1.514828e+01 > [16] -1.259896e-01 -7.504192e-02 6.647855e-02 5.609645e-01 > 1.093114e-01 > [21] 1.802123e+00 7.650033e-03 -3.534951e+00 -2.028473e-03 > -2.837360e+01 > > do.call("fun",list(a=c(1:6),b=rnorm(6))) > [1] 1.4766338 NaN 3.0214852 3.8132530 0.2753699 NaN > [7] NaN NaN 2.9998547 NaN NaN 6.3050385 > [13] 0.5970596 0.8722498 2.9931344 4.0664852 NaN NaN > [19] 2.8121803 NaN 2.9989127 NaN NaN NaN > [25] 14.4631627 > Warning messages: > 1: longer object length > is not a multiple of shorter object length in: vec^b > 2: longer object length > is not a multiple of shorter object length in: a * vec^b > >--------------------------------------- Thomas Lumley: And why wouldn't you want to use the for() loop? Unless your function is vectorised you're not going to gain anything by getting rid of the for() loop. Definition of vectorised function by Thomas: Many R functions can operate on a vector of parameter values, eg log(10,c(2,e,10)) gives the log of 10 to base 2, e, and 10 If your function can do this, you can construct a set of vectors containing all your parameter values (expand.grid() is useful for this) and evaluate your function once. This can be faster than for() loops when much of the iteration is done in compiled code. If the iteration has to be done in interpreted code then you can't really speed up the for() loops. You can hide the loops with the apply() functions, which may make your code more readable, but it won't typically speed it up. --------------------------------------- Ernesto Jardim: This was not my understanding. I thougth that if you can use functions like apply and similar instead of for loops your code will be faster. Basicly relying on these functions code which is (should be) optimized for speed. If what you're saying is true then using functions like apply is a matter of simplicity and not speeding up the code. Is this correct ? --------------------------------------- Douglas Bates: Yes. If you examine the apply function you will see that the bulk of the work is done in a loop if (length(d.call) < 2) { if (length(dn.call)) dimnames(newX) <- c(dn.call, list(NULL)) for (i in 1:d2) ans[[i]] <- FUN(newX[, i], ...) } else for (i in 1:d2) ans[[i]] <- FUN(array(newX[, i], d.call, dn.call), ...) In their book "S Programming" (Springer, 2000) Venables and Ripley discuss general strategies for writing R functions and for making them faster. One general principle is to profile the code before implementing changes. The manual "Writing R Extensions" has a section on "Profiling R code" which is highly recommended. --------------------------------------- Thomas Lumley: Yes. As you can easily verify [and always should verify if you're doing optimisation], the apply commands are rarely faster than their for() loop equivalents. They can be slower. The speed advantage of apply is partly mythical -- there's never been that much advantage -- and partly historical, as in some versions of S-PLUS 3.x apply was often faster for complicated reasons due to memory management. The real point of the apply() family is to suppress unnecessary loop variables and make your code tidier. If we ever get parallel processing then apply() could really become faster, but that's not going to happen any time soon. --------------------------------------- Brian Ripley: I think that is a little pessimistic. It is true for apply() in R, which just streamlines a for() loop, and also does things you may not want. However, lapply is an internal function (written by me) because it is sometimes a lot faster, and in my experiments never slower. lapply() was a lot faster in S-PLUS 3.4. It was often slower than for() in 5.0, hence a lot of consternation. There *are* a lot of myths about, but not all in one direction. As others have said, `S Programming' tries to give a balanced view across 3 different S implementations, and profiling can be a great tool in optimizing code (it can be misleading too, but rarely when it matters). Summary: lapply is encouraged. apply is a matter of style. Test out whatever you do to see if it is really worthwhile. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._