** Disconcerting to me, anyway; perhaps not to others** (Apologies if this has been discussed before. I was a bit nonplussed by it, but maybe I'm just clueless.) Anyway: Here are two almost identical versions of the Sieve of Eratosthenes. The difference between them is only in the call to seq.int() that is highlighted sieve1 <- function(m){ if(m < 2) return(NULL) a <- floor(sqrt(m)) pr <- Recall(a) #################### s <- seq.int(2, to = m) ## Only difference here ###################### for( i in pr) s <- s[as.logical(s %% i)] c(pr,s) } sieve2 <- function(m){ if(m < 2) return(NULL) a <- floor(sqrt(m)) pr <- Recall(a) #################### s <- seq.int(2, to = m, by =1) ## Only difference here ####################### for( i in pr) s <- s[as.logical(s %% i)] c(pr,s) } However, execution time is *quite* different. library(microbenchmark)> microbenchmark(l1 <- sieve1(1e5), times =50)Unit: milliseconds expr min lq mean median uq max l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 neval 50> microbenchmark(l2 <- sieve2(1e5), times =50)Unit: milliseconds expr min lq mean median uq max l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 neval 50 Now note that:> identical(l1, l2)[1] FALSE ## Because:> str(l1)int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...> str(l2)num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... I therefore assume that seq.int(), an internal generic, is dispatching to a method that uses integer arithmetic for sieve1 and floating point for sieve2. Is this correct? If not, what do I fail to understand? And is this indeed the source of the large difference in execution time? Further, ?seq.int says: "The interpretation of the unnamed arguments of seq and seq.int is not standard, and it is recommended always to name the arguments when programming." The above suggests that maybe this advice should be qualified, and/or adding some comments to the Help file regarding this behavior might be useful to na?fs like me. In case it makes a difference (and it might!):> sessionInfo()R version 4.2.0 (2022-04-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Monterey 12.3.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] microbenchmark_1.4.9 loaded via a namespace (and not attached): [1] compiler_4.2.0 tools_4.2.0 Thanks for any enlightenment and again apologies if I am plowing old ground. Best to all, Bert Gunter
A sequence where 'from' and 'to' are both integer valued (not necessarily class integer) will use R_compact_intrange; the return value is an integer vector and is stored with minimal space. In your case, you specified a 'from', 'to', and 'by'; if all are integer class, then the return value is also integer class. I think if 'from' and 'to' are integer valued and 'by' is integer class, the return value is integer class, might want to check that though. In your case, I think replacing 'by = 1' with 'by = 1L' will mean the sequences are identical, though it may still take longer than not specifying at all. On Mon, May 2, 2022, 21:46 Bert Gunter <bgunter.4567 at gmail.com> wrote:> ** Disconcerting to me, anyway; perhaps not to others** > (Apologies if this has been discussed before. I was a bit nonplussed by > it, but maybe I'm just clueless.) Anyway: > > Here are two almost identical versions of the Sieve of Eratosthenes. > The difference between them is only in the call to seq.int() that is > highlighted > > sieve1 <- function(m){ > if(m < 2) return(NULL) > a <- floor(sqrt(m)) > pr <- Recall(a) > #################### > s <- seq.int(2, to = m) ## Only difference here > ###################### > for( i in pr) s <- s[as.logical(s %% i)] > c(pr,s) > } > > sieve2 <- function(m){ > if(m < 2) return(NULL) > a <- floor(sqrt(m)) > pr <- Recall(a) > #################### > s <- seq.int(2, to = m, by =1) ## Only difference here > ####################### > for( i in pr) s <- s[as.logical(s %% i)] > c(pr,s) > } > > However, execution time is *quite* different. > > library(microbenchmark) > > > microbenchmark(l1 <- sieve1(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > neval > 50 > > > microbenchmark(l2 <- sieve2(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > neval > 50 > > Now note that: > > identical(l1, l2) > [1] FALSE > > ## Because: > > str(l1) > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > > str(l2) > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > I therefore assume that seq.int(), an internal generic, is dispatching > to a method that uses integer arithmetic for sieve1 and floating point > for sieve2. Is this correct? If not, what do I fail to understand? And > is this indeed the source of the large difference in execution time? > > Further, ?seq.int says: > "The interpretation of the unnamed arguments of seq and seq.int is not > standard, and it is recommended always to name the arguments when > programming." > > The above suggests that maybe this advice should be qualified, and/or > adding some comments to the Help file regarding this behavior might be > useful to na?fs like me. > > In case it makes a difference (and it might!): > > > sessionInfo() > R version 4.2.0 (2022-04-22) > Platform: x86_64-apple-darwin17.0 (64-bit) > Running under: macOS Monterey 12.3.1 > > Matrix products: default > LAPACK: > /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] microbenchmark_1.4.9 > > loaded via a namespace (and not attached): > [1] compiler_4.2.0 tools_4.2.0 > > > Thanks for any enlightenment and again apologies if I am plowing old > ground. > > Best to all, > > Bert Gunter > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
iuke-tier@ey m@iii@g oii uiow@@edu
2022-May-03 03:52 UTC
[R] [External] Somewhat disconcerting behavior of seq.int()
Something is very different about your system. On my Linux system I get> microbenchmark(l1 <- sieve1(1e5), times =50)Unit: milliseconds expr min lq mean median uq max neval l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085 50> microbenchmark(l2 <- sieve2(1e5), times =50)Unit: milliseconds expr min lq mean median uq max neval l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338 50 Similar on an Intel Mac. Best, luke On Tue, 3 May 2022, Bert Gunter wrote:> ** Disconcerting to me, anyway; perhaps not to others** > (Apologies if this has been discussed before. I was a bit nonplussed by > it, but maybe I'm just clueless.) Anyway: > > Here are two almost identical versions of the Sieve of Eratosthenes. > The difference between them is only in the call to seq.int() that is > highlighted > > sieve1 <- function(m){ > if(m < 2) return(NULL) > a <- floor(sqrt(m)) > pr <- Recall(a) > #################### > s <- seq.int(2, to = m) ## Only difference here > ###################### > for( i in pr) s <- s[as.logical(s %% i)] > c(pr,s) > } > > sieve2 <- function(m){ > if(m < 2) return(NULL) > a <- floor(sqrt(m)) > pr <- Recall(a) > #################### > s <- seq.int(2, to = m, by =1) ## Only difference here > ####################### > for( i in pr) s <- s[as.logical(s %% i)] > c(pr,s) > } > > However, execution time is *quite* different. > > library(microbenchmark) > >> microbenchmark(l1 <- sieve1(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max > l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751 > neval > 50 > >> microbenchmark(l2 <- sieve2(1e5), times =50) > Unit: milliseconds > expr min lq mean median uq max > l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464 > neval > 50 > > Now note that: >> identical(l1, l2) > [1] FALSE > > ## Because: >> str(l1) > int [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > >> str(l2) > num [1:9592] 2 3 5 7 11 13 17 19 23 29 ... > > I therefore assume that seq.int(), an internal generic, is dispatching > to a method that uses integer arithmetic for sieve1 and floating point > for sieve2. Is this correct? If not, what do I fail to understand? And > is this indeed the source of the large difference in execution time? > > Further, ?seq.int says: > "The interpretation of the unnamed arguments of seq and seq.int is not > standard, and it is recommended always to name the arguments when > programming." > > The above suggests that maybe this advice should be qualified, and/or > adding some comments to the Help file regarding this behavior might be > useful to na?fs like me. > > In case it makes a difference (and it might!): > >> sessionInfo() > R version 4.2.0 (2022-04-22) > Platform: x86_64-apple-darwin17.0 (64-bit) > Running under: macOS Monterey 12.3.1 > > Matrix products: default > LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] microbenchmark_1.4.9 > > loaded via a namespace (and not attached): > [1] compiler_4.2.0 tools_4.2.0 > > > Thanks for any enlightenment and again apologies if I am plowing old ground. > > Best to all, > > Bert Gunter > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu