thr3ads.net - R help - [R] [External] Somewhat disconcerting behavior of seq.int() [May 2022]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2022-May-03 01:45 UTC

[R] Somewhat disconcerting behavior of seq.int()

** Disconcerting to me, anyway; perhaps not to others**
(Apologies if this has been discussed before. I was a bit nonplussed by
it, but maybe I'm just clueless.) Anyway:

Here are two almost identical versions of the Sieve of Eratosthenes.
The difference between them is only in the call to seq.int() that is
highlighted

sieve1 <- function(m){
   if(m < 2) return(NULL)
   a <- floor(sqrt(m))
   pr <- Recall(a)
####################
   s <- seq.int(2, to = m) ## Only difference here
######################
   for( i in pr) s <- s[as.logical(s %% i)]
   c(pr,s)
}

sieve2 <- function(m){
   if(m < 2) return(NULL)
   a <- floor(sqrt(m))
   pr <- Recall(a)
####################
   s <- seq.int(2, to = m, by =1) ## Only difference here
#######################
   for( i in pr) s <- s[as.logical(s %% i)]
   c(pr,s)
}

However, execution time is *quite* different.

library(microbenchmark)
> microbenchmark(l1 <- sieve1(1e5), times =50)Unit: milliseconds
                expr      min       lq     mean  median       uq      max
 l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
 neval
    50
> microbenchmark(l2 <- sieve2(1e5), times =50)Unit: milliseconds
                expr      min      lq     mean   median       uq      max
 l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
 neval
    50

Now note that:> identical(l1, l2)[1] FALSE

## Because:> str(l1) int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
> str(l2) num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...

I therefore assume that seq.int(), an internal generic, is dispatching
to a method that uses integer arithmetic for sieve1 and floating point
for sieve2. Is this correct? If not, what do I fail to understand? And
is this indeed the source of the large difference in execution time?

Further, ?seq.int says:
"The interpretation of the unnamed arguments of seq and seq.int is not
standard, and it is recommended always to name the arguments when
programming."

The above suggests that maybe this advice should be qualified, and/or
adding some comments to the Help file regarding this behavior might be
useful to na?fs like me.

In case it makes a difference (and it might!):
> sessionInfo()R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK:
/Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] microbenchmark_1.4.9

loaded via a namespace (and not attached):
[1] compiler_4.2.0 tools_4.2.0


Thanks for any enlightenment and again apologies if I am plowing old ground.

Best to all,

Bert Gunter

Andrew Simmons

2022-May-03 02:00 UTC

head link

[R] Somewhat disconcerting behavior of seq.int()

A sequence where 'from' and 'to' are both integer valued (not
necessarily
class integer) will use R_compact_intrange; the return value is an integer
vector and is stored with minimal space.

In your case, you specified a 'from', 'to', and 'by'; if
all are integer
class, then the return value is also integer class. I think if 'from'
and
'to' are integer valued and 'by' is integer class, the return
value is
integer class, might want to check that though. In your case, I think
replacing 'by = 1' with 'by = 1L' will mean the sequences are
identical,
though it may still take longer than not specifying at all.

On Mon, May 2, 2022, 21:46 Bert Gunter <bgunter.4567 at gmail.com> wrote:
> ** Disconcerting to me, anyway; perhaps not to others**
> (Apologies if this has been discussed before. I was a bit nonplussed by
> it, but maybe I'm just clueless.) Anyway:
>
> Here are two almost identical versions of the Sieve of Eratosthenes.
> The difference between them is only in the call to seq.int() that is
> highlighted
>
> sieve1 <- function(m){
>    if(m < 2) return(NULL)
>    a <- floor(sqrt(m))
>    pr <- Recall(a)
> ####################
>    s <- seq.int(2, to = m) ## Only difference here
> ######################
>    for( i in pr) s <- s[as.logical(s %% i)]
>    c(pr,s)
> }
>
> sieve2 <- function(m){
>    if(m < 2) return(NULL)
>    a <- floor(sqrt(m))
>    pr <- Recall(a)
> ####################
>    s <- seq.int(2, to = m, by =1) ## Only difference here
> #######################
>    for( i in pr) s <- s[as.logical(s %% i)]
>    c(pr,s)
> }
>
> However, execution time is *quite* different.
>
> library(microbenchmark)
>
> > microbenchmark(l1 <- sieve1(1e5), times =50)
> Unit: milliseconds
>                 expr      min       lq     mean  median       uq      max
>  l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918
7.627751
>  neval
>     50
>
> > microbenchmark(l2 <- sieve2(1e5), times =50)
> Unit: milliseconds
>                 expr      min      lq     mean   median       uq      max
>  l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253
687.9464
>  neval
>     50
>
> Now note that:
> > identical(l1, l2)
> [1] FALSE
>
> ## Because:
> > str(l1)
>  int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>
> > str(l2)
>  num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>
> I therefore assume that seq.int(), an internal generic, is dispatching
> to a method that uses integer arithmetic for sieve1 and floating point
> for sieve2. Is this correct? If not, what do I fail to understand? And
> is this indeed the source of the large difference in execution time?
>
> Further, ?seq.int says:
> "The interpretation of the unnamed arguments of seq and seq.int is not
> standard, and it is recommended always to name the arguments when
> programming."
>
> The above suggests that maybe this advice should be qualified, and/or
> adding some comments to the Help file regarding this behavior might be
> useful to na?fs like me.
>
> In case it makes a difference (and it might!):
>
> > sessionInfo()
> R version 4.2.0 (2022-04-22)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Monterey 12.3.1
>
> Matrix products: default
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] microbenchmark_1.4.9
>
> loaded via a namespace (and not attached):
> [1] compiler_4.2.0 tools_4.2.0
>
>
> Thanks for any enlightenment and again apologies if I am plowing old
> ground.
>
> Best to all,
>
> Bert Gunter
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

iuke-tier@ey m@iii@g oii uiow@@edu

2022-May-03 03:52 UTC

head link

[R] [External] Somewhat disconcerting behavior of seq.int()

Something is very different about your system. On my Linux system I get
> microbenchmark(l1 <- sieve1(1e5), times =50)Unit: milliseconds
                 expr     min       lq     mean   median       uq     max neval
  l1 <- sieve1(1e+05) 5.04615 5.350576 6.967507 5.787626 7.323502 28.3085   
50> microbenchmark(l2 <- sieve2(1e5), times =50)Unit: milliseconds
                 expr      min       lq     mean   median      uq      max neval
  l2 <- sieve2(1e+05) 14.58763 15.79368 17.00738 16.29299 17.0723 30.57338   
50

Similar on an Intel Mac.

Best,

luke

On Tue, 3 May 2022, Bert Gunter wrote:
> ** Disconcerting to me, anyway; perhaps not to others**
> (Apologies if this has been discussed before. I was a bit nonplussed by
> it, but maybe I'm just clueless.) Anyway:
>
> Here are two almost identical versions of the Sieve of Eratosthenes.
> The difference between them is only in the call to seq.int() that is
> highlighted
>
> sieve1 <- function(m){
>   if(m < 2) return(NULL)
>   a <- floor(sqrt(m))
>   pr <- Recall(a)
> ####################
>   s <- seq.int(2, to = m) ## Only difference here
> ######################
>   for( i in pr) s <- s[as.logical(s %% i)]
>   c(pr,s)
> }
>
> sieve2 <- function(m){
>   if(m < 2) return(NULL)
>   a <- floor(sqrt(m))
>   pr <- Recall(a)
> ####################
>   s <- seq.int(2, to = m, by =1) ## Only difference here
> #######################
>   for( i in pr) s <- s[as.logical(s %% i)]
>   c(pr,s)
> }
>
> However, execution time is *quite* different.
>
> library(microbenchmark)
>
>> microbenchmark(l1 <- sieve1(1e5), times =50)
> Unit: milliseconds
>                expr      min       lq     mean  median       uq      max
> l1 <- sieve1(1e+05) 3.957084 3.997959 4.732045 4.01698 4.184918 7.627751
> neval
>    50
>
>> microbenchmark(l2 <- sieve2(1e5), times =50)
> Unit: milliseconds
>                expr      min      lq     mean   median       uq      max
> l2 <- sieve2(1e+05) 681.6209 682.555 683.8279 682.9368 685.2253 687.9464
> neval
>    50
>
> Now note that:
>> identical(l1, l2)
> [1] FALSE
>
> ## Because:
>> str(l1)
> int [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>
>> str(l2)
> num [1:9592] 2 3 5 7 11 13 17 19 23 29 ...
>
> I therefore assume that seq.int(), an internal generic, is dispatching
> to a method that uses integer arithmetic for sieve1 and floating point
> for sieve2. Is this correct? If not, what do I fail to understand? And
> is this indeed the source of the large difference in execution time?
>
> Further, ?seq.int says:
> "The interpretation of the unnamed arguments of seq and seq.int is not
> standard, and it is recommended always to name the arguments when
> programming."
>
> The above suggests that maybe this advice should be qualified, and/or
> adding some comments to the Help file regarding this behavior might be
> useful to na?fs like me.
>
> In case it makes a difference (and it might!):
>
>> sessionInfo()
> R version 4.2.0 (2022-04-22)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Monterey 12.3.1
>
> Matrix products: default
> LAPACK:
/Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] microbenchmark_1.4.9
>
> loaded via a namespace (and not attached):
> [1] compiler_4.2.0 tools_4.2.0
>
>
> Thanks for any enlightenment and again apologies if I am plowing old
ground.
>
> Best to all,
>
> Bert Gunter
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

R help - May 2022 - [External] Somewhat disconcerting behavior of seq.int()

[R] Somewhat disconcerting behavior of seq.int()

[R] Somewhat disconcerting behavior of seq.int()

[R] [External] Somewhat disconcerting behavior of seq.int()