Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--optimize-level 3 slowing things down #892

Open
mml opened this issue Dec 13, 2024 · 3 comments
Open

--optimize-level 3 slowing things down #892

mml opened this issue Dec 13, 2024 · 3 comments

Comments

@mml
Copy link

mml commented Dec 13, 2024

Yesterday I filed a bug prematurely, but I knew something was not quite right. The code below takes >30% longer with --optimize-level 3 than without it.

% scheme --script div-and-mod.ss
(time (for-each (lambda (...) ...) ...))
    no collections
    6.400004545s elapsed cpu time
    6.400545484s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
% echo '(compile-file "div-and-mod.ss")' | scheme -q
compiling div-and-mod.ss with output to div-and-mod.so
% scheme --script div-and-mod.so
(time (for-each (lambda (...) ...) ...))
    no collections
    6.117455272s elapsed cpu time
    6.117724356s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
% echo '(compile-file "div-and-mod.ss")' | scheme -q --optimize-level 3
compiling div-and-mod.ss with output to div-and-mod.so
% scheme --script div-and-mod.so
(time (for-each (lambda (...) ...) ...))
    no collections
    8.324555132s elapsed cpu time
    8.329026569s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
(set! total 0)
(let ([l (iota 300000000)])
  (time
    (for-each
      (lambda (x)
        (let-values ([(quo rem) (fxdiv-and-mod x 1000)])
          (set! total (+ total quo rem))))
      l)))
(printf "Total ~:d~n" total)
% scheme --version
10.1.0
% uname -a
Linux welwitschia 6.1.0-27-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.115-1 (2024-11-01) x86_64 GNU/Linux
@jltaylor-us
Copy link
Contributor

This works as expected (i.e., optimize level 3 is faster) on tarm64osx. Someone else will have to check the x64 linux builds.

@mbakhterev
Copy link

i've got same timings approx as @mml. level 3 is slower

$ cat test.scm 
(set! total 0)

(let ([l (iota 300000000)])
  (time
    (for-each
      (lambda (x)
        (let-values ([(quo rem) (fxdiv-and-mod x 1000)])
          (set! total (+ total quo rem))))
      l)))

(printf "Total ~:d~n" total)
$ echo '(compile-file "test.scm")' | chez -q
compiling test.scm with output to test.so
$ chez --script test.so
(time (for-each (lambda (...) ...) ...))
    no collections
    6.137851039s elapsed cpu time
    6.142206869s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
$ echo '(compile-file "test.scm")' | chez -q --optimize-level 3
compiling test.scm with output to test.so
$ chez --optimize-level 3 --script test.so
(time (for-each (lambda (...) ...) ...))
    no collections
    8.453322405s elapsed cpu time
    8.458856063s elapsed real time
    0 bytes allocated
Total 45,149,700,000,000
$ uname -a
Linux rocket 6.12.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Mon, 09 Dec 2024 14:31:57 +0000 x86_64 GNU/Linux
$ cat /proc/cpuinfo | grep 'model name' | uniq
model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
$ chez --version
10.1.0

@owaddell
Copy link
Contributor

I see the slowdown on arm64osx and a6le.

It looks like the issue is that the cp0 inline handler for fxdiv-and-mod does not (yet) anticipate how np-expand-primitives will handle fxdiv and fxmod. Since 1000 is not a power of two, we end up turning one out-of-line call into two such calls. If the test program had used 1024 instead of 1000, then the optimize-level 3 version would likely run faster.

We can cut a lot of (unmeasured) overhead from the test by replacing the for-each over the output of iota with a do loop:

(let ()
  (define (add-quo-rem total x)
    (let-values ([(quo rem) (fxdiv-and-mod x 1000)])
      (+ total quo rem)))
  (define total
    (time
     (do ([x 0 (fx+ x 1)]
          [total 0 (add-quo-rem total x)])
         ((fx= x 300000000) total))))
  (printf "Total ~:d~n" total))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants