-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
test: eltype matching tests run outside of error mode
- Loading branch information
Showing
1 changed file
with
16 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0b51676
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
0b51676
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/115081
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
0b51676
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
411125
ns411750
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322750
ns322250
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
244083
ns244354.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740229
ns739959
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43576
ns44622
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1361688
ns1314687.5
ns1.04
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2448167
ns2415854
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
16505500
ns16411375
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2198042
ns2250459
ns0.98
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
207361
ns210429
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1419479
ns1387333
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
931729
ns913146
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
1582917
ns1549208
ns1.02
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2213229
ns2241750
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1768708
ns1764229.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1072541.5
ns1093291
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1542417
ns1517187.5
ns1.02
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3010167
ns2995125
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208923
ns211213.5
ns0.99
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12164458
ns12135312.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8831167
ns8821250.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9231125
ns9211042
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18575542
ns18575959
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1506706
ns1486214
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17297875
ns17305854
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13966709
ns13958125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14490229
ns14521020.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21825958
ns21821271.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250077771
ns250357604
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148351292
ns148471959
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116742208
ns116711333.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
446235042
ns447366750
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5474148
ns5485324
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1226735000
ns1224804208
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
933099541
ns931517375
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
833488083
ns829351334
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1628798917
ns1631699042
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31247743
ns31517422
ns0.99
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1139513458
ns1033852125
ns1.10
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1004012958
ns985852708.5
ns1.02
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1343460771
ns1297620895.5
ns1.04
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1729098333
ns1729286083.5
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1084187.5
ns1104458
ns0.98
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1632875
ns1636000
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
3807833
ns3608917
ns1.06
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
781500
ns779208.5
ns1.00
lenet(28, 28, 1, 32)/forward/GPU/CUDA
269181
ns263937.5
ns1.02
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2973917
ns3005562.5
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4123458
ns4102687.5
ns1.01
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
11391021
ns9959375
ns1.14
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3140229.5
ns3171291.5
ns0.99
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1147789
ns1093100.5
ns1.05
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2327458.5
ns2309084
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1427875
ns1395125
ns1.02
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1552208
ns1532166.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4203041
ns4206854.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
209123
ns207803.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19423562
ns19411834
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16279416
ns16074729.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
17361812
ns17204167
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25815125
ns25846729
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1606839
ns1588884
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34524104
ns34056708
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
31057875
ns30790312.5
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31105416
ns31003625
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36883875
ns37038625
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4526208.5
ns4527562.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2777083.5
ns2780667
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2685312.5
ns2672396
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8381562.5
ns8380583
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
373639
ns420119
ns0.89
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38887521
ns39090938
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32509584
ns32065354
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32333229
ns32270791
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51833125
ns51859750
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2633953
ns2623535
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
88607687.5
ns89004313
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
113743125
ns114465416
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
227726583
ns219243333
ns1.04
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74951083
ns74793562.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
267716166
ns268192958
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159256375
ns159139541
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
123708895.5
ns123304667
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
485091625
ns484886208
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7022924
ns7013600
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1478680979
ns1472254854
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1179547083
ns1174209000
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1066054563
ns1058770187.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
2001889209
ns2000167187.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34822377.5
ns34540951
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1724298291
ns1715889292
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1565497271
ns1527816438
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1925114250
ns1882392833
ns1.02
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2239111625
ns2226899333
ns1.01
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2028500
ns2068395.5
ns0.98
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2967646
ns2993084
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
8104667
ns8374792
ns0.97
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2308041.5
ns2453875.5
ns0.94
lenet(28, 28, 1, 128)/forward/GPU/CUDA
272667
ns266526.5
ns1.02
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9619395.5
ns9618542
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12015166
ns12066625
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
26324292
ns23824500
ns1.10
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11677541
ns11760125.5
ns0.99
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1188628.5
ns1164719.5
ns1.02
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
383215354.5
ns382306709
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
284366604.5
ns285915229.5
ns0.99
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
261725395.5
ns259469458
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
453056042
ns452429396
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
5009701
ns4851990
ns1.03
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1160384584
ns1152636625
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
912166042
ns942909208
ns0.97
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
984922208
ns988346750
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1396092167
ns1394608042
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
18111984
ns17883204
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1053833
ns1047084
ns1.01
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
1605958
ns2051062.5
ns0.78
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
5411083
ns5536708
ns0.98
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1296875
ns1365833.5
ns0.95
lenet(28, 28, 1, 64)/forward/GPU/CUDA
265721
ns273727
ns0.97
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6510958
ns6487104
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
13082584
ns12416624.5
ns1.05
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
21760833.5
ns18396062.5
ns1.18
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
5984375
ns6074542
ns0.99
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1208949
ns1242828
ns0.97
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70494333
ns70480271
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43641125
ns43555583
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39690584
ns39728521
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
133468354
ns132459104
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1945255.5
ns1879688
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356723479.5
ns356722500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
271306709
ns270518833
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254269771
ns253991500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
536238459
ns534459625
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12301288
ns12289288
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
395599834
ns395296292
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
377440167
ns405206479
ns0.93
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
697289229.5
ns702801292
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
708495833
ns709895792
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1188885083
ns1186905458
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
692916625
ns688634396
ns1.01
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
642915416.5
ns641177604
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1776695937.5
ns1774744187
ns1.00
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12306515
ns12312145
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3668882667
ns3681320875
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2834396125
ns2815834792
ns1.01
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2699395792
ns2699549167
ns1.00
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5050853166
ns5054825084
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49852240.5
ns49638979
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3422958
ns3415021
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2075583
ns2058416.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2513666
ns2523458
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6018396
ns6016791
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
317455.5
ns345305
ns0.92
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
26048666
ns26262750
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19094062.5
ns18935500
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19316000
ns19377771
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39190562.5
ns39256000
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2466381
ns2462287
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
55369583
ns55393875
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
82210395.5
ns81461166
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
173994812.5
ns173473458
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45354333
ns45537167
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1779187.5
ns1775625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1097834
ns1108166
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1568791
ns1574125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3021312
ns3027041
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
210623
ns213889
ns0.98
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12543916
ns12554020.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9277708.5
ns9212687
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9594229.5
ns9634625
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18987604.5
ns18974791
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1527868.5
ns1535990.5
ns0.99
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17650708
ns17660667
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14335458
ns14318666.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14544250
ns14527083
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22174250
ns22176166
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70431125
ns70469833.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43537125
ns43612917
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39620583
ns39834375
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132531916.5
ns132581875
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1888879
ns1939391
ns0.97
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
360439083.5
ns362447520.5
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
347132666.5
ns345850729
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
304637542
ns304601834
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
722631792
ns723285166
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13304668
ns13373827
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
419234750
ns418608542
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
421465729
ns424576167
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
724319500
ns708932270.5
ns1.02
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
714217917
ns719137166
ns0.99
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1705416
ns1646791.5
ns1.04
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1350333.5
ns1350958
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1170667
ns1155333
ns1.01
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2385333.5
ns2429416.5
ns0.98
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
580442.5
ns590640
ns0.98
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8948271
ns8673937.5
ns1.03
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12980437.5
ns12961417
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
32353312.5
ns32282958
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9804417
ns9836041
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1427987.5
ns1466324
ns0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17962354
ns17283334
ns1.04
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17440000
ns17102416.5
ns1.02
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
29738291
ns29614750
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14431937.5
ns14366916
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
669833.5
ns668167
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
529250
ns576791
ns0.92
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1065708.5
ns1066666.5
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
725395.5
ns725292
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
47647
ns48292
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1549104
ns1517874.5
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1038917
ns1004646
ns1.03
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1517584
ns1520604
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2269896
ns2250708.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
233022
ns239031.5
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1582916
ns1572667
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1087854.5
ns1074146
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1464166
ns1411458
ns1.04
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2190854
ns2225791.5
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3413625
ns3403354
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2047083
ns2053083
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2507333.5
ns2486145.5
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6011813
ns5997125
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
284231.5
ns289032.5
ns0.98
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24149000
ns24071812.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17330312.5
ns17199083
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17059271
ns17076604
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37480499.5
ns37510583.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2394265
ns2401628
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
53573937.5
ns53560041.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83649500
ns81012667
ns1.03
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
172928458
ns171727292
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44425187.5
ns44535666.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249999250
ns250063500
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148223583
ns148044042
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
116384896
ns116121687.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447335937.5
ns446980916.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5449146
ns5449734
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1105347792
ns1103639292
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
857822708.5
ns857470145.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
830398396
ns823519999.5
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1762030583
ns1754095625
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28862807
ns29250301
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1020245354
ns1017950791.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
966178875
ns975600583
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1293466208
ns1309755625
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1724193375.5
ns1718285542
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1306896.5
ns1300041
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
984292
ns946500
ns1.04
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
778437.5
ns781542
ns1.00
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
1958750
ns1942625
ns1.01
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
566426
ns559913
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
6042375
ns5979812
ns1.01
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6715125
ns6230958.5
ns1.08
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
26872708
ns25879708.5
ns1.04
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
6973417
ns7087208
ns0.98
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1365853
ns1361665
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
11215770.5
ns11063062
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
10033208
ns10077750
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
17672208
ns17523854
ns1.01
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8568500
ns8754104
ns0.98
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
399500
ns358709
ns1.11
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
399291.5
ns439875
ns0.91
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
3544167
ns3375791.5
ns1.05
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
88459
ns88834
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27618
ns27879
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
397459
ns388437.5
ns1.02
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
445041.5
ns426292
ns1.04
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4819375
ns4297000
ns1.12
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
259833
ns258500
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219889.5
ns218884.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
428313
ns419000
ns1.02
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
475541
ns456354
ns1.04
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4960437.5
ns4722375
ns1.05
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
271333
ns271062.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
343709
ns306375
ns1.12
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
333937.5
ns375708.5
ns0.89
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
769833
ns776917
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
53125
ns53833
ns0.99
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
28016
ns27939
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
362209
ns351666
ns1.03
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
342792
ns314041
ns1.09
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
897833
ns420792
ns2.13
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
152583
ns151583
ns1.01
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
205326.5
ns204717.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
378500
ns366416
ns1.03
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
358042
ns329334
ns1.09
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
728708
ns423542
ns1.72
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
150833.5
ns150833
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
603479208
ns603125958
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
429058104
ns424709854
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
385950542
ns379453834
ns1.02
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
872372584
ns872147584
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7023071
ns7026308.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2010730958
ns2006276000.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1608264687.5
ns1611544791.5
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1653085833
ns1550847520.5
ns1.07
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2638084625
ns2621300375
ns1.01
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
25932761
ns25894358
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
535250
ns524666
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
433291.5
ns431646
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
3023791.5
ns2828083
ns1.07
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
880791
ns865708.5
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
46986
ns47753
ns0.98
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1881604
ns1892208
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2798729
ns2773459
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
16356750
ns16216042
ns1.01
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2759229
ns2764145.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
246659.5
ns250438
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1962958.5
ns1946958.5
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5070604
ns5023625
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
16396875
ns16786084
ns0.98
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2785625.5
ns2779000
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1614125
ns1564584
ns1.03
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1235583
ns1208666.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1027208
ns946958
ns1.08
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2300875
ns2330542
ns0.99
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
587018.5
ns588876.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5921542
ns5931146
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
5089688
ns4679292
ns1.09
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
26372271
ns25938874.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7288250
ns7312458.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1379747.5
ns1358666
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
13324958
ns13317979
ns1.00
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
12237645.5
ns11993375
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
21281499.5
ns20776000
ns1.02
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10668750
ns10716854.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
4417
ns2625
ns1.68
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2583.5
ns2292
ns1.13
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
2750
ns3542
ns0.78
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2500
ns2333.5
ns1.07
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24754
ns24837.5
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
7459
ns7083
ns1.05
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7250
ns7084
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7333
ns7167
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7083
ns7125
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
213008
ns210657.5
ns1.01
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8375
ns8208
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8583
ns8167
ns1.05
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8459
ns8292
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
5834
ns6000
ns0.97
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
10625
ns10312.5
ns1.03
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13708
ns13625
ns1.01
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
12042
ns10667
ns1.13
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7500
ns6917
ns1.08
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
25091.5
ns25243
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
20250
ns19959
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
19959
ns19792
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20083
ns20084
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19875
ns19959
ns1.00
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
231793
ns230204.5
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23625
ns23583
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23667
ns23458
ns1.01
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
23666
ns23583
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21084
ns21375
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28708
ns28875
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
29292
ns28542
ns1.03
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28375
ns28542
ns0.99
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46584
ns46084
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
26247
ns26158
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
222250
ns223771
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
279729.5
ns274229.5
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4335396.5
ns4189916
ns1.03
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145208
ns144958
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
203061
ns206708.5
ns0.98
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
333124.5
ns331333
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
322500
ns311771
ns1.03
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
861333
ns855937.5
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
160750
ns160334
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1875
ns1875
ns1
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1958
ns2000
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2416
ns2750
ns0.88
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1792
ns3833.5
ns0.47
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
23061
ns23305
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5458
ns5459
ns1.00
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5500
ns5292
ns1.04
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5375
ns5291
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5375
ns5209
ns1.03
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
243257
ns255218.5
ns0.95
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11333.5
ns11500
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11208
ns11375
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11667
ns11417
ns1.02
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6833
ns6791
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79834791
ns79822041
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49125291
ns49051354.5
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
43259375
ns43286875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151428917
ns151651459
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2726005
ns2720855
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
498680292
ns667046042
ns0.75
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
414152083
ns413223250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
396991709
ns397303625
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
689086500
ns681225125
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14585553
ns14587521
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
712438146
ns715487875
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
683887166
ns677171917
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
1013847083
ns1012616958
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
999589459
ns1001064708
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.