-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: pretty printing of MaxPool Layer (#891)
- Loading branch information
Showing
3 changed files
with
13 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
043bae1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JuliaRegistrator register
043bae1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registration pull request created: JuliaRegistries/General/114849
Tip: Release Notes
Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.
To add them here just re-invoke and the PR will be updated.
Tagging
After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.
This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:
043bae1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lux Benchmarks
Dense(512 => 512, identity)(512 x 128)/forward/CPU/2 thread(s)
412666
ns412458.5
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/4 thread(s)
322354.5
ns322188
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/CPU/8 thread(s)
319458
ns244250
ns1.31
Dense(512 => 512, identity)(512 x 128)/forward/CPU/1 thread(s)
740416.5
ns739584
ns1.00
Dense(512 => 512, identity)(512 x 128)/forward/GPU/CUDA
43751
ns43656
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/2 thread(s)
1313875
ns1361709
ns0.96
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/4 thread(s)
2420917
ns2428521
ns1.00
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/8 thread(s)
19423583
ns16099583.5
ns1.21
Dense(512 => 512, identity)(512 x 128)/zygote/CPU/1 thread(s)
2274292
ns2260562
ns1.01
Dense(512 => 512, identity)(512 x 128)/zygote/GPU/CUDA
205395
ns206975.5
ns0.99
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/2 thread(s)
1390500
ns1428812
ns0.97
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/4 thread(s)
912917
ns906708
ns1.01
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/8 thread(s)
10875709
ns1628500
ns6.68
Dense(512 => 512, identity)(512 x 128)/enzyme/CPU/1 thread(s)
2207896
ns2244917
ns0.98
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1773146
ns1660916.5
ns1.07
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1085124.5
ns1079750
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1432500
ns1530375
ns0.94
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
2989333
ns3002125
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/forward/GPU/CUDA
207449.5
ns207801.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12143291
ns12150625
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
8836062.5
ns8835458
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9291520.5
ns9224604
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18553083.5
ns18587667
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1505031.5
ns1487468.5
ns1.01
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17307791
ns17307333.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
13990459
ns13941250
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14509125
ns14519104.5
ns1.00
Conv((3, 3), 2 => 2, identity)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
21766396
ns21817021.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
250314354.5
ns249997583.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
148416125
ns148163667
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
121652791
ns116524583.5
ns1.04
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
445949875
ns454091250
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5456985
ns5461865
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1227899292
ns1221856542
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
932161917
ns931447709
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
846826937.5
ns834664604
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1625356625
ns1654541041
ns0.98
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
31290090
ns31167157.5
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1150326667
ns1137975750
ns1.01
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
997528083
ns995311958
ns1.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
3958813520.5
ns1319930125
ns3.00
Conv((3, 3), 64 => 64, relu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1725291604
ns1748556646.5
ns0.99
lenet(28, 28, 1, 32)/forward/CPU/2 thread(s)
1123250
ns1095125
ns1.03
lenet(28, 28, 1, 32)/forward/CPU/4 thread(s)
1617437.5
ns1622583
ns1.00
lenet(28, 28, 1, 32)/forward/CPU/8 thread(s)
10483333
ns3546312.5
ns2.96
lenet(28, 28, 1, 32)/forward/CPU/1 thread(s)
782875
ns789458
ns0.99
lenet(28, 28, 1, 32)/forward/GPU/CUDA
261765
ns263083
ns0.99
lenet(28, 28, 1, 32)/zygote/CPU/2 thread(s)
2988166
ns2978417
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/4 thread(s)
4099750
ns4117958
ns1.00
lenet(28, 28, 1, 32)/zygote/CPU/8 thread(s)
18729584
ns12025354
ns1.56
lenet(28, 28, 1, 32)/zygote/CPU/1 thread(s)
3268292
ns3159375
ns1.03
lenet(28, 28, 1, 32)/zygote/GPU/CUDA
1093399.5
ns1134663
ns0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
2322000
ns2330792
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1376646
ns1433104
ns0.96
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1622229.5
ns1541125
ns1.05
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
4218083
ns4209645.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
208585
ns208448.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
19421125
ns19415042
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
16162729.5
ns16071271
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
16659708.5
ns17172750
ns0.97
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
25778791
ns25792250
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1607423
ns1593280
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
34077666.5
ns34130083.5
ns1.00
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
30945250
ns30775292
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
31583520.5
ns31125042
ns1.01
Conv((3, 3), 2 => 2, gelu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
36716875
ns36818959
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
4526541.5
ns4527292
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2770209
ns2773208.5
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2921229
ns2656208.5
ns1.10
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
8366833
ns8382771
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
423782
ns428034
ns0.99
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
38926229.5
ns38903708
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
32245667
ns32088354
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
32910458
ns32248250
ns1.02
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
51760416.5
ns51944791
ns1.00
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2630256
ns2616927.5
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
89355187.5
ns88376541
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
114003500
ns113229459
ns1.01
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
1405927875
ns228466396
ns6.15
Conv((3, 3), 4 => 4, gelu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
74231458.5
ns74323208
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
268393792
ns267494667
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
159475583
ns159172958
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
132975208
ns123508104
ns1.08
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
484020063
ns484768333
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/forward/GPU/CUDA
7020070
ns6999230.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1473032083.5
ns1465729583.5
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
1177961375
ns1176260541
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
1082903374.5
ns1073046687.5
ns1.01
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1999116063
ns2008112021
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
34775365.5
ns34700607
ns1.00
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1720636250
ns1675838166
ns1.03
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
1555603750
ns1494284562.5
ns1.04
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
4526585645.5
ns1751750208
ns2.58
Conv((3, 3), 64 => 64, gelu)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
2207239541.5
ns2233116333
ns0.99
lenet(28, 28, 1, 128)/forward/CPU/2 thread(s)
2110458
ns1651542
ns1.28
lenet(28, 28, 1, 128)/forward/CPU/4 thread(s)
2946104
ns2558521
ns1.15
lenet(28, 28, 1, 128)/forward/CPU/8 thread(s)
14693125.5
ns6003209
ns2.45
lenet(28, 28, 1, 128)/forward/CPU/1 thread(s)
2312646
ns2474729.5
ns0.93
lenet(28, 28, 1, 128)/forward/GPU/CUDA
266343.5
ns266892
ns1.00
lenet(28, 28, 1, 128)/zygote/CPU/2 thread(s)
9579333
ns8851208.5
ns1.08
lenet(28, 28, 1, 128)/zygote/CPU/4 thread(s)
12048375
ns11418334
ns1.06
lenet(28, 28, 1, 128)/zygote/CPU/8 thread(s)
37492562.5
ns23064666.5
ns1.63
lenet(28, 28, 1, 128)/zygote/CPU/1 thread(s)
11316041.5
ns11739000
ns0.96
lenet(28, 28, 1, 128)/zygote/GPU/CUDA
1164743
ns1168625
ns1.00
vgg16(32, 32, 3, 32)/forward/CPU/2 thread(s)
381670125
ns379338084
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/4 thread(s)
285686541
ns283711083
ns1.01
vgg16(32, 32, 3, 32)/forward/CPU/8 thread(s)
238643375.5
ns273375708.5
ns0.87
vgg16(32, 32, 3, 32)/forward/CPU/1 thread(s)
452689979
ns453126979.5
ns1.00
vgg16(32, 32, 3, 32)/forward/GPU/CUDA
4852504
ns4863813
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/2 thread(s)
1156829333
ns1152978750
ns1.00
vgg16(32, 32, 3, 32)/zygote/CPU/4 thread(s)
936606625
ns927580833
ns1.01
vgg16(32, 32, 3, 32)/zygote/CPU/8 thread(s)
1034937042
ns926883375
ns1.12
vgg16(32, 32, 3, 32)/zygote/CPU/1 thread(s)
1394580541
ns1395415875
ns1.00
vgg16(32, 32, 3, 32)/zygote/GPU/CUDA
18385711
ns17771785
ns1.03
lenet(28, 28, 1, 64)/forward/CPU/2 thread(s)
1051604
ns1046833
ns1.00
lenet(28, 28, 1, 64)/forward/CPU/4 thread(s)
2032750
ns1903958.5
ns1.07
lenet(28, 28, 1, 64)/forward/CPU/8 thread(s)
6421479
ns4710334
ns1.36
lenet(28, 28, 1, 64)/forward/CPU/1 thread(s)
1392792
ns1288833.5
ns1.08
lenet(28, 28, 1, 64)/forward/GPU/CUDA
269338
ns272890.5
ns0.99
lenet(28, 28, 1, 64)/zygote/CPU/2 thread(s)
6488271
ns6488521
ns1.00
lenet(28, 28, 1, 64)/zygote/CPU/4 thread(s)
12419333
ns13795375
ns0.90
lenet(28, 28, 1, 64)/zygote/CPU/8 thread(s)
20712583
ns18166417
ns1.14
lenet(28, 28, 1, 64)/zygote/CPU/1 thread(s)
6069104
ns6069042
ns1.00
lenet(28, 28, 1, 64)/zygote/GPU/CUDA
1230429.5
ns1252551
ns0.98
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70482479
ns70461917
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43549791.5
ns43567958
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39533021
ns39697500
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132479958.5
ns134157667
ns0.99
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1936490
ns1944258.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
356251291.5
ns355807438
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
270847875
ns270773667
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
254784208
ns252761792
ns1.01
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
534252895.5
ns534329166.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
12272241.5
ns12278757.5
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
394477666
ns394634458
ns1.00
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
402690375
ns389791583
ns1.03
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
720082750
ns673445792
ns1.07
Conv((3, 3), 32 => 32, identity)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
709744667
ns709693958
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/2 thread(s)
1187004958
ns1186364083
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/4 thread(s)
691832145.5
ns691957562.5
ns1.00
vgg16(32, 32, 3, 128)/forward/CPU/8 thread(s)
632123250
ns638110667
ns0.99
vgg16(32, 32, 3, 128)/forward/CPU/1 thread(s)
1770715312
ns1783555042
ns0.99
vgg16(32, 32, 3, 128)/forward/GPU/CUDA
12547245
ns12301928
ns1.02
vgg16(32, 32, 3, 128)/zygote/CPU/2 thread(s)
3637291458
ns3711157812
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/4 thread(s)
2824273000
ns2879197208
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/8 thread(s)
2730881708
ns2773307208
ns0.98
vgg16(32, 32, 3, 128)/zygote/CPU/1 thread(s)
5049712083
ns5035792375
ns1.00
vgg16(32, 32, 3, 128)/zygote/GPU/CUDA
49393707
ns49664392.5
ns0.99
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3418083.5
ns3407458
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2075979
ns2075563
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2539271
ns2527020.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
6026459
ns6024750
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/forward/GPU/CUDA
346441.5
ns343963.5
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
25952979.5
ns25968187.5
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
19050375
ns19062125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
19201375
ns19252125
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
39193125
ns39301000
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2459244
ns2461837
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
55322083
ns55558542
ns1.00
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
81164292
ns80387500
ns1.01
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
170963771
ns175413270.5
ns0.97
Conv((3, 3), 4 => 4, relu)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
45568000
ns45602750
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/2 thread(s)
1782541.5
ns1783854
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/4 thread(s)
1100417
ns1100146
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/8 thread(s)
1551125
ns1552500
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/CPU/1 thread(s)
3037000
ns3029125
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/forward/GPU/CUDA
214556
ns212167
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/2 thread(s)
12536291
ns12521479
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/4 thread(s)
9216000
ns9200041.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/8 thread(s)
9685375
ns9609625
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/CPU/1 thread(s)
18982854
ns18969562.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/zygote/GPU/CUDA
1538065
ns1536334.5
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/2 thread(s)
17658500
ns17631000
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/4 thread(s)
14322812.5
ns14331187
ns1.00
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/8 thread(s)
14698833
ns14538062.5
ns1.01
Conv((3, 3), 2 => 2, relu)(64 x 64 x 2 x 128)/enzyme/CPU/1 thread(s)
22165937.5
ns22175666.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
70506187.5
ns70452395.5
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
43620250
ns43579917
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
39487104.5
ns39814895.5
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
132516562.5
ns133530375
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
1945320.5
ns1873599.5
ns1.04
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
361020458
ns359456959
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
347398479.5
ns345462791
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
304273333
ns304957917
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
723138416
ns730536500
ns0.99
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
13360775
ns13387305
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
417284416.5
ns418668750
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
421339792
ns422783667
ns1.00
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
702984958
ns694707583.5
ns1.01
Conv((3, 3), 32 => 32, relu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
714157000
ns714852708
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/2 thread(s)
1690770.5
ns1688208.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/4 thread(s)
1342250
ns1348958.5
ns1.00
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/8 thread(s)
1267146
ns1141500
ns1.11
mlp7layer_bn(gelu)(32 x 256)/forward/CPU/1 thread(s)
2453041
ns2410041
ns1.02
mlp7layer_bn(gelu)(32 x 256)/forward/GPU/CUDA
590494.5
ns583102.5
ns1.01
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/2 thread(s)
8943958
ns8957396
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/4 thread(s)
12866833
ns12832625
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/8 thread(s)
30425166
ns31672062.5
ns0.96
mlp7layer_bn(gelu)(32 x 256)/zygote/CPU/1 thread(s)
9850895.5
ns9824709
ns1.00
mlp7layer_bn(gelu)(32 x 256)/zygote/GPU/CUDA
1477491
ns1427623.5
ns1.03
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/2 thread(s)
17137625.5
ns17909083
ns0.96
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/4 thread(s)
17335937
ns17252334
ns1.00
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/8 thread(s)
29414500
ns30244979.5
ns0.97
mlp7layer_bn(gelu)(32 x 256)/enzyme/CPU/1 thread(s)
14221083.5
ns14301854
ns0.99
Dense(512 => 512, relu)(512 x 128)/forward/CPU/2 thread(s)
670250
ns671500
ns1.00
Dense(512 => 512, relu)(512 x 128)/forward/CPU/4 thread(s)
587208
ns582646
ns1.01
Dense(512 => 512, relu)(512 x 128)/forward/CPU/8 thread(s)
1034167
ns1059875
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/CPU/1 thread(s)
723250.5
ns738291.5
ns0.98
Dense(512 => 512, relu)(512 x 128)/forward/GPU/CUDA
48445
ns47313
ns1.02
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/2 thread(s)
1557292
ns1553208
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/4 thread(s)
1015708
ns1029250
ns0.99
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/8 thread(s)
1383084
ns1568708
ns0.88
Dense(512 => 512, relu)(512 x 128)/zygote/CPU/1 thread(s)
2231354
ns2237833
ns1.00
Dense(512 => 512, relu)(512 x 128)/zygote/GPU/CUDA
240891.5
ns237738.5
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/2 thread(s)
1521354.5
ns1562083
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/4 thread(s)
1075542
ns1066750
ns1.01
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/8 thread(s)
1487792
ns1541083.5
ns0.97
Dense(512 => 512, relu)(512 x 128)/enzyme/CPU/1 thread(s)
2254458
ns2207833
ns1.02
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/2 thread(s)
3396375
ns3389167
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/4 thread(s)
2057334
ns2034875
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/8 thread(s)
2498687.5
ns2486708
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/CPU/1 thread(s)
5992229
ns5993958
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/forward/GPU/CUDA
287342
ns283368
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/2 thread(s)
24078166.5
ns24043104
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/4 thread(s)
17297542
ns17216854.5
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/8 thread(s)
17230333
ns17073750
ns1.01
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/CPU/1 thread(s)
37494917
ns37471000
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/zygote/GPU/CUDA
2405577
ns2400544
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/2 thread(s)
53543396
ns53725750
ns1.00
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/4 thread(s)
83822833
ns80060417
ns1.05
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/8 thread(s)
166927250
ns173749250
ns0.96
Conv((3, 3), 4 => 4, identity)(64 x 64 x 4 x 128)/enzyme/CPU/1 thread(s)
44407750
ns44536333.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/2 thread(s)
249509499.5
ns250047041.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/4 thread(s)
147934792
ns148068333.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/8 thread(s)
115819083.5
ns116114541.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/CPU/1 thread(s)
447669812.5
ns447302270.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/forward/GPU/CUDA
5459471
ns5442086
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/2 thread(s)
1102880459
ns1103374375
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/4 thread(s)
857952270.5
ns858772104.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/8 thread(s)
827187146
ns827438312.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/CPU/1 thread(s)
1749946917
ns1767052584
ns0.99
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/zygote/GPU/CUDA
28884820
ns29017689.5
ns1.00
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/2 thread(s)
1012294625
ns1004373333
ns1.01
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/4 thread(s)
967548417
ns930750709
ns1.04
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/8 thread(s)
1291437750
ns1242654000
ns1.04
Conv((3, 3), 64 => 64, identity)(64 x 64 x 64 x 128)/enzyme/CPU/1 thread(s)
1719471271
ns1746910729
ns0.98
mlp7layer_bn(relu)(32 x 256)/forward/CPU/2 thread(s)
1287333
ns1306417
ns0.99
mlp7layer_bn(relu)(32 x 256)/forward/CPU/4 thread(s)
983417
ns925687
ns1.06
mlp7layer_bn(relu)(32 x 256)/forward/CPU/8 thread(s)
901750
ns706792
ns1.28
mlp7layer_bn(relu)(32 x 256)/forward/CPU/1 thread(s)
2103875
ns2044583
ns1.03
mlp7layer_bn(relu)(32 x 256)/forward/GPU/CUDA
563889
ns568045
ns0.99
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/2 thread(s)
5980875
ns5802000
ns1.03
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/4 thread(s)
6572395.5
ns6862750
ns0.96
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/8 thread(s)
23913083.5
ns25619396
ns0.93
mlp7layer_bn(relu)(32 x 256)/zygote/CPU/1 thread(s)
7082125
ns6379750
ns1.11
mlp7layer_bn(relu)(32 x 256)/zygote/GPU/CUDA
1362470.5
ns1369217
ns1.00
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/2 thread(s)
11293750
ns10914875
ns1.03
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/4 thread(s)
10483750
ns9317208.5
ns1.13
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/8 thread(s)
17869125
ns17171708.5
ns1.04
mlp7layer_bn(relu)(32 x 256)/enzyme/CPU/1 thread(s)
8484417
ns8667334
ns0.98
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/2 thread(s)
355500
ns344250
ns1.03
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/4 thread(s)
375708
ns388208
ns0.97
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/8 thread(s)
1905667
ns2652375
ns0.72
Dense(128 => 128, gelu)(128 x 128)/forward/CPU/1 thread(s)
88916
ns88729.5
ns1.00
Dense(128 => 128, gelu)(128 x 128)/forward/GPU/CUDA
27382
ns27556
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/2 thread(s)
388541
ns361250
ns1.08
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/4 thread(s)
441375
ns399709
ns1.10
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/8 thread(s)
4554812.5
ns4501000
ns1.01
Dense(128 => 128, gelu)(128 x 128)/zygote/CPU/1 thread(s)
258625
ns262354
ns0.99
Dense(128 => 128, gelu)(128 x 128)/zygote/GPU/CUDA
219767
ns223132
ns0.98
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/2 thread(s)
418500
ns391375
ns1.07
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/4 thread(s)
472541
ns431437.5
ns1.10
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/8 thread(s)
4780416
ns4729375
ns1.01
Dense(128 => 128, gelu)(128 x 128)/enzyme/CPU/1 thread(s)
282333
ns282666.5
ns1.00
Dense(128 => 128, relu)(128 x 128)/forward/CPU/2 thread(s)
302875
ns290750
ns1.04
Dense(128 => 128, relu)(128 x 128)/forward/CPU/4 thread(s)
310292
ns327708
ns0.95
Dense(128 => 128, relu)(128 x 128)/forward/CPU/8 thread(s)
746917
ns675958
ns1.10
Dense(128 => 128, relu)(128 x 128)/forward/CPU/1 thread(s)
52958
ns54270.5
ns0.98
Dense(128 => 128, relu)(128 x 128)/forward/GPU/CUDA
27770
ns27926
ns0.99
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/2 thread(s)
353062.5
ns310292
ns1.14
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/4 thread(s)
337625
ns273625
ns1.23
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/8 thread(s)
632166.5
ns375709
ns1.68
Dense(128 => 128, relu)(128 x 128)/zygote/CPU/1 thread(s)
151958.5
ns152354
ns1.00
Dense(128 => 128, relu)(128 x 128)/zygote/GPU/CUDA
205653
ns207917
ns0.99
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/2 thread(s)
368187.5
ns325125
ns1.13
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/4 thread(s)
352042
ns289084
ns1.22
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/8 thread(s)
899958
ns403542
ns2.23
Dense(128 => 128, relu)(128 x 128)/enzyme/CPU/1 thread(s)
151416
ns151625
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/2 thread(s)
602224792
ns603325417
ns1.00
vgg16(32, 32, 3, 64)/forward/CPU/4 thread(s)
426937479
ns421621416.5
ns1.01
vgg16(32, 32, 3, 64)/forward/CPU/8 thread(s)
377069646
ns380597249.5
ns0.99
vgg16(32, 32, 3, 64)/forward/CPU/1 thread(s)
871474708
ns874303208
ns1.00
vgg16(32, 32, 3, 64)/forward/GPU/CUDA
7023361
ns7027347
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/2 thread(s)
2004892916.5
ns2005331375
ns1.00
vgg16(32, 32, 3, 64)/zygote/CPU/4 thread(s)
1605784063
ns1619588271
ns0.99
vgg16(32, 32, 3, 64)/zygote/CPU/8 thread(s)
1566567437.5
ns1613872687.5
ns0.97
vgg16(32, 32, 3, 64)/zygote/CPU/1 thread(s)
2619326292
ns2628480958
ns1.00
vgg16(32, 32, 3, 64)/zygote/GPU/CUDA
26073016
ns26003745
ns1.00
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/2 thread(s)
537145.5
ns527792
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/4 thread(s)
436187.5
ns426833
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/8 thread(s)
1750750.5
ns2562271
ns0.68
Dense(512 => 512, gelu)(512 x 128)/forward/CPU/1 thread(s)
879542
ns866208
ns1.02
Dense(512 => 512, gelu)(512 x 128)/forward/GPU/CUDA
47197
ns47205
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/2 thread(s)
1906583.5
ns1876021
ns1.02
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/4 thread(s)
2789708
ns2780812
ns1.00
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/8 thread(s)
14534625
ns16664416
ns0.87
Dense(512 => 512, gelu)(512 x 128)/zygote/CPU/1 thread(s)
2719437.5
ns2745542
ns0.99
Dense(512 => 512, gelu)(512 x 128)/zygote/GPU/CUDA
247168
ns250772.5
ns0.99
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/2 thread(s)
1975604.5
ns1973979.5
ns1.00
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/4 thread(s)
5033938
ns4994854
ns1.01
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/8 thread(s)
14790687.5
ns16607792
ns0.89
Dense(512 => 512, gelu)(512 x 128)/enzyme/CPU/1 thread(s)
2768750
ns2721770.5
ns1.02
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/2 thread(s)
1615208
ns1608270.5
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/4 thread(s)
1258604
ns1262750
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/8 thread(s)
1213228.5
ns929208
ns1.31
mlp7layer_bn(tanh)(32 x 256)/forward/CPU/1 thread(s)
2324937.5
ns2314292
ns1.00
mlp7layer_bn(tanh)(32 x 256)/forward/GPU/CUDA
579864
ns587834.5
ns0.99
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/2 thread(s)
5939542
ns5921500
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/4 thread(s)
6479000
ns6925166
ns0.94
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/8 thread(s)
24227875
ns25706812
ns0.94
mlp7layer_bn(tanh)(32 x 256)/zygote/CPU/1 thread(s)
7320042
ns7304292
ns1.00
mlp7layer_bn(tanh)(32 x 256)/zygote/GPU/CUDA
1342640
ns1354974
ns0.99
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/2 thread(s)
13346958
ns11973583
ns1.11
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/4 thread(s)
11570125
ns12125584
ns0.95
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/8 thread(s)
21132000
ns21506521
ns0.98
mlp7layer_bn(tanh)(32 x 256)/enzyme/CPU/1 thread(s)
10783208
ns10667104.5
ns1.01
Dense(16 => 16, relu)(16 x 128)/forward/CPU/2 thread(s)
2458.5
ns2333
ns1.05
Dense(16 => 16, relu)(16 x 128)/forward/CPU/4 thread(s)
2333
ns2542
ns0.92
Dense(16 => 16, relu)(16 x 128)/forward/CPU/8 thread(s)
3292
ns3333
ns0.99
Dense(16 => 16, relu)(16 x 128)/forward/CPU/1 thread(s)
2792
ns2459
ns1.14
Dense(16 => 16, relu)(16 x 128)/forward/GPU/CUDA
24276
ns24615
ns0.99
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/2 thread(s)
6958
ns7250
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/4 thread(s)
7250
ns7125
ns1.02
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/8 thread(s)
7292
ns7291
ns1.00
Dense(16 => 16, relu)(16 x 128)/zygote/CPU/1 thread(s)
7041
ns7334
ns0.96
Dense(16 => 16, relu)(16 x 128)/zygote/GPU/CUDA
208234.5
ns211753
ns0.98
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/2 thread(s)
8187.5
ns8208
ns1.00
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/4 thread(s)
8375
ns8250
ns1.02
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/8 thread(s)
8542
ns8292
ns1.03
Dense(16 => 16, relu)(16 x 128)/enzyme/CPU/1 thread(s)
6042
ns6125
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/2 thread(s)
11562.5
ns10625
ns1.09
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/4 thread(s)
13583
ns13750
ns0.99
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/8 thread(s)
10500
ns10520.5
ns1.00
Dense(16 => 16, gelu)(16 x 128)/forward/CPU/1 thread(s)
7375
ns7500
ns0.98
Dense(16 => 16, gelu)(16 x 128)/forward/GPU/CUDA
24269
ns24896
ns0.97
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/2 thread(s)
19750
ns19895.5
ns0.99
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/4 thread(s)
20042
ns19875
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/8 thread(s)
20291.5
ns20125
ns1.01
Dense(16 => 16, gelu)(16 x 128)/zygote/CPU/1 thread(s)
19750
ns20375
ns0.97
Dense(16 => 16, gelu)(16 x 128)/zygote/GPU/CUDA
227670.5
ns232312.5
ns0.98
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/2 thread(s)
23292
ns23375
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/4 thread(s)
23625
ns23583
ns1.00
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/8 thread(s)
24000
ns23625
ns1.02
Dense(16 => 16, gelu)(16 x 128)/enzyme/CPU/1 thread(s)
21375
ns21375
ns1
Dense(128 => 128, identity)(128 x 128)/forward/CPU/2 thread(s)
28625
ns28458
ns1.01
Dense(128 => 128, identity)(128 x 128)/forward/CPU/4 thread(s)
28854.5
ns28750
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/CPU/8 thread(s)
28709
ns28209
ns1.02
Dense(128 => 128, identity)(128 x 128)/forward/CPU/1 thread(s)
46167
ns46333
ns1.00
Dense(128 => 128, identity)(128 x 128)/forward/GPU/CUDA
25604
ns25917
ns0.99
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/2 thread(s)
222104.5
ns220041
ns1.01
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/4 thread(s)
278083
ns272792
ns1.02
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/8 thread(s)
4126167
ns4142125
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/CPU/1 thread(s)
145792
ns146229
ns1.00
Dense(128 => 128, identity)(128 x 128)/zygote/GPU/CUDA
206165
ns211737.5
ns0.97
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/2 thread(s)
336792
ns329896
ns1.02
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/4 thread(s)
320666
ns317541
ns1.01
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/8 thread(s)
569042
ns641000
ns0.89
Dense(128 => 128, identity)(128 x 128)/enzyme/CPU/1 thread(s)
161000
ns161708
ns1.00
Dense(16 => 16, identity)(16 x 128)/forward/CPU/2 thread(s)
1625
ns1542
ns1.05
Dense(16 => 16, identity)(16 x 128)/forward/CPU/4 thread(s)
1750
ns1792
ns0.98
Dense(16 => 16, identity)(16 x 128)/forward/CPU/8 thread(s)
2250
ns2334
ns0.96
Dense(16 => 16, identity)(16 x 128)/forward/CPU/1 thread(s)
1917
ns2042
ns0.94
Dense(16 => 16, identity)(16 x 128)/forward/GPU/CUDA
22470
ns23108
ns0.97
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/2 thread(s)
5292
ns5229.5
ns1.01
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/4 thread(s)
5291
ns5208
ns1.02
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/8 thread(s)
5250
ns5333
ns0.98
Dense(16 => 16, identity)(16 x 128)/zygote/CPU/1 thread(s)
5167
ns5209
ns0.99
Dense(16 => 16, identity)(16 x 128)/zygote/GPU/CUDA
241214
ns243209
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/2 thread(s)
11250
ns11291
ns1.00
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/4 thread(s)
11583
ns11250
ns1.03
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/8 thread(s)
11459
ns11541
ns0.99
Dense(16 => 16, identity)(16 x 128)/enzyme/CPU/1 thread(s)
6833
ns6833
ns1
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/2 thread(s)
79831958
ns79916708
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/4 thread(s)
49067917
ns48976292
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/8 thread(s)
44974625
ns43178166
ns1.04
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/CPU/1 thread(s)
151479959
ns151466292
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/forward/GPU/CUDA
2716485.5
ns2667705.5
ns1.02
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/2 thread(s)
497629333
ns660651833
ns0.75
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/4 thread(s)
413644333
ns411637166
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/8 thread(s)
399276520.5
ns396465125
ns1.01
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/CPU/1 thread(s)
684016541
ns687244250
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/zygote/GPU/CUDA
14627968
ns14722784
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/2 thread(s)
713376625
ns711004292
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/4 thread(s)
669209042
ns670184709
ns1.00
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/8 thread(s)
995614375
ns1007186709
ns0.99
Conv((3, 3), 32 => 32, gelu)(64 x 64 x 32 x 128)/enzyme/CPU/1 thread(s)
998601250
ns997936792
ns1.00
This comment was automatically generated by workflow using github-action-benchmark.