-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix scan / sm90 perf regression #3236
Conversation
🟩 CI finished in 1h 33m: Pass: 100%/96 | Total: 2d 15h | Avg: 39m 30s | Max: 1h 13m | Hits: 72%/12404
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 96)
# | Runner |
---|---|
71 | linux-amd64-cpu16 |
11 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
4 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
static constexpr BlockLoadAlgorithm load_algorithm = | ||
(sizeof(AccumT) > 128) ? BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED : BLOCK_LOAD_WARP_TRANSPOSE; | ||
static constexpr BlockStoreAlgorithm store_algorithm = | ||
(sizeof(AccumT) > 128) ? BLOCK_STORE_WARP_TRANSPOSE_TIMESLICED : BLOCK_STORE_WARP_TRANSPOSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While fixing the issue at hand, this duplicates the logic from several lines below: https://github.com/NVIDIA/cccl/pull/3236/files#diff-d0a57aa3bf737e06d3f9f37bc80ea090ddf53e25f882ed3b99858ce26e785617R235-R238. I will file a refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR fixes NVBug 5022428. |
Description
Fixes regression introduced in #3138
We accidentally dropped
load_algorithm
andstore_algorithm
member variables from sm90 tuning. That made SFINAE always choose default tuning for Hopper. Shijie Chen embedded missing fields in every specialization, so proper Hopper tunings and not SFINAEd out now.Checklist