-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PTX shfl_sync
#3241
base: main
Are you sure you want to change the base?
PTX shfl_sync
#3241
Conversation
🟩 CI finished in 1h 49m: Pass: 100%/170 | Total: 3d 02h | Avg: 26m 12s | Max: 1h 08m | Hits: 76%/22526
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 170)
# | Runner |
---|---|
125 | linux-amd64-cpu16 |
19 | linux-amd64-gpu-v100-latest-1 |
15 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
1 | linux-amd64-gpu-h100-latest-1-testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a quick review. I would love to have @ahendriksen's opinion, since it touches his work on the PTX exposure. Also, he has a way better PTX understanding than me.
|
||
shfl.sync | ||
^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Why do we need a dedicated file in a manual
directory? We currently just document PTX instructions inline (with generated parts from the generated
directory). I see no strong reason to deviate here.
// under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. | |
// SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. |
Applies to other places as well.
//===----------------------------------------------------------------------===// | ||
// UNSUPPORTED: libcpp-has-no-threads | ||
// UNSUPPORTED: clang && !nvcc | ||
// UNSUPPORTED: c++98, c++03, c++11, c++14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit unsatisfactory that we provide the new feature for C++11, but can only compile and run tests in C++17 mode or higher. @miscco how do you feel about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have sent some comments in private as well. The data
parameter should be a template parameter to allow shuffling any 32-bit value.
[[nodiscard]] __device__ static inline | ||
shfl_return_values shfl_sync(shfl_mode_t shfl_mode, | ||
uint32_t data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a template parameter that takes any 32-bit value.
Related to #2976
Description
Provide C++ implementation of PTX
shfl_sync
.In addition to CUDA intrinsics, the function provide the following features: