-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix correctness in cuda_mapreduce #2106
Conversation
63a7ef8
to
cafbbcf
Compare
b0eea6e
to
0298139
Compare
One option is to use binary-op appropriate initialization. For example,
with
|
This would require defining the init value for every function, which doesn't seem optimal. Is there any issue with the fix in this PR? |
ext/cuda/data_layouts_mapreduce.jl
Outdated
@@ -31,6 +31,34 @@ function mapreduce_cuda( | |||
weighted_jacobian = OnesArray(parent(data)), | |||
opargs..., | |||
) | |||
# This function implements the following parallel reduction algorithm: | |||
# | |||
# Blocks processes multiple data points at the same time (n_ops_on_load) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each thread loads multiple data points in shmem!
`cuda_mapreduce` was not working correctly with certain spaces. Why was this happening? I added a comment to describe the algorithm in the commit. In a nutshell, the algorithm was not taking into account the fact that the final block is not completely filled with points to process. Therefore, the reduction included some elements that did not contain real points (but the value 0).
0298139
to
8cdf3f3
Compare
cuda_mapreduce
was not working correctly with certain spaces.Why was this happening?
I added a comment to describe the algorithm in the commit.
In a nutshell, the algorithm was not taking into account the fact that the final block is not completely filled with points to process. Therefore, the reduction included some elements that did not contain real points (but the value 0).
Closes #2097