Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
closes #617
closes #590
This PR:
__nv_fp8_e4m3
and__nv_fp8_e5m2
) support for CUB radix sortSome of the tests do not work correctly with the lowest / max FP range. These are mostly tests that accumulate values. When significant number of large values (1^38) is accumulated, we push floats to
inf
values, that complicates comparison. For now, I've reduced the value range for these tests to get them passing. We should think how to handle FP tests going forward.The PR doesn't update CUB docs to guarantee fp8 support. I suggest the following order:
We should also document
__nv_bfloat16
. For some reason, only__half
is documented right now.Checklist