Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMD] Flush denorms to zero in math.rsqrt #5438

Merged
merged 3 commits into from
Jan 3, 2025

Conversation

knwng
Copy link
Contributor

@knwng knwng commented Dec 16, 2024

This commit modified the denorm handling behavior of math.rsqrt.

In case of ftz, it calls llvm.amdgcn.rsq.f32 directly to flush the denormalized inputs to zero. Otherwise, it calls __ocml_rsqrt_f32, which will dynamically check the backend to decide ftz or not.

Arch non-ftz ftz
CUDA __nv_rsqrtf => rsqrt.approx.f32 __nv_rsqrtf => rsqrt.approx.ftz.f32
AMD __ocml_rsqrt_f32 llvm.amdgcn.rsq.f32

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /test for lit tests
      • /unittest for C++ tests
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

This commit modified the denorm handling behavior of math.rsqrt.

In case of ftz, it calls llvm.amdgcn.rsq.f32 directly to flush the
denormalized inputs to zero. Otherwise, it calls __ocml_rsqrt_f32,
which will dynamically check the backend to decide ftz or not.
@@ -656,7 +656,7 @@ bool CTAPlanner::isElementwiseOp(Operation *op) const {
math::CtPopOp, math::ErfOp, math::ExpOp, math::Exp2Op,
math::FloorOp, math::ExpM1Op, math::FmaOp, math::LogOp,
math::Log10Op, math::Log1pOp, math::Log2Op, math::PowFOp,
math::RsqrtOp, math::SqrtOp, math::RsqrtOp, math::TanhOp>(op))
math::SqrtOp, math::RsqrtOp, math::TanhOp>(op))
Copy link
Contributor Author

@knwng knwng Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's duplicated. Please let me know if it's actually deliberate.

@antiagainst antiagainst marked this pull request as ready for review January 3, 2025 01:36
@antiagainst antiagainst merged commit 781ae0b into triton-lang:main Jan 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants