[AMD] Handling denorms in lowering math.sqrt and math.sqrt_rn #5422

knwng · 2024-12-13T11:16:56Z

In this commit, we handled the denorm flushing behaviors of math.sqrt and math.sqrt_rn. They read HIP_FTZ to determine whether denorms should be preserved or flushed to zero.

Backend	sqrt non-ftz	sqrt ftz	sqrt_rn non-ftz	sqrt_rn ftz
CUDA	sqrt.approx.f32	sqrt.approx.ftz.f32	sqrt.rn.f32	sqrt.rn.ftz.f32
AMD	scaling + llvm.amdgcn.sqrt.f32	llvm.amdgcn.sqrt.f32	llvm.sqrt.f32	llvm.amdgcn.rsq.f32 + mul + refinement

math.sqrt provides approximation of SQRT. In AMD backend, we use llvm.amdgcn.sqrt.f32, which provides direct access to v_sqrt_f32 and has 1ULP accuracy.

math.sqrt_rn provides IEEE-compliant result(round-to-nearest-or-even) of SQRT. Following the implementation in LLVM, we use extra refinement to get correctly rounded result.

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - ✅ /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because FILL THIS IN.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

In this commit, we handled the denorm flushing behaviors of math.sqrt and math.sqrt_rn. They read HIP_FTZ to determine whether denorms should be preserved or flushed to zero.

test/Conversion/amd/math-denorm-handling.mlir

third_party/amd/lib/TritonAMDGPUToLLVM/ElementwiseOpToLLVM.cpp

zhanglx13

LGTM. good to go after the comment is fixed

knwng force-pushed the ftz_sqrt branch from 35a5733 to 281dabe Compare December 14, 2024 00:07

[AMD] Handling denorms in lowering math.sqrt and math.sqrt_rn

ff4f744

In this commit, we handled the denorm flushing behaviors of math.sqrt and math.sqrt_rn. They read HIP_FTZ to determine whether denorms should be preserved or flushed to zero.

knwng force-pushed the ftz_sqrt branch from 281dabe to ff4f744 Compare January 3, 2025 06:59

antiagainst requested changes Jan 4, 2025

View reviewed changes

resolve comments

3bc94f6

knwng requested a review from antiagainst January 6, 2025 12:09

zhanglx13 reviewed Jan 8, 2025

View reviewed changes

third_party/amd/lib/TritonAMDGPUToLLVM/ElementwiseOpToLLVM.cpp Outdated Show resolved Hide resolved

zhanglx13 approved these changes Jan 8, 2025

View reviewed changes

fix comments

2733d56

antiagainst approved these changes Jan 9, 2025

View reviewed changes

antiagainst marked this pull request as ready for review January 9, 2025 16:46

antiagainst requested a review from ptillet as a code owner January 9, 2025 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] Handling denorms in lowering math.sqrt and math.sqrt_rn #5422

[AMD] Handling denorms in lowering math.sqrt and math.sqrt_rn #5422

knwng commented Dec 13, 2024

zhanglx13 left a comment

[AMD] Handling denorms in lowering math.sqrt and math.sqrt_rn #5422

Are you sure you want to change the base?

[AMD] Handling denorms in lowering math.sqrt and math.sqrt_rn #5422

Conversation

knwng commented Dec 13, 2024

New contributor declaration

zhanglx13 left a comment

Choose a reason for hiding this comment