[Nvidia] Support fp8 to bf16 casting on RTX 4000 series #5544

mbrookhart · 2025-01-07T03:58:06Z

I noticed that some of the tests were failing when I was testing on a workstation with a consumer RTX card. Turns out that sm_89 supports fp8, but doesn't support cvt.bf16.f16

From the ptx spec:

cvt.bf16.{u8/s8/u16/s16/u32/s32/u64/s64/f16/f64/bf16}, cvt.{u8/s8/u16/s16/u32/s32/u64/s64/f16/f64}.bf16, and cvt.tf32.f32.{relu}.{rn/rz} require sm_90 or higher.

This adds a path to first convert to fp32 and then bf16 if compute compatibility is < 90,

This is already hit in the tests (specifically several test cases in test core, many variations on dot_scaled in particular).

Support fp8 to bf16 casting on RTX 4000 series

c154cec

mbrookhart requested a review from ptillet as a code owner January 7, 2025 03:58

Merge branch 'main' into tests_on_4090

b5548fa

ThomasRaoux approved these changes Jan 7, 2025

View reviewed changes

ThomasRaoux merged commit 4947a95 into triton-lang:main Jan 7, 2025
7 checks passed

peterbell10 mentioned this pull request Jan 7, 2025

Casting to bf16 from fp8 breaks on SM89 #5491

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Nvidia] Support fp8 to bf16 casting on RTX 4000 series #5544

[Nvidia] Support fp8 to bf16 casting on RTX 4000 series #5544

mbrookhart commented Jan 7, 2025

[Nvidia] Support fp8 to bf16 casting on RTX 4000 series #5544

[Nvidia] Support fp8 to bf16 casting on RTX 4000 series #5544

Conversation

mbrookhart commented Jan 7, 2025