This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Add more FMA intriniscs/builtins
ClosedPublic

Authored by jchlanda on Feb 4 2022, 2:00 AM.

Details

Summary

This patch adds builtins/intrinsics for the following variants of FMA:

  • f16
    • rn
    • rn_ftz
    • rn_sat
    • rn_ftz_sat
    • rn_relu
    • rn_ftz_relu
  • f16x2
    • rn
    • rn_ftz
    • rn_sat
    • rn_ftz_sat
    • rn_relu
    • rn_ftz_relu
  • bf16
    • rn
    • rn_relu
  • bf16x2
    • rn
    • rn_relu

They all require PTX 7.0, SM_80.

ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly.

Depends on D<117887>

Diff Detail

Event Timeline

jchlanda created this revision.Feb 4 2022, 2:00 AM
jchlanda requested review of this revision.Feb 4 2022, 2:00 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 4 2022, 2:00 AM
tra added a comment.Feb 4 2022, 10:25 AM

They all require PTX 7.0, SM_80.

According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma only fma.relu and bf16* variants require ptx70/sm80:

PTX ISA Notes
Introduced in PTX ISA version 4.2.

fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} introduced in PTX ISA version 7.0.

Target ISA Notes
Requires sm_53 or higher.

fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} require sm_80 or higher.
jchlanda updated this revision to Diff 406322.Feb 6 2022, 10:22 PM

Set correct SM and PTX version.

They all require PTX 7.0, SM_80.

According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma only fma.relu and bf16* variants require ptx70/sm80:

PTX ISA Notes
Introduced in PTX ISA version 4.2.

fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} introduced in PTX ISA version 7.0.

Target ISA Notes
Requires sm_53 or higher.

fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} require sm_80 or higher.

My bad, sorry. Fixed now.

tra added a comment.Feb 7 2022, 11:56 AM

Target ISA Notes
Requires sm_53 or higher.

I think we do need this constraint applied to the new builtins, too. Right now nothing stops using them on a GPU where they do not exist and that will likely crash the compiler when we fail to find a matching intrinsic.

jchlanda updated this revision to Diff 406851.Feb 8 2022, 8:38 AM

Add sm/ptx version guard to f16{x2} builtins.

Target ISA Notes
Requires sm_53 or higher.

I think we do need this constraint applied to the new builtins, too. Right now nothing stops using them on a GPU where they do not exist and that will likely crash the compiler when we fail to find a matching intrinsic.

You are right, it would lead to unexpected crashes, I wrongly assumed that since the lowest version guarded against is PTX 6.0/SM_60, anything lower than that should be OK. Fixed now.

tra added inline comments.Feb 8 2022, 1:36 PM
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
937

I think the default should be the most useful/common and the least surprising value.
I'd argue that in this case it would be []. This would give reader a reasonable idea about what's going on even without looking at FMA_TUPLE implementation.

jchlanda updated this revision to Diff 407078.Feb 9 2022, 1:12 AM

Tidy up FMA_TUPLE class.

jchlanda added inline comments.Feb 9 2022, 1:14 AM
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
937

Agreed, I wrote that class before folding in f32 and f64, hasPTX70, hasSM80 made more sense then. Changed now.

tra accepted this revision.Feb 9 2022, 11:09 AM
This revision is now accepted and ready to land.Feb 9 2022, 11:09 AM
jchlanda updated this revision to Diff 407904.Feb 11 2022, 9:07 AM

PTX/sm version tidy up.

jchlanda marked an inline comment as done.Feb 11 2022, 9:09 AM

@tra I've fixed the test failure (math-intrins.ll) the rest seems to be unrelated timeouts, would you be able to merge those patches in, as I don't have the commit access please? The same goes for https://reviews.llvm.org/D117887 and https://reviews.llvm.org/D119157 from Nicolas. Thanks.

tra added a comment.Feb 17 2022, 10:13 AM

@tra I've fixed the test failure (math-intrins.ll) the rest seems to be unrelated timeouts,

Thank you.

would you be able to merge those patches in, as I don't have the commit access please? The same goes for https://reviews.llvm.org/D117887 and https://reviews.llvm.org/D119157 from Nicolas. Thanks.

I'll be out of office till Tuesday. I will land the patches when I'm back.

This revision was landed with ongoing or failed builds.Feb 23 2022, 1:57 PM
Closed by commit rGbe672934ff88: [NVPTX] Add more FMA intriniscs/builtins (authored by jchlanda, committed by tra). · Explain Why
This revision was automatically updated to reflect the committed changes.