This is an archive of the discontinued LLVM Phabricator instance.

[CUDA,NVPTX] Implement __bf16 support for NVPTX.
ClosedPublic

Authored by tra on Oct 19 2022, 6:53 PM.

Details

Summary

Recent Clang changes expose _bf16 types for SSE2-enabled host compilations and
that makes those types visible furing GPU-side compilation, where it currently
fails with Sema complaining that __bf16 is not supported.

Considering that __bf16 is a storage-only type, enabling it for NVPTX if it's
should pose no issues, correctness-wise as we can always load/store them as 16-bit untyped values.

Recent NVIDIA GPUs have introduced bf16 support, so we'll likely grow better
support for __bf16 on NVPTX going forward, anyways.

Diff Detail

Event Timeline

tra created this revision.Oct 19 2022, 6:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 6:53 PM
tra updated this revision to Diff 469453.Oct 20 2022, 7:28 PM

Added bf16 load/store support.

tra updated this revision to Diff 469460.Oct 20 2022, 8:30 PM

More bf16 lowring fixes.

tra updated this revision to Diff 469657.Oct 21 2022, 9:27 AM

More lowering fixes & tests.

tra published this revision for review.Oct 21 2022, 9:33 AM
tra added reviewers: jchlanda, yaxunl.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 21 2022, 9:33 AM
tra updated this revision to Diff 469663.Oct 21 2022, 9:39 AM

Make __bf16 available regradless of its availability on the host.

tra retitled this revision from [CUDA] Propagate __bf16 type info from the host compilation. to [CUDA,NVPTX] Implement __bf16 support for NVPTX..Oct 21 2022, 10:13 AM
tra edited the summary of this revision. (Show Details)
tra updated this revision to Diff 469676.EditedOct 21 2022, 10:14 AM

Cosmetic refactoring.

tra updated this revision to Diff 469683.Oct 21 2022, 10:45 AM

Added LLVM test for bfloat load/stores. Fixed asm output for bf16 constants.

tra added a comment.Oct 24 2022, 3:46 PM

@jchlanda PTAL. You probably have the most context for NVPTX and bf16 instructions there.

We need this change to unbreak CUDA compilation after D132329 exposed __bf16 to GPU-side compilation. https://godbolt.org/z/Kz8PYfPj5

tra added a comment.Oct 24 2022, 3:56 PM

@yaxunl It appears that AMDGPU also does not support __bf16, but for some reason it does not error out in clang headers: https://godbolt.org/z/GrTGMn49f
Any ideas why that may be the case?

jchlanda accepted this revision.Oct 24 2022, 11:01 PM

Looks good.

llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
838–852

New line here.

844

Nice, this fixes the reg type for v2f16 from Float to Untyped, looks like no test picked up on that before.

This revision is now accepted and ready to land.Oct 24 2022, 11:01 PM
yaxunl accepted this revision.Oct 25 2022, 8:32 AM

LGTM. Thanks.

Do you plan to support arithmetic operators for bf16 or implement the FMA instruction support?

Allen added a subscriber: Allen.Oct 25 2022, 9:05 AM
Allen added inline comments.
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
186

sorry for a basic question: what's the different between bf16 and f16 ?

tra added a comment.Oct 25 2022, 9:55 AM

LGTM. Thanks.

Do you plan to support arithmetic operators for bf16 or implement the FMA instruction support?

Yes. sm_90 has introduced a handful of new bf16 operations that will be eventually implemented.

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
186
tra added inline comments.Oct 25 2022, 10:01 AM
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
186

If your question is why both bf16 and f16 use Float16Regs, then the answer is that both use 'untyped' 16-bit *integer * registers.
The difference from Int16Regs is that those are signed. PTX has some awkward restrictions on matching instructions and register kinds, even though under the hood it all boils down to everything using 32-bit registers.

tra updated this revision to Diff 470544.Oct 25 2022, 10:13 AM

whitespace fix

This revision was landed with ongoing or failed builds.Oct 25 2022, 11:10 AM
This revision was automatically updated to reflect the committed changes.
Allen added inline comments.Oct 25 2022, 6:07 PM
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
186

Thanks for your explanation.