This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Propagate __float128 support from the host.
AcceptedPublic

Authored by tra on Aug 24 2023, 2:02 PM.

Details

Reviewers
jlebar
yaxunl
Summary

GPUs do not have actual FP128 support, but we do need to be able to compile
host-side headers which use __float128. On the GPU side we'll downgrade __float128
to double, similarly to how we handle long double. Both types will have
different in-memory representation compared to their host counterparts and are
not expected to be interchangeable across host/device boundary.

Also see https://reviews.llvm.org/D78513 which applied equivalent change to
HIP/AMDGPU.

Diff Detail

Event Timeline

tra created this revision.Aug 24 2023, 2:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 2:02 PM
tra edited the summary of this revision. (Show Details)Aug 24 2023, 2:03 PM
tra published this revision for review.Aug 24 2023, 2:06 PM
tra added reviewers: jlebar, yaxunl.

For some context about why it's needed see https://github.com/compiler-explorer/compiler-explorer/pull/5373#issuecomment-1687127788
The short version is that currently CUDA compilation is broken w/ clang with unpatched libstdc++. Ubuntu and Debian patch libstdc++ to avoid the problem, but this should be handled by clang.

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 2:06 PM
yaxunl accepted this revision.Aug 24 2023, 2:37 PM

LGTM. Thanks

This revision is now accepted and ready to land.Aug 24 2023, 2:37 PM
tra added a subscriber: ABataev.Aug 28 2023, 3:56 PM

@ABataev

This patch breaks breaks two tests:

  • github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp
  • github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp

It's not clear what exactly these tests are testing for and I can't tell whether I should just remove the checks related to __float128, or if there's something else that would need to be done on the OpenMP side.

AFAICT, OpenMP will pick up double format for __float128 after my patch. This suggests that we would only have long double left as an unsupported type on GPU-supporting targets, which suggests that I should just remove the checks related to __float128 from those tests.

Am I missing something? Is there anything else that may need to be done on the OpenMP side?

@ABataev

This patch breaks breaks two tests:

  • github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp
  • github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp

It's not clear what exactly these tests are testing for and I can't tell whether I should just remove the checks related to __float128, or if there's something else that would need to be done on the OpenMP side.

AFAICT, OpenMP will pick up double format for __float128 after my patch. This suggests that we would only have long double left as an unsupported type on GPU-supporting targets, which suggests that I should just remove the checks related to __float128 from those tests.

Am I missing something? Is there anything else that may need to be done on the OpenMP side?

Just checks removal should be fine

tra added a subscriber: jhuber6.Aug 29 2023, 11:09 AM

Just checks removal should be fine

Looks like OpenMP handles long double and __float128 differently -- it always insists on using the host's FP format for both.
https://github.com/llvm/llvm-project/blob/d037445f3a2c6dc1842b5bfc1d5d81988c2f223d/clang/lib/AST/ASTContext.cpp#L1674

This creates a divergence between what clang thinks and what LLVM can handle.
I'm not quite sure how it's supposed to work with NVPTX or AMDGPU, where we demote those types to double and can't generate code for the actual types.

@jhuber6 what does OpenMP expect to happen for those types on the GPU side?

Just checks removal should be fine

Looks like OpenMP handles long double and __float128 differently -- it always insists on using the host's FP format for both.
https://github.com/llvm/llvm-project/blob/d037445f3a2c6dc1842b5bfc1d5d81988c2f223d/clang/lib/AST/ASTContext.cpp#L1674

This creates a divergence between what clang thinks and what LLVM can handle.
I'm not quite sure how it's supposed to work with NVPTX or AMDGPU, where we demote those types to double and can't generate code for the actual types.

@jhuber6 what does OpenMP expect to happen for those types on the GPU side?

That's a good question, I'm not entirely sure what the expectation would be. We obviously need to keep things coherent across D2H and H2D memcpy's so we want them to be the same size. I'm pretty sure our handling of this is just wrong right now. Just doing a simple example here https://godbolt.org/z/Y3E58PKMz shows that for NVPTX we error out (as I would expect) but for AMDGPU we emit an x86 80-bit double. My guess is that we should make this more explicit, considering that both vendors explicitly state that quad precision is not available on the GPU, unless we want to implement some software floats.

tra added a comment.Aug 29 2023, 1:28 PM

Just doing a simple example here https://godbolt.org/z/Y3E58PKMz shows that for NVPTX we error out (as I would expect) but for AMDGPU we emit an x86 80-bit double.

With this patch NVPTX will behave the same as AMDGPU and we'll no longer error out.

I think I may need to explicitly add a diagnostics for the case where the host idea of long double and __float128 does not match that of the target. It would have to be specific to OpenMP, as CUDA does expect this discrepancy for historic reasons.