This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Basic/Targets/
-
Basic/
-
Targets/
-
NVPTX.cpp
-
test/SemaCUDA/
-
SemaCUDA/
-
amdgpu-f128.cu
-
f128.cu

Differential D158778

[CUDA] Propagate __float128 support from the host.
AcceptedPublic

Authored by tra on Aug 24 2023, 2:02 PM.

Download Raw Diff

Details

Reviewers

jlebar
yaxunl

Summary

GPUs do not have actual FP128 support, but we do need to be able to compile
host-side headers which use __float128. On the GPU side we'll downgrade __float128
to double, similarly to how we handle long double. Both types will have
different in-memory representation compared to their host counterparts and are
not expected to be interchangeable across host/device boundary.

Also see https://reviews.llvm.org/D78513 which applied equivalent change to
HIP/AMDGPU.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	96,090 ms	clang CI - Running libc++ test suite with Clang Modules > llvm-libc++-shared-cfg-in.llvm-libc++-shared-cfg-in::/var/lib/buildkite-agent/builds/llvm-project/build/generic-modules/test/libcxx/modules_include.gen.py/__std_clang_module.compile.pass.mm

Event Timeline

tra created this revision.Aug 24 2023, 2:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 2:02 PM

Herald added subscribers: mattd, gchakrabarti, asavonic and 5 others. · View Herald Transcript

tra edited the summary of this revision. (Show Details)Aug 24 2023, 2:03 PM

For some context about why it's needed see https://github.com/compiler-explorer/compiler-explorer/pull/5373#issuecomment-1687127788
The short version is that currently CUDA compilation is broken w/ clang with unpatched libstdc++. Ubuntu and Debian patch libstdc++ to avoid the problem, but this should be handled by clang.

Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2023, 2:06 PM

Herald added subscribers: cfe-commits, jholewinski. · View Herald Transcript

Also, https://github.com/llvm/llvm-project/issues/46903

LGTM. Thanks

This revision is now accepted and ready to land.Aug 24 2023, 2:37 PM

Harbormaster completed remote builds in B254718: Diff 553253.Aug 24 2023, 3:39 PM

@ABataev

This patch breaks breaks two tests:

github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp
github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp

It's not clear what exactly these tests are testing for and I can't tell whether I should just remove the checks related to __float128, or if there's something else that would need to be done on the OpenMP side.

AFAICT, OpenMP will pick up double format for __float128 after my patch. This suggests that we would only have long double left as an unsupported type on GPU-supporting targets, which suggests that I should just remove the checks related to __float128 from those tests.

Am I missing something? Is there anything else that may need to be done on the OpenMP side?

In D158778#4622901, @tra wrote:

@ABataev

This patch breaks breaks two tests:

github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp

github.com/llvm/llvm-project/blob/main/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp

It's not clear what exactly these tests are testing for and I can't tell whether I should just remove the checks related to __float128, or if there's something else that would need to be done on the OpenMP side.

AFAICT, OpenMP will pick up double format for __float128 after my patch. This suggests that we would only have long double left as an unsupported type on GPU-supporting targets, which suggests that I should just remove the checks related to __float128 from those tests.

Am I missing something? Is there anything else that may need to be done on the OpenMP side?

Just checks removal should be fine

In D158778#4624408, @ABataev wrote:

Just checks removal should be fine

Looks like OpenMP handles long double and __float128 differently -- it always insists on using the host's FP format for both.
https://github.com/llvm/llvm-project/blob/d037445f3a2c6dc1842b5bfc1d5d81988c2f223d/clang/lib/AST/ASTContext.cpp#L1674

This creates a divergence between what clang thinks and what LLVM can handle.
I'm not quite sure how it's supposed to work with NVPTX or AMDGPU, where we demote those types to double and can't generate code for the actual types.

@jhuber6 what does OpenMP expect to happen for those types on the GPU side?

In D158778#4625892, @tra wrote:

In D158778#4624408, @ABataev wrote:

Just checks removal should be fine

Looks like OpenMP handles long double and __float128 differently -- it always insists on using the host's FP format for both.
https://github.com/llvm/llvm-project/blob/d037445f3a2c6dc1842b5bfc1d5d81988c2f223d/clang/lib/AST/ASTContext.cpp#L1674

This creates a divergence between what clang thinks and what LLVM can handle.
I'm not quite sure how it's supposed to work with NVPTX or AMDGPU, where we demote those types to double and can't generate code for the actual types.

@jhuber6 what does OpenMP expect to happen for those types on the GPU side?

That's a good question, I'm not entirely sure what the expectation would be. We obviously need to keep things coherent across D2H and H2D memcpy's so we want them to be the same size. I'm pretty sure our handling of this is just wrong right now. Just doing a simple example here https://godbolt.org/z/Y3E58PKMz shows that for NVPTX we error out (as I would expect) but for AMDGPU we emit an x86 80-bit double. My guess is that we should make this more explicit, considering that both vendors explicitly state that quad precision is not available on the GPU, unless we want to implement some software floats.

In D158778#4626181, @jhuber6 wrote:

Just doing a simple example here https://godbolt.org/z/Y3E58PKMz shows that for NVPTX we error out (as I would expect) but for AMDGPU we emit an x86 80-bit double.

With this patch NVPTX will behave the same as AMDGPU and we'll no longer error out.

I think I may need to explicitly add a diagnostics for the case where the host idea of long double and __float128 does not match that of the target. It would have to be specific to OpenMP, as CUDA does expect this discrepancy for historic reasons.

Revision Contents

Path

Size

clang/

lib/

Basic/

Targets/

NVPTX.cpp

14 lines

test/

SemaCUDA/

amdgpu-f128.cu

	f128.cu
	amdgpu-f128.cu

1 line

Diff 553253

clang/lib/Basic/Targets/NVPTX.cpp

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
ZeroLengthBitfieldBoundary = HostTarget->getZeroLengthBitfieldBoundary();		ZeroLengthBitfieldBoundary = HostTarget->getZeroLengthBitfieldBoundary();

// This is a bit of a lie, but it controls __GCC_ATOMIC_XXX_LOCK_FREE, and		// This is a bit of a lie, but it controls __GCC_ATOMIC_XXX_LOCK_FREE, and
// we need those macros to be identical on host and device, because (among		// we need those macros to be identical on host and device, because (among
// other things) they affect which standard library classes are defined, and		// other things) they affect which standard library classes are defined, and
// we need all classes to be defined on both the host and device.		// we need all classes to be defined on both the host and device.
MaxAtomicInlineWidth = HostTarget->getMaxAtomicInlineWidth();		MaxAtomicInlineWidth = HostTarget->getMaxAtomicInlineWidth();

		// For certain builtin types support on the host target, claim they are
		// support to pass the compilation of the host code during the device-side
		// compilation.
		//
		// FIXME: As the side effect, we also accept `__float128` uses in the device
		// code, but use 'double' as the underlying type, so host/device
		// representation of the type is different. This is similar to what happens to
		// long double.

		if (HostTarget->hasFloat128Type()) {
		HasFloat128 = true;
		Float128Format = DoubleFormat;
		}

// Properties intentionally not copied from host:		// Properties intentionally not copied from host:
// - LargeArrayMinWidth, LargeArrayAlign: Not visible across the		// - LargeArrayMinWidth, LargeArrayAlign: Not visible across the
// host/device boundary.		// host/device boundary.
// - SuitableAlign: Not visible across the host/device boundary, and may		// - SuitableAlign: Not visible across the host/device boundary, and may
// correctly be different on host/device, e.g. if host has wider vector		// correctly be different on host/device, e.g. if host has wider vector
// types than device.		// types than device.
// - LongDoubleWidth, LongDoubleAlign: nvptx's long double type is the same		// - LongDoubleWidth, LongDoubleAlign: nvptx's long double type is the same
// as its double type, but that's not necessarily true on the host.		// as its double type, but that's not necessarily true on the host.
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

clang/test/SemaCUDA/amdgpu-f128.cu

This file was moved to clang/test/SemaCUDA/f128.cu.

clang/test/SemaCUDA/f128.cu

This file was moved from clang/test/SemaCUDA/amdgpu-f128.cu.

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -fsyntax-only -verify %s			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -fsyntax-only -verify %s
				// RUN: %clang_cc1 -triple nvptx64 -aux-triple x86_64-unknown-linux-gnu -fcuda-is-device -fsyntax-only -verify %s

	// expected-no-diagnostics			// expected-no-diagnostics
	typedef __float128 f128_t;			typedef __float128 f128_t;