This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on host/device
ClosedPublic

Authored by shangwuyao on Feb 14 2023, 2:13 PM.

Download Raw Diff

Details

Reviewers

jlebar
tra
yaxunl

Commits

rG8bd13ad6c537: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on…

Summary

This change matches the CUDA/SPIRV behavior with CUDA/NVPTX, and makes some builtin types
and __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device. This is only
done when host triple is provided and known, otherwise the behavior is unchanged.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shangwuyao created this revision.Feb 14 2023, 2:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 2:13 PM

Herald added subscribers: mattd, carlosgalvezp, ThomasRaoux. · View Herald Transcript

shangwuyao requested review of this revision.Feb 14 2023, 2:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2023, 2:13 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B213736: Diff 497439.Feb 14 2023, 3:30 PM

Making the builtin types consistent is necessary to keep struct layout consistent across host and device, but why do we need to make __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device? Is there any concrete issue if they are not the same?

In D144047#4129154, @yaxunl wrote:

Making the builtin types consistent is necessary to keep struct layout consistent across host and device, but why do we need to make __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device? Is there any concrete issue if they are not the same?

The reason is the same as NVPTX, see https://github.com/llvm/llvm-project/blob/22882c39df71397cc6f9774d18e87d06e016c55f/clang/lib/Basic/Targets/NVPTX.cpp#L137-L141. Without it, we won't be able to use libraries that statically check the __atomic_always_lock_free. I could add the comments in the code if that makes things more clear.

RSenApps added a subscriber: RSenApps.Feb 15 2023, 7:58 AM

In D144047#4129182, @shangwuyao wrote:

In D144047#4129154, @yaxunl wrote:

Making the builtin types consistent is necessary to keep struct layout consistent across host and device, but why do we need to make __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device? Is there any concrete issue if they are not the same?

The reason is the same as NVPTX, see https://github.com/llvm/llvm-project/blob/22882c39df71397cc6f9774d18e87d06e016c55f/clang/lib/Basic/Targets/NVPTX.cpp#L137-L141. Without it, we won't be able to use libraries that statically check the __atomic_always_lock_free. I could add the comments in the code if that makes things more clear.

I see. Better add some comments about that.

This also means backend needs to handle atomic operations not supported by hardware.

Amend with comments

In D144047#4129247, @yaxunl wrote:

In D144047#4129182, @shangwuyao wrote:

In D144047#4129154, @yaxunl wrote:

Making the builtin types consistent is necessary to keep struct layout consistent across host and device, but why do we need to make __GCC_ATOMIC_XXX_LOCK_FREE macros the same between the host and device? Is there any concrete issue if they are not the same?

The reason is the same as NVPTX, see https://github.com/llvm/llvm-project/blob/22882c39df71397cc6f9774d18e87d06e016c55f/clang/lib/Basic/Targets/NVPTX.cpp#L137-L141. Without it, we won't be able to use libraries that statically check the __atomic_always_lock_free. I could add the comments in the code if that makes things more clear.

I see. Better add some comments about that.

This also means backend needs to handle atomic operations not supported by hardware.

Yeah. It is probably the application developer's responsibility to not request atomics that are not supported by the hardware?

Harbormaster completed remote builds in B213916: Diff 497700.Feb 15 2023, 10:56 AM

Run clang-format.

Harbormaster completed remote builds in B214255: Diff 498145.Feb 16 2023, 3:27 PM

Friendly ping :-)

LGTM. Thanks

This revision is now accepted and ready to land.Feb 21 2023, 8:06 AM

Closed by commit rG8bd13ad6c537: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on… (authored by shangwuyao). · Explain WhyFeb 22 2023, 7:46 AM

This revision was automatically updated to reflect the committed changes.

shangwuyao added a commit: rG8bd13ad6c537: [CUDA][SPIRV] Match builtin types and __GCC_ATOMIC_XXX_LOCK_FREE macros on….

Revision Contents

Path

Size

clang/

lib/

Basic/

Targets/

SPIR.h

51 lines

test/

CodeGenCUDASPIRV/

cuda-types.cu

56 lines

Diff 499508

clang/lib/Basic/Targets/SPIR.h

//===--- SPIR.h - Declare SPIR and SPIR-V target feature support - C++ --===//		//===--- SPIR.h - Declare SPIR and SPIR-V target feature support - C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares SPIR and SPIR-V TargetInfo objects.		// This file declares SPIR and SPIR-V TargetInfo objects.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H		#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H
#define LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H		#define LLVM_CLANG_LIB_BASIC_TARGETS_SPIR_H

		#include "Targets.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/TargetOptions.h"		#include "clang/Basic/TargetOptions.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/TargetParser/Triple.h"		#include "llvm/TargetParser/Triple.h"
#include <optional>		#include <optional>

namespace clang {		namespace clang {
namespace targets {		namespace targets {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	static const unsigned SPIRDefIsGenMap[] = {
0, // ptr32_sptr		0, // ptr32_sptr
0, // ptr32_uptr		0, // ptr32_uptr
0, // ptr64		0, // ptr64
0, // hlsl_groupshared		0, // hlsl_groupshared
};		};

// Base class for SPIR and SPIR-V target info.		// Base class for SPIR and SPIR-V target info.
class LLVM_LIBRARY_VISIBILITY BaseSPIRTargetInfo : public TargetInfo {		class LLVM_LIBRARY_VISIBILITY BaseSPIRTargetInfo : public TargetInfo {
		std::unique_ptr<TargetInfo> HostTarget;

protected:		protected:
BaseSPIRTargetInfo(const llvm::Triple &Triple, const TargetOptions &)		BaseSPIRTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
: TargetInfo(Triple) {		: TargetInfo(Triple) {
assert((Triple.isSPIR() \|\| Triple.isSPIRV()) &&		assert((Triple.isSPIR() \|\| Triple.isSPIRV()) &&
"Invalid architecture for SPIR or SPIR-V.");		"Invalid architecture for SPIR or SPIR-V.");
assert(getTriple().getOS() == llvm::Triple::UnknownOS &&		assert(getTriple().getOS() == llvm::Triple::UnknownOS &&
"SPIR(-V) target must use unknown OS");		"SPIR(-V) target must use unknown OS");
assert(getTriple().getEnvironment() == llvm::Triple::UnknownEnvironment &&		assert(getTriple().getEnvironment() == llvm::Triple::UnknownEnvironment &&
"SPIR(-V) target must use unknown environment type");		"SPIR(-V) target must use unknown environment type");
TLSSupported = false;		TLSSupported = false;
VLASupported = false;		VLASupported = false;
LongWidth = LongAlign = 64;		LongWidth = LongAlign = 64;
AddrSpaceMap = &SPIRDefIsPrivMap;		AddrSpaceMap = &SPIRDefIsPrivMap;
UseAddrSpaceMapMangling = true;		UseAddrSpaceMapMangling = true;
HasLegalHalfType = true;		HasLegalHalfType = true;
HasFloat16 = true;		HasFloat16 = true;
// Define available target features		// Define available target features
// These must be defined in sorted order!		// These must be defined in sorted order!
NoAsmVariants = true;		NoAsmVariants = true;

		llvm::Triple HostTriple(Opts.HostTriple);
		if (!HostTriple.isSPIR() && !HostTriple.isSPIRV() &&
		HostTriple.getArch() != llvm::Triple::UnknownArch) {
		HostTarget.reset(AllocateTarget(llvm::Triple(Opts.HostTriple), Opts));

		// Copy properties from host target.
		BoolWidth = HostTarget->getBoolWidth();
		BoolAlign = HostTarget->getBoolAlign();
		IntWidth = HostTarget->getIntWidth();
		IntAlign = HostTarget->getIntAlign();
		HalfWidth = HostTarget->getHalfWidth();
		HalfAlign = HostTarget->getHalfAlign();
		FloatWidth = HostTarget->getFloatWidth();
		FloatAlign = HostTarget->getFloatAlign();
		DoubleWidth = HostTarget->getDoubleWidth();
		DoubleAlign = HostTarget->getDoubleAlign();
		LongWidth = HostTarget->getLongWidth();
		LongAlign = HostTarget->getLongAlign();
		LongLongWidth = HostTarget->getLongLongWidth();
		LongLongAlign = HostTarget->getLongLongAlign();
		MinGlobalAlign = HostTarget->getMinGlobalAlign(/* TypeSize = */ 0);
		NewAlign = HostTarget->getNewAlign();
		DefaultAlignForAttributeAligned =
		HostTarget->getDefaultAlignForAttributeAligned();
		IntMaxType = HostTarget->getIntMaxType();
		WCharType = HostTarget->getWCharType();
		WIntType = HostTarget->getWIntType();
		Char16Type = HostTarget->getChar16Type();
		Char32Type = HostTarget->getChar32Type();
		Int64Type = HostTarget->getInt64Type();
		SigAtomicType = HostTarget->getSigAtomicType();
		ProcessIDType = HostTarget->getProcessIDType();

		UseBitFieldTypeAlignment = HostTarget->useBitFieldTypeAlignment();
		UseZeroLengthBitfieldAlignment =
		HostTarget->useZeroLengthBitfieldAlignment();
		UseExplicitBitFieldAlignment = HostTarget->useExplicitBitFieldAlignment();
		ZeroLengthBitfieldBoundary = HostTarget->getZeroLengthBitfieldBoundary();

		// This is a bit of a lie, but it controls __GCC_ATOMIC_XXX_LOCK_FREE, and
		// we need those macros to be identical on host and device, because (among
		// other things) they affect which standard library classes are defined,
		// and we need all classes to be defined on both the host and device.
		MaxAtomicInlineWidth = HostTarget->getMaxAtomicInlineWidth();
		}
}		}

public:		public:
// SPIR supports the half type and the only llvm intrinsic allowed in SPIR is		// SPIR supports the half type and the only llvm intrinsic allowed in SPIR is
// memcpy as per section 3 of the SPIR spec.		// memcpy as per section 3 of the SPIR spec.
bool useFP16ConversionIntrinsics() const override { return false; }		bool useFP16ConversionIntrinsics() const override { return false; }

ArrayRef<Builtin::Info> getTargetBuiltins() const override {		ArrayRef<Builtin::Info> getTargetBuiltins() const override {
▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

clang/test/CodeGenCUDASPIRV/cuda-types.cu

This file was added.

				// Check that types, widths, __CLANG_ATOMIC* macros, etc. match on the host and
				// device sides of CUDA compilations. Note that we filter out long double and
				// maxwidth of _BitInt(), as this is intentionally different on host and device.
				//
				// Also ignore __CLANG_ATOMIC_LLONG_LOCK_FREE on i386. The default host CPU for
				// an i386 triple is typically at least an i586, which has cmpxchg8b (Clang
				// feature, "cx8"). Therefore, __CLANG_ATOMIC_LLONG_LOCK_FREE is 2 on the host,
				// but the value should be 1 for the device.
				//
				// Unlike CUDA, the width of SPIR-V POINTER type could differ between host and
				// device, because SPIR-V explicitly sets POINTER type width. So it is the
				// user's responsibility to choose the offload with the right POINTER size,
				// otherwise the values for __CLANG_ATOMIC_POINTER_LOCK_FREE could be different.

				// RUN: mkdir -p %t

				// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-host-defines-filtered
				// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-device-defines-filtered
				// RUN: diff %t/i386-host-defines-filtered %t/i386-device-defines-filtered

				// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-host-defines-filtered
				// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv32 -target i386-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/i386-msvc-device-defines-filtered
				// RUN: diff %t/i386-msvc-host-defines-filtered %t/i386-msvc-device-defines-filtered

				// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-host-defines-filtered
				// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-device-defines-filtered
				// RUN: diff %t/x86_64-host-defines-filtered %t/x86_64-device-defines-filtered

				// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-host-defines-filtered
				// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target powerpc64-unknown-linux-gnu -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/powerpc64-device-defines-filtered
				// RUN: diff %t/powerpc64-host-defines-filtered %t/powerpc64-device-defines-filtered

				// RUN: %clang --cuda-host-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-host-defines-filtered
				// RUN: %clang --cuda-device-only -nocudainc -nocudalib --offload=spirv64 -target x86_64-windows-msvc -x cuda -emit-llvm -E -dM -o - /dev/null \
				// RUN: \| grep -E '__CLANG_ATOMIC' \
				// RUN: \| grep -Ev '_ATOMIC_LLONG_LOCK_FREE' > %t/x86_64-msvc-device-defines-filtered
				// RUN: diff %t/x86_64-msvc-host-defines-filtered %t/x86_64-msvc-device-defines-filtered