This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Sema/
-
clang/
-
Sema/
4/4
Sema.h
-
lib/
-
CodeGen/
-
CGDeclCXX.cpp
2/2
CodeGenModule.cpp
-
Sema/
2/2
SemaCUDA.cpp
-
SemaDeclAttr.cpp
-
SemaExpr.cpp
-
test/
-
AST/
-
ast-dump-constant-var.cu
-
CodeGenCUDA/
2/2
host-used-device-var.cu
-
SemaCUDA/
-
static-device-var.cu

Differential D102801

[CUDA][HIP] Fix device variables used by host
ClosedPublic

Authored by yaxunl on May 19 2021, 12:55 PM.

Download Raw Diff

Details

Reviewers

tra
aaron.ballman
rsmith

Commits

rG4cb42564ec4b: [CUDA][HIP] Fix device variables used by host

Summary

variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.

This fixes the regression caused by https://reviews.llvm.org/D102237

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.May 19 2021, 12:55 PM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptMay 19 2021, 12:55 PM

yaxunl requested review of this revision.May 19 2021, 12:55 PM

yaxunl mentioned this in D102237: [CUDA][HIP] Fix non-ODR-use of static device variable.May 19 2021, 1:05 PM

Harbormaster completed remote builds in B105290: Diff 346537.May 19 2021, 1:57 PM

Tentative LGTM as we need it to fix the regression soon.

Summoning @rsmith for the 'big picture' opinion.
While the patch may fix this particular regression, I wonder if there's a better way to deal with this. We're growing a bit too many nuances that would be hard to explain and may cause more corner cases to appear.

clang/lib/CodeGen/CodeGenModule.cpp
2391	IIUIC, The idea here is that we do not want to emit `constexpr int foo;` on device, even if we happen to ODR-use it there. And the way we detect this is by checking for implicit `__constant__` we happen to add to constexpr variables. I think this may be relying on the implementation details too much. It also makes compiler's behavior somewhat surprising -- we would potentially emit other variables that do not get any device attributes attribute, but would not emit the variables with implicit `__constant__`, which is a device attribute. I'm not sure if we have any good options here. This may be an acceptable compromise, but I wonder if there's a better way to deal with this. That said, this patch is OK to fix the regression we have now, but we may need to revisit this.
clang/test/CodeGenCUDA/host-used-device-var.cu
185–213	This definitely needs some comments. Otherwise this is nearly incomprehensible and it's impossible to tell what's going on.

This patch does not appear to fix the second regression introduced by the D102237.

Trying to compile the following code triggers an assertion in CGExpr.cpp:

class a {
public:
  a(char *);
};
void b() {
  [](char *c) {
    static a d(c);
    d;
  };
}

With assertions disabled it eventually leads to a different error:
Module has a nontrivial global ctor, which NVPTX does not support.
https://godbolt.org/z/sYE1dKr1W

Fix the other regression

In D102801#2769936, @tra wrote:
This patch does not appear to fix the second regression introduced by the D102237.

Trying to compile the following code triggers an assertion in CGExpr.cpp:
class a {
public:
  a(char *);
};
void b() {
  [](char *c) {
    static a d(c);
    d;
  };
}
With assertions disabled it eventually leads to a different error:
Module has a nontrivial global ctor, which NVPTX does not support.
https://godbolt.org/z/sYE1dKr1W

The root cause is similar to the last regression. Basically when a variable is emitted on both sides but as different entities, we should not treat it as a device variable on host side. I have updated the patch to fix both regressions.

clang/lib/CodeGen/CodeGenModule.cpp
2391	we need to differentiate `constexpr int a` and `__constant__ constexpr int a`, since the former is emitted on both sides, and the later is only emitted on device side. It seems the only way to differentiate them is to check whether the constant attribute is explicit or not.
clang/test/CodeGenCUDA/host-used-device-var.cu
185–213	done

In D102801#2769619, @tra wrote:

Tentative LGTM as we need it to fix the regression soon.

Summoning @rsmith for the 'big picture' opinion.
While the patch may fix this particular regression, I wonder if there's a better way to deal with this. We're growing a bit too many nuances that would be hard to explain and may cause more corner cases to appear.

In the updated patch I have a simpler solution which is easier to explain to the users. Basically we classify variables by how they are emitted: device side only, host side only, both sides as different entities (e.g. default constexpr var), and both sides as unified entity (e.g. managed var). For variables emitted on both sides as separate entities, we have limited knowledge and we limit what we can do for them. I think users should understand the compiler's limitation in such cases. And they can easily workaround that by making the variable explicitly device variable.

In D102801#2771664, @yaxunl wrote:

In the updated patch I have a simpler solution which is easier to explain to the users. Basically we classify variables by how they are emitted: device side only, host side only, both sides as different entities (e.g. default constexpr var), and both sides as unified entity (e.g. managed var). For variables emitted on both sides as separate entities, we have limited knowledge and we limit what we can do for them. I think users should understand the compiler's limitation in such cases. And they can easily workaround that by making the variable explicitly device variable.

This is really nice.

Let me test it internally and see if anything breaks.

clang/include/clang/Sema/Sema.h
12066	Wasn't there another kind, where the variable is emitted on the host with device-side shadow? I vaguely recall it had something to do with textures.
12067	I think we should mention the host-side shadows, too.
clang/lib/Sema/SemaCUDA.cpp
148–149	I'm still not a fan of relying on a implicit constant. Can we change it to more direct `is-a-constexpr && !has-explicit-device-side-attr` ? We may eventually consider relaxing this to `can-be-const-evaluated` and allow const vars with known values.

Harbormaster completed remote builds in B105470: Diff 346796.May 20 2021, 12:08 PM

yaxunl marked 3 inline comments as done.May 20 2021, 12:54 PM

yaxunl added inline comments.

clang/include/clang/Sema/Sema.h
12066	That was the first implementation, which was similar to managed var but used pinned host memory as a common memory shared by device and host. However, that implementation was later replaced by a different implementation which is similar to nvcc. In the new implementation textures and surfaces are like usual device variables. So far I do not see the necessity to differentiate them from usual device variables.
12067	will do
clang/lib/Sema/SemaCUDA.cpp
148–149	will do. agree we should relax this for const var in the future

revised by Artem's comments

LGTM.

I've verified that Tensorflow still builds with this patch and that the patch does fix the regressions we've seen.
If you could land this patch soon, that would be appreciated.

This revision is now accepted and ready to land.May 20 2021, 1:31 PM

Harbormaster completed remote builds in B105489: Diff 346831.May 20 2021, 1:40 PM

This revision was landed with ongoing or failed builds.May 20 2021, 2:04 PM

Closed by commit rG4cb42564ec4b: [CUDA][HIP] Fix device variables used by host (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rG4cb42564ec4b: [CUDA][HIP] Fix device variables used by host.

Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2021, 2:05 PM

Revision Contents

Path

Size

clang/

include/

clang/

Sema/

Sema.h

9 lines

lib/

CodeGen/

CGDeclCXX.cpp

4 lines

CodeGenModule.cpp

2 lines

Sema/

SemaCUDA.cpp

40 lines

SemaDeclAttr.cpp

7 lines

SemaExpr.cpp

26 lines

test/

AST/

ast-dump-constant-var.cu

32 lines

CodeGenCUDA/

host-used-device-var.cu

122 lines

SemaCUDA/

static-device-var.cu

32 lines

Diff 346851

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,057 Lines • ▼ Show 20 Lines	public:
};		};

/// Determines whether the given function is a CUDA device/host/kernel/etc.		/// Determines whether the given function is a CUDA device/host/kernel/etc.
/// function.		/// function.
///		///
/// Use this rather than examining the function's attributes yourself -- you		/// Use this rather than examining the function's attributes yourself -- you
/// will get it wrong. Returns CFT_Host if D is null.		/// will get it wrong. Returns CFT_Host if D is null.
CUDAFunctionTarget IdentifyCUDATarget(const FunctionDecl *D,		CUDAFunctionTarget IdentifyCUDATarget(const FunctionDecl *D,
bool IgnoreImplicitHDAttr = false);		bool IgnoreImplicitHDAttr = false);
		traUnsubmitted Done Reply Inline Actions Wasn't there another kind, where the variable is emitted on the host with device-side shadow? I vaguely recall it had something to do with textures. tra: Wasn't there another kind, where the variable is emitted on the host with device-side shadow? I…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions That was the first implementation, which was similar to managed var but used pinned host memory as a common memory shared by device and host. However, that implementation was later replaced by a different implementation which is similar to nvcc. In the new implementation textures and surfaces are like usual device variables. So far I do not see the necessity to differentiate them from usual device variables. yaxunl: That was the first implementation, which was similar to managed var but used pinned host memory…
CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView &Attrs);		CUDAFunctionTarget IdentifyCUDATarget(const ParsedAttributesView &Attrs);
		traUnsubmitted Done Reply Inline Actions I think we should mention the host-side shadows, too. tra: I think we should mention the host-side shadows, too.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do

		enum CUDAVariableTarget {
		CVT_Device, /// Emitted on device side with a shadow variable on host side
		CVT_Host, /// Emitted on host side only
		CVT_Both, /// Emitted on both sides with different addresses
		CVT_Unified, /// Emitted as a unified address, e.g. managed variables
		};
		/// Determines whether the given variable is emitted on host or device side.
		CUDAVariableTarget IdentifyCUDATarget(const VarDecl *D);

/// Gets the CUDA target for the current context.		/// Gets the CUDA target for the current context.
CUDAFunctionTarget CurrentCUDATarget() {		CUDAFunctionTarget CurrentCUDATarget() {
return IdentifyCUDATarget(dyn_cast<FunctionDecl>(CurContext));		return IdentifyCUDATarget(dyn_cast<FunctionDecl>(CurContext));
}		}

static bool isCUDAImplicitHostDeviceFunction(const FunctionDecl *D);		static bool isCUDAImplicitHostDeviceFunction(const FunctionDecl *D);

// CUDA function call preference. Must be ordered numerically from		// CUDA function call preference. Must be ordered numerically from
▲ Show 20 Lines • Show All 895 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGDeclCXX.cpp

Show First 20 Lines • Show All 638 Lines • ▼ Show 20 Lines	CodeGenModule::EmitCXXGlobalInitFunc() {
// However it seems global destruction has little meaning without any		// However it seems global destruction has little meaning without any
// dynamic resource allocation on the device and program scope variables are		// dynamic resource allocation on the device and program scope variables are
// destroyed by the runtime when program is released.		// destroyed by the runtime when program is released.
if (getLangOpts().OpenCL) {		if (getLangOpts().OpenCL) {
GenOpenCLArgMetadata(Fn);		GenOpenCLArgMetadata(Fn);
Fn->setCallingConv(llvm::CallingConv::SPIR_KERNEL);		Fn->setCallingConv(llvm::CallingConv::SPIR_KERNEL);
}		}

if (getLangOpts().HIP) {		assert(!getLangOpts().CUDA \|\| !getLangOpts().CUDAIsDevice \|\|
		getLangOpts().GPUAllowDeviceInit);
		if (getLangOpts().HIP && getLangOpts().CUDAIsDevice) {
Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);		Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
Fn->addFnAttr("device-init");		Fn->addFnAttr("device-init");
}		}

CXXGlobalInits.clear();		CXXGlobalInits.clear();
}		}

void CodeGenModule::EmitCXXGlobalCleanUpFunc() {		void CodeGenModule::EmitCXXGlobalCleanUpFunc() {
▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 2,362 Lines • ▼ Show 20 Lines	if (!DeferredVTables.empty()) {

// Emitting a vtable doesn't directly cause more vtables to		// Emitting a vtable doesn't directly cause more vtables to
// become deferred, although it can cause functions to be		// become deferred, although it can cause functions to be
// emitted that then need those vtables.		// emitted that then need those vtables.
assert(DeferredVTables.empty());		assert(DeferredVTables.empty());
}		}

// Emit CUDA/HIP static device variables referenced by host code only.		// Emit CUDA/HIP static device variables referenced by host code only.
		// Note we should not clear CUDADeviceVarODRUsedByHost since it is still
		// needed for further handling.
if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice)		if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice)
for (const auto *V : getContext().CUDADeviceVarODRUsedByHost)		for (const auto *V : getContext().CUDADeviceVarODRUsedByHost)
DeferredDeclsToEmit.push_back(V);		DeferredDeclsToEmit.push_back(V);

// Stop if we're out of both deferred vtables and deferred declarations.		// Stop if we're out of both deferred vtables and deferred declarations.
if (DeferredDeclsToEmit.empty())		if (DeferredDeclsToEmit.empty())
return;		return;

// Grab the list of decls to emit. If EmitGlobalDefinition schedules more		// Grab the list of decls to emit. If EmitGlobalDefinition schedules more
// work, it will not interfere with this.		// work, it will not interfere with this.
std::vector<GlobalDecl> CurDeclsToEmit;		std::vector<GlobalDecl> CurDeclsToEmit;
CurDeclsToEmit.swap(DeferredDeclsToEmit);		CurDeclsToEmit.swap(DeferredDeclsToEmit);

for (GlobalDecl &D : CurDeclsToEmit) {		for (GlobalDecl &D : CurDeclsToEmit) {
// We should call GetAddrOfGlobal with IsForDefinition set to true in order		// We should call GetAddrOfGlobal with IsForDefinition set to true in order
// to get GlobalValue with exactly the type we need, not something that		// to get GlobalValue with exactly the type we need, not something that
// might had been created for another decl with the same mangled name but		// might had been created for another decl with the same mangled name but
// different type.		// different type.
llvm::GlobalValue *GV = dyn_cast<llvm::GlobalValue>(		llvm::GlobalValue *GV = dyn_cast<llvm::GlobalValue>(
		traUnsubmitted Done Reply Inline Actions IIUIC, The idea here is that we do not want to emit `constexpr int foo;` on device, even if we happen to ODR-use it there. And the way we detect this is by checking for implicit `__constant__` we happen to add to constexpr variables. I think this may be relying on the implementation details too much. It also makes compiler's behavior somewhat surprising -- we would potentially emit other variables that do not get any device attributes attribute, but would not emit the variables with implicit `__constant__`, which is a device attribute. I'm not sure if we have any good options here. This may be an acceptable compromise, but I wonder if there's a better way to deal with this. That said, this patch is OK to fix the regression we have now, but we may need to revisit this. tra: IIUIC, The idea here is that we do not want to emit `constexpr int foo;` on device, even if we…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions we need to differentiate `constexpr int a` and `__constant__ constexpr int a`, since the former is emitted on both sides, and the later is only emitted on device side. It seems the only way to differentiate them is to check whether the constant attribute is explicit or not. yaxunl: we need to differentiate `constexpr int a` and `__constant__ constexpr int a`, since the former…
GetAddrOfGlobal(D, ForDefinition));		GetAddrOfGlobal(D, ForDefinition));

// In case of different address spaces, we may still get a cast, even with		// In case of different address spaces, we may still get a cast, even with
// IsForDefinition equal to true. Query mangled names table to get		// IsForDefinition equal to true. Query mangled names table to get
// GlobalValue.		// GlobalValue.
if (!GV)		if (!GV)
GV = GetGlobalValue(getMangledName(D));		GV = GetGlobalValue(getMangledName(D));

▲ Show 20 Lines • Show All 4,010 Lines • Show Last 20 Lines

clang/lib/Sema/SemaCUDA.cpp

Show All 20 Lines
#include "clang/Sema/Sema.h"		#include "clang/Sema/Sema.h"
#include "clang/Sema/SemaDiagnostic.h"		#include "clang/Sema/SemaDiagnostic.h"
#include "clang/Sema/SemaInternal.h"		#include "clang/Sema/SemaInternal.h"
#include "clang/Sema/Template.h"		#include "clang/Sema/Template.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
using namespace clang;		using namespace clang;

		template <typename AttrT> static bool hasExplicitAttr(const VarDecl *D) {
		if (!D)
		return false;
		if (auto *A = D->getAttr<AttrT>())
		return !A->isImplicit();
		return false;
		}

void Sema::PushForceCUDAHostDevice() {		void Sema::PushForceCUDAHostDevice() {
assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");		assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");
ForceCUDAHostDeviceDepth++;		ForceCUDAHostDeviceDepth++;
}		}

bool Sema::PopForceCUDAHostDevice() {		bool Sema::PopForceCUDAHostDevice() {
assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");		assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");
if (ForceCUDAHostDeviceDepth == 0)		if (ForceCUDAHostDeviceDepth == 0)
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	if (hasAttr<CUDADeviceAttr>(D, IgnoreImplicitHDAttr)) {
// Some implicit declarations (like intrinsic functions) are not marked.		// Some implicit declarations (like intrinsic functions) are not marked.
// Set the most lenient target on them for maximal flexibility.		// Set the most lenient target on them for maximal flexibility.
return CFT_HostDevice;		return CFT_HostDevice;
}		}

return CFT_Host;		return CFT_Host;
}		}

		/// IdentifyTarget - Determine the CUDA compilation target for this variable.
		Sema::CUDAVariableTarget Sema::IdentifyCUDATarget(const VarDecl *Var) {
		if (Var->hasAttr<HIPManagedAttr>())
		return CVT_Unified;
		if (Var->isConstexpr() && !hasExplicitAttr<CUDAConstantAttr>(Var))
		return CVT_Both;
		traUnsubmitted Done Reply Inline Actions I'm still not a fan of relying on a implicit constant. Can we change it to more direct `is-a-constexpr && !has-explicit-device-side-attr` ? We may eventually consider relaxing this to `can-be-const-evaluated` and allow const vars with known values. tra: I'm still not a fan of relying on a implicit __constant__. Can we change it to more direct `is…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions will do. agree we should relax this for const var in the future yaxunl: will do. agree we should relax this for const var in the future
		if (Var->hasAttr<CUDADeviceAttr>() \|\| Var->hasAttr<CUDAConstantAttr>() \|\|
		Var->hasAttr<CUDASharedAttr>() \|\|
		Var->getType()->isCUDADeviceBuiltinSurfaceType() \|\|
		Var->getType()->isCUDADeviceBuiltinTextureType())
		return CVT_Device;
		// Function-scope static variable without explicit device or constant
		// attribute are emitted
		// - on both sides in host device functions
		// - on device side in device or global functions
		if (auto *FD = dyn_cast<FunctionDecl>(Var->getDeclContext())) {
		switch (IdentifyCUDATarget(FD)) {
		case CFT_HostDevice:
		return CVT_Both;
		case CFT_Device:
		case CFT_Global:
		return CVT_Device;
		default:
		return CVT_Host;
		}
		}
		return CVT_Host;
		}

// * CUDA Call preference table		// * CUDA Call preference table
//		//
// F - from,		// F - from,
// T - to		// T - to
// Ph - preference in host mode		// Ph - preference in host mode
// Pd - preference in device mode		// Pd - preference in device mode
// H - handled in (x)		// H - handled in (x)
// Preferences: N:native, SS:same side, HD:host-device, WS:wrong side, --:never.		// Preferences: N:native, SS:same side, HD:host-device, WS:wrong side, --:never.
▲ Show 20 Lines • Show All 488 Lines • ▼ Show 20 Lines	void Sema::maybeAddCUDAHostDeviceAttrs(FunctionDecl *NewD,
}		}

NewD->addAttr(CUDAHostAttr::CreateImplicit(Context));		NewD->addAttr(CUDAHostAttr::CreateImplicit(Context));
NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context));		NewD->addAttr(CUDADeviceAttr::CreateImplicit(Context));
}		}

void Sema::MaybeAddCUDAConstantAttr(VarDecl *VD) {		void Sema::MaybeAddCUDAConstantAttr(VarDecl *VD) {
if (getLangOpts().CUDAIsDevice && VD->isConstexpr() &&		if (getLangOpts().CUDAIsDevice && VD->isConstexpr() &&
(VD->isFileVarDecl() \|\| VD->isStaticDataMember())) {		(VD->isFileVarDecl() \|\| VD->isStaticDataMember()) &&
		!VD->hasAttr<CUDAConstantAttr>()) {
VD->addAttr(CUDAConstantAttr::CreateImplicit(getASTContext()));		VD->addAttr(CUDAConstantAttr::CreateImplicit(getASTContext()));
}		}
}		}

Sema::SemaDiagnosticBuilder Sema::CUDADiagIfDeviceCode(SourceLocation Loc,		Sema::SemaDiagnosticBuilder Sema::CUDADiagIfDeviceCode(SourceLocation Loc,
unsigned DiagID) {		unsigned DiagID) {
assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");		assert(getLangOpts().CUDA && "Should only be called during CUDA compilation");
SemaDiagnosticBuilder::Kind DiagKind = [&] {		SemaDiagnosticBuilder::Kind DiagKind = [&] {
▲ Show 20 Lines • Show All 225 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,413 Lines • ▼ Show 20 Lines
	}			}

	static void handleConstantAttr(Sema &S, Decl *D, const ParsedAttr &AL) {			static void handleConstantAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
	const auto *VD = cast<VarDecl>(D);			const auto *VD = cast<VarDecl>(D);
	if (VD->hasLocalStorage()) {			if (VD->hasLocalStorage()) {
	S.Diag(AL.getLoc(), diag::err_cuda_nonstatic_constdev);			S.Diag(AL.getLoc(), diag::err_cuda_nonstatic_constdev);
	return;			return;
	}			}
				// constexpr variable may already get an implicit constant attr, which should
				// be replaced by the explicit constant attr.
				if (auto *A = D->getAttr<CUDAConstantAttr>()) {
				if (!A->isImplicit())
				return;
				D->dropAttr<CUDAConstantAttr>();
				}
	D->addAttr(::new (S.Context) CUDAConstantAttr(S.Context, AL));			D->addAttr(::new (S.Context) CUDAConstantAttr(S.Context, AL));
	}			}

	static void handleSharedAttr(Sema &S, Decl *D, const ParsedAttr &AL) {			static void handleSharedAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
	const auto *VD = cast<VarDecl>(D);			const auto *VD = cast<VarDecl>(D);
	// extern __shared__ is only allowed on arrays with no length (e.g.			// extern __shared__ is only allowed on arrays with no length (e.g.
	// "int x[]").			// "int x[]").
	if (!S.getLangOpts().GPURelocatableDeviceCode && VD->hasExternalStorage() &&			if (!S.getLangOpts().GPURelocatableDeviceCode && VD->hasExternalStorage() &&
	▲ Show 20 Lines • Show All 4,259 Lines • Show Last 20 Lines

clang/lib/Sema/SemaExpr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,140 Lines • ▼ Show 20 Lines	MarkVarDeclODRUsed(VarDecl *Var, SourceLocation Loc, Sema &SemaRef,
SemaRef.tryCaptureVariable(Var, Loc, Sema::TryCapture_Implicit,		SemaRef.tryCaptureVariable(Var, Loc, Sema::TryCapture_Implicit,
/EllipsisLoc/ SourceLocation(),		/EllipsisLoc/ SourceLocation(),
/BuildAndDiagnose/ true,		/BuildAndDiagnose/ true,
CaptureType, DeclRefType,		CaptureType, DeclRefType,
FunctionScopeIndexToStopAt);		FunctionScopeIndexToStopAt);

if (SemaRef.LangOpts.CUDA && Var && Var->hasGlobalStorage()) {		if (SemaRef.LangOpts.CUDA && Var && Var->hasGlobalStorage()) {
auto *FD = dyn_cast_or_null<FunctionDecl>(SemaRef.CurContext);		auto *FD = dyn_cast_or_null<FunctionDecl>(SemaRef.CurContext);
auto Target = SemaRef.IdentifyCUDATarget(FD);		auto VarTarget = SemaRef.IdentifyCUDATarget(Var);
auto IsEmittedOnDeviceSide = [](VarDecl *Var) {		auto UserTarget = SemaRef.IdentifyCUDATarget(FD);
if (Var->hasAttr<CUDADeviceAttr>() \|\| Var->hasAttr<CUDAConstantAttr>() \|\|		if (VarTarget == Sema::CVT_Host &&
Var->hasAttr<CUDASharedAttr>() \|\|		(UserTarget == Sema::CFT_Device \|\| UserTarget == Sema::CFT_HostDevice \|\|
Var->getType()->isCUDADeviceBuiltinSurfaceType() \|\|		UserTarget == Sema::CFT_Global)) {
Var->getType()->isCUDADeviceBuiltinTextureType())
return true;
// Function-scope static variable in device functions or kernels are
// emitted on device side.
if (auto *FD = dyn_cast<FunctionDecl>(Var->getDeclContext())) {
return FD->hasAttr<CUDADeviceAttr>() \|\| FD->hasAttr<CUDAGlobalAttr>();
}
return false;
};
if (!IsEmittedOnDeviceSide(Var)) {
// Diagnose ODR-use of host global variables in device functions.		// Diagnose ODR-use of host global variables in device functions.
// Reference of device global variables in host functions is allowed		// Reference of device global variables in host functions is allowed
// through shadow variables therefore it is not diagnosed.		// through shadow variables therefore it is not diagnosed.
if (SemaRef.LangOpts.CUDAIsDevice)		if (SemaRef.LangOpts.CUDAIsDevice)
SemaRef.targetDiag(Loc, diag::err_ref_bad_target)		SemaRef.targetDiag(Loc, diag::err_ref_bad_target)
<< /host/ 2 << /variable/ 1 << Var << Target;		<< /host/ 2 << /variable/ 1 << Var << UserTarget;
} else if ((Target == Sema::CFT_Host \|\| Target == Sema::CFT_HostDevice) &&		} else if (VarTarget == Sema::CVT_Device &&
		(UserTarget == Sema::CFT_Host \|\|
		UserTarget == Sema::CFT_HostDevice) &&
!Var->hasExternalStorage()) {		!Var->hasExternalStorage()) {
// Record a CUDA/HIP device side variable if it is ODR-used		// Record a CUDA/HIP device side variable if it is ODR-used
// by host code. This is done conservatively, when the variable is		// by host code. This is done conservatively, when the variable is
// referenced in any of the following contexts:		// referenced in any of the following contexts:
// - a non-function context		// - a non-function context
// - a host function		// - a host function
// - a host device function		// - a host device function
// This makes the ODR-use of the device side variable by host code to		// This makes the ODR-use of the device side variable by host code to
▲ Show 20 Lines • Show All 2,507 Lines • Show Last 20 Lines

clang/test/AST/ast-dump-constant-var.cu

This file was added.

				// RUN: %clang_cc1 -std=c++14 -ast-dump -x hip %s \| FileCheck -check-prefixes=CHECK,HOST %s
				// RUN: %clang_cc1 -std=c++14 -ast-dump -fcuda-is-device -x hip %s \| FileCheck -check-prefixes=CHECK,DEV %s

				#include "Inputs/cuda.h"

				// CHECK-LABEL: VarDecl {{.*}} m1 'int'
				// CHECK-NEXT: CUDAConstantAttr {{.*}}cuda.h
				__constant__ int m1;

				// CHECK-LABEL: VarDecl {{.*}} m2 'int'
				// CHECK-NEXT: CUDAConstantAttr {{.*}}cuda.h
				// CHECK-NOT: CUDAConstantAttr
				__constant__ __constant__ int m2;

				// CHECK-LABEL: VarDecl {{.*}} m3 'const int'
				// HOST-NOT: CUDAConstantAttr
				// DEV-NOT: CUDAConstantAttr {{.*}}cuda.h
				// DEV: CUDAConstantAttr {{.*}}Implicit
				// DEV-NOT: CUDAConstantAttr {{.*}}cuda.h
				constexpr int m3 = 1;

				// CHECK-LABEL: VarDecl {{.*}} m3a 'const int'
				// CHECK-NOT: CUDAConstantAttr {{.*}}Implicit
				// CHECK: CUDAConstantAttr {{.*}}cuda.h
				// CHECK-NOT: CUDAConstantAttr {{.*}}Implicit
				constexpr __constant__ int m3a = 2;

				// CHECK-LABEL: VarDecl {{.*}} m3b 'const int'
				// CHECK-NOT: CUDAConstantAttr {{.*}}Implicit
				// CHECK: CUDAConstantAttr {{.*}}cuda.h
				// CHECK-NOT: CUDAConstantAttr {{.*}}Implicit
				__constant__ constexpr int m3b = 3;

clang/test/CodeGenCUDA/host-used-device-var.cu

	Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	{			{
	return x + y;			return x + y;
	}			}

	// DEV-DAG: @_Z10p_add_funcIiE = linkonce_odr addrspace(1) externally_initialized global i32 (i32, i32)* @_Z8add_funcIiET_S0_S0_			// DEV-DAG: @_Z10p_add_funcIiE = linkonce_odr addrspace(1) externally_initialized global i32 (i32, i32)* @_Z8add_funcIiET_S0_S0_
	template <typename T>			template <typename T>
	__device__ func_t<T> p_add_func = add_func<T>;			__device__ func_t<T> p_add_func = add_func<T>;

				// Check non-constant constexpr variables ODR-used by host code only is not emitted.
				// DEV-NEG-NOT: constexpr_var1a
				// DEV-NEG-NOT: constexpr_var1b
				constexpr int constexpr_var1a = 1;
				inline constexpr int constexpr_var1b = 1;

				// Check constant constexpr variables ODR-used by host code only.
				// Non-inline constexpr variable has internal linkage, therefore it is not accessible by host and not kept.
				// Inline constexpr variable has linkonce_ord linkage, therefore it can be accessed by host and kept.
				// DEV-NEG-NOT: constexpr_var2a
				// DEV-DAG: @constexpr_var2b = linkonce_odr addrspace(4) externally_initialized constant i32 2
				__constant__ constexpr int constexpr_var2a = 2;
				inline __constant__ constexpr int constexpr_var2b = 2;

	void use(func_t<int> p);			void use(func_t<int> p);
	void use(int *p);			__host__ __device__ void use(const int *p);

				// Check static device variable in host function.
				// DEV-DAG: @_ZZ4fun1vE11static_var1 = addrspace(1) externally_initialized global i32 3
	void fun1() {			void fun1() {
				static __device__ int static_var1 = 3;
	use(&u1);			use(&u1);
	use(&u2);			use(&u2);
	use(&u3);			use(&u3);
	use(&ext_var);			use(&ext_var);
	use(&inline_var);			use(&inline_var);
	use(p_add_func<int>);			use(p_add_func<int>);
				use(&constexpr_var1a);
				use(&constexpr_var1b);
				use(&constexpr_var2a);
				use(&constexpr_var2b);
				use(&static_var1);
				}

				// Check static variable in host device function.
				// DEV-DAG: @_ZZ4fun2vE11static_var2 = internal addrspace(1) global i32 4
				// DEV-DAG: @_ZZ4fun2vE11static_var3 = addrspace(1) global i32 4
				__host__ __device__ void fun2() {
				static int static_var2 = 4;
				static __device__ int static_var3 = 4;
				use(&static_var2);
				use(&static_var3);
	}			}

	__global__ void kern1(int **x) {			__global__ void kern1(int **x) {
	*x = &u4;			*x = &u4;
				fun2();
				}

				// Check static variables of lambda functions.

				// Lambda functions are implicit host device functions.
				// Default static variables in lambda functions should be treated
				// as host variables on host side, therefore should not be forced
				// to be emitted on device.

				// DEV-DAG: @_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var2 = addrspace(1) externally_initialized global i32 5
				// DEV-NEG-NOT: @_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var1
				namespace TestStaticVarInLambda {
				class A {
				public:
				A(char *);
				};
				void fun() {
				(void) [](char *c) {
				static A var1(c);
				static __device__ int var2 = 5;
				(void) var1;
				(void) var2;
				};
				}
				}

				// Check implicit constant variable ODR-used by host code is not emitted.

				// AST contains instantiation of al<ar>, which triggers AST instantiation
				// of x::al<ar>::am, which triggers AST instatiation of x::ap<ar>,
				// which triggers AST instantiation of aw<ar>::c, which has type
				// ar. ar has base class x which has member ah. x::ah is initialized
				// with function pointer pointing to ar:as, which returns an object
				// of type ou. The constexpr aw<ar>::c is an implicit constant variable
				// which is ODR-used by host function x::ap<ar>. An incorrect implementation
				// will force aw<ar>::c to be emitted on device side, which will trigger
				// emit of x::as and further more ctor of ou and variable o.
				// The ODR-use of aw<ar>::c in x::ap<ar> should be treated as a host variable
				// instead of device variable.

				// DEV-NEG-NOT: _ZN16TestConstexprVar1oE
				namespace TestConstexprVar {
				char o;
				class ou {
				public:
				ou(char) { __builtin_strlen(&o); }
				};
				template < typename ao > struct aw { static constexpr ao c; };
				class x {
				protected:
				typedef ou (y)(const x );
				constexpr x(y ag) : ah(ag) {}
				template < bool * > struct ak;
				template < typename > struct al {
				static bool am;
				static ak< &am > an;
				};
				template < typename ao > static x ap() { (void)aw< ao >::c; return x(nullptr); }
				y ah;
				};
				template < typename ao > bool x::al< ao >::am(&ap< ao >);
				class ar : x {
				public:
				constexpr ar() : x(as) {}
				static ou as(const x *) { return 0; }
				al< ar > av;
				};
	}			}

	// Check the exact list of variables to ensure @_ZL2u4 is not among them.			// Check the exact list of variables to ensure @_ZL2u4 is not among them.
	// DEV: @llvm.compiler.used = {{[^@]}} @_Z10p_add_funcIiE {{[^@]}} @_ZL2u3 {{[^@]}} @inline_var {{[^@]}} @u1 {{[^@]}} @u2 {{[^@]}} @u5			// DEV: @llvm.compiler.used = {{[^@]*}} @_Z10p_add_funcIiE
				// DEV-SAME: {{^[^@]*}} @_ZL2u3
				// DEV-SAME: {{^[^@]*}} @_ZZ4fun1vE11static_var1
				// DEV-SAME: {{^[^@]*}} @_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var2
				// DEV-SAME: {{^[^@]*}} @constexpr_var2b
				// DEV-SAME: {{^[^@]*}} @inline_var
				// DEV-SAME: {{^[^@]*}} @u1
				// DEV-SAME: {{^[^@]*}} @u2
				// DEV-SAME: {{^[^@]*}} @u5
				// DEV-SAME: {{^[^@]*$}}

	// HOST-DAG: hipRegisterVar{{.*}}@u1			// HOST-DAG: hipRegisterVar{{.*}}@u1
	// HOST-DAG: hipRegisterVar{{.*}}@u2			// HOST-DAG: hipRegisterVar{{.*}}@u2
	// HOST-DAG: hipRegisterVar{{.*}}@_ZL2u3			// HOST-DAG: hipRegisterVar{{.*}}@_ZL2u3
				// HOST-DAG: hipRegisterVar{{.*}}@constexpr_var2b
	// HOST-DAG: hipRegisterVar{{.*}}@u5			// HOST-DAG: hipRegisterVar{{.*}}@u5
	// HOST-DAG: hipRegisterVar{{.*}}@inline_var			// HOST-DAG: hipRegisterVar{{.*}}@inline_var
	// HOST-DAG: hipRegisterVar{{.*}}@_Z10p_add_funcIiE			// HOST-DAG: hipRegisterVar{{.*}}@_Z10p_add_funcIiE
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZZ4fun1vE11static_var1
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZZ4fun2vE11static_var2
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZZ4fun2vE11static_var3
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var2
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZZZN21TestStaticVarInLambda3funEvENKUlPcE_clES0_E4var1
	// HOST-NEG-NOT: hipRegisterVar{{.*}}@ext_var			// HOST-NEG-NOT: hipRegisterVar{{.*}}@ext_var
	// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZL2u4			// HOST-NEG-NOT: hipRegisterVar{{.*}}@_ZL2u4
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@constexpr_var1a
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@constexpr_var1b
				// HOST-NEG-NOT: hipRegisterVar{{.*}}@constexpr_var2a
				traUnsubmitted Done Reply Inline Actions This definitely needs some comments. Otherwise this is nearly incomprehensible and it's impossible to tell what's going on. tra: This definitely needs some comments. Otherwise this is nearly incomprehensible and it's…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done

clang/test/SemaCUDA/static-device-var.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \			// RUN: %clang_cc1 -triple nvptx -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -o - %s -fsyntax-only -verify=dev			// RUN: -emit-llvm -o - %s -fsyntax-only -verify=dev,com

	// RUN: %clang_cc1 -triple x86_64-gnu-linux \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -o - %s -fsyntax-only -verify=host			// RUN: -emit-llvm -o - %s -fsyntax-only -verify=host,com

	// Checks allowed usage of file-scope and function-scope static variables.			// Checks allowed usage of file-scope and function-scope static variables.

	// host-no-diagnostics

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Checks static variables are allowed in device functions.			// Checks static variables are allowed in device functions.

	__device__ void f1() {			__device__ void f1() {
	const static int b = 123;			const static int b = 123;
	static int a;			static int a;
	}			}
	Show All 15 Lines

	__global__ void kernel(int *a) {			__global__ void kernel(int *a) {
	a[0] = x;			a[0] = x;
	a[1] = y;			a[1] = y;
	a[2] = z;			a[2] = z;
	// dev-error@-1 {{reference to __host__ variable 'z' in __global__ function}}			// dev-error@-1 {{reference to __host__ variable 'z' in __global__ function}}
	}			}

				// Check dynamic initialization of static device variable is not allowed.

				namespace TestStaticVarInLambda {
				class A {
				public:
				A(char *);
				};
				class B {
				public:
				__device__ B(char *);
				};
				void fun() {
				(void) [](char *c) {
				static A var1(c);
				static __device__ B var2(c);
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				(void) var1;
				(void) var2;
				};
				}
				}

	int* getDeviceSymbol(int *x);			int* getDeviceSymbol(int *x);

	void foo() {			void foo() {
	getDeviceSymbol(&x);			getDeviceSymbol(&x);
	getDeviceSymbol(&y);			getDeviceSymbol(&y);
	}			}