This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/AST/
-
clang/
-
AST/
-
ASTContext.h
-
lib/
-
AST/
4/4
ASTContext.cpp
-
CodeGen/
-
CGCUDANV.cpp
-
CodeGenModule.h
3/4
CodeGenModule.cpp
-
test/
-
CodeGenCUDA/
2/2
device-var-linkage.cu
2/2
managed-var.cu
2/2
static-device-var-rdc.cu
-
SemaCUDA/
6/6
static-device-var.cu

Differential D85223

[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc
ClosedPublic

Authored by yaxunl on Aug 4 2020, 9:56 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
JonChesterfield
hliao
jdoerfert
hfinkel
ronlieb

Commits

rG47acdec1dd5d: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc

Summary

This is separated from https://reviews.llvm.org/D80858

For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.

Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.Aug 4 2020, 9:56 AM

Herald added a subscriber: dang. · View Herald TranscriptAug 4 2020, 9:56 AM

yaxunl requested review of this revision.Aug 4 2020, 9:56 AM

yaxunl added a parent revision: D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

I concede that making the variables external, and trying to give them unique names, does work around static variables not working. I believe static variables are subjected to more aggressive optimisation than external ones but the effect might not be significant.

This "works" in cuda today because the loader ignores the local annotation when accessing the variable. There is some probably unintended behaviour when multiple static variables have the same name in that the first one wins.

The corresponding change to the hsa loader is trivial. Why is making the symbols external, with the associated complexity in picking non-conflicting names, considered better than changing the loader?

Herald added a subscriber: dexonsmith. · View Herald TranscriptDec 14 2020, 8:55 AM

In D85223#2452363, @JonChesterfield wrote:

I concede that making the variables external, and trying to give them unique names, does work around static variables not working. I believe static variables are subjected to more aggressive optimisation than external ones but the effect might not be significant.

This "works" in cuda today because the loader ignores the local annotation when accessing the variable. There is some probably unintended behaviour when multiple static variables have the same name in that the first one wins.

The corresponding change to the hsa loader is trivial. Why is making the symbols external, with the associated complexity in picking non-conflicting names, considered better than changing the loader?

Three reasons:

The loader would like to look up dynsym only, which conforms better to the standard dynamic linker behavior and is more efficient than looking up all symbols.

There could be symbols with the same name from different compilation units and they end up as local symbols with the same name in the binary. How does the loader know which is which.

If a device symbol is static but actually accessed by the host code in the same compilation unit, the device symbol has de facto external linkage since it is truly accessed by some one out side of the device object (this is due to the unfortunate fact that a single source file ends up with a host object and a device object even though they are supposed to be the same compilation unit). Keeping the device symbol with internal linkage will cause the compiler over optimize the device code.

yaxunl added a reviewer: ronlieb.Jan 18 2021, 8:05 AM

I'd propose splitting the patch into two. One with the addition of CUID and the other that changes the way we havdle static vars.
CUID is useful on its own and is relatively uncontroversial.

Externalizing static vars is a more interesting issue and I'm not sure what's the best way to handle it yet. On one hand it is necessary for visibility across host/device, on the other, externalizing all static vars will almost always have negative effect as very few of the static vars actually need this. As already pointed out in the #if 0 section of the patch, ideally we should externalize only the vars that need it. Generally speaking, I do not think we will be able to do that, because with -fgpu-rdc it may be used from the host code in some other TU.

We may need to explicitly annotate such the static variables that need to be visible on both sides and only apply externalization to the variables annotated this way. E.g. require them to be __host__ __device__.

WDYT?

In D85223#2507518, @tra wrote:

I'd propose splitting the patch into two. One with the addition of CUID and the other that changes the way we havdle static vars.
CUID is useful on its own and is relatively uncontroversial.

Externalizing static vars is a more interesting issue and I'm not sure what's the best way to handle it yet. On one hand it is necessary for visibility across host/device, on the other, externalizing all static vars will almost always have negative effect as very few of the static vars actually need this. As already pointed out in the #if 0 section of the patch, ideally we should externalize only the vars that need it. Generally speaking, I do not think we will be able to do that, because with -fgpu-rdc it may be used from the host code in some other TU.

We may need to explicitly annotate such the static variables that need to be visible on both sides and only apply externalization to the variables annotated this way. E.g. require them to be __host__ __device__.

WDYT?

Agree that CUID may be useful for other situations. Will separate it to another review.

yaxunl mentioned this in D95007: [CUDA][HIP] Add -fuse-cuid.Jan 19 2021, 3:20 PM

yaxunl added a parent revision: D95007: [CUDA][HIP] Add -fuse-cuid.

separate CUID patch.

tra added inline comments.Jan 20 2021, 11:42 AM

clang/lib/AST/ASTContext.cpp
11446–11447	`!(getLangOpts().GPURelocatableDeviceCode && getLangOpts().CUID.empty())`. Maybe this should be broken down into something easier to read. // Applies only to -fgpu-rdc or when we were given a CUID if (!getLangOpts().GPURelocatableDeviceCode \|\| !getLangOpts().CUID.empty())) return false; // .. only file-scope static vars... auto *VD = dyn_cast<VarDecl>(D); if (!(VD && VD->isFileVarDecl() && VD->getStorageClass() == SC_Static)) return false; // .. with explicit __device__ or __constant__ attributes. return ((D->hasAttr<CUDADeviceAttr>() && !D->getAttr<CUDADeviceAttr>()->isImplicit()) \|\| (D->hasAttr<CUDAConstantAttr>() &&!D->getAttr<CUDAConstantAttr>()->isImplicit()));
11446–11447	BTW, does this mean that we'll externalize & uniquify the vars even w/o `-fgpu-rdc` if CUID is given? IMO `-fgpu-rdc` should remain the flag to control whether externalization is needed. CUID controls the value of a unique suffix, if we need it, but should not automatically enable externalization.
clang/lib/CodeGen/CodeGenModule.cpp
2865–2866	Is this code needed?

yaxunl marked 3 inline comments as done.Feb 7 2021, 8:57 PM

yaxunl added inline comments.

clang/lib/AST/ASTContext.cpp
11446–11447	done
11446–11447	mayExternalizeStaticVar returns true does not mean the static var must be externalized. mayExternalizeStaticVar only indicates the static var may be externalized. It is used to enable checking whether this var is used by host code. For -fno-gpu-rdc, we only externalize a static variable if it is referenced by host code. If a static var is referenced by host code, -fno-gpu-rdc will change its linkage to external, but does not need to make the symbol unique because each TU ends up as a different device binary.
clang/lib/CodeGen/CodeGenModule.cpp
2865–2866	this code is not needed. removed.

Revised by Artem's comments. Use CUID hash as postfix for static variable name.

yaxunl added a parent revision: D96195: [HIP] Fix managed variable linkage.Feb 7 2021, 9:00 PM

LGTM with new test nits.

@JonChesterfield -- are you OK with the patch?

clang/test/CodeGenCUDA/device-var-linkage.cu
40	It should probably be a regex after `HASH:`, not the hash value itself.
clang/test/CodeGenCUDA/managed-var.cu
54	Same here.
clang/test/CodeGenCUDA/static-device-var-rdc.cu
35	ditto.
clang/test/SemaCUDA/static-device-var.cu
11	A comment explaining what we're testing would be helpful. `no-diagnostics` gives no clues about what is it we're looking for here.
15–23	So, this verifies that we're allowed to use static local vars in device code. A comment would be useful.
24–38	And this verifies that global static vars can be referenced from both host and device. I'd also add a negative test with `static int host_only;` and would verify that we still don't allow accessing it from the device.

This revision is now accepted and ready to land.Feb 9 2021, 10:31 AM

This works around the limitations of the binary format nvptx and amdgpu are using in the compiler. It's the wrong place in the stack to fix it - we could introduce another symbol table in the binary to capture the per-tu-between-arch scoping.

However, if we later reach consensus on what to do in the elf instead, we can still do that. In particular, embedding an elf for one arch in a named section of an elf for a host arch is crude. This workaround seems acceptable in the meantime.

What breaks existing abstractions is that we produce N ELF objects from a single TU and the meaning of static becomes fuzzy. On one hand, we don't want that static symbol to be visible across objects on the same target, at the same time we do want it to be visible across host/device objects compiled from the same TU. ELF does not have a way to express it. Making such symbols visible with an unique suffix seems to be a reasonable tradeoff. We probably have more options available for AMDGPU. E.g. as you've suggested, give runtime extra clues about referencing these variables across host/device boundary without resorting to making them externally visible. However, we don't have that flexibility for NVPTX.

In D85223#2551894, @JonChesterfield wrote:

This works around the limitations of the binary format nvptx and amdgpu are using in the compiler. It's the wrong place in the stack to fix it - we could introduce another symbol table in the binary to capture the per-tu-between-arch scoping.

However, if we later reach consensus on what to do in the elf instead, we can still do that. In particular, embedding an elf for one arch in a named section of an elf for a host arch is crude. This workaround seems acceptable in the meantime.

Yes we should revisit this if there is a better solution.

clang/test/CodeGenCUDA/device-var-linkage.cu
40	will do
clang/test/CodeGenCUDA/managed-var.cu
54	will do
clang/test/CodeGenCUDA/static-device-var-rdc.cu
35	will do
clang/test/SemaCUDA/static-device-var.cu
11	will do
15–23	will do
24–38	will do

Closed by commit rG47acdec1dd5d: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc (authored by yaxunl). · Explain WhyFeb 24 2021, 3:41 PM

This revision was automatically updated to reflect the committed changes.

yaxunl marked 6 inline comments as done.

yaxunl added a commit: rG47acdec1dd5d: [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2021, 3:41 PM

Hahnfeld added a subscriber: Hahnfeld.Aug 20 2021, 4:33 AM

Hahnfeld added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
6265–6268	I've tried to use this with CUDA, but it errors out because `.` is not allowed in identifiers. Could you check if https://reviews.llvm.org/D108456 also works for HIP?

yaxunl added inline comments.Aug 20 2021, 11:06 AM

clang/lib/CodeGen/CodeGenModule.cpp
6265–6268	I will try it with our CI and get back to you.

Hahnfeld mentioned this in D108456: [CUDA] Fix static device variables with -fgpu-rdc.Aug 21 2021, 8:26 AM

Revision Contents

Path

Size

clang/

include/

clang/

AST/

ASTContext.h

6 lines

lib/

AST/

ASTContext.cpp

19 lines

CodeGen/

CGCUDANV.cpp

11 lines

CodeGenModule.h

4 lines

CodeGenModule.cpp

23 lines

test/

CodeGenCUDA/

device-var-linkage.cu

11 lines

managed-var.cu

28 lines

static-device-var-rdc.cu

97 lines

SemaCUDA/

static-device-var.cu

50 lines

Diff 326223

clang/include/clang/AST/ASTContext.h

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	class ASTContext : public RefCountedBase<ASTContext> {
mutable llvm::FoldingSet<TemplateParamObjectDecl> TemplateParamObjectDecls;		mutable llvm::FoldingSet<TemplateParamObjectDecl> TemplateParamObjectDecls;

/// A cache mapping a string value to a StringLiteral object with the same		/// A cache mapping a string value to a StringLiteral object with the same
/// value.		/// value.
///		///
/// This is lazily created. This is intentionally not serialized.		/// This is lazily created. This is intentionally not serialized.
mutable llvm::StringMap<StringLiteral *> StringLiteralCache;		mutable llvm::StringMap<StringLiteral *> StringLiteralCache;

		/// MD5 hash of CUID. It is calculated when first used and cached by this
		/// data member.
		mutable std::string CUIDHash;

/// Representation of a "canonical" template template parameter that		/// Representation of a "canonical" template template parameter that
/// is used in canonical template names.		/// is used in canonical template names.
class CanonicalTemplateTemplateParm : public llvm::FoldingSetNode {		class CanonicalTemplateTemplateParm : public llvm::FoldingSetNode {
TemplateTemplateParmDecl *Parm;		TemplateTemplateParmDecl *Parm;

public:		public:
CanonicalTemplateTemplateParm(TemplateTemplateParmDecl *Parm)		CanonicalTemplateTemplateParm(TemplateTemplateParmDecl *Parm)
: Parm(Parm) {}		: Parm(Parm) {}
▲ Show 20 Lines • Show All 2,802 Lines • ▼ Show 20 Lines	public:
OMPTraitInfo &getNewOMPTraitInfo();		OMPTraitInfo &getNewOMPTraitInfo();

/// Whether a C++ static variable may be externalized.		/// Whether a C++ static variable may be externalized.
bool mayExternalizeStaticVar(const Decl *D) const;		bool mayExternalizeStaticVar(const Decl *D) const;

/// Whether a C++ static variable should be externalized.		/// Whether a C++ static variable should be externalized.
bool shouldExternalizeStaticVar(const Decl *D) const;		bool shouldExternalizeStaticVar(const Decl *D) const;

		StringRef getCUIDHash() const;

private:		private:
/// All OMPTraitInfo objects live in this collection, one per		/// All OMPTraitInfo objects live in this collection, one per
/// `pragma omp [begin] declare variant` directive.		/// `pragma omp [begin] declare variant` directive.
SmallVector<std::unique_ptr<OMPTraitInfo>, 4> OMPTraitInfoVector;		SmallVector<std::unique_ptr<OMPTraitInfo>, 4> OMPTraitInfoVector;
};		};

/// Insertion operator for diagnostics.		/// Insertion operator for diagnostics.
const StreamingDiagnostic &operator<<(const StreamingDiagnostic &DB,		const StreamingDiagnostic &operator<<(const StreamingDiagnostic &DB,
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

clang/lib/AST/ASTContext.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Support/Capacity.h"		#include "llvm/Support/Capacity.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
		#include "llvm/Support/MD5.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstdint>		#include <cstdint>
#include <cstdlib>		#include <cstdlib>
#include <map>		#include <map>
▲ Show 20 Lines • Show All 10,545 Lines • ▼ Show 20 Lines	static GVALinkage adjustGVALinkageForAttributes(const ASTContext &Context,
} else if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice) {		} else if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice) {
// Device-side functions with __global__ attribute must always be		// Device-side functions with __global__ attribute must always be
// visible externally so they can be launched from host.		// visible externally so they can be launched from host.
if (D->hasAttr<CUDAGlobalAttr>() &&		if (D->hasAttr<CUDAGlobalAttr>() &&
(L == GVA_DiscardableODR \|\| L == GVA_Internal))		(L == GVA_DiscardableODR \|\| L == GVA_Internal))
return GVA_StrongODR;		return GVA_StrongODR;
// Single source offloading languages like CUDA/HIP need to be able to		// Single source offloading languages like CUDA/HIP need to be able to
// access static device variables from host code of the same compilation		// access static device variables from host code of the same compilation
// unit. This is done by externalizing the static variable.		// unit. This is done by externalizing the static variable with a shared
		// name between the host and device compilation which is the same for the
		// same compilation unit whereas different among different compilation
		// units.
if (Context.shouldExternalizeStaticVar(D))		if (Context.shouldExternalizeStaticVar(D))
return GVA_StrongExternal;		return GVA_StrongExternal;
}		}
return L;		return L;
}		}

/// Adjust the GVALinkage for a declaration based on what an external AST source		/// Adjust the GVALinkage for a declaration based on what an external AST source
/// knows about whether there can be other definitions of this declaration.		/// knows about whether there can be other definitions of this declaration.
▲ Show 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	case BuiltinType::SatULongAccum:
return SatLongAccumTy;		return SatLongAccumTy;
case BuiltinType::UShortFract:		case BuiltinType::UShortFract:
return ShortFractTy;		return ShortFractTy;
case BuiltinType::UFract:		case BuiltinType::UFract:
return FractTy;		return FractTy;
case BuiltinType::ULongFract:		case BuiltinType::ULongFract:
return LongFractTy;		return LongFractTy;
case BuiltinType::SatUShortFract:		case BuiltinType::SatUShortFract:
return SatShortFractTy;		return SatShortFractTy;
case BuiltinType::SatUFract:		case BuiltinType::SatUFract:
		traUnsubmitted Done Reply Inline Actions `!(getLangOpts().GPURelocatableDeviceCode && getLangOpts().CUID.empty())`. Maybe this should be broken down into something easier to read. // Applies only to -fgpu-rdc or when we were given a CUID if (!getLangOpts().GPURelocatableDeviceCode \|\| !getLangOpts().CUID.empty())) return false; // .. only file-scope static vars... auto VD = dyn_cast<VarDecl>(D); if (!(VD && VD->isFileVarDecl() && VD->getStorageClass() == SC_Static)) return false; // .. with explicit __device__ or __constant__ attributes. return ((D->hasAttr<CUDADeviceAttr>() && !D->getAttr<CUDADeviceAttr>()->isImplicit()) \|\| (D->hasAttr<CUDAConstantAttr>() &&!D->getAttr<CUDAConstantAttr>()->isImplicit())); tra:* `!(getLangOpts().GPURelocatableDeviceCode && getLangOpts().CUID.empty())`. Maybe this should…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
		traUnsubmitted Done Reply Inline Actions BTW, does this mean that we'll externalize & uniquify the vars even w/o `-fgpu-rdc` if CUID is given? IMO `-fgpu-rdc` should remain the flag to control whether externalization is needed. CUID controls the value of a unique suffix, if we need it, but should not automatically enable externalization. tra: BTW, does this mean that we'll externalize & uniquify the vars even w/o `-fgpu-rdc` if CUID is…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions mayExternalizeStaticVar returns true does not mean the static var must be externalized. mayExternalizeStaticVar only indicates the static var may be externalized. It is used to enable checking whether this var is used by host code. For -fno-gpu-rdc, we only externalize a static variable if it is referenced by host code. If a static var is referenced by host code, -fno-gpu-rdc will change its linkage to external, but does not need to make the symbol unique because each TU ends up as a different device binary. yaxunl: mayExternalizeStaticVar returns true does not mean the static var must be externalized.
return SatFractTy;		return SatFractTy;
case BuiltinType::SatULongFract:		case BuiltinType::SatULongFract:
return SatLongFractTy;		return SatLongFractTy;
default:		default:
llvm_unreachable("Unexpected unsigned fixed point type");		llvm_unreachable("Unexpected unsigned fixed point type");
}		}
}		}

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	bool ASTContext::mayExternalizeStaticVar(const Decl *D) const {
bool IsStaticVar =		bool IsStaticVar =
isa<VarDecl>(D) && cast<VarDecl>(D)->getStorageClass() == SC_Static;		isa<VarDecl>(D) && cast<VarDecl>(D)->getStorageClass() == SC_Static;
bool IsExplicitDeviceVar = (D->hasAttr<CUDADeviceAttr>() &&		bool IsExplicitDeviceVar = (D->hasAttr<CUDADeviceAttr>() &&
!D->getAttr<CUDADeviceAttr>()->isImplicit()) \|\|		!D->getAttr<CUDADeviceAttr>()->isImplicit()) \|\|
(D->hasAttr<CUDAConstantAttr>() &&		(D->hasAttr<CUDAConstantAttr>() &&
!D->getAttr<CUDAConstantAttr>()->isImplicit());		!D->getAttr<CUDAConstantAttr>()->isImplicit());
// CUDA/HIP: static managed variables need to be externalized since it is		// CUDA/HIP: static managed variables need to be externalized since it is
// a declaration in IR, therefore cannot have internal linkage.		// a declaration in IR, therefore cannot have internal linkage.
// ToDo: externalize static variables for -fgpu-rdc.
return IsStaticVar &&		return IsStaticVar &&
(D->hasAttr<HIPManagedAttr>() \|\|		(D->hasAttr<HIPManagedAttr>() \|\| IsExplicitDeviceVar);
(!getLangOpts().GPURelocatableDeviceCode && IsExplicitDeviceVar));
}		}

bool ASTContext::shouldExternalizeStaticVar(const Decl *D) const {		bool ASTContext::shouldExternalizeStaticVar(const Decl *D) const {
return mayExternalizeStaticVar(D) &&		return mayExternalizeStaticVar(D) &&
(D->hasAttr<HIPManagedAttr>() \|\|		(D->hasAttr<HIPManagedAttr>() \|\|
CUDAStaticDeviceVarReferencedByHost.count(cast<VarDecl>(D)));		CUDAStaticDeviceVarReferencedByHost.count(cast<VarDecl>(D)));
}		}

		StringRef ASTContext::getCUIDHash() const {
		if (!CUIDHash.empty())
		return CUIDHash;
		if (LangOpts.CUID.empty())
		return StringRef();
		CUIDHash = llvm::utohexstr(llvm::MD5Hash(LangOpts.CUID), /LowerCase=/true);
		return CUIDHash;
		}

clang/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	std::string CGNVCUDARuntime::getDeviceSideName(const NamedDecl *ND) {
std::string DeviceSideName;		std::string DeviceSideName;
if (DeviceMC->shouldMangleDeclName(ND)) {		if (DeviceMC->shouldMangleDeclName(ND)) {
SmallString<256> Buffer;		SmallString<256> Buffer;
llvm::raw_svector_ostream Out(Buffer);		llvm::raw_svector_ostream Out(Buffer);
DeviceMC->mangleName(GD, Out);		DeviceMC->mangleName(GD, Out);
DeviceSideName = std::string(Out.str());		DeviceSideName = std::string(Out.str());
} else		} else
DeviceSideName = std::string(ND->getIdentifier()->getName());		DeviceSideName = std::string(ND->getIdentifier()->getName());

		// Make unique name for device side static file-scope variable for HIP.
		if (CGM.getContext().shouldExternalizeStaticVar(ND) &&
		CGM.getLangOpts().GPURelocatableDeviceCode &&
		!CGM.getLangOpts().CUID.empty()) {
		SmallString<256> Buffer;
		llvm::raw_svector_ostream Out(Buffer);
		Out << DeviceSideName;
		CGM.printPostfixForExternalizedStaticVar(Out);
		DeviceSideName = std::string(Out.str());
		}
return DeviceSideName;		return DeviceSideName;
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});		EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|		CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|
▲ Show 20 Lines • Show All 796 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.h

Show First 20 Lines • Show All 1,416 Lines • ▼ Show 20 Lines	CharUnits getNaturalTypeAlignment(QualType T,
LValueBaseInfo *BaseInfo = nullptr,		LValueBaseInfo *BaseInfo = nullptr,
TBAAAccessInfo *TBAAInfo = nullptr,		TBAAAccessInfo *TBAAInfo = nullptr,
bool forPointeeType = false);		bool forPointeeType = false);
CharUnits getNaturalPointeeTypeAlignment(QualType T,		CharUnits getNaturalPointeeTypeAlignment(QualType T,
LValueBaseInfo *BaseInfo = nullptr,		LValueBaseInfo *BaseInfo = nullptr,
TBAAAccessInfo *TBAAInfo = nullptr);		TBAAAccessInfo *TBAAInfo = nullptr);
bool stopAutoInit();		bool stopAutoInit();

		/// Print the postfix for externalized static variable for single source
		/// offloading languages CUDA and HIP.
		void printPostfixForExternalizedStaticVar(llvm::raw_ostream &OS) const;

private:		private:
llvm::Constant *GetOrCreateLLVMFunction(		llvm::Constant *GetOrCreateLLVMFunction(
StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable,		StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable,
bool DontDefer = false, bool IsThunk = false,		bool DontDefer = false, bool IsThunk = false,
llvm::AttributeList ExtraAttrs = llvm::AttributeList(),		llvm::AttributeList ExtraAttrs = llvm::AttributeList(),
ForDefinition_t IsForDefinition = NotForDefinition);		ForDefinition_t IsForDefinition = NotForDefinition);

llvm::Constant *GetOrCreateMultiVersionResolver(GlobalDecl GD,		llvm::Constant *GetOrCreateMultiVersionResolver(GlobalDecl GD,
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 1,178 Lines • ▼ Show 20 Lines	if (FD->isMultiVersion() && !OmitMultiVersionMangling) {
case MultiVersionKind::Target:		case MultiVersionKind::Target:
AppendTargetMangling(CGM, FD->getAttr<TargetAttr>(), Out);		AppendTargetMangling(CGM, FD->getAttr<TargetAttr>(), Out);
break;		break;
case MultiVersionKind::None:		case MultiVersionKind::None:
llvm_unreachable("None multiversion type isn't valid here");		llvm_unreachable("None multiversion type isn't valid here");
}		}
}		}

		// Make unique name for device side static file-scope variable for HIP.
		if (CGM.getContext().shouldExternalizeStaticVar(ND) &&
		CGM.getLangOpts().GPURelocatableDeviceCode &&
		CGM.getLangOpts().CUDAIsDevice && !CGM.getLangOpts().CUID.empty())
		CGM.printPostfixForExternalizedStaticVar(Out);
return std::string(Out.str());		return std::string(Out.str());
}		}

void CodeGenModule::UpdateMultiVersionNames(GlobalDecl GD,		void CodeGenModule::UpdateMultiVersionNames(GlobalDecl GD,
const FunctionDecl *FD) {		const FunctionDecl *FD) {
if (!FD->isMultiVersion())		if (!FD->isMultiVersion())
return;		return;

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	if (const auto *CD = dyn_cast<CXXConstructorDecl>(CanonicalGD.getDecl())) {
if (!getTarget().getCXXABI().hasConstructorVariants()) {		if (!getTarget().getCXXABI().hasConstructorVariants()) {
CXXCtorType OrigCtorType = GD.getCtorType();		CXXCtorType OrigCtorType = GD.getCtorType();
assert(OrigCtorType == Ctor_Base \|\| OrigCtorType == Ctor_Complete);		assert(OrigCtorType == Ctor_Base \|\| OrigCtorType == Ctor_Complete);
if (OrigCtorType == Ctor_Base)		if (OrigCtorType == Ctor_Base)
CanonicalGD = GlobalDecl(CD, Ctor_Complete);		CanonicalGD = GlobalDecl(CD, Ctor_Complete);
}		}
}		}

		// In CUDA/HIP device compilation with -fgpu-rdc, the mangled name of a
		// static device variable depends on whether the variable is referenced by
		// a host or device host function. Therefore the mangled name cannot be
		// cached.
		if (!LangOpts.CUDAIsDevice \|\|
		!getContext().mayExternalizeStaticVar(GD.getDecl())) {
auto FoundName = MangledDeclNames.find(CanonicalGD);		auto FoundName = MangledDeclNames.find(CanonicalGD);
if (FoundName != MangledDeclNames.end())		if (FoundName != MangledDeclNames.end())
return FoundName->second;		return FoundName->second;
		}

// Keep the first result in the case of a mangling collision.		// Keep the first result in the case of a mangling collision.
const auto *ND = cast<NamedDecl>(GD.getDecl());		const auto *ND = cast<NamedDecl>(GD.getDecl());
std::string MangledName = getMangledNameImpl(*this, GD, ND);		std::string MangledName = getMangledNameImpl(*this, GD, ND);

// Ensure either we have different ABIs between host and device compilations,		// Ensure either we have different ABIs between host and device compilations,
// says host compilation following MSVC ABI but device compilation follows		// says host compilation following MSVC ABI but device compilation follows
// Itanium C++ ABI or, if they follow the same ABI, kernel names after		// Itanium C++ ABI or, if they follow the same ABI, kernel names after
▲ Show 20 Lines • Show All 1,590 Lines • ▼ Show 20 Lines	void CodeGenModule::EmitGlobal(GlobalDecl GD) {
// If we're deferring emission of a C++ variable with an		// If we're deferring emission of a C++ variable with an
// initializer, remember the order in which it appeared in the file.		// initializer, remember the order in which it appeared in the file.
if (getLangOpts().CPlusPlus && isa<VarDecl>(Global) &&		if (getLangOpts().CPlusPlus && isa<VarDecl>(Global) &&
cast<VarDecl>(Global)->hasInit()) {		cast<VarDecl>(Global)->hasInit()) {
DelayedCXXInitPosition[Global] = CXXGlobalInits.size();		DelayedCXXInitPosition[Global] = CXXGlobalInits.size();
CXXGlobalInits.push_back(nullptr);		CXXGlobalInits.push_back(nullptr);
}		}

StringRef MangledName = getMangledName(GD);		StringRef MangledName = getMangledName(GD);
if (GetGlobalValue(MangledName) != nullptr) {		if (GetGlobalValue(MangledName) != nullptr) {
		traUnsubmitted Done Reply Inline Actions Is this code needed? tra: Is this code needed?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions this code is not needed. removed. yaxunl: this code is not needed. removed.
// The value has already been used and should therefore be emitted.		// The value has already been used and should therefore be emitted.
addDeferredDeclToEmit(GD);		addDeferredDeclToEmit(GD);
} else if (MustBeEmitted(Global)) {		} else if (MustBeEmitted(Global)) {
// The value must be emitted, but cannot be emitted eagerly.		// The value must be emitted, but cannot be emitted eagerly.
assert(!MayBeEmittedEagerly(Global));		assert(!MayBeEmittedEagerly(Global));
addDeferredDeclToEmit(GD);		addDeferredDeclToEmit(GD);
} else {		} else {
// Otherwise, remember that we saw a deferred decl with this name. The		// Otherwise, remember that we saw a deferred decl with this name. The
▲ Show 20 Lines • Show All 3,381 Lines • ▼ Show 20 Lines	if (!NumAutoVarInit) {
LangOptions::TrivialAutoVarInitKind::Zero		LangOptions::TrivialAutoVarInitKind::Zero
? "zero"		? "zero"
: "pattern");		: "pattern");
}		}
++NumAutoVarInit;		++NumAutoVarInit;
}		}
return false;		return false;
}		}

		void CodeGenModule::printPostfixForExternalizedStaticVar(
		llvm::raw_ostream &OS) const {
		OS << ".static." << getContext().getCUIDHash();
		}
		HahnfeldUnsubmitted Not Done Reply Inline Actions I've tried to use this with CUDA, but it errors out because `.` is not allowed in identifiers. Could you check if https://reviews.llvm.org/D108456 also works for HIP? Hahnfeld: I've tried to use this with CUDA, but it errors out because `.` is not allowed in identifiers.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I will try it with our CI and get back to you. yaxunl: I will try it with our CI and get back to you.

clang/test/CodeGenCUDA/device-var-linkage.cu

	// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \			// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
	// RUN: -emit-llvm -o - -x hip %s \			// RUN: -emit-llvm -o - -x hip %s \
	// RUN: \| FileCheck -check-prefixes=DEV,NORDC %s			// RUN: \| FileCheck -check-prefixes=DEV,NORDC %s
	// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \			// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
	// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \			// RUN: -fgpu-rdc -cuid=abc -emit-llvm -o - -x hip %s \
	// RUN: \| FileCheck -check-prefixes=DEV,RDC %s			// RUN: \| FileCheck -check-prefixes=DEV,RDC %s
	// RUN: %clang_cc1 -triple nvptx \			// RUN: %clang_cc1 -triple nvptx \
	// RUN: -emit-llvm -o - -x hip %s \			// RUN: -emit-llvm -o - -x hip %s \
	// RUN: \| FileCheck -check-prefixes=HOST,NORDC-H %s			// RUN: \| FileCheck -check-prefixes=HOST,NORDC-H %s
	// RUN: %clang_cc1 -triple nvptx \			// RUN: %clang_cc1 -triple nvptx \
	// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \			// RUN: -fgpu-rdc -cuid=abc -emit-llvm -o - -x hip %s \
	// RUN: \| FileCheck -check-prefixes=HOST,RDC-H %s			// RUN: \| FileCheck -check-prefixes=HOST,RDC-H %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// DEV-DAG: @v1 = dso_local addrspace(1) externally_initialized global i32 0			// DEV-DAG: @v1 = dso_local addrspace(1) externally_initialized global i32 0
	// NORDC-H-DAG: @v1 = internal global i32 undef			// NORDC-H-DAG: @v1 = internal global i32 undef
	// RDC-H-DAG: @v1 = dso_local global i32 undef			// RDC-H-DAG: @v1 = dso_local global i32 undef
	__device__ int v1;			__device__ int v1;
	Show All 12 Lines
	// DEV-DAG: @ev2 = external addrspace(4) global i32			// DEV-DAG: @ev2 = external addrspace(4) global i32
	// HOST-DAG: @ev2 = external global i32			// HOST-DAG: @ev2 = external global i32
	extern __constant__ int ev2;			extern __constant__ int ev2;
	// DEV-DAG: @ev3 = external addrspace(1) externally_initialized global i32 addrspace(1)*			// DEV-DAG: @ev3 = external addrspace(1) externally_initialized global i32 addrspace(1)*
	// HOST-DAG: @ev3 = external externally_initialized global i32*			// HOST-DAG: @ev3 = external externally_initialized global i32*
	extern __managed__ int ev3;			extern __managed__ int ev3;

	// NORDC-DAG: @_ZL3sv1 = dso_local addrspace(1) externally_initialized global i32 0			// NORDC-DAG: @_ZL3sv1 = dso_local addrspace(1) externally_initialized global i32 0
	// RDC-DAG: @_ZL3sv1 = internal addrspace(1) global i32 0			// RDC-DAG: @_ZL3sv1.static.[[HASH:.*]] = dso_local addrspace(1) externally_initialized global i32 0
				traUnsubmitted Done Reply Inline Actions It should probably be a regex after `HASH:`, not the hash value itself. tra: It should probably be a regex after `HASH:`, not the hash value itself.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
	// HOST-DAG: @_ZL3sv1 = internal global i32 undef			// HOST-DAG: @_ZL3sv1 = internal global i32 undef
	static __device__ int sv1;			static __device__ int sv1;
	// NORDC-DAG: @_ZL3sv2 = dso_local addrspace(4) externally_initialized global i32 0			// NORDC-DAG: @_ZL3sv2 = dso_local addrspace(4) externally_initialized global i32 0
	// RDC-DAG: @_ZL3sv2 = internal addrspace(4) global i32 0			// RDC-DAG: @_ZL3sv2.static.[[HASH]] = dso_local addrspace(4) externally_initialized global i32 0
	// HOST-DAG: @_ZL3sv2 = internal global i32 undef			// HOST-DAG: @_ZL3sv2 = internal global i32 undef
	static __constant__ int sv2;			static __constant__ int sv2;
	// DEV-DAG: @_ZL3sv3 = dso_local addrspace(1) externally_initialized global i32 addrspace(1)* null			// NORDC-DAG: @_ZL3sv3 = dso_local addrspace(1) externally_initialized global i32 addrspace(1)* null
				// RDC-DAG: @_ZL3sv3.static.[[HASH]] = dso_local addrspace(1) externally_initialized global i32 addrspace(1)* null
	// HOST-DAG: @_ZL3sv3 = internal externally_initialized global i32* null			// HOST-DAG: @_ZL3sv3 = internal externally_initialized global i32* null
	static __managed__ int sv3;			static __managed__ int sv3;

	__device__ __host__ int work(int *x);			__device__ __host__ int work(int *x);

	__device__ __host__ int fun1() {			__device__ __host__ int fun1() {
	return work(&ev1) + work(&ev2) + work(&ev3) + work(&sv1) + work(&sv2) + work(&sv3);			return work(&ev1) + work(&ev2) + work(&ev3) + work(&sv1) + work(&sv2) + work(&sv3);
	}			}
	Show All 10 Lines

clang/test/CodeGenCUDA/managed-var.cu

	// REQUIRES: x86-registered-target, amdgpu-registered-target			// REQUIRES: x86-registered-target, amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=COMMON,DEV %s			// RUN: -check-prefixes=COMMON,DEV,NORDC-D %s

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -fgpu-rdc -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -fgpu-rdc -cuid=abc -o - -x hip %s > %t.dev
	// RUN: -check-prefixes=COMMON,DEV %s			// RUN: cat %t.dev \| FileCheck -check-prefixes=COMMON,DEV,RDC-D %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=COMMON,HOST,NORDC %s			// RUN: -check-prefixes=COMMON,HOST,NORDC %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -fgpu-rdc -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -fgpu-rdc -cuid=abc -o - -x hip %s > %t.host
	// RUN: -check-prefixes=COMMON,HOST,RDC %s			// RUN: cat %t.host \| FileCheck -check-prefixes=COMMON,HOST,RDC %s

				// Check device and host compilation use the same postfix for static
				// variable name.

				// RUN: cat %t.dev %t.host \| FileCheck -check-prefix=POSTFIX %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	struct vec {			struct vec {
	float x,y,z;			float x,y,z;
	};			};

	// DEV-DAG: @x.managed = dso_local addrspace(1) externally_initialized global i32 1, align 4			// DEV-DAG: @x.managed = dso_local addrspace(1) externally_initialized global i32 1, align 4
	Show All 14 Lines
	__managed__ vec v2[100] = {{1, 1, 1}};			__managed__ vec v2[100] = {{1, 1, 1}};

	// DEV-DAG: @ex.managed = external addrspace(1) global i32, align 4			// DEV-DAG: @ex.managed = external addrspace(1) global i32, align 4
	// DEV-DAG: @ex = external addrspace(1) externally_initialized global i32 addrspace(1)*			// DEV-DAG: @ex = external addrspace(1) externally_initialized global i32 addrspace(1)*
	// HOST-DAG: @ex.managed = external global i32			// HOST-DAG: @ex.managed = external global i32
	// HOST-DAG: @ex = external externally_initialized global i32*			// HOST-DAG: @ex = external externally_initialized global i32*
	extern __managed__ int ex;			extern __managed__ int ex;

	// DEV-DAG: @_ZL2sx.managed = dso_local addrspace(1) externally_initialized global i32 1, align 4			// NORDC-D-DAG: @_ZL2sx.managed = dso_local addrspace(1) externally_initialized global i32 1, align 4
	// DEV-DAG: @_ZL2sx = dso_local addrspace(1) externally_initialized global i32 addrspace(1)* null			// NORDC-D-DAG: @_ZL2sx = dso_local addrspace(1) externally_initialized global i32 addrspace(1)* null
				traUnsubmitted Done Reply Inline Actions Same here. tra: Same here.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				// RDC-D-DAG: @_ZL2sx.static.[[HASH:.*]].managed = dso_local addrspace(1) externally_initialized global i32 1, align 4
				// RDC-D-DAG: @_ZL2sx.static.[[HASH]] = dso_local addrspace(1) externally_initialized global i32 addrspace(1)* null
	// HOST-DAG: @_ZL2sx.managed = internal global i32 1			// HOST-DAG: @_ZL2sx.managed = internal global i32 1
	// HOST-DAG: @_ZL2sx = internal externally_initialized global i32* null			// HOST-DAG: @_ZL2sx = internal externally_initialized global i32* null
				// NORDC-DAG: @[[DEVNAMESX:[0-9]+]] = {{.*}}c"_ZL2sx\00"
				// RDC-DAG: @[[DEVNAMESX:[0-9]+]] = {{.}}c"_ZL2sx.static.[[HASH:.]]\00"

				// POSTFIX: @_ZL2sx.static.[[HASH:.]] = dso_local addrspace(1) externally_initialized global i32 addrspace(1) null
				// POSTFIX: @[[DEVNAMESX:[0-9]+]] = {{.*}}c"_ZL2sx.static.[[HASH]]\00"
	static __managed__ int sx = 1;			static __managed__ int sx = 1;

	// DEV-DAG: @llvm.compiler.used			// DEV-DAG: @llvm.compiler.used
	// DEV-SAME-DAG: @x.managed			// DEV-SAME-DAG: @x.managed
	// DEV-SAME-DAG: @x			// DEV-SAME-DAG: @x
	// DEV-SAME-DAG: @v.managed			// DEV-SAME-DAG: @v.managed
	// DEV-SAME-DAG: @v			// DEV-SAME-DAG: @v
	// DEV-SAME-DAG: @_ZL2sx.managed			// DEV-SAME-DAG: @_ZL2sx.managed
	▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
	// HOST: %ld.managed = load i32, i32* @ex, align 4			// HOST: %ld.managed = load i32, i32* @ex, align 4
	// HOST: %0 = load i32, i32* %ld.managed, align 4			// HOST: %0 = load i32, i32* %ld.managed, align 4
	// HOST: ret i32 %0			// HOST: ret i32 %0
	__device__ __host__ int load4() {			__device__ __host__ int load4() {
	return ex;			return ex;
	}			}

	// HOST-DAG: __hipRegisterManagedVar({{.}}@x {{.}}@x.managed {{.}}@[[DEVNAMEX]]{{.}}, i64 4, i32 4)			// HOST-DAG: __hipRegisterManagedVar({{.}}@x {{.}}@x.managed {{.}}@[[DEVNAMEX]]{{.}}, i64 4, i32 4)
	// HOST-DAG: __hipRegisterManagedVar({{.}}@_ZL2sx {{.}}@_ZL2sx.managed			// HOST-DAG: __hipRegisterManagedVar({{.}}@_ZL2sx {{.}}@_ZL2sx.managed {{.*}}@[[DEVNAMESX]]
	// HOST-NOT: __hipRegisterManagedVar({{.}}@ex {{.}}@ex.managed			// HOST-NOT: __hipRegisterManagedVar({{.}}@ex {{.}}@ex.managed
	// HOST-DAG: declare void @__hipRegisterManagedVar(i8*, i8, i8, i8, i64, i32)			// HOST-DAG: declare void @__hipRegisterManagedVar(i8*, i8, i8, i8, i64, i32)

clang/test/CodeGenCUDA/static-device-var-rdc.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
				// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=DEV,INT-DEV %s

				// RUN: %clang_cc1 -triple x86_64-gnu-linux \
				// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=HOST,INT-HOST %s

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \
				// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s > %t.dev
				// RUN: cat %t.dev \| FileCheck -check-prefixes=DEV,EXT-DEV %s

				// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \
				// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s > %t.host
				// RUN: cat %t.host \| FileCheck -check-prefixes=HOST,EXT-HOST %s

				// Check host and device compilations use the same postfixes for static
				// variable names.

				// RUN: cat %t.dev %t.host \| FileCheck -check-prefix=POSTFIX %s

				#include "Inputs/cuda.h"

				// Test function scope static device variable, which should not be externalized.
				// DEV-DAG: @_ZZ6kernelPiPPKiE1w = internal addrspace(4) constant i32 1


				// HOST-DAG: @_ZL1x = internal global i32 undef
				// HOST-DAG: @_ZL1y = internal global i32 undef

				// Test normal static device variables
				// INT-DEV-DAG: @_ZL1x = dso_local addrspace(1) externally_initialized global i32 0
				traUnsubmitted Done Reply Inline Actions ditto. tra: ditto.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				// INT-HOST-DAG: @[[DEVNAMEX:[0-9]+]] = {{.*}}c"_ZL1x\00"

				// Test externalized static device variables
				// EXT-DEV-DAG: @_ZL1x.static.[[HASH:.*]] = dso_local addrspace(1) externally_initialized global i32 0
				// EXT-HOST-DAG: @[[DEVNAMEX:[0-9]+]] = {{.}}c"_ZL1x.static.[[HASH:.]]\00"

				// POSTFIX: @_ZL1x.static.[[HASH:.*]] = dso_local addrspace(1) externally_initialized global i32 0
				// POSTFIX: @[[DEVNAMEX:[0-9]+]] = {{.*}}c"_ZL1x.static.[[HASH]]\00"

				static __device__ int x;

				// Test static device variables not used by host code should not be externalized
				// DEV-DAG: @_ZL2x2 = internal addrspace(1) global i32 0

				static __device__ int x2;

				// Test normal static device variables
				// INT-DEV-DAG: @_ZL1y = dso_local addrspace(4) externally_initialized global i32 0
				// INT-HOST-DAG: @[[DEVNAMEY:[0-9]+]] = {{.*}}c"_ZL1y\00"

				// Test externalized static device variables
				// EXT-DEV-DAG: @_ZL1y.static.[[HASH]] = dso_local addrspace(4) externally_initialized global i32 0
				// EXT-HOST-DAG: @[[DEVNAMEY:[0-9]+]] = {{.*}}c"_ZL1y.static.[[HASH]]\00"

				static __constant__ int y;

				// Test static host variable, which should not be externalized nor registered.
				// HOST-DAG: @_ZL1z = internal global i32 0
				// DEV-NOT: @_ZL1z
				static int z;

				// Test static device variable in inline function, which should not be
				// externalized nor registered.
				// DEV-DAG: @_ZZ6devfunPPKiE1p = linkonce_odr addrspace(4) constant i32 2, comdat

				inline __device__ void devfun(const int ** b) {
				const static int p = 2;
				b[0] = &p;
				}

				__global__ void kernel(int a, const int *b) {
				const static int w = 1;
				a[0] = x;
				a[1] = y;
				b[0] = &w;
				b[1] = &x2;
				devfun(b);
				}

				int* getDeviceSymbol(int *x);

				void foo() {
				getDeviceSymbol(&x);
				getDeviceSymbol(&y);
				z = 123;
				}

				// HOST: __hipRegisterVar({{.}}@_ZL1x {{.}}@[[DEVNAMEX]]
				// HOST: __hipRegisterVar({{.}}@_ZL1y {{.}}@[[DEVNAMEY]]
				// HOST-NOT: __hipRegisterVar({{.*}}@_ZL2x2
				// HOST-NOT: __hipRegisterVar({{.*}}@_ZZ6kernelPiPPKiE1w
				// HOST-NOT: __hipRegisterVar({{.*}}@_ZZ6devfunPPKiE1p

clang/test/SemaCUDA/static-device-var.cu

This file was added.

				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang_cc1 -triple nvptx -fcuda-is-device \
				// RUN: -emit-llvm -o - %s -fsyntax-only -verify=dev

				// RUN: %clang_cc1 -triple x86_64-gnu-linux \
				// RUN: -emit-llvm -o - %s -fsyntax-only -verify=host

				// Checks allowed usage of file-scope and function-scope static variables.

				traUnsubmitted Done Reply Inline Actions A comment explaining what we're testing would be helpful. `no-diagnostics` gives no clues about what is it we're looking for here. tra: A comment explaining what we're testing would be helpful. `no-diagnostics` gives no clues about…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				// host-no-diagnostics

				#include "Inputs/cuda.h"

				// Checks static variables are allowed in device functions.

				__device__ void f1() {
				const static int b = 123;
				static int a;
				}

				// Checks static variables are allowd in global functions.
				traUnsubmitted Done Reply Inline Actions So, this verifies that we're allowed to use static local vars in device code. A comment would be useful. tra: So, this verifies that we're allowed to use static local vars in device code. A comment would…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do

				__global__ void k1() {
				const static int b = 123;
				static int a;
				}

				// Checks static device and constant variables are allowed in device and
				// host functions, and static host variables are not allowed in device
				// functions.

				static __device__ int x;
				static __constant__ int y;
				static int z;

				__global__ void kernel(int *a) {
				traUnsubmitted Done Reply Inline Actions And this verifies that global static vars can be referenced from both host and device. I'd also add a negative test with `static int host_only;` and would verify that we still don't allow accessing it from the device. tra: And this verifies that global static vars can be referenced from both host and device. I'd…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do yaxunl: will do
				a[0] = x;
				a[1] = y;
				a[2] = z;
				// dev-error@-1 {{reference to __host__ variable 'z' in __global__ function}}
				}

				int* getDeviceSymbol(int *x);

				void foo() {
				getDeviceSymbol(&x);
				getDeviceSymbol(&y);
				}