This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/
-
AST/
-
ASTContext.cpp
-
CodeGen/
-
CGDecl.cpp
2/2
CodeGenModule.cpp
-
Sema/
2/3
SemaDecl.cpp
-
test/
-
AST/
-
ast-dump-func-scope-static-var.cu
-
CodeGenCUDA/
6/6
func-scope-static-var.cu
-
static-device-var-no-rdc.cu
-
SemaCUDA/
-
func-scope-static-var.cu

Differential D95560

[CUDA][HIP] Fix function scope static variable
Needs ReviewPublic

Authored by yaxunl on Jan 27 2021, 1:58 PM.

Download Raw Diff

Details

Reviewers

tra

Summary

Currently static variables are allowed in device, global, and host device
functions.

A static variable in device and global functions is supposed to have
implicit device attribute. Currently it does not. This causes incorrect
diagnostics about host variables accessed by device functions.

Another issue is static device variables are allowed in host functions
since host functions could pass them to kernels for useful computations.
Currently they are not emitted in device compilation, which should be
fixed.

This patch also handles static variables in host device functions
and function scope static managed variables, and externalization
of such variables for fno-gpu-rdc case.

Diff Detail

Event Timeline

yaxunl requested review of this revision.Jan 27 2021, 1:58 PM

yaxunl created this revision.

yaxunl added a child revision: D95558: [NFC][CUDA] Refactor registering device variable.Jan 27 2021, 2:07 PM

A static variable in device and global functions is supposed to have
implicit device attribute. Currently it does not. This causes incorrect
diagnostics about host variables accessed by device functions.

Correct diagnostics sevice-side local static vars is a valid concern.
Could you elaborate on why are static variables in device functions are supposed to be __device__? I'm not quite sure that it's been established. At least not as a full __device__ variable, with runtime registration and the host-side shadow.

Judging by the tests and the comments, it may be better to rephrase the purpose of this patch along the lines that it allows treating a subset of the static variables for which the host may need to know device-side address as __device__, with all the overhead it entails. static vars that can't be created in the host code, remain purely static on device. When I read the patch description for the first time, it sounded more invasive than it actually is.

clang/lib/CodeGen/CodeGenModule.cpp
101	Nit. "This class does that" could be dropped. I'd generally follow a `"<this thing> does <that> for <this reason>"` structure. E.g something along these lines: Helper class for emitting device-side static variables created in host-side functions. While we do not emit host-side functions on device, we still need to emit the static variables the host code will expect to see on the device.
clang/lib/Sema/SemaCUDA.cpp
533–540 ↗	(On Diff #319658)	This does not seem to be directly relevant for this patch. Perhaps move it into a separate patch?
clang/lib/Sema/SemaDecl.cpp
7247–7250	This is somewhat confusing. I guess the issue is that we're conflating all the functionality implied by the `__device__` attribute and the `accessible on device` which is a subset of it. For the static vars in D functions you only need for it to be accessible on device, IMO. For HD functions, you do need the full `__device__` functionality, with host shadow and runtime registration. While adding implicit `__device__` works for statics in the device-only functions, it's a bit of an overkill. It also gives us a somewhat different AST between host/device compilations. Perhaps we can handle statics in device-only functions w/o adding implicit `__device__`. Can we check the parent of the variable instead when we check whether we're allowed to reference the variable?
clang/test/CodeGenCUDA/func-scope-static-var.cu
55	What's the reason for externalizing the variables for no-rdc only? If we do not externalize them, then we'll potentially have a problem with the host code attempting to get variable's device-side address and fail at runtime, because it's not visible on device. I think the right thing to do here is to always externalize them, but add unique suffix for RDC.
88	Nit: `static variables w/o attributes are implicitly __device__`. Or `By default, static variables are implicitly __device__`. It's also not clear what you mean by `which are independent`. It may be better to add more details in a separate sentence.
127–128	We could use an explanation why we're not externalizing or shadowing them.

yaxunl marked 6 inline comments as done.Feb 1 2021, 5:22 PM

yaxunl added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
101	done
clang/lib/Sema/SemaCUDA.cpp
533–540 ↗	(On Diff #319658)	separated to another patch
clang/lib/Sema/SemaDecl.cpp
7247–7250	Before we consider a function scope static variable without explicit device attribute, let's consider the difference between a static variable with explicit device attribute and a global device variable. They are both emitted in device compilation and have shadow variables in host compilation. The only difference is the linkage. A global device variable is supposed to be visible to other compilation units, whereas a static device variable is supposed to be visible to the same compilation unit only. A function scope static variable with device attribute has similar traits: It needs to be emitted in device compilation, and it needs a shadow variable in host compilation in case it needs to be accessed in host code. The only difference is that it is only visible inside the function. Now let's consider a static var without device attribute in a device function. From sema and codegen point of view, it should have difference from a function scope static var with device attribute. Adding an implicit device attribute would simplify its handling. Now let's consider a static var without device attribute in a host device function. The following code is valid for both nvcc and cuda-clang: int __device__ __host__ func(int x) { static int a = 1; return a + x; } This requires the static variable is directly accessible in both device and host compilation. This requires that in device compilation, the static var behaves like a static var with explicit device attribute, whereas in host compilation, the static var behaves like a normal host static var. By adding implicit device attribute, we can clearly distinguish these situations and reuse the sema and codegen logic of device attribute.
clang/test/CodeGenCUDA/func-scope-static-var.cu
55	Yes this will be fixed by the patch for externalizing static var for -fgpu-rdc
88	revised
127–128	added explanation

yaxunl removed a child revision: D95558: [NFC][CUDA] Refactor registering device variable.Feb 1 2021, 5:26 PM

yaxunl added a parent revision: D95558: [NFC][CUDA] Refactor registering device variable.

yaxunl mentioned this in D95840: [CUDA][HIP] Fix checking dependent initalizer.Feb 1 2021, 5:32 PM

yaxunl added a parent revision: D95840: [CUDA][HIP] Fix checking dependent initalizer.

Revised by Artem's comments

yaxunl mentioned this in D95558: [NFC][CUDA] Refactor registering device variable.Feb 2 2021, 10:39 AM

tra added inline comments.Feb 2 2021, 11:08 AM

clang/lib/Sema/SemaDecl.cpp
7247–7250	A function scope static variable with device attribute has similar traits: It needs to be emitted in device compilation, and it needs a shadow variable in host compilation in case it needs to be accessed in host code. This is the part I don't agree with. Can you give me an example how a local variable in a `__device__` function can be accessed from the host code? One can't refer to local static vars from outside of the function and even if the function returns the address, it will make no sense on the host side, because there's no reverse `device-address to host shadow` registration. I do not think we need host shadow or registration for device-side local statics. What do I miss? Now let's consider a static var without device attribute in a device function. From sema and codegen point of view, it should have difference from a function scope static var with device attribute. Adding an implicit device attribute would simplify its handling. I agree that it makes things simpler. What I'm saying is that the simple solution comes with an overhead that's not needed. int __device__ __host__ func(int x) { static int a = 1; return a + x; } This requires the static variable is directly accessible in both device and host compilation. This requires that in device compilation, the static var behaves like a static var with explicit device attribute, whereas in host compilation, the static var behaves like a normal host static var. I'm not sure I follow your reasoning. `directly accessible in both device and host compilation.` would need an equivalent of `__managed__` attribute. Regular `__device__` variables only allow the variable to have an address on the host side which can then be translated into device-side address by the runtime. The variable is only directly accessible from device. By adding implicit device attribute, we can clearly distinguish these situations and reuse the sema and codegen logic of device attribute. While this approach does remove that shadow+registration overhead, it does not give both host and device access to the same variable and it creates more divergence between host and device AST, which I'd prefer to avoid, if possible. To summarize, we appear to agree on what we want in the end -- a variable accessible on its respective side only w/o overhead of the shadown and registration. What we disagree on is on how to implement it. Your approach is to add `__device__` attibute only during device-side compilation only, which allows using parts of the functionality that happes to do the right thing in the individual compilation at the price of AST divergence. I think that AST divergence should be avoided and that we should have a uniform way of handling local static vars on both sides. Also, we'll need to figure out and document how static vars are expected to work in HD functions. Should they be implicitly `__managed__`? That would be the most intuitively sensible thing, but it's not going to work with CUDA as we don't support `__managed__` yet. We could explicitly say that both host and device have their own instance of the local static variable. It's sort of how it works in practice now, but it's deviating of what a user would expect from a static var. It's probably a more natural fit for CUDA/HIP programming model in general. E.g. consider that we may be running on more than one GPU. In order for a static var to work for all GPUs and the host, it should live on the host and then be memory-mapped on each device. I'm not sure if `__managed__` can handle that in principle for CUDA. Each-carries-their own approach is more consistent -- that's how we treat global variables anyways.

Revision Contents

Path

Size

clang/

lib/

AST/

ASTContext.cpp

6 lines

CodeGen/

CGDecl.cpp

31 lines

CodeGenModule.cpp

46 lines

Sema/

SemaDecl.cpp

19 lines

test/

AST/

ast-dump-func-scope-static-var.cu

53 lines

CodeGenCUDA/

func-scope-static-var.cu

168 lines

static-device-var-no-rdc.cu

2 lines

SemaCUDA/

func-scope-static-var.cu

115 lines

Diff 320644

clang/lib/AST/ASTContext.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,991 Lines • ▼ Show 20 Lines
	return DB << "a prior #pragma section";	return DB << "a prior #pragma section";
	}	}

	bool ASTContext::mayExternalizeStaticVar(const Decl *D) const {	bool ASTContext::mayExternalizeStaticVar(const Decl *D) const {
	return !getLangOpts().GPURelocatableDeviceCode &&	return !getLangOpts().GPURelocatableDeviceCode &&
	((D->hasAttr<CUDADeviceAttr>() &&	((D->hasAttr<CUDADeviceAttr>() &&
	!D->getAttr<CUDADeviceAttr>()->isImplicit()) \|\|	!D->getAttr<CUDADeviceAttr>()->isImplicit()) \|\|
	(D->hasAttr<CUDAConstantAttr>() &&	(D->hasAttr<CUDAConstantAttr>() &&
	!D->getAttr<CUDAConstantAttr>()->isImplicit())) &&	!D->getAttr<CUDAConstantAttr>()->isImplicit()) \|\|
	isa<VarDecl>(D) && cast<VarDecl>(D)->isFileVarDecl() &&	D->hasAttr<HIPManagedAttr>()) &&
	cast<VarDecl>(D)->getStorageClass() == SC_Static;	isa<VarDecl>(D) && cast<VarDecl>(D)->getStorageClass() == SC_Static;
	}	}

	bool ASTContext::shouldExternalizeStaticVar(const Decl *D) const {	bool ASTContext::shouldExternalizeStaticVar(const Decl *D) const {
	return mayExternalizeStaticVar(D) &&	return mayExternalizeStaticVar(D) &&
	CUDAStaticDeviceVarReferencedByHost.count(cast<VarDecl>(D));	CUDAStaticDeviceVarReferencedByHost.count(cast<VarDecl>(D));
	}	}
Context not available.

clang/lib/CodeGen/CGDecl.cpp

//===--- CGDecl.cpp - Emit LLVM Code for declarations ---------------------===//		//===--- CGDecl.cpp - Emit LLVM Code for declarations ---------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This contains code to emit Decl nodes as LLVM code.		// This contains code to emit Decl nodes as LLVM code.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGBlocks.h"		#include "CGBlocks.h"
		#include "CGCUDARuntime.h"
#include "CGCXXABI.h"		#include "CGCXXABI.h"
#include "CGCleanup.h"		#include "CGCleanup.h"
#include "CGDebugInfo.h"		#include "CGDebugInfo.h"
#include "CGOpenCLRuntime.h"		#include "CGOpenCLRuntime.h"
#include "CGOpenMPRuntime.h"		#include "CGOpenMPRuntime.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "ConstantEmitter.h"		#include "ConstantEmitter.h"
▲ Show 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	if (D.getType()->isVariablyModifiedType())
EmitVariablyModifiedType(D.getType());		EmitVariablyModifiedType(D.getType());

// Save the type in case adding the initializer forces a type change.		// Save the type in case adding the initializer forces a type change.
llvm::Type *expectedType = addr->getType();		llvm::Type *expectedType = addr->getType();

llvm::GlobalVariable *var =		llvm::GlobalVariable *var =
cast<llvm::GlobalVariable>(addr->stripPointerCasts());		cast<llvm::GlobalVariable>(addr->stripPointerCasts());

		// CUDA/HIP: need to register static device variable declared in host
		// or host device functions.
		if (getLangOpts().CUDA && !getLangOpts().CUDAIsDevice && CurFuncDecl) {
		if (auto *FD = dyn_cast<FunctionDecl>(CurFuncDecl)) {
		if (!FD->hasAttr<CUDAGlobalAttr>() &&
		(!FD->hasAttr<CUDADeviceAttr>() \|\| FD->hasAttr<CUDAHostAttr>()))
		CGM.getCUDARuntime().handleVarRegistration(&D, *var);
		}
		}

// CUDA's local and local static __shared__ variables should not		// CUDA's local and local static __shared__ variables should not
// have any non-empty initializers. This is ensured by Sema.		// have any non-empty initializers. This is ensured by Sema.
// Whatever initializer such variable may have when it gets here is		// Whatever initializer such variable may have when it gets here is
// a no-op and should not be emitted.		// a no-op and should not be emitted.
bool isCudaSharedVar = getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&		bool isCudaSharedVar = getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
D.hasAttr<CUDASharedAttr>();		D.hasAttr<CUDASharedAttr>();
		// HIP static managed variables need to be emitted as declarations in device
		// compilation in host or host device functions.
		bool isUndefManagedVar = false;
		if (getLangOpts().CUDAIsDevice && D.hasAttr<HIPManagedAttr>() &&
		CurFuncDecl) {
		if (auto *FD = dyn_cast<FunctionDecl>(CurFuncDecl)) {
		if (!FD->hasAttr<CUDAGlobalAttr>() &&
		(!FD->hasAttr<CUDADeviceAttr>() \|\| FD->hasAttr<CUDAHostAttr>())) {
		isUndefManagedVar = true;
		}
		}
		}
		if (isUndefManagedVar) {
		var->setInitializer(nullptr);
		var->setLinkage(llvm::GlobalValue::ExternalLinkage);
		} else if (D.getInit() && !isCudaSharedVar) {
// If this value has an initializer, emit it.		// If this value has an initializer, emit it.
if (D.getInit() && !isCudaSharedVar)
var = AddInitializerToStaticVarDecl(D, var);		var = AddInitializerToStaticVarDecl(D, var);
		}

var->setAlignment(alignment.getAsAlign());		var->setAlignment(alignment.getAsAlign());

if (D.hasAttr<AnnotateAttr>())		if (D.hasAttr<AnnotateAttr>())
CGM.AddGlobalAnnotations(&D, var);		CGM.AddGlobalAnnotations(&D, var);

if (auto *SA = D.getAttr<PragmaClangBSSSectionAttr>())		if (auto *SA = D.getAttr<PragmaClangBSSSectionAttr>())
var->addAttribute("bss-section", SA->getName());		var->addAttribute("bss-section", SA->getName());
▲ Show 20 Lines • Show All 2,174 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	case TargetCXXABI::XL:
return CreateItaniumCXXABI(CGM);		return CreateItaniumCXXABI(CGM);
case TargetCXXABI::Microsoft:		case TargetCXXABI::Microsoft:
return CreateMicrosoftCXXABI(CGM);		return CreateMicrosoftCXXABI(CGM);
}		}

llvm_unreachable("invalid C++ ABI kind");		llvm_unreachable("invalid C++ ABI kind");
}		}

		// Helper class for emitting device-side static variables created in host-side
		// functions for CUDA/HIP. While we do not emit host-side functions on device,
		// we still need to emit the static variables the host code will expect to see
		// on the device.
		class CUDAStaticDeviceVarEmitter
		traUnsubmitted Done Reply Inline Actions Nit. "This class does that" could be dropped. I'd generally follow a `"<this thing> does <that> for <this reason>"` structure. E.g something along these lines: Helper class for emitting device-side static variables created in host-side functions. While we do not emit host-side functions on device, we still need to emit the static variables the host code will expect to see on the device. tra: Nit. "This class does that" could be dropped. I'd generally follow a `"<this thing> does <that>…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
		: public StmtVisitor<CUDAStaticDeviceVarEmitter> {
		public:
		CodeGenFunction CGF;
		CUDAStaticDeviceVarEmitter(CodeGenModule &CGM) : CGF(CGM) {}
		void Visit(Stmt *S) {
		if (!S)
		return;
		if (auto *DS = dyn_cast<DeclStmt>(S)) {
		for (auto &&D : DS->decls()) {
		if (auto *VD = dyn_cast<VarDecl>(D)) {
		if (VD->hasAttr<CUDADeviceAttr>() \|\|
		VD->hasAttr<CUDAConstantAttr>()) {
		llvm::GlobalValue::LinkageTypes Linkage =
		CGF.CGM.getLLVMLinkageVarDefinition(VD, /IsConstant=/false);
		return CGF.EmitStaticVarDecl(*VD, Linkage);
		}
		}
		}
		}
		for (auto &&SS : S->children())
		Visit(SS);
		}
		void runOn(const FunctionDecl *FD) {
		assert(CGF.getLangOpts().CUDAIsDevice);
		assert(!FD->hasAttr<CUDADeviceAttr>() && !FD->hasAttr<CUDAGlobalAttr>());
		assert(FD->hasBody());
		CGF.CurFuncDecl = FD;
		Visit(FD->getBody());
		}
		};

CodeGenModule::CodeGenModule(ASTContext &C, const HeaderSearchOptions &HSO,		CodeGenModule::CodeGenModule(ASTContext &C, const HeaderSearchOptions &HSO,
const PreprocessorOptions &PPO,		const PreprocessorOptions &PPO,
const CodeGenOptions &CGO, llvm::Module &M,		const CodeGenOptions &CGO, llvm::Module &M,
DiagnosticsEngine &diags,		DiagnosticsEngine &diags,
CoverageSourceInfo *CoverageInfo)		CoverageSourceInfo *CoverageInfo)
: Context(C), LangOpts(C.getLangOpts()), HeaderSearchOpts(HSO),		: Context(C), LangOpts(C.getLangOpts()), HeaderSearchOpts(HSO),
PreprocessorOpts(PPO), CodeGenOpts(CGO), TheModule(M), Diags(diags),		PreprocessorOpts(PPO), CodeGenOpts(CGO), TheModule(M), Diags(diags),
Target(C.getTargetInfo()), ABI(createCXXABI(*this)),		Target(C.getTargetInfo()), ABI(createCXXABI(*this)),
▲ Show 20 Lines • Show All 2,638 Lines • ▼ Show 20 Lines	void CodeGenModule::EmitGlobal(GlobalDecl GD) {
// If this is CUDA, be selective about which declarations we emit.		// If this is CUDA, be selective about which declarations we emit.
if (LangOpts.CUDA) {		if (LangOpts.CUDA) {
if (LangOpts.CUDAIsDevice) {		if (LangOpts.CUDAIsDevice) {
if (!Global->hasAttr<CUDADeviceAttr>() &&		if (!Global->hasAttr<CUDADeviceAttr>() &&
!Global->hasAttr<CUDAGlobalAttr>() &&		!Global->hasAttr<CUDAGlobalAttr>() &&
!Global->hasAttr<CUDAConstantAttr>() &&		!Global->hasAttr<CUDAConstantAttr>() &&
!Global->hasAttr<CUDASharedAttr>() &&		!Global->hasAttr<CUDASharedAttr>() &&
!Global->getType()->isCUDADeviceBuiltinSurfaceType() &&		!Global->getType()->isCUDADeviceBuiltinSurfaceType() &&
!Global->getType()->isCUDADeviceBuiltinTextureType())		!Global->getType()->isCUDADeviceBuiltinTextureType()) {
		if (auto *FD = dyn_cast<FunctionDecl>(Global)) {
		if (FD->hasBody()) {
		// Emit static device or constant variables for host functions.
		CUDAStaticDeviceVarEmitter E(*this);
		E.runOn(FD);
		}
		}
return;		return;
		}
} else {		} else {
// We need to emit host-side 'shadows' for all global		// We need to emit host-side 'shadows' for all global
// device-side variables because the CUDA runtime needs their		// device-side variables because the CUDA runtime needs their
// size and host-side address in order to provide access to		// size and host-side address in order to provide access to
// their device-side incarnations.		// their device-side incarnations.

// So device-only functions are the only things we skip.		// So device-only functions are the only things we skip.
if (isa<FunctionDecl>(Global) && !Global->hasAttr<CUDAHostAttr>() &&		if (isa<FunctionDecl>(Global) && !Global->hasAttr<CUDAHostAttr>() &&
▲ Show 20 Lines • Show All 3,483 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,238 Lines • ▼ Show 20 Lines	if (getLangOpts().OpenCL) {
deduceOpenCLAddressSpace(NewVD);		deduceOpenCLAddressSpace(NewVD);

diagnoseOpenCLTypes(S, *this, D, DC, NewVD->getType());		diagnoseOpenCLTypes(S, *this, D, DC, NewVD->getType());
}		}

// Handle attributes prior to checking for duplicates in MergeVarDecl		// Handle attributes prior to checking for duplicates in MergeVarDecl
ProcessDeclAttributes(S, NewVD, D);		ProcessDeclAttributes(S, NewVD, D);

		// CUDA/HIP: Function-scope static variables in device or global functions
		// have implicit device or constant attribute. Function-scope static variables
		// in host device functions have implicit device or constant attribute in
		// device compilation only.
		traUnsubmitted Done Reply Inline Actions This is somewhat confusing. I guess the issue is that we're conflating all the functionality implied by the `__device__` attribute and the `accessible on device` which is a subset of it. For the static vars in D functions you only need for it to be accessible on device, IMO. For HD functions, you do need the full `__device__` functionality, with host shadow and runtime registration. While adding implicit `__device__` works for statics in the device-only functions, it's a bit of an overkill. It also gives us a somewhat different AST between host/device compilations. Perhaps we can handle statics in device-only functions w/o adding implicit `__device__`. Can we check the parent of the variable instead when we check whether we're allowed to reference the variable? tra: This is somewhat confusing. I guess the issue is that we're conflating all the functionality…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Before we consider a function scope static variable without explicit device attribute, let's consider the difference between a static variable with explicit device attribute and a global device variable. They are both emitted in device compilation and have shadow variables in host compilation. The only difference is the linkage. A global device variable is supposed to be visible to other compilation units, whereas a static device variable is supposed to be visible to the same compilation unit only. A function scope static variable with device attribute has similar traits: It needs to be emitted in device compilation, and it needs a shadow variable in host compilation in case it needs to be accessed in host code. The only difference is that it is only visible inside the function. Now let's consider a static var without device attribute in a device function. From sema and codegen point of view, it should have difference from a function scope static var with device attribute. Adding an implicit device attribute would simplify its handling. Now let's consider a static var without device attribute in a host device function. The following code is valid for both nvcc and cuda-clang: int __device__ __host__ func(int x) { static int a = 1; return a + x; } This requires the static variable is directly accessible in both device and host compilation. This requires that in device compilation, the static var behaves like a static var with explicit device attribute, whereas in host compilation, the static var behaves like a normal host static var. By adding implicit device attribute, we can clearly distinguish these situations and reuse the sema and codegen logic of device attribute. yaxunl: Before we consider a function scope static variable without explicit device attribute, let's…
		traUnsubmitted Not Done Reply Inline Actions A function scope static variable with device attribute has similar traits: It needs to be emitted in device compilation, and it needs a shadow variable in host compilation in case it needs to be accessed in host code. This is the part I don't agree with. Can you give me an example how a local variable in a `__device__` function can be accessed from the host code? One can't refer to local static vars from outside of the function and even if the function returns the address, it will make no sense on the host side, because there's no reverse `device-address to host shadow` registration. I do not think we need host shadow or registration for device-side local statics. What do I miss? Now let's consider a static var without device attribute in a device function. From sema and codegen point of view, it should have difference from a function scope static var with device attribute. Adding an implicit device attribute would simplify its handling. I agree that it makes things simpler. What I'm saying is that the simple solution comes with an overhead that's not needed. int __device__ __host__ func(int x) { static int a = 1; return a + x; } This requires the static variable is directly accessible in both device and host compilation. This requires that in device compilation, the static var behaves like a static var with explicit device attribute, whereas in host compilation, the static var behaves like a normal host static var. I'm not sure I follow your reasoning. `directly accessible in both device and host compilation.` would need an equivalent of `__managed__` attribute. Regular `__device__` variables only allow the variable to have an address on the host side which can then be translated into device-side address by the runtime. The variable is only directly accessible from device. By adding implicit device attribute, we can clearly distinguish these situations and reuse the sema and codegen logic of device attribute. While this approach does remove that shadow+registration overhead, it does not give both host and device access to the same variable and it creates more divergence between host and device AST, which I'd prefer to avoid, if possible. To summarize, we appear to agree on what we want in the end -- a variable accessible on its respective side only w/o overhead of the shadown and registration. What we disagree on is on how to implement it. Your approach is to add `__device__` attibute only during device-side compilation only, which allows using parts of the functionality that happes to do the right thing in the individual compilation at the price of AST divergence. I think that AST divergence should be avoided and that we should have a uniform way of handling local static vars on both sides. Also, we'll need to figure out and document how static vars are expected to work in HD functions. Should they be implicitly `__managed__`? That would be the most intuitively sensible thing, but it's not going to work with CUDA as we don't support `__managed__` yet. We could explicitly say that both host and device have their own instance of the local static variable. It's sort of how it works in practice now, but it's deviating of what a user would expect from a static var. It's probably a more natural fit for CUDA/HIP programming model in general. E.g. consider that we may be running on more than one GPU. In order for a static var to work for all GPUs and the host, it should live on the host and then be memory-mapped on each device. I'm not sure if `__managed__` can handle that in principle for CUDA. Each-carries-their own approach is more consistent -- that's how we treat global variables anyways. tra: > A function scope static variable with device attribute has similar traits: It needs to be…
		if (getLangOpts().CUDA && SC == SC_Static) {
		FunctionDecl *CurFD = getCurFunctionDecl();
		if (CurFD &&
		(CurFD->hasAttr<CUDADeviceAttr>() \|\|
		CurFD->hasAttr<CUDAGlobalAttr>()) &&
		(getLangOpts().CUDAIsDevice \|\| !CurFD->hasAttr<CUDAHostAttr>()) &&
		!NewVD->hasAttr<CUDASharedAttr>() &&
		!NewVD->hasAttr<CUDAConstantAttr>()) {
		if (NewVD->isConstexpr() \|\| NewVD->getType().getQualifiers().hasConst())
		NewVD->addAttr(CUDAConstantAttr::CreateImplicit(getASTContext()));
		else if (!NewVD->hasAttr<CUDADeviceAttr>())
		NewVD->addAttr(CUDADeviceAttr::CreateImplicit(getASTContext()));
		}
		}

if (getLangOpts().CUDA \|\| getLangOpts().OpenMPIsDevice \|\|		if (getLangOpts().CUDA \|\| getLangOpts().OpenMPIsDevice \|\|
getLangOpts().SYCLIsDevice) {		getLangOpts().SYCLIsDevice) {
if (EmitTLSUnsupportedError &&		if (EmitTLSUnsupportedError &&
((getLangOpts().CUDA && DeclAttrsMatchCUDAMode(getLangOpts(), NewVD)) \|\|		((getLangOpts().CUDA && DeclAttrsMatchCUDAMode(getLangOpts(), NewVD)) \|\|
(getLangOpts().OpenMPIsDevice &&		(getLangOpts().OpenMPIsDevice &&
OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(NewVD))))		OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(NewVD))))
Diag(D.getDeclSpec().getThreadStorageClassSpecLoc(),		Diag(D.getDeclSpec().getThreadStorageClassSpecLoc(),
diag::err_thread_unsupported);		diag::err_thread_unsupported);
▲ Show 20 Lines • Show All 9,991 Lines • Show Last 20 Lines

clang/test/AST/ast-dump-func-scope-static-var.cu

This file was added.

				// RUN: %clang_cc1 -std=c++11 -ast-dump -x hip %s \| FileCheck %s
				// RUN: %clang_cc1 -std=c++11 -ast-dump -fcuda-is-device -x hip %s \| FileCheck %s

				#include "Inputs/cuda.h"

				// CHECK-LABEL: FunctionDecl {{.*}} fun1
				// CHECK: VarDecl {{.*}} a 'int' static
				// CHECK-NOT: CUDADeviceAttr
				// CHECK: VarDecl {{.*}} b 'int' static
				// CHECK-NEXT: CUDADeviceAttr {{.*}}cuda.h
				// CHECK: VarDecl {{.*}} c 'const int' static cinit
				// CHECK-NOT: CUDADeviceAttr
				// CHECK: VarDecl {{.*}} d 'const int' static constexpr cinit
				// CHECK-NOT: CUDADeviceAttr
				// CHECK: VarDecl {{.*}} e 'int' static cinit
				// CHECK: CUDAConstantAttr {{.*}}cuda.h
				// CHECK: VarDecl {{.*}} f 'int' static cinit
				// CHECK: HIPManagedAttr {{.*}}cuda.h
				// CHECK: CUDADeviceAttr {{.*}}Implicit
				// CHECK-NOT: CUDADeviceAttr
				void fun1() {
				static int a;
				static __device__ int b;
				static const int c = 1;
				static constexpr int d = 1;
				static __constant__ int e = 1;
				static __managed__ int f = 1;
				}

				// CHECK-LABEL: FunctionDecl {{.*}} fun2
				// CHECK: VarDecl {{.*}} a 'int' static
				// CHECK-NEXT: CUDADeviceAttr {{.*}}Implicit
				// CHECK: VarDecl {{.*}} b 'int' static
				// CHECK-NEXT: CUDADeviceAttr {{.*}}cuda.h
				// CHECK: VarDecl {{.*}} c 'const int' static cinit
				// CHECK: CUDAConstantAttr {{.*}}Implicit
				// CHECK-NOT: CUDADeviceAttr
				// CHECK: VarDecl {{.*}} d 'const int' static constexpr cinit
				// CHECK: CUDAConstantAttr {{.*}}Implicit
				// CHECK-NOT: CUDADeviceAttr
				// CHECK: VarDecl {{.*}} e 'int' static cinit
				// CHECK: CUDAConstantAttr {{.*}}cuda.h
				// CHECK: VarDecl {{.*}} f 'int' static cinit
				// CHECK: HIPManagedAttr {{.*}}cuda.h
				// CHECK: CUDADeviceAttr {{.*}}Implicit
				__device__ void fun2() {
				static int a;
				static __device__ int b;
				static const int c = 1;
				static constexpr int d = 1;
				static __constant__ int e = 1;
				static __managed__ int f = 1;
				}

clang/test/CodeGenCUDA/func-scope-static-var.cu

This file was added.

				// REQUIRES: x86-registered-target, amdgpu-registered-target

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
				// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=DEV,NORDC %s

				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
				// RUN: -emit-llvm -fgpu-rdc -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=DEV,RDC %s

				// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
				// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=HOST %s

				// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
				// RUN: -emit-llvm -fgpu-rdc -o - -x hip %s \| FileCheck \
				// RUN: -check-prefixes=HOST %s

				#include "Inputs/cuda.h"

				// In device functions, static device variables are not externalized nor shadowed.
				// Static managed variable behaves like a normal static device variable.

				// DEV: @_ZZ4fun1vE1a = internal addrspace(1) global i32 1
				// HOST-NOT: @_ZZ4fun1vE1a
				// DEV: @_ZZ4fun1vE1b = internal addrspace(1) global i32 2
				// HOST-NOT: @_ZZ4fun1vE1b
				// DEV: @_ZZ4fun1vE1c = internal addrspace(4) constant i32 3
				// HOST-NOT: @_ZZ4fun1vE1c
				// DEV: @_ZZ4fun1vE1d = internal addrspace(4) constant i32 4
				// HOST-NOT: @_ZZ4fun1vE1d
				// DEV: @_ZZ4fun1vE1e = internal addrspace(4) global i32 5
				// HOST-NOT: @_ZZ4fun1vE1e
				// DEV: @_ZZ4fun1vE1f = internal addrspace(1) global i32 6
				// HOST-NOT: @_ZZ4fun1vE1f
				__device__ int fun1() {
				static int a = 1;
				static __device__ int b = 2;
				static const int c = 3;
				static constexpr int d = 4;
				static __constant__ int e = 5;
				static __managed__ int f = 6;
				return a + b + c + d + e + f;
				}

				// Assuming this function accepts a device pointer and does some work.
				__host__ __device__ int work(int *x);

				// In host function, static device variables are externalized if used and shadowed.

				// DEV-NOT: @_ZZ4fun2vE1a
				// HOST: @_ZZ4fun2vE1a = internal global i32 1
				// NORDC: @_ZZ4fun2vE1b = dso_local addrspace(1) global i32 2
				// RDC: @_ZZ4fun2vE1b = internal addrspace(1) global i32 2
				// HOST: @_ZZ4fun2vE1b = internal global i32 2
				traUnsubmitted Done Reply Inline Actions What's the reason for externalizing the variables for no-rdc only? If we do not externalize them, then we'll potentially have a problem with the host code attempting to get variable's device-side address and fail at runtime, because it's not visible on device. I think the right thing to do here is to always externalize them, but add unique suffix for RDC. tra: What's the reason for externalizing the variables for no-rdc only? If we do not externalize…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions Yes this will be fixed by the patch for externalizing static var for -fgpu-rdc yaxunl: Yes this will be fixed by the patch for externalizing static var for -fgpu-rdc
				// DEV-NOT: @_ZZ4fun2vE1c
				// HOST: @_ZZ4fun2vE1c = internal constant i32 3
				// DEV-NOT: @_ZZ4fun2vE1d
				// HOST: @_ZZ4fun2vE1d = internal constant i32 4
				// NORDC: @_ZZ4fun2vE1e = dso_local addrspace(4) global i32 5
				// RDC: @_ZZ4fun2vE1e = internal addrspace(4) global i32 5
				// HOST: @_ZZ4fun2vE1e = internal global i32 5
				// DEV: @_ZZ4fun2vE1f = internal addrspace(1) global i32* addrspacecast (i32 addrspace(1)* @_ZZ4fun2vE1b to i32*)
				// HOST: @_ZZ4fun2vE1f = internal global i32* @_ZZ4fun2vE1b
				// NORDC: @_ZZ4fun2vE1b_0 = dso_local addrspace(1) global i32 6
				// RDC: @_ZZ4fun2vE1b_0 = internal addrspace(1) global i32 6
				// HOST: @_ZZ4fun2vE1b_0 = internal global i32 6
				// NORDC: @_ZZ4fun2vE1g = dso_local addrspace(1) externally_initialized global i32 undef
				// RDC: @_ZZ4fun2vE1g = external dso_local addrspace(1) global i32
				// HOST: @_ZZ4fun2vE1g = internal global i32 7
				int fun2() {
				static int a = 1;
				static __device__ int b = 2;
				static const int c = 3;
				static constexpr int d = 4;
				static __constant__ int e = 5;
				static __device__ int *f = &b;
				for (int i = 0; i < 10; i++) {
				static __device__ int b = 6;
				work(&b);
				}
				static __managed__ int g = 7;
				return a + c + d + work(&e) + g;
				}

				// In host device function, explicit static device variables are externalized
				// if used and registered. Static variables w/o attributes are implicit device
				// variables in device compilation and host variables in host compilation.
				traUnsubmitted Done Reply Inline Actions Nit: `static variables w/o attributes are implicitly __device__`. Or `By default, static variables are implicitly __device__`. It's also not clear what you mean by `which are independent`. It may be better to add more details in a separate sentence. tra: Nit: `static variables w/o attributes are implicitly __device__`. Or `By default, static…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions revised yaxunl: revised
				// The variable emitted in host compilation is not the shadow variable of the
				// variable emitted in device compilation.

				// DEV: @_ZZ4fun3vE1a = internal addrspace(1) global i32 1
				// HOST: @_ZZ4fun3vE1a = internal global i32 1
				// NORDC: @_ZZ4fun3vE1b = dso_local addrspace(1) global i32 2
				// RDC: @_ZZ4fun3vE1b = internal addrspace(1) global i32 2
				// HOST: @_ZZ4fun3vE1b = internal global i32 2
				// DEV: @_ZZ4fun3vE1c = internal addrspace(4) constant i32 3
				// HOST: @_ZZ4fun3vE1c = internal constant i32 3
				// DEV: @_ZZ4fun3vE1d = internal addrspace(4) constant i32 4
				// HOST: @_ZZ4fun3vE1d = internal constant i32 4
				// NORDC: @_ZZ4fun3vE1e = dso_local addrspace(4) global i32 5
				// RDC: @_ZZ4fun3vE1e = internal addrspace(4) global i32 5
				// HOST: @_ZZ4fun3vE1e = internal global i32 5
				// DEV: @_ZZ4fun3vE1f = internal addrspace(1) global i32* addrspacecast (i32 addrspace(1)* @_ZZ4fun3vE1b to i32*)
				// HOST: @_ZZ4fun3vE1f = internal global i32* @_ZZ4fun3vE1b
				// NORDC: @_ZZ4fun3vE1b_0 = dso_local addrspace(1) global i32 6
				// RDC: @_ZZ4fun3vE1b_0 = internal addrspace(1) global i32 6
				// HOST: @_ZZ4fun3vE1b_0 = internal global i32 6
				// NORDC: @_ZZ4fun3vE1g = dso_local addrspace(1) externally_initialized global i32 undef
				// RDC: @_ZZ4fun3vE1g = external dso_local addrspace(1) global i32
				// HOST: @_ZZ4fun3vE1g = internal global i32 7
				__host__ __device__ int fun3() {
				static int a = 1;
				static __device__ int b = 2;
				static const int c = 3;
				static constexpr int d = 4;
				static __constant__ int e = 5;
				static __device__ int *f = &b;
				for (int i = 0; i < 10; i++) {
				static __device__ int b = 6;
				work(&b);
				}
				static __managed__ int g = 7;
				return a + c + d + work(&e) + g;
				}

				// In kernels, static device variables are not externalized nor shadowed
				// since they cannot be accessed by host code. Static managed variable behaves
				traUnsubmitted Done Reply Inline Actions We could use an explanation why we're not externalizing or shadowing them. tra: We could use an explanation why we're not externalizing or shadowing them.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions added explanation yaxunl: added explanation
				// like a normal static device variable.

				// DEV: @_ZZ4fun4vE1a = internal addrspace(1) global i32 1
				// HOST-NOT: @_ZZ4fun4vE1a
				// DEV: @_ZZ4fun4vE1b = internal addrspace(1) global i32 2
				// HOST-NOT: @_ZZ4fun4vE1b
				// DEV: @_ZZ4fun4vE1c = internal addrspace(4) constant i32 3
				// HOST-NOT: @_ZZ4fun4vE1c
				// DEV: @_ZZ4fun4vE1d = internal addrspace(4) constant i32 4
				// HOST-NOT: @_ZZ4fun4vE1d
				// DEV: @_ZZ4fun4vE1e = internal addrspace(4) global i32 5
				// HOST-NOT: @_ZZ4fun4vE1e
				// DEV: @_ZZ4fun4vE1f = internal addrspace(1) global i32 6
				// HOST-NOT: @_ZZ4fun4vE1f
				__global__ void fun4() {
				static int a = 1;
				static __device__ int b = 2;
				static const int c = 3;
				static constexpr int d = 4;
				static __constant__ int e = 5;
				static __managed__ int f = 6;
				}

				// HOST-NOT: call void @__hipRegisterVar({{.*}}@_ZZ4fun1vE1f
				// HOST-NOT: call void @__hipRegisterManagedVar({{.*}}@_ZZ4fun1vE1f
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun2vE1b
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun2vE1e
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun2vE1f
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun2vE1b_0
				// HOST: call void @__hipRegisterManagedVar({{.*}}@_ZZ4fun2vE1g
				// HOST-NOT: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1a
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1b
				// HOST-NOT: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1c
				// HOST-NOT: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1d
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1e
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1f
				// HOST: call void @__hipRegisterVar({{.*}}@_ZZ4fun3vE1b_0
				// HOST: call void @__hipRegisterManagedVar({{.*}}@_ZZ4fun3vE1g
				// HOST-NOT: call void @__hipRegisterVar({{.*}}@_ZZ4fun4vE1f
				// HOST-NOT: call void @__hipRegisterManagedVar({{.*}}@_ZZ4fun4vE1f

clang/test/CodeGenCUDA/static-device-var-no-rdc.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=DEV %s			// RUN: -check-prefixes=DEV %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=HOST %s			// RUN: -check-prefixes=HOST %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Test function scope static device variable, which should not be externalized.			// Test function scope static device variable, which should not be externalized.
	// DEV-DAG: @_ZZ6kernelPiPPKiE1w = internal addrspace(4) constant i32 1			// DEV-DAG: @_ZZ6kernelPiPPKiE1w = internal addrspace(4) constant i32 1
	// DEV-DAG: @_ZZ6kernelPiPPKiE21local_static_constant = internal addrspace(4) constant i32 42			// DEV-DAG: @_ZZ6kernelPiPPKiE21local_static_constant = internal addrspace(4) constant i32 42
	// DEV-DAG: @_ZZ6kernelPiPPKiE19local_static_device = internal addrspace(1) constant i32 43			// DEV-DAG: @_ZZ6kernelPiPPKiE19local_static_device = internal addrspace(4) constant i32 43

	// Check a static device variable referenced by host function is externalized.			// Check a static device variable referenced by host function is externalized.
	// DEV-DAG: @_ZL1x ={{.*}} addrspace(1) externally_initialized global i32 0			// DEV-DAG: @_ZL1x ={{.*}} addrspace(1) externally_initialized global i32 0
	// HOST-DAG: @_ZL1x = internal global i32 undef			// HOST-DAG: @_ZL1x = internal global i32 undef
	// HOST-DAG: @[[DEVNAMEX:[0-9]+]] = {{.*}}c"_ZL1x\00"			// HOST-DAG: @[[DEVNAMEX:[0-9]+]] = {{.*}}c"_ZL1x\00"

	static __device__ int x;			static __device__ int x;

	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

clang/test/SemaCUDA/func-scope-static-var.cu

This file was added.

				// RUN: %clang_cc1 -std=c++14 -fsyntax-only -verify=host,com -x hip %s
				// RUN: %clang_cc1 -std=c++14 -fsyntax-only -fcuda-is-device -verify=dev,com -x hip %s
				// RUN: %clang_cc1 -std=c++14 -fsyntax-only -fgpu-rdc -verify=host,com -x hip %s
				// RUN: %clang_cc1 -std=c++14 -fsyntax-only -fgpu-rdc -fcuda-is-device -verify=dev,com -x hip %s

				#include "Inputs/cuda.h"

				struct A {
				static int a;
				static __device__ int fun();
				};

				int A::a;
				__device__ int A::fun() {
				return a;
				// dev-error@-1 {{reference to __host__ variable 'a' in __device__ function}}
				}

				// Assuming this function accepts a pointer to a device variable and calculate some result.
				__device__ __host__ int work(const int *x);

				int fun1(int x) {
				static __device__ int a = sizeof(a);
				static __device__ int b = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const __device__ int c = sizeof(a);
				static constexpr __device__ int d = sizeof(a);
				static __constant__ __device__ int e = sizeof(a);
				static __managed__ __device__ int f = sizeof(a);
				static int a2 = sizeof(a);
				static int b2 = x;
				static const int c2 = sizeof(a);
				static constexpr int d2 = sizeof(a);
				static __constant__ int e2 = sizeof(a);
				static __managed__ int f2 = sizeof(a);
				return work(&a) + work(&b) + work(&c) + work(&d) + work(&e) + f + a2 + b2 + c2 + d2 + work(&e2) + f2;
				}

				__device__ int fun2(int x) {
				static __device__ int a = sizeof(a);
				static __device__ int b = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const __device__ int c = sizeof(a);
				static constexpr __device__ int d = sizeof(a);
				static __constant__ __device__ int e = sizeof(a);
				static __managed__ __device__ int f = sizeof(a);
				static int a2 = sizeof(a);
				static int b2 = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const int c2 = sizeof(a);
				static constexpr int d2 = sizeof(a);
				static __constant__ int e2 = sizeof(a);
				static __managed__ int f2 = sizeof(a);
				return a + b + c + d + e + f + a2 + b2 + c2 + d2 + e2 + f2;
				}

				__device__ __host__ int fun3(int x) {
				static __device__ int a = sizeof(a);
				static __device__ int b = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const __device__ int c = sizeof(a);
				static constexpr __device__ int d = sizeof(a);
				static __constant__ __device__ int e = sizeof(a);
				static __managed__ __device__ int f = sizeof(a);
				static int a2 = sizeof(a);
				static int b2 = x;
				// dev-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const int c2 = sizeof(a);
				static constexpr int d2 = sizeof(a);
				static __constant__ int e2 = sizeof(a);
				static __managed__ int f2 = sizeof(a);
				return work(&a) + work(&b) + work(&c) + work(&d) + work(&e) + f + a2 + b2 + c2 + d2 + work(&e2) + f2;
				}

				template<typename T>
				__device__ __host__ int fun4(T x) {
				static __device__ int a = sizeof(x);
				static __device__ int b = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const __device__ int c = sizeof(x);
				static constexpr __device__ int d = sizeof(x);
				static __constant__ __device__ int e = sizeof(a);
				static __managed__ __device__ int f = sizeof(a);
				static int a2 = sizeof(x);
				static int b2 = x;
				// dev-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const int c2 = sizeof(x);
				static constexpr int d2 = sizeof(x);
				static __constant__ int e2 = sizeof(a);
				static __managed__ int f2 = sizeof(a);
				return work(&a) + work(&b) + work(&c) + work(&d) + work(&e) + f + a2 + b2 + c2 + d2 + work(&e2) + f2;
				}

				__device__ __host__ int fun4_caller() {
				return fun4(1);
				// com-note@-1 {{in instantiation of function template specialization 'fun4<int>' requested here}}
				}

				__global__ void fun5(int x, int *y) {
				static __device__ int a = sizeof(a);
				static __device__ int b = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const __device__ int c = sizeof(a);
				static constexpr __device__ int d = sizeof(a);
				static __constant__ __device__ int e = sizeof(a);
				static __managed__ __device__ int f = sizeof(a);
				static int a2 = sizeof(a);
				static int b2 = x;
				// com-error@-1 {{dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables}}
				static const int c2 = sizeof(a);
				static constexpr int d2 = sizeof(a);
				static __constant__ int e2 = sizeof(a);
				static __managed__ int f2 = sizeof(a);
				*y = a + b + c + d + e + f + a2 + b2 + c2 + d2 + e2 + f2;
				}