This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
1/2
DiagnosticSemaKinds.td
-
lib/
-
Driver/ToolChains/
-
ToolChains/
-
HIP.cpp
-
Sema/
1
SemaDeclAttr.cpp
-
test/
-
CodeGenCUDA/
-
amdgpu-hip-implicit-kernarg.cu
-
amdgpu-kernel-arg-pointer-type.cu
-
amdgpu-kernel-attrs.cu
-
kernel-amdgcn.cu
-
kernel-args.cu
-
kernel-dbg-info.cu
-
lambda-reference-var.cu
-
lambda.cu
-
managed-var.cu
-
norecurse.cu
-
static-device-var-no-rdc.cu
-
static-device-var-rdc.cu
-
unnamed-types.cu
-
Driver/
-
cuda-flush-denormals-to-zero.cu
-
hip-default-gpu-arch.hip
-
SemaCUDA/
1
kernel-no-gpu.cu

Differential D100552

[HIP] Diagnose compiling kernel without offload arch
Needs ReviewPublic

Authored by yaxunl on Apr 15 2021, 5:11 AM.

Download Raw Diff

Details

Reviewers

tra
aaron.ballman

Summary

AMDGPU does not have a common processor (GPU arch). A HIP kernel
must be compiled with a specified processor to be able to be launched
on that processor.

However we cannot simply diagnose missing --offload-arch in clang
driver since a valid HIP program can contain no kernel, which can be
compiled without specifying offload arch and executed on machines
without AMDGPU.

Therefore only HIP programs containing kernels should be diagnosed
when compiled without offload arch.

This patch changes clang driver so that when offload arch is not specified
for HIP, no target CPU is specified for clang -cc1. If HIP program contains
kernel, FE will diagnose it as a fatal error so that the diagnostics will be
emitted only once. This way, we allow HIP programs without kernels to
be compiled without offload arch whereas forbid HIP programs with
kernels to be compiled without offload arch.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 15 2021, 5:11 AM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptApr 15 2021, 5:11 AM

Herald added subscribers: kerbowa, tpr, nhaehnle, jvesely. · View Herald Transcript

yaxunl requested review of this revision.Apr 15 2021, 5:11 AM

Harbormaster completed remote builds in B98887: Diff 337721.Apr 15 2021, 6:24 AM

Drive-by comment on the diagnostic wording.

clang/include/clang/Basic/DiagnosticSemaKinds.td
8260

revised error msg by Aaron's comments

Harbormaster completed remote builds in B99008: Diff 337883.Apr 15 2021, 1:54 PM

Enforcing explicit GPU target makes sense.

However, I think that singling out a __global__ as the trigger is not sufficient for the intended purpose.

If we can't generate a usable GPU-side binary, then we should produce an error if we need to generate *anything* during GPU-side compilation.
Using __global__ as a proxy would not catch some use cases and, possibly, will produce false positives in others.

E.g. what if I have a TU which only has a __device__ int var = 42; combined with a host-side code to memcpy to/from it? It would still be a valid, if not very useful code, but it would still suffer from runtime being unable to load it on a GPU unless that variable is in a GPU binary compiled with a valid target.

__device__ functions in TUs compiled with -fgpu-rdc would have a similar problem. They would eventually be linked into a GPU binary which will be useless if it's not compiled for correct GPU. Granted, __device__ functions will eventually need to be called from a kernel, so we will error out on a __global__ *somewhere*, but it will miss the problem when such TU does not get to the linking stage (e.g. maybe the user wants to link them at runtime).

clang/include/clang/Basic/DiagnosticSemaKinds.td
8260	How about compiling a file with `__device__` functions with `-fgpu-rdc`? If a kernel with no-arch is an error, then this should be an error, too.
clang/lib/Sema/SemaDeclAttr.cpp
4431	Will this fire if we have an uninstantiated kernel template?
clang/test/SemaCUDA/kernel-no-gpu.cu
7	We'll need few more test cases. E.g. these should be fine. template <typename T> __global__ void kernel(T arg ) {}; __global__ void kernel(T arg );

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticSemaKinds.td

3 lines

lib/

Driver/

ToolChains/

HIP.cpp

11 lines

Sema/

SemaDeclAttr.cpp

5 lines

test/

CodeGenCUDA/

amdgpu-hip-implicit-kernarg.cu

2 lines

amdgpu-kernel-arg-pointer-type.cu

12 lines

amdgpu-kernel-attrs.cu

4 lines

kernel-amdgcn.cu

3 lines

kernel-args.cu

3 lines

kernel-dbg-info.cu

9 lines

lambda-reference-var.cu

2 lines

lambda.cu

2 lines

managed-var.cu

4 lines

norecurse.cu

3 lines

static-device-var-no-rdc.cu

2 lines

static-device-var-rdc.cu

4 lines

unnamed-types.cu

2 lines

Driver/

cuda-flush-denormals-to-zero.cu

7 lines

hip-default-gpu-arch.hip

3 lines

SemaCUDA/

kernel-no-gpu.cu

11 lines

Diff 337883

clang/include/clang/Basic/DiagnosticSemaKinds.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,250 Lines • ▼ Show 20 Lines

def err_capture_bad_target_this_ptr : Error< def err_capture_bad_target_this_ptr : Error<

"capture host side class data member by this pointer in device or host device lambda function">; "capture host side class data member by this pointer in device or host device lambda function">;

def warn_kern_is_method : Extension< def warn_kern_is_method : Extension<

"kernel function %0 is a member function; this may not be accepted by nvcc">, "kernel function %0 is a member function; this may not be accepted by nvcc">,

InGroup<CudaCompat>; InGroup<CudaCompat>;

def warn_kern_is_inline : Warning< def warn_kern_is_inline : Warning<

"ignored 'inline' attribute on kernel function %0">, "ignored 'inline' attribute on kernel function %0">,

InGroup<CudaCompat>; InGroup<CudaCompat>;

def err_hip_kern_without_gpu : Error<

"compiling a HIP kernel without specifying an offload arch is not allowed">,

aaron.ballmanUnsubmitted

Done

def err_hip_kern_without_gpu : Error<

- "compile HIP kernel without specifying offload arch is not allowed">,

+ "compiling a HIP kernel without specifying an offload arch is not allowed">,

DefaultFatal;

aaron.ballman:

traUnsubmitted

Not Done

How about compiling a file with __device__ functions with -fgpu-rdc? If a kernel with no-arch is an error, then this should be an error, too.

tra: How about compiling a file with `__device__` functions with `-fgpu-rdc`? If a kernel with no…

DefaultFatal;

def err_variadic_device_fn : Error< def err_variadic_device_fn : Error<

"CUDA device code does not support variadic functions">; "CUDA device code does not support variadic functions">;

def err_va_arg_in_device : Error< def err_va_arg_in_device : Error<

"CUDA device code does not support va_arg">; "CUDA device code does not support va_arg">;

def err_alias_not_supported_on_nvptx : Error<"CUDA does not support aliases">; def err_alias_not_supported_on_nvptx : Error<"CUDA does not support aliases">;

def err_cuda_unattributed_constexpr_cannot_overload_device : Error< def err_cuda_unattributed_constexpr_cannot_overload_device : Error<

"constexpr function %0 without __host__ or __device__ attributes cannot " "constexpr function %0 without __host__ or __device__ attributes cannot "

"overload __device__ function with same signature. Add a __host__ " "overload __device__ function with same signature. Add a __host__ "

▲ Show 20 Lines • Show All 2,928 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIP.cpp

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	HIPToolChain::TranslateArgs(const llvm::opt::DerivedArgList &Args,

for (Arg *A : Args) {		for (Arg *A : Args) {
if (!shouldSkipArgument(A))		if (!shouldSkipArgument(A))
DAL->append(A);		DAL->append(A);
}		}

if (!BoundArch.empty()) {		if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_mcpu_EQ);		DAL->eraseArg(options::OPT_mcpu_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_mcpu_EQ), BoundArch);		if (BoundArch != "unknown") {
		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_mcpu_EQ),
		BoundArch);
checkTargetID(*DAL);		checkTargetID(*DAL);
}		}
		}

return DAL;		return DAL;
}		}

Tool *HIPToolChain::buildLinker() const {		Tool *HIPToolChain::buildLinker() const {
assert(getTriple().getArch() == llvm::Triple::amdgcn);		assert(getTriple().getArch() == llvm::Triple::amdgcn);
return new tools::AMDGCN::Linker(*this);		return new tools::AMDGCN::Linker(*this);
}		}
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
VersionTuple HIPToolChain::computeMSVCVersion(const Driver *D,		VersionTuple HIPToolChain::computeMSVCVersion(const Driver *D,
const ArgList &Args) const {		const ArgList &Args) const {
return HostTC.computeMSVCVersion(D, Args);		return HostTC.computeMSVCVersion(D, Args);
}		}

llvm::SmallVector<std::string, 12>		llvm::SmallVector<std::string, 12>
HIPToolChain::getHIPDeviceLibs(const llvm::opt::ArgList &DriverArgs) const {		HIPToolChain::getHIPDeviceLibs(const llvm::opt::ArgList &DriverArgs) const {
llvm::SmallVector<std::string, 12> BCLibs;		llvm::SmallVector<std::string, 12> BCLibs;
if (DriverArgs.hasArg(options::OPT_nogpulib))		StringRef GpuArch = getGPUArch(DriverArgs);
		if (DriverArgs.hasArg(options::OPT_nogpulib) \|\| GpuArch.empty())
return {};		return {};
ArgStringList LibraryPaths;		ArgStringList LibraryPaths;

// Find in --hip-device-lib-path and HIP_LIBRARY_PATH.		// Find in --hip-device-lib-path and HIP_LIBRARY_PATH.
for (auto Path : RocmInstallation.getRocmDeviceLibPathArg())		for (auto Path : RocmInstallation.getRocmDeviceLibPathArg())
LibraryPaths.push_back(DriverArgs.MakeArgString(Path));		LibraryPaths.push_back(DriverArgs.MakeArgString(Path));

addDirectoryList(DriverArgs, LibraryPaths, "", "HIP_DEVICE_LIB_PATH");		addDirectoryList(DriverArgs, LibraryPaths, "", "HIP_DEVICE_LIB_PATH");
Show All 14 Lines	llvm::for_each(BCLibArgs, [&](StringRef BCName) {
}		}
getDriver().Diag(diag::err_drv_no_such_file) << BCName;		getDriver().Diag(diag::err_drv_no_such_file) << BCName;
});		});
} else {		} else {
if (!RocmInstallation.hasDeviceLibrary()) {		if (!RocmInstallation.hasDeviceLibrary()) {
getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0;		getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0;
return {};		return {};
}		}
StringRef GpuArch = getGPUArch(DriverArgs);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
(void)GpuArch;		(void)GpuArch;
auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);
const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);		const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);

std::string LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);		std::string LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);
if (LibDeviceFile.empty()) {		if (LibDeviceFile.empty()) {
getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 1 << GpuArch;		getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 1 << GpuArch;
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,422 Lines • ▼ Show 20 Lines	static void handleGlobalAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
if (const auto *Method = dyn_cast<CXXMethodDecl>(FD)) {		if (const auto *Method = dyn_cast<CXXMethodDecl>(FD)) {
if (Method->isInstance()) {		if (Method->isInstance()) {
S.Diag(Method->getBeginLoc(), diag::err_kern_is_nonstatic_method)		S.Diag(Method->getBeginLoc(), diag::err_kern_is_nonstatic_method)
<< Method;		<< Method;
return;		return;
}		}
S.Diag(Method->getBeginLoc(), diag::warn_kern_is_method) << Method;		S.Diag(Method->getBeginLoc(), diag::warn_kern_is_method) << Method;
}		}
		if (S.getASTContext().getTargetInfo().getTargetOpts().CPU.empty() &&
		traUnsubmitted Not Done Reply Inline Actions Will this fire if we have an uninstantiated kernel template? tra: Will this fire if we have an uninstantiated kernel template?
		S.getLangOpts().HIP && S.getLangOpts().CUDAIsDevice) {
		S.Diag(FD->getBeginLoc(), diag::err_hip_kern_without_gpu);
		return;
		}
// Only warn for "inline" when compiling for host, to cut down on noise.		// Only warn for "inline" when compiling for host, to cut down on noise.
if (FD->isInlineSpecified() && !S.getLangOpts().CUDAIsDevice)		if (FD->isInlineSpecified() && !S.getLangOpts().CUDAIsDevice)
S.Diag(FD->getBeginLoc(), diag::warn_kern_is_inline) << FD;		S.Diag(FD->getBeginLoc(), diag::warn_kern_is_inline) << FD;

D->addAttr(::new (S.Context) CUDAGlobalAttr(S.Context, AL));		D->addAttr(::new (S.Context) CUDAGlobalAttr(S.Context, AL));
// In host compilation the kernel is emitted as a stub function, which is		// In host compilation the kernel is emitted as a stub function, which is
// a helper function for launching the kernel. The instructions in the helper		// a helper function for launching the kernel. The instructions in the helper
// function has nothing to do with the source code of the kernel. Do not emit		// function has nothing to do with the source code of the kernel. Do not emit
▲ Show 20 Lines • Show All 4,190 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip -o - %s \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -fcuda-is-device -emit-llvm -x hip -o - %s \| FileCheck %s
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	__global__ void hip_kernel_temp() {			__global__ void hip_kernel_temp() {
	}			}

	// CHECK: attributes {{.}} = {{.}} "amdgpu-implicitarg-num-bytes"="56"			// CHECK: attributes {{.}} = {{.}} "amdgpu-implicitarg-num-bytes"="56"

clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -o - \| FileCheck --check-prefixes=COMMON,CHECK %s			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 \
	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm -x hip %s -disable-O0-optnone -o - \| opt -S -O2 \| FileCheck %s --check-prefixes=COMMON,OPT			// RUN: -fcuda-is-device -emit-llvm -x hip %s -o - \
	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -x hip %s -o - \| FileCheck -check-prefix=HOST %s			// RUN: \| FileCheck --check-prefixes=COMMON,CHECK %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 \
				// RUN: -fcuda-is-device -emit-llvm -x hip %s \
				// RUN: -disable-O0-optnone -o - \| opt -S -O2 \
				// RUN: \| FileCheck %s --check-prefixes=COMMON,OPT
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
				// RUN: -x hip %s -o - \| FileCheck -check-prefix=HOST %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Coerced struct from `struct S` without all generic pointers lowered into			// Coerced struct from `struct S` without all generic pointers lowered into
	// global ones.			// global ones.

	// On the host-side compilation, generic pointer won't be coerced.			// On the host-side compilation, generic pointer won't be coerced.
	// HOST-NOT: %struct.S.coerce			// HOST-NOT: %struct.S.coerce
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 \
	// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \			// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \
	// RUN: \| FileCheck -check-prefixes=CHECK,DEFAULT %s			// RUN: \| FileCheck -check-prefixes=CHECK,DEFAULT %s
	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa --gpu-max-threads-per-block=1024 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa --gpu-max-threads-per-block=1024 \
	// RUN: -fcuda-is-device -emit-llvm -o - -x hip %s \			// RUN: -target-cpu gfx906 -fcuda-is-device -emit-llvm -o - -x hip %s \
	// RUN: \| FileCheck -check-prefixes=CHECK,MAX1024 %s			// RUN: \| FileCheck -check-prefixes=CHECK,MAX1024 %s
	// RUN: %clang_cc1 -triple nvptx \			// RUN: %clang_cc1 -triple nvptx \
	// RUN: -fcuda-is-device -emit-llvm -o - %s \| FileCheck %s \			// RUN: -fcuda-is-device -emit-llvm -o - %s \| FileCheck %s \
	// RUN: -check-prefix=NAMD			// RUN: -check-prefix=NAMD
	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \			// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm \
	// RUN: -verify -o - -x hip %s \| FileCheck -check-prefix=NAMD %s			// RUN: -verify -o - -x hip %s \| FileCheck -check-prefix=NAMD %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"
	Show All 34 Lines

clang/test/CodeGenCUDA/kernel-amdgcn.cu

	// RUN: %clang_cc1 -triple amdgcn -fcuda-is-device -emit-llvm -x hip %s -o - \| FileCheck %s			// RUN: %clang_cc1 -triple amdgcn -target-cpu gfx906 -fcuda-is-device \
				// RUN: -emit-llvm -x hip %s -o - \| FileCheck %s
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// CHECK: define{{.*}} amdgpu_kernel void @_ZN1A6kernelEv			// CHECK: define{{.*}} amdgpu_kernel void @_ZN1A6kernelEv
	class A {			class A {
	public:			public:
	static __global__ void kernel(){}			static __global__ void kernel(){}
	};			};

	Show All 33 Lines

clang/test/CodeGenCUDA/kernel-args.cu

	// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: %clang_cc1 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=AMDGCN %s			// RUN: -target-cpu gfx906 -emit-llvm %s -o - \
				// RUN: \| FileCheck -check-prefix=AMDGCN %s
	// RUN: %clang_cc1 -x cuda -triple nvptx64-nvidia-cuda- -fcuda-is-device \			// RUN: %clang_cc1 -x cuda -triple nvptx64-nvidia-cuda- -fcuda-is-device \
	// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=NVPTX %s			// RUN: -emit-llvm %s -o - \| FileCheck -check-prefix=NVPTX %s
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	struct A {			struct A {
	int a[32];			int a[32];
	float *p;			float *p;
	};			};
	Show All 30 Lines

clang/test/CodeGenCUDA/kernel-dbg-info.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O0 \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O0 \
	// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \			// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
	// RUN: -o - -x hip \| FileCheck -check-prefixes=CHECK,O0 %s			// RUN: -o - -x hip \| FileCheck -check-prefixes=CHECK,O0 %s
	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O0 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O0 \
	// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \			// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
	// RUN: -o - -x hip -fcuda-is-device \| FileCheck -check-prefix=DEV %s			// RUN: -target-cpu gfx906 -o - -x hip -fcuda-is-device \
				// RUN: \| FileCheck -check-prefix=DEV %s

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O0 \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O0 \
	// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \			// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
	// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \			// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \
	// RUN: \| FileCheck -check-prefixes=CHECK,O0 %s			// RUN: \| FileCheck -check-prefixes=CHECK,O0 %s
	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O0 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O0 \
	// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \			// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
	// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \			// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \
	// RUN: -fcuda-is-device \| FileCheck -check-prefix=DEV %s			// RUN: -target-cpu gfx906 -fcuda-is-device \
				// RUN: \| FileCheck -check-prefix=DEV %s

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O3 \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -O3 \
	// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \			// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
	// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \| FileCheck %s			// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \| FileCheck %s
	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O3 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s -O3 \
	// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \			// RUN: -fcuda-include-gpubinary %t -debug-info-kind=limited \
	// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \			// RUN: -o - -x hip -debugger-tuning=gdb -dwarf-version=4 \
	// RUN: -fcuda-is-device \| FileCheck -check-prefix=DEV %s			// RUN: -target-cpu gfx906 -fcuda-is-device \
				// RUN: \| FileCheck -check-prefix=DEV %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	extern "C" __global__ void ckernel(int *a) {			extern "C" __global__ void ckernel(int *a) {
	*a = 1;			*a = 1;
	}			}

	// Kernel symbol for launching kernel.			// Kernel symbol for launching kernel.
	Show All 20 Lines

clang/test/CodeGenCUDA/lambda-reference-var.cu

	// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \			// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \
	// RUN: -triple x86_64-linux-gnu \			// RUN: -triple x86_64-linux-gnu \
	// RUN: \| FileCheck -check-prefix=HOST %s			// RUN: \| FileCheck -check-prefix=HOST %s
	// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \			// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \
	// RUN: -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: \| FileCheck -check-prefix=DEV %s			// RUN: -target-cpu gfx906 \| FileCheck -check-prefix=DEV %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// HOST: %[[T1:.]] = type <{ i32, i32, [4 x i8] }>			// HOST: %[[T1:.]] = type <{ i32, i32, [4 x i8] }>
	// HOST: %[[T2:.]] = type { i32, i32** }			// HOST: %[[T2:.]] = type { i32, i32** }
	// HOST: %[[T3:.]] = type <{ i32, i32, [4 x i8] }>			// HOST: %[[T3:.]] = type <{ i32, i32, [4 x i8] }>
	// DEV: %[[T1:.]] = type { i32 }			// DEV: %[[T1:.]] = type { i32 }
	// DEV: %[[T2:.]] = type { i32* }			// DEV: %[[T2:.]] = type { i32* }
	▲ Show 20 Lines • Show All 121 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/lambda.cu

	// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \			// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \
	// RUN: -triple x86_64-linux-gnu \			// RUN: -triple x86_64-linux-gnu \
	// RUN: \| FileCheck -check-prefix=HOST %s			// RUN: \| FileCheck -check-prefix=HOST %s
	// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \			// RUN: %clang_cc1 -x hip -emit-llvm -std=c++11 %s -o - \
	// RUN: -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: \| FileCheck -check-prefix=DEV %s			// RUN: -target-cpu gfx906 \| FileCheck -check-prefix=DEV %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Device side kernel name.			// Device side kernel name.
	// HOST: @[[KERN_CAPTURE:[0-9]+]] = {{.*}} c"_Z1gIZ12test_capturevEUlvE_EvT_\00"			// HOST: @[[KERN_CAPTURE:[0-9]+]] = {{.*}} c"_Z1gIZ12test_capturevEUlvE_EvT_\00"
	// HOST: @[[KERN_RESOLVE:[0-9]+]] = {{.*}} c"_Z1gIZ12test_resolvevEUlvE_EvT_\00"			// HOST: @[[KERN_RESOLVE:[0-9]+]] = {{.*}} c"_Z1gIZ12test_resolvevEUlvE_EvT_\00"

	// Check functions emitted for test_capture in host compilation.			// Check functions emitted for test_capture in host compilation.
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/managed-var.cu

	// REQUIRES: x86-registered-target, amdgpu-registered-target			// REQUIRES: x86-registered-target, amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -target-cpu gfx906 -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=COMMON,DEV,NORDC-D %s			// RUN: -check-prefixes=COMMON,DEV,NORDC-D %s

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -fgpu-rdc -cuid=abc -o - -x hip %s > %t.dev			// RUN: -target-cpu gfx906 -emit-llvm -fgpu-rdc -cuid=abc -o - -x hip %s > %t.dev
	// RUN: cat %t.dev \| FileCheck -check-prefixes=COMMON,DEV,RDC-D %s			// RUN: cat %t.dev \| FileCheck -check-prefixes=COMMON,DEV,RDC-D %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=COMMON,HOST,NORDC %s			// RUN: -check-prefixes=COMMON,HOST,NORDC %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -fgpu-rdc -cuid=abc -o - -x hip %s > %t.host			// RUN: -emit-llvm -fgpu-rdc -cuid=abc -o - -x hip %s > %t.host
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/norecurse.cu

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target

	// RUN: %clang_cc1 -triple nvptx-nvidia-cuda -fcuda-is-device \			// RUN: %clang_cc1 -triple nvptx-nvidia-cuda -fcuda-is-device \
	// RUN: -emit-llvm -disable-llvm-passes -o - %s \| FileCheck %s			// RUN: -emit-llvm -disable-llvm-passes -o - %s \| FileCheck %s

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: -emit-llvm -disable-llvm-passes -o - -x hip %s \| FileCheck %s			// RUN: -target-cpu gfx906 -emit-llvm -disable-llvm-passes \
				// RUN: -o - -x hip %s \| FileCheck %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	__global__ void kernel1(int a) {}			__global__ void kernel1(int a) {}
	// CHECK: define{{.}}@_Z7kernel1i{{.}}#[[ATTR:[0-9]*]]			// CHECK: define{{.}}@_Z7kernel1i{{.}}#[[ATTR:[0-9]*]]

	// CHECK: attributes #[[ATTR]] = {{.*}}norecurse			// CHECK: attributes #[[ATTR]] = {{.*}}norecurse

clang/test/CodeGenCUDA/static-device-var-no-rdc.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -target-cpu gfx906 -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=DEV %s			// RUN: -check-prefixes=DEV %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -std=c++11 \
	// RUN: -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=HOST %s			// RUN: -check-prefixes=HOST %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/static-device-var-rdc.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device \
	// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -target-cpu gfx906 -fgpu-rdc -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=DEV,INT-DEV %s			// RUN: -check-prefixes=DEV,INT-DEV %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux \			// RUN: %clang_cc1 -triple x86_64-gnu-linux \
	// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \| FileCheck \			// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s \| FileCheck \
	// RUN: -check-prefixes=HOST,INT-HOST %s			// RUN: -check-prefixes=HOST,INT-HOST %s

	// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \			// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -cuid=abc \
	// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s > %t.dev			// RUN: -target-cpu gfx906 -fgpu-rdc -emit-llvm -o - -x hip %s > %t.dev
	// RUN: cat %t.dev \| FileCheck -check-prefixes=DEV,EXT-DEV %s			// RUN: cat %t.dev \| FileCheck -check-prefixes=DEV,EXT-DEV %s

	// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \			// RUN: %clang_cc1 -triple x86_64-gnu-linux -cuid=abc \
	// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s > %t.host			// RUN: -fgpu-rdc -emit-llvm -o - -x hip %s > %t.host
	// RUN: cat %t.host \| FileCheck -check-prefixes=HOST,EXT-HOST %s			// RUN: cat %t.host \| FileCheck -check-prefixes=HOST,EXT-HOST %s

	// Check host and device compilations use the same postfixes for static			// Check host and device compilations use the same postfixes for static
	// variable names.			// variable names.
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/unnamed-types.cu

	// RUN: %clang_cc1 -std=c++11 -x hip -triple x86_64-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - \| FileCheck %s --check-prefix=HOST			// RUN: %clang_cc1 -std=c++11 -x hip -triple x86_64-linux-gnu -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - \| FileCheck %s --check-prefix=HOST
	// RUN: %clang_cc1 -std=c++11 -x hip -triple x86_64-pc-windows-msvc -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - \| FileCheck %s --check-prefix=MSVC			// RUN: %clang_cc1 -std=c++11 -x hip -triple x86_64-pc-windows-msvc -aux-triple amdgcn-amd-amdhsa -emit-llvm %s -o - \| FileCheck %s --check-prefix=MSVC
	// RUN: %clang_cc1 -std=c++11 -x hip -triple amdgcn-amd-amdhsa -fcuda-is-device -emit-llvm %s -o - \| FileCheck %s --check-prefix=DEVICE			// RUN: %clang_cc1 -std=c++11 -x hip -triple amdgcn-amd-amdhsa -target-cpu gfx906 -fcuda-is-device -emit-llvm %s -o - \| FileCheck %s --check-prefix=DEVICE

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// HOST: @0 = private unnamed_addr constant [43 x i8] c"_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_\00", align 1			// HOST: @0 = private unnamed_addr constant [43 x i8] c"_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_\00", align 1
	// HOST: @1 = private unnamed_addr constant [60 x i8] c"_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_\00", align 1			// HOST: @1 = private unnamed_addr constant [60 x i8] c"_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_\00", align 1
	// Check that, on MSVC, the same device kernel mangling name is generated.			// Check that, on MSVC, the same device kernel mangling name is generated.
	// MSVC: @0 = private unnamed_addr constant [43 x i8] c"_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_\00", align 1			// MSVC: @0 = private unnamed_addr constant [43 x i8] c"_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_\00", align 1
	// MSVC: @1 = private unnamed_addr constant [60 x i8] c"_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_\00", align 1			// MSVC: @1 = private unnamed_addr constant [60 x i8] c"_Z2k1IZ2f1PfEUlfE_Z2f1S0_EUlffE_Z2f1S0_EUlfE0_EvS0_T_T0_T1_\00", align 1
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

clang/test/Driver/cuda-flush-denormals-to-zero.cu

	Show All 20 Lines

	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -fgpu-flush-denormals-to-zero -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZ %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -fgpu-flush-denormals-to-zero -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZ %s
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -fno-gpu-flush-denormals-to-zero -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -fno-gpu-flush-denormals-to-zero -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s

	// Test the default changing with no argument based on the subtarget in HIP mode			// Test the default changing with no argument based on the subtarget in HIP mode
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx803 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZ %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx803 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZ %s
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s

	// Test no subtarget, which should get the denormal setting of the default gfx803			// Test no subtarget, which should get the denormal setting of the default
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZ %s			// CPU of AMDGPU target, which is 'none'.
				// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c \
				// RUN: -march=haswell -nocudainc -nogpulib %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=NOFTZ %s

	// Test multiple offload archs with different defaults.			// Test multiple offload archs with different defaults.
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=MIXED-DEFAULT-MODE %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=MIXED-DEFAULT-MODE %s
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell -fgpu-flush-denormals-to-zero --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZX2 %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell -fgpu-flush-denormals-to-zero --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=FTZX2 %s
	// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell -fno-gpu-flush-denormals-to-zero --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s			// RUN: %clang -x hip -no-canonical-prefixes -### -target x86_64-linux-gnu -c -march=haswell -fno-gpu-flush-denormals-to-zero --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 -nocudainc -nogpulib %s 2>&1 \| FileCheck -check-prefix=NOFTZ %s


	// CPUFTZ-NOT: -fdenormal-fp-math			// CPUFTZ-NOT: -fdenormal-fp-math
	Show All 17 Lines

clang/test/Driver/hip-default-gpu-arch.hip

	// REQUIRES: clang-driver			// REQUIRES: clang-driver
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang -### -c %s 2>&1 \| FileCheck %s			// RUN: %clang -### -c %s 2>&1 \| FileCheck %s

	// CHECK: {{.}}clang{{.}}"-target-cpu" "gfx803"			// CHECK-NOT: {{.}}clang{{.}}"-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu"
				// CHECK: {{.}}clang{{.}}"-triple" "amdgcn-amd-amdhsa"

clang/test/SemaCUDA/kernel-no-gpu.cu

This file was added.

				// RUN: %clang_cc1 -fcuda-is-device -verify=hip -x hip %s
				// RUN: %clang_cc1 -fcuda-is-device -verify=cuda %s
				// cuda-no-diagnostics

				#include "Inputs/cuda.h"

				__global__ void kern1() {}
				traUnsubmitted Not Done Reply Inline Actions We'll need few more test cases. E.g. these should be fine. template <typename T> __global__ void kernel(T arg ) {}; __global__ void kernel(T arg ); tra: We'll need few more test cases. E.g. these should be fine. ``` template <typename T>…
				// hip-error@-1 {{compiling a HIP kernel without specifying an offload arch is not allowed}}

				// Make sure the error is emitted once.
				__global__ void kern2() {}

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Diagnose compiling kernel without offload archNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 337883

clang/include/clang/Basic/DiagnosticSemaKinds.td

clang/lib/Driver/ToolChains/HIP.cpp

clang/lib/Sema/SemaDeclAttr.cpp

clang/test/CodeGenCUDA/amdgpu-hip-implicit-kernarg.cu

clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu

clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu

clang/test/CodeGenCUDA/kernel-amdgcn.cu

clang/test/CodeGenCUDA/kernel-args.cu

clang/test/CodeGenCUDA/kernel-dbg-info.cu

clang/test/CodeGenCUDA/lambda-reference-var.cu

clang/test/CodeGenCUDA/lambda.cu

clang/test/CodeGenCUDA/managed-var.cu

clang/test/CodeGenCUDA/norecurse.cu

clang/test/CodeGenCUDA/static-device-var-no-rdc.cu

clang/test/CodeGenCUDA/static-device-var-rdc.cu

clang/test/CodeGenCUDA/unnamed-types.cu

clang/test/Driver/cuda-flush-denormals-to-zero.cu

clang/test/Driver/hip-default-gpu-arch.hip

clang/test/SemaCUDA/kernel-no-gpu.cu

[HIP] Diagnose compiling kernel without offload arch
Needs ReviewPublic