This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
2/4
Attr.td
-
lib/CodeGen/
-
CodeGen/
-
CodeGenModule.cpp
1/3
TargetInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
nvptx_attributes.c

Differential D140226

[NVPTX] Introduce attribute to mark kernels without a language mode
ClosedPublic

Authored by jhuber6 on Dec 16 2022, 9:59 AM.

Download Raw Diff

Details

Reviewers

tra
JonChesterfield
aaron.ballman
erichkeane

Commits

rG29a5c3c8fe30: [NVPTX] Introduce attribute to mark kernels without a language mode

Summary

We may want to be able to mark certain regions as kernels even without
being in an accepted CUDA or OpenCL language mode. This patch introduces
a new attribute limited to nvptx targets called nvptx_kernel which
will perform the same metadata action as the existing CUDA ones. This
closely mimics the behaviour of the amdgpu_kernel attribute. This
allows for making executable NVPTX device images without using an
existing offloading language model.

I was unsure how to do this, I could potentially re-use all the CUDA
attributes and just replace the CUDA language requirement with an
NVPTX architecture requirement. Also I don't know if I should add more
than just this attribute.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.Dec 16 2022, 9:59 AM

Herald added a reviewer: aaron.ballman. · View Herald TranscriptDec 16 2022, 9:59 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: kosarev, mattd, gchakrabarti and 4 others. · View Herald Transcript

jhuber6 requested review of this revision.Dec 16 2022, 9:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 16 2022, 9:59 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

tra added inline comments.Dec 16 2022, 10:26 AM

clang/lib/CodeGen/TargetInfo.cpp
7359	How does AMDGPU track kernels? It may be a good opportunity to stop using metadata for this if we can use a better suited mechanism. E.g. a function attribute or a calling convention.
clang/lib/Sema/SemaDeclAttr.cpp
4872 ↗	(On Diff #483572)	I'm tempted to `addAttr(CUDAGlobal)` here, effectively making `nvptx_kernel` a target-specific alias for it, so we're guaranteed that they both are handled exactly the same everywhere. On the other hand, it all may be moot -- without CUDA compilation mode, `CUDAGlobal` handling will be different in this compilation mode. Can CUDAGlobal itself be allowed to be used as a target-specific attribute for NVPTX in C++ mode? I think, if possible, we should ideally have only one attribute doing the job, even if it may have somewhat different use cases in CUDA vs C++ compilation modes.

jhuber6 added inline comments.Dec 16 2022, 10:31 AM

clang/lib/CodeGen/TargetInfo.cpp
7359	AMDGPU uses a calling convention, which is probably a better option. I don't know how this still gets reduced in the back-end though.
clang/lib/Sema/SemaDeclAttr.cpp
4872 ↗	(On Diff #483572)	Yeah that's what I was thinking. Right now we only parse and check all the CUDA attributes in the CUDA language mode. I could change it to allow them whenever we're compiling for the `NVPTX` architecture instead. I don't think for the vast majority it would have any significant effect.

tra added inline comments.Dec 16 2022, 10:36 AM

clang/lib/CodeGen/TargetInfo.cpp
7359	OK. Switching from metadata to a new calling convention would be nice, but it is likely a bit more complicated and can be handled separately if/when we decide to do it. It's not needed for your purposes.
clang/lib/Sema/SemaDeclAttr.cpp
4872 ↗	(On Diff #483572)	Let's give it a try.

Harbormaster completed remote builds in B203643: Diff 483572.Dec 16 2022, 10:48 AM

Changing to use the same CUDA global attributes. This requires a few extra checks for whether or not we were in CUDA mode since previously it just assume any time we saw one of these globals we were in that mode. I added a different spelling as well just for consistency.

LGTM.

General question -- what happens now that the global and launch_bounds are target-specific as opposed to language-specific, if they happen to be used in a C++ compilation targeting x86? I assume they will still be ignored, right?

clang/include/clang/Basic/Attr.td
1198	Nice. This reminded me that we have a project compiling CUDA, but targeting SPIR-V instead of NVPTX. It looks like this will likely break them. The project is out-of-tree, but I'd still need to figure out how to keep them working. I guess it would be easy enough to expand TargetNVPTX to TargetNVPTXOrSpirV. I'm mostly concerned about logistics of making it happen without disruption.

jhuber6 added inline comments.Dec 16 2022, 12:57 PM

clang/include/clang/Basic/Attr.td
1198	This might've broken more stuff after looking into it, I forgot that `AMDGPU` still uses the same CUDA attributes, and the host portion of CUDA also checks these. It would be nice if there was a way to say "CUDA" or "NVPTX", wondering if that's possible in the tablegen here.

Harbormaster completed remote builds in B203696: Diff 483640.Dec 16 2022, 2:10 PM

I wonder whether we could not factorize some code/attribute/logic with AMDGPU or SYCL.
Is the use case to have for example CUDA+HIP+SYCL in the same TU and thus there is a need for different attributes

In D140226#4003781, @keryell wrote:

I wonder whether we could not factorize some code/attribute/logic with AMDGPU or SYCL.
Is the use case to have for example CUDA+HIP+SYCL in the same TU and thus there is a need for different attributes

It would probably be good to have the high level concept of a "kernel" be factored out since this is common between all the offloading languages. The actual implementation it gets lowered to would still need to be distinct since this usually gets turned into some magic bits stashed in the executable for the runtime to read. The use-case for this patch is simply to allow people to compile pure C/C++ code to the NVPTX architecture, but still be able to mark the necessary metadata for kernels and globals.

I've recently thought if we could just apply the same logic used for shared objects with GPU images, that is globals without hidden visibility would be considered __global__ and ones with hidden visibility would be considered __device__ in CUDA terms. I think the only thing preventing us from thinking of a kernel call as a dynamic symbol load is probably the launch parameters. But this is purely theoretical, I don't think we need to worry about moving away from offloading languages or anything.

There are already SYCL specific attributes: https://reviews.llvm.org/D60455

In D140226#4003788, @tschuett wrote:

There are already SYCL specific attributes: https://reviews.llvm.org/D60455

We could potentially merge these all into some generic attribute since they all do the same thing on a conceptual level. The unique thing about the existing amdgpu_kernel and corresponding nvptx_kernel is that they don't rely on the language options like SYCL or CUDA. Though, semantically those are definitely involved because the kernel itself is only meaningful to whatever runtime is going to load it (e.g. CUDA or HSA) but we can probably consider that separately to the compilation itself and just think of these as calling conventions.

In D140226#4003794, @jhuber6 wrote:

In D140226#4003788, @tschuett wrote:

There are already SYCL specific attributes: https://reviews.llvm.org/D60455

We could potentially merge these all into some generic attribute since they all do the same thing on a conceptual level. The unique thing about the existing amdgpu_kernel and corresponding nvptx_kernel is that they don't rely on the language options like SYCL or CUDA. Though, semantically those are definitely involved because the kernel itself is only meaningful to whatever runtime is going to load it (e.g. CUDA or HSA) but we can probably consider that separately to the compilation itself and just think of these as calling conventions.

But then you are maybe mixing two concepts. kernel is source code or AST feature. nvptx or AMDGPU are command line flags.

CUDA, Sycl, nvtx, and AMDGPU are modes or calling conventions?

In D140226#4003826, @tschuett wrote:

But then you are maybe mixing two concepts. kernel is source code or AST feature. nvptx or AMDGPU are command line flags.

CUDA, Sycl, nvtx, and AMDGPU are modes or calling conventions?

The way I understand it, the architecture determines the actual ISA for the code and the kernel metadata operates like a calling convention for whatever "OS" will be executing it. For example, for the triple amdgcn-amd-amdhsa we generate code for the amdgcn architecture and emit kernels such that the hsa runtime can call them. Similarly, for nvptx64-nvidia-cuda we emit code for nvptx64 and our kernels use the calling convention such that the cuda runtime can call them. I think the main question of this patch is if we can separate the cuda runtime from the CUDA language. That is, we don't need to be using the CUDA language to emit functions that the cuda runtime can call. So this is more or less thinking of these kernel calls as a calling convention for a runtime or operating system rather than as a language feature.

Calling convention is the right model here. Kernels are functions with a different calling convention to the 'normal' functions in a very literal sense. The calling convention modelling in clang is different to attribute handling and changing nvptx to it is probably invasive, though it seems to me it could be done incrementally.

I wouldn't suggest adding a nvptx_kernel calling convention to clang though, rather we could repurpose the amdgpu one to be gpu_kernel. Possibly spelled nvptx_kernel for the user but represented within clang as gpu_kernel.

Related, I think there's a spirv or opencl kernel representation in llvm for amdgpu, I would be interested in collapsing those and the openmp or hip annotation to a single thing if possible.

That's all medium term cleanup ideas, current patch looks good to me.

shangwuyao added a subscriber: shangwuyao.Dec 19 2022, 8:16 AM

shangwuyao added inline comments.

clang/include/clang/Basic/Attr.td
1198	What's the plan here for keeping the SPIR-V and AMDGPU working? Would it work if we simply get rid of the `TargetSpecificAttr<TargetNVPTX>`?

jhuber6 added inline comments.Dec 19 2022, 8:54 AM

clang/include/clang/Basic/Attr.td
1198	Yeah, it would I'll need to update the patch. The best solution would be if there were a way to say "TargetNVPTX or LangOpts.CUDA". Not sure if that's possible in Tablegen. The previous diff I had worked fine, but we should definitely try to avoid rework.

RSenApps added a subscriber: RSenApps.Dec 19 2022, 1:09 PM

Precommit CI found failures that look relevant to the patch.

We may want to be able to mark certain regions as kernels even without being in an accepted CUDA or OpenCL language mode.

Can you explain this a bit more? Under what circumstances would you want to do this?

In D140226#4052105, @aaron.ballman wrote:

Precommit CI found failures that look relevant to the patch.

We may want to be able to mark certain regions as kernels even without being in an accepted CUDA or OpenCL language mode.

Can you explain this a bit more? Under what circumstances would you want to do this?

Yeah, I need to work on this some more. A previous version worked fine but it duplicated some logic, I'm not sure if there's a good way to re-use the existing kernel logic without breaking some of the assumptions. The desire was to be able to emit a kernel that can be called externally via cross-compilation. E.g. clang foo.c --target=nvptx64-nvidia-cuda. The intended use-case was for testing experimental libc implementations using integration tests.

@tra would it be possible to go to the earlier version that simply duplicated a slight amount of logic to introduce the new and separate attribute nvptx_kernel? Overloading CUDA's device attribute is problematic because it's used and checked in several different contexts. I'd like to be able to simplify this code https://github.com/llvm/llvm-project/blob/main/libc/startup/gpu/nvptx/start.cpp.

Updating to simply add an entirely new attribute again. The existing
CUDAGlobal attribute does what we want, but it's also highly coupled with the
CUDA language. This made it pretty much impossible to find a way to re-use it
without breaking existing functionality. The amount of code duplicated is
minimal and this is required to be able to emit a callable kernel targeting
NVPTX directly. I'd like to use this for my ongoing GPU libc project so I'd
appreciate someone looking at this again.

tra accepted this revision.Mar 24 2023, 11:27 AM

This revision is now accepted and ready to land.Mar 24 2023, 11:27 AM

This revision was landed with ongoing or failed builds.Mar 24 2023, 12:42 PM

Closed by commit rG29a5c3c8fe30: [NVPTX] Introduce attribute to mark kernels without a language mode (authored by jhuber6). · Explain Why

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG29a5c3c8fe30: [NVPTX] Introduce attribute to mark kernels without a language mode.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Attr.td

11 lines

lib/

CodeGen/

CodeGenModule.cpp

3 lines

TargetInfo.cpp

49 lines

test/

CodeGen/

nvptx_attributes.c

51 lines

Diff 483640

clang/include/clang/Basic/Attr.td

Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines
def TargetMips32 : TargetArch<["mips", "mipsel"]>;		def TargetMips32 : TargetArch<["mips", "mipsel"]>;
def TargetAnyMips : TargetArch<["mips", "mipsel", "mips64", "mips64el"]>;		def TargetAnyMips : TargetArch<["mips", "mipsel", "mips64", "mips64el"]>;
def TargetMSP430 : TargetArch<["msp430"]>;		def TargetMSP430 : TargetArch<["msp430"]>;
def TargetM68k : TargetArch<["m68k"]>;		def TargetM68k : TargetArch<["m68k"]>;
def TargetRISCV : TargetArch<["riscv32", "riscv64"]>;		def TargetRISCV : TargetArch<["riscv32", "riscv64"]>;
def TargetX86 : TargetArch<["x86"]>;		def TargetX86 : TargetArch<["x86"]>;
def TargetAnyX86 : TargetArch<["x86", "x86_64"]>;		def TargetAnyX86 : TargetArch<["x86", "x86_64"]>;
def TargetWebAssembly : TargetArch<["wasm32", "wasm64"]>;		def TargetWebAssembly : TargetArch<["wasm32", "wasm64"]>;
		def TargetNVPTX : TargetArch<["nvptx", "nvptx64"]>;
def TargetWindows : TargetSpec {		def TargetWindows : TargetSpec {
let OSes = ["Win32"];		let OSes = ["Win32"];
}		}
def TargetHasDLLImportExport : TargetSpec {		def TargetHasDLLImportExport : TargetSpec {
let CustomCode = [{ Target.getTriple().hasDLLImportExport() }];		let CustomCode = [{ Target.getTriple().hasDLLImportExport() }];
}		}
def TargetItaniumCXXABI : TargetSpec {		def TargetItaniumCXXABI : TargetSpec {
let CustomCode = [{ Target.getCXXABI().isItaniumFamily() }];		let CustomCode = [{ Target.getCXXABI().isItaniumFamily() }];
▲ Show 20 Lines • Show All 770 Lines • ▼ Show 20 Lines	def CUDADeviceBuiltinTextureType : InheritableAttr {
let Subjects = SubjectList<[CXXRecord]>;		let Subjects = SubjectList<[CXXRecord]>;
let Documentation = [CUDADeviceBuiltinTextureTypeDocs];		let Documentation = [CUDADeviceBuiltinTextureTypeDocs];
let MeaningfulToClassTemplateDefinition = 1;		let MeaningfulToClassTemplateDefinition = 1;
let SimpleHandler = 1;		let SimpleHandler = 1;
}		}
def : MutualExclusions<[CUDADeviceBuiltinSurfaceType,		def : MutualExclusions<[CUDADeviceBuiltinSurfaceType,
CUDADeviceBuiltinTextureType]>;		CUDADeviceBuiltinTextureType]>;

def CUDAGlobal : InheritableAttr {		def CUDAGlobal : InheritableAttr, TargetSpecificAttr<TargetNVPTX> {
		traUnsubmitted Not Done Reply Inline Actions Nice. This reminded me that we have a project compiling CUDA, but targeting SPIR-V instead of NVPTX. It looks like this will likely break them. The project is out-of-tree, but I'd still need to figure out how to keep them working. I guess it would be easy enough to expand TargetNVPTX to TargetNVPTXOrSpirV. I'm mostly concerned about logistics of making it happen without disruption. tra: Nice. This reminded me that we have a project compiling CUDA, but targeting SPIR-V instead of…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions This might've broken more stuff after looking into it, I forgot that `AMDGPU` still uses the same CUDA attributes, and the host portion of CUDA also checks these. It would be nice if there was a way to say "CUDA" or "NVPTX", wondering if that's possible in the tablegen here. jhuber6: This might've broken more stuff after looking into it, I forgot that `AMDGPU` still uses the…
		shangwuyaoUnsubmitted Not Done Reply Inline Actions What's the plan here for keeping the SPIR-V and AMDGPU working? Would it work if we simply get rid of the `TargetSpecificAttr<TargetNVPTX>`? shangwuyao: What's the plan here for keeping the SPIR-V and AMDGPU working? Would it work if we simply get…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions Yeah, it would I'll need to update the patch. The best solution would be if there were a way to say "TargetNVPTX or LangOpts.CUDA". Not sure if that's possible in Tablegen. The previous diff I had worked fine, but we should definitely try to avoid rework. jhuber6: Yeah, it would I'll need to update the patch. The best solution would be if there were a way to…
let Spellings = [GNU<"global">, Declspec<"__global__">];		let Spellings = [GNU<"global">, Declspec<"__global__">, Clang<"nvptx_kernel">];
let Subjects = SubjectList<[Function]>;		let Subjects = SubjectList<[Function]>;
let LangOpts = [CUDA];
let Documentation = [Undocumented];		let Documentation = [Undocumented];
}		}
def : MutualExclusions<[CUDADevice, CUDAGlobal]>;		def : MutualExclusions<[CUDADevice, CUDAGlobal]>;

def CUDAHost : InheritableAttr {		def CUDAHost : InheritableAttr {
let Spellings = [GNU<"host">, Declspec<"__host__">];		let Spellings = [GNU<"host">, Declspec<"__host__">];
let Subjects = SubjectList<[Function]>;		let Subjects = SubjectList<[Function]>;
let LangOpts = [CUDA];		let LangOpts = [CUDA];
Show All 11 Lines

def CUDAInvalidTarget : InheritableAttr {		def CUDAInvalidTarget : InheritableAttr {
let Spellings = [];		let Spellings = [];
let Subjects = SubjectList<[Function]>;		let Subjects = SubjectList<[Function]>;
let LangOpts = [CUDA];		let LangOpts = [CUDA];
let Documentation = [InternalOnly];		let Documentation = [InternalOnly];
}		}

def CUDALaunchBounds : InheritableAttr {		def CUDALaunchBounds : InheritableAttr, TargetSpecificAttr<TargetNVPTX> {
let Spellings = [GNU<"launch_bounds">, Declspec<"__launch_bounds__">];		let Spellings = [GNU<"launch_bounds">, Declspec<"__launch_bounds__">, Clang<"nvptx_launch_bounds">];
let Args = [ExprArgument<"MaxThreads">, ExprArgument<"MinBlocks", 1>];		let Args = [ExprArgument<"MaxThreads">, ExprArgument<"MinBlocks", 1>];
let LangOpts = [CUDA];
let Subjects = SubjectList<[ObjCMethod, FunctionLike]>;		let Subjects = SubjectList<[ObjCMethod, FunctionLike]>;
// An AST node is created for this attribute, but is not used by other parts		// An AST node is created for this attribute, but is not used by other parts
// of the compiler. However, this node needs to exist in the AST because		// of the compiler. However, this node needs to exist in the AST because
// non-LLVM backends may be relying on the attribute's presence.		// non-LLVM backends may be relying on the attribute's presence.
let Documentation = [Undocumented];		let Documentation = [Undocumented];
}		}

def CUDAShared : InheritableAttr {		def CUDAShared : InheritableAttr {
▲ Show 20 Lines • Show All 2,864 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,405 Lines • ▼ Show 20 Lines	static std::string getMangledNameImpl(CodeGenModule &CGM, GlobalDecl GD,
else {		else {
IdentifierInfo *II = ND->getIdentifier();		IdentifierInfo *II = ND->getIdentifier();
assert(II && "Attempt to mangle unnamed decl.");		assert(II && "Attempt to mangle unnamed decl.");
const auto *FD = dyn_cast<FunctionDecl>(ND);		const auto *FD = dyn_cast<FunctionDecl>(ND);

if (FD &&		if (FD &&
FD->getType()->castAs<FunctionType>()->getCallConv() == CC_X86RegCall) {		FD->getType()->castAs<FunctionType>()->getCallConv() == CC_X86RegCall) {
Out << "__regcall3__" << II->getName();		Out << "__regcall3__" << II->getName();
} else if (FD && FD->hasAttr<CUDAGlobalAttr>() &&		} else if (FD && CGM.getLangOpts().CUDA && FD->hasAttr<CUDAGlobalAttr>() &&
GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {		GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {
Out << "__device_stub__" << II->getName();		Out << "__device_stub__" << II->getName();
} else {		} else {
Out << II->getName();		Out << II->getName();
}		}
}		}

// Check if the module name hash should be appended for internal linkage		// Check if the module name hash should be appended for internal linkage
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	StringRef CodeGenModule::getMangledName(GlobalDecl GD) {
// mangling should be the same after name stubbing. The later checking is		// mangling should be the same after name stubbing. The later checking is
// very important as the device kernel name being mangled in host-compilation		// very important as the device kernel name being mangled in host-compilation
// is used to resolve the device binaries to be executed. Inconsistent naming		// is used to resolve the device binaries to be executed. Inconsistent naming
// result in undefined behavior. Even though we cannot check that naming		// result in undefined behavior. Even though we cannot check that naming
// directly between host- and device-compilations, the host- and		// directly between host- and device-compilations, the host- and
// device-mangling in host compilation could help catching certain ones.		// device-mangling in host compilation could help catching certain ones.
assert(!isa<FunctionDecl>(ND) \|\| !ND->hasAttr<CUDAGlobalAttr>() \|\|		assert(!isa<FunctionDecl>(ND) \|\| !ND->hasAttr<CUDAGlobalAttr>() \|\|
getContext().shouldExternalize(ND) \|\| getLangOpts().CUDAIsDevice \|\|		getContext().shouldExternalize(ND) \|\| getLangOpts().CUDAIsDevice \|\|
		!getLangOpts().CUDA \|\|
(getContext().getAuxTargetInfo() &&		(getContext().getAuxTargetInfo() &&
(getContext().getAuxTargetInfo()->getCXXABI() !=		(getContext().getAuxTargetInfo()->getCXXABI() !=
getContext().getTargetInfo().getCXXABI())) \|\|		getContext().getTargetInfo().getCXXABI())) \|\|
getCUDARuntime().getDeviceSideName(ND) ==		getCUDARuntime().getDeviceSideName(ND) ==
getMangledNameImpl(		getMangledNameImpl(
*this,		*this,
GD.getWithKernelReferenceKind(KernelReferenceKind::Kernel),		GD.getWithKernelReferenceKind(KernelReferenceKind::Kernel),
ND));		ND));
▲ Show 20 Lines • Show All 5,593 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,322 Lines • ▼ Show 20 Lines	if (FD->hasAttr<OpenCLKernelAttr>()) {
// OpenCL __kernel functions get kernel metadata		// OpenCL __kernel functions get kernel metadata
// Create !{<func-ref>, metadata !"kernel", i32 1} node		// Create !{<func-ref>, metadata !"kernel", i32 1} node
addNVVMMetadata(F, "kernel", 1);		addNVVMMetadata(F, "kernel", 1);
// And kernel functions are not subject to inlining		// And kernel functions are not subject to inlining
F->addFnAttr(llvm::Attribute::NoInline);		F->addFnAttr(llvm::Attribute::NoInline);
}		}
}		}

// Perform special handling in CUDA mode.
if (M.getLangOpts().CUDA) {
// CUDA __global__ functions get a kernel metadata entry. Since		// CUDA __global__ functions get a kernel metadata entry. Since
// __global__ functions cannot be called from the device, we do not		// __global__ functions cannot be called from the device, we do not
// need to set the noinline attribute.		// need to set the noinline attribute.
if (FD->hasAttr<CUDAGlobalAttr>()) {		if (FD->hasAttr<CUDAGlobalAttr>()) {
// Create !{<func-ref>, metadata !"kernel", i32 1} node		// Create !{<func-ref>, metadata !"kernel", i32 1} node
addNVVMMetadata(F, "kernel", 1);		addNVVMMetadata(F, "kernel", 1);
}		}
if (CUDALaunchBoundsAttr *Attr = FD->getAttr<CUDALaunchBoundsAttr>()) {		if (CUDALaunchBoundsAttr *Attr = FD->getAttr<CUDALaunchBoundsAttr>()) {
// Create !{<func-ref>, metadata !"maxntidx", i32 <val>} node		// Create !{<func-ref>, metadata !"maxntidx", i32 <val>} node
llvm::APSInt MaxThreads(32);		llvm::APSInt MaxThreads(32);
MaxThreads = Attr->getMaxThreads()->EvaluateKnownConstInt(M.getContext());		MaxThreads = Attr->getMaxThreads()->EvaluateKnownConstInt(M.getContext());
if (MaxThreads > 0)		if (MaxThreads > 0)
addNVVMMetadata(F, "maxntidx", MaxThreads.getExtValue());		addNVVMMetadata(F, "maxntidx", MaxThreads.getExtValue());

// min blocks is an optional argument for CUDALaunchBoundsAttr. If it was		// min blocks is an optional argument for CUDALaunchBoundsAttr. If it was
// not specified in __launch_bounds__ or if the user specified a 0 value,		// not specified in __launch_bounds__ or if the user specified a 0 value,
// we don't have to add a PTX directive.		// we don't have to add a PTX directive.
if (Attr->getMinBlocks()) {		if (Attr->getMinBlocks()) {
llvm::APSInt MinBlocks(32);		llvm::APSInt MinBlocks(32);
MinBlocks = Attr->getMinBlocks()->EvaluateKnownConstInt(M.getContext());		MinBlocks = Attr->getMinBlocks()->EvaluateKnownConstInt(M.getContext());
if (MinBlocks > 0)		if (MinBlocks > 0)
// Create !{<func-ref>, metadata !"minctasm", i32 <val>} node		// Create !{<func-ref>, metadata !"minctasm", i32 <val>} node
addNVVMMetadata(F, "minctasm", MinBlocks.getExtValue());		addNVVMMetadata(F, "minctasm", MinBlocks.getExtValue());
}		}
}		}
}		}
}

void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,		void NVPTXTargetCodeGenInfo::addNVVMMetadata(llvm::GlobalValue *GV,
StringRef Name, int Operand) {		StringRef Name, int Operand) {
		traUnsubmitted Not Done Reply Inline Actions How does AMDGPU track kernels? It may be a good opportunity to stop using metadata for this if we can use a better suited mechanism. E.g. a function attribute or a calling convention. tra: How does AMDGPU track kernels? It may be a good opportunity to stop using metadata for this if…
		jhuber6AuthorUnsubmitted Done Reply Inline Actions AMDGPU uses a calling convention, which is probably a better option. I don't know how this still gets reduced in the back-end though. jhuber6: AMDGPU uses a calling convention, which is probably a better option. I don't know how this…
		traUnsubmitted Not Done Reply Inline Actions OK. Switching from metadata to a new calling convention would be nice, but it is likely a bit more complicated and can be handled separately if/when we decide to do it. It's not needed for your purposes. tra: OK. Switching from metadata to a new calling convention would be nice, but it is likely a bit…
llvm::Module *M = GV->getParent();		llvm::Module *M = GV->getParent();
llvm::LLVMContext &Ctx = M->getContext();		llvm::LLVMContext &Ctx = M->getContext();

// Get "nvvm.annotations" metadata node		// Get "nvvm.annotations" metadata node
llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata("nvvm.annotations");		llvm::NamedMDNode *MD = M->getOrInsertNamedMetadata("nvvm.annotations");

llvm::Metadata *MDVals[] = {		llvm::Metadata *MDVals[] = {
llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, Name),		llvm::ConstantAsMetadata::get(GV), llvm::MDString::get(Ctx, Name),
▲ Show 20 Lines • Show All 5,117 Lines • Show Last 20 Lines

clang/test/CodeGen/nvptx_attributes.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --function-signature --check-attributes --check-globals
				// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -target-cpu sm_61 -emit-llvm %s -o - \| FileCheck %s
				// CHECK: Function Attrs: noinline nounwind optnone
				// CHECK-LABEL: define {{[^@]+}}@device
				// CHECK-SAME: () #[[ATTR0:[0-9]+]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: ret i32 1
				//
				int device() {return 1;};

				// CHECK: Function Attrs: noinline nounwind optnone
				// CHECK-LABEL: define {{[^@]+}}@foo
				// CHECK-SAME: (ptr noundef [[RET:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[RET_ADDR:%.*]] = alloca ptr, align 8
				// CHECK-NEXT: store ptr [[RET]], ptr [[RET_ADDR]], align 8
				// CHECK-NEXT: [[CALL:%.*]] = call i32 @device()
				// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[RET_ADDR]], align 8
				// CHECK-NEXT: store i32 [[CALL]], ptr [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				__attribute__((nvptx_kernel)) void foo(int *ret) {
				*ret = device();
				}

				// CHECK: Function Attrs: noinline nounwind optnone
				// CHECK-LABEL: define {{[^@]+}}@bar
				// CHECK-SAME: (ptr noundef [[RET:%.*]]) #[[ATTR0]] {
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[RET_ADDR:%.*]] = alloca ptr, align 8
				// CHECK-NEXT: store ptr [[RET]], ptr [[RET_ADDR]], align 8
				// CHECK-NEXT: [[CALL:%.*]] = call i32 @device()
				// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[RET_ADDR]], align 8
				// CHECK-NEXT: store i32 [[CALL]], ptr [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				__attribute__((nvptx_kernel, nvptx_launch_bounds(1, 128))) void bar(int *ret) {
				*ret = device();
				}


				//.
				// CHECK: attributes #0 = { noinline nounwind optnone "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="sm_61" "target-features"="+ptx32,+sm_61" }
				//.
				// CHECK: !0 = !{ptr @foo, !"kernel", i32 1}
				// CHECK: !1 = !{ptr @bar, !"kernel", i32 1}
				// CHECK: !2 = !{ptr @bar, !"maxntidx", i32 1}
				// CHECK: !3 = !{ptr @bar, !"minctasm", i32 128}
				// CHECK: !4 = !{i32 1, !"wchar_size", i32 4}
				// CHECK: !5 = !{!"clang version 16.0.0"}
				//.