This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
7/9
AMDGPUCtorDtorLowering.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
2/5
lower-ctor-dtor-constexpr-alias.ll
-
lower-ctor-dtor-existing.ll
-
lower-ctor-dtor.ll
-
lower-multiple-ctor-dtor.ll

Differential D150675

[AMDGPU] Rewrite device ctor / dtor handling to use .init / .fini sections
ClosedPublic

Authored by jhuber6 on May 16 2023, 7:56 AM.

Download Raw Diff

Details

Reviewers

JonChesterfield
yaxunl
arsenm
rampitec
b-sumner
MaskRay

Commits

rG8b132747cd3b: [AMDGPU] Rewrite device ctor / dtor handling to use .init / .fini sections

Summary

Currently, AMDGPU has special handling for constructors and destructors.
We manuall emit a kernel that calls the functoins listed in the global
constructor / destructor list. This currently has two main problems. The
first is that we do not repsect the priortiy and simply call them in any
order. The second is that we redefine the symbol unconditionally which
coulid have a different definition, meaning we cannot merge any code
with a constructor post-codegen. This patch changes the handling to
instead use the standard support for travering the .init_array and
.fini_array sections the compiler creates. This allows us to emit a
single kernel with odr semantics, so even if we emit this multiple
times they will be merged into a single kernel.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jhuber6 created this revision.May 16 2023, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 7:56 AM

Herald added subscribers: foad, kerbowa, hiraditya and 4 others. · View Herald Transcript

jhuber6 requested review of this revision.May 16 2023, 7:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2023, 7:56 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

arsenm added inline comments.May 16 2023, 8:13 AM

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
80	Use the enum or query from datalayout
85	Use the enum or query from datalayout
89	Ditto
91	Ditto
96	Ditto
106	I don't like getUnqual, it's a bad name that implies they're 0="unqualified" which is not really how things work. Query the function's address space
110	Should just save the ptr type to a variable and reuse throughout

Addressing comments

I've tested this working with a standalone constructor, but it may be good to double check with the asan case.

yaxunl added inline comments.May 16 2023, 8:43 AM

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
1	can we add a check that init_array_start and init_array_end are created by lld as expected?

jhuber6 added inline comments.May 16 2023, 8:46 AM

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
1	That might be a little tough since `lld` is a separate project that might not be built. There are tests for it in there though, but it would be best to know for sure. Might need to be a runtime test? For now you can just take my word for it since it's what we use in the `libc` tests which have a buildbot https://lab.llvm.org/staging/#/builders/247.

yaxunl added inline comments.May 16 2023, 8:52 AM

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
1	OK. LGTM

Harbormaster completed remote builds in B232333: Diff 522639.May 16 2023, 9:49 AM

arsenm added inline comments.May 16 2023, 12:25 PM

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
30–36	Should just use the exiting AMDGPUAS::* instead of inventing a new copy

Move to AMDGPUAS.

Harbormaster completed remote builds in B232398: Diff 522740.May 16 2023, 2:26 PM

In D150675#4346449, @jhuber6 wrote:

I've tested this working with a standalone constructor, but it may be good to double check with the asan case.

I tested the above patch downstream with my asan-toolchain setup on r8 systems and found that the patch has no regressions with respect to both Hip/OpenMP runtimes of launching global ctor/dtors on amdgpu device code.

I have tested the above patches with asan test cases that stress test on the gpu side global-buffer-overflow conditions both with hip and openmp test cases and found the poisoning of redzones to be happening as we want.

I can share the test cases if any people has a working asan toolchain setup with the current patchset included.

Thanks
Amit

Wondering if it's worth maintaining a way to keep the old non-indirect codegen for LTO

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp
111–112	CreateConstInBoundsGEP1_64
llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
20	It's not really wrong but maybe we should use the default program address space instead

This revision is now accepted and ready to land.May 19 2023, 11:33 AM

In D150675#4357229, @arsenm wrote:

Wondering if it's worth maintaining a way to keep the old non-indirect codegen for LTO

That's true, these are indirect function calls so they can't be inlined or anything. But I feel like it may not be worthwhile considering that we execute this kernel with a single thread anyway, overhead from a few indirect function calls probably isn't going to add a lot of latency overall.

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll
20	So I put it in `addrspace(1)` because I was getting weird failures. I think the LLVM ctor / dtor globals were either addrsapce(1) or default, the latter then causing problems.

Closed by commit rG8b132747cd3b: [AMDGPU] Rewrite device ctor / dtor handling to use .init / .fini sections (authored by jhuber6). · Explain WhyMay 19 2023, 2:22 PM

This revision was automatically updated to reflect the committed changes.

jhuber6 added a commit: rG8b132747cd3b: [AMDGPU] Rewrite device ctor / dtor handling to use .init / .fini sections.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUCtorDtorLowering.cpp

102 lines

test/

CodeGen/

AMDGPU/

lower-ctor-dtor-constexpr-alias.ll

72 lines

lower-ctor-dtor-existing.ll

6 lines

lower-ctor-dtor.ll

44 lines

lower-multiple-ctor-dtor.ll

50 lines

Diff 522639

llvm/lib/Target/AMDGPU/AMDGPUCtorDtorLowering.cpp

	Show All 21 Lines
	#include "llvm/Transforms/Utils/ModuleUtils.h"			#include "llvm/Transforms/Utils/ModuleUtils.h"

	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "amdgpu-lower-ctor-dtor"			#define DEBUG_TYPE "amdgpu-lower-ctor-dtor"

	namespace {			namespace {

				enum class AddressSpace : unsigned {
				Generic = 0,
				Global = 1,
				Shared = 3,
				Constant = 4,
				Private = 5,
				};
				arsenmUnsubmitted Not Done Reply Inline Actions Should just use the exiting AMDGPUAS::* instead of inventing a new copy arsenm: Should just use the exiting AMDGPUAS::* instead of inventing a new copy

	static Function *createInitOrFiniKernelFunction(Module &M, bool IsCtor) {			static Function *createInitOrFiniKernelFunction(Module &M, bool IsCtor) {
	StringRef InitOrFiniKernelName = "amdgcn.device.init";			StringRef InitOrFiniKernelName = "amdgcn.device.init";
	if (!IsCtor)			if (!IsCtor)
	InitOrFiniKernelName = "amdgcn.device.fini";			InitOrFiniKernelName = "amdgcn.device.fini";
	if (Function *F = M.getFunction(InitOrFiniKernelName))			if (M.getFunction(InitOrFiniKernelName))
	return F;			return nullptr;

	Function *InitOrFiniKernel = Function::createWithDefaultAttr(			Function *InitOrFiniKernel = Function::createWithDefaultAttr(
	FunctionType::get(Type::getVoidTy(M.getContext()), false),			FunctionType::get(Type::getVoidTy(M.getContext()), false),
	GlobalValue::ExternalLinkage, 0, InitOrFiniKernelName, &M);			GlobalValue::WeakODRLinkage, 0, InitOrFiniKernelName, &M);
	BasicBlock *InitOrFiniKernelBB =
	BasicBlock::Create(M.getContext(), "", InitOrFiniKernel);
	ReturnInst::Create(M.getContext(), InitOrFiniKernelBB);

	InitOrFiniKernel->setCallingConv(CallingConv::AMDGPU_KERNEL);			InitOrFiniKernel->setCallingConv(CallingConv::AMDGPU_KERNEL);
	if (IsCtor)			if (IsCtor)
	InitOrFiniKernel->addFnAttr("device-init");			InitOrFiniKernel->addFnAttr("device-init");
	else			else
	InitOrFiniKernel->addFnAttr("device-fini");			InitOrFiniKernel->addFnAttr("device-fini");
	return InitOrFiniKernel;			return InitOrFiniKernel;
	}			}

				// The linker will provide the associated symbols to allow us to traverse the
				// global constructors in priority order. We create the IR required to call each
				// callback in this section. This is equivalent to the following code.
				//
				// extern "C" void * __init_array_start[];
				// extern "C" void * __init_array_end[];
				//
				// using InitCallback = void();
				//
				// void call_init_array_callbacks() {
				// for (auto start = __init_array_start; start != __init_array_end; ++start)
				// reinterpret_cast<InitCallback >(start)();
				static void createInitOrFiniCalls(Function &F, bool IsCtor) {
				Module &M = *F.getParent();
				LLVMContext &C = M.getContext();

				IRBuilder<> IRB(BasicBlock::Create(C, "entry", &F));
				auto *LoopBB = BasicBlock::Create(C, "while.entry", &F);
				auto *ExitBB = BasicBlock::Create(C, "while.end", &F);
				Type *PtrTy = IRB.getPtrTy(static_cast<unsigned>(AddressSpace::Global));

				auto *Begin = M.getOrInsertGlobal(
				IsCtor ? "__init_array_start" : "__fini_array_start",
				ArrayType::get(PtrTy, 0), [&]() {
				return new GlobalVariable(
				arsenmUnsubmitted Done Reply Inline Actions Use the enum or query from datalayout arsenm: Use the enum or query from datalayout
				M, ArrayType::get(PtrTy, 0),
				/isConstant=/true, GlobalValue::ExternalLinkage,
				/Initializer=/nullptr,
				IsCtor ? "__init_array_start" : "__fini_array_start",
				/InsertBefore=/nullptr, GlobalVariable::NotThreadLocal,
				arsenmUnsubmitted Done Reply Inline Actions Use the enum or query from datalayout arsenm: Use the enum or query from datalayout
				/AddressSpace=/1);
				});
				auto *End = M.getOrInsertGlobal(
				IsCtor ? "__init_array_end" : "__fini_array_end",
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				ArrayType::get(PtrTy, 0), [&]() {
				return new GlobalVariable(
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				M, ArrayType::get(PtrTy, 0),
				/isConstant=/true, GlobalValue::ExternalLinkage,
				/Initializer=/nullptr,
				IsCtor ? "__init_array_end" : "__fini_array_end",
				/InsertBefore=/nullptr, GlobalVariable::NotThreadLocal,
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				/AddressSpace=/1);
				});

				// The constructor type is suppoed to allow using the argument vectors, but
				// for now we just call them with no arguments.
				auto *CallBackTy = FunctionType::get(IRB.getVoidTy(), {});

				IRB.CreateCondBr(IRB.CreateICmpNE(Begin, End), LoopBB, ExitBB);
				IRB.SetInsertPoint(LoopBB);
				auto *CallBackPHI = IRB.CreatePHI(PtrTy, 2, "ptr");
				arsenmUnsubmitted Done Reply Inline Actions I don't like getUnqual, it's a bad name that implies they're 0="unqualified" which is not really how things work. Query the function's address space arsenm: I don't like getUnqual, it's a bad name that implies they're 0="unqualified" which is not…
				auto *CallBack = IRB.CreateLoad(CallBackTy->getPointerTo(F.getAddressSpace()),
				CallBackPHI, "callback");
				IRB.CreateCall(CallBackTy, CallBack);
				auto *NewCallBack = IRB.CreateInBoundsGEP(
				arsenmUnsubmitted Done Reply Inline Actions Should just save the ptr type to a variable and reuse throughout arsenm: Should just save the ptr type to a variable and reuse throughout
				PtrTy, CallBackPHI, ConstantInt::get(IRB.getInt64Ty(), 1), "next");
				auto *EndCmp = IRB.CreateICmpEQ(NewCallBack, End, "end");
				arsenmUnsubmitted Not Done Reply Inline Actions CreateConstInBoundsGEP1_64 arsenm: CreateConstInBoundsGEP1_64
				CallBackPHI->addIncoming(Begin, &F.getEntryBlock());
				CallBackPHI->addIncoming(NewCallBack, LoopBB);
				IRB.CreateCondBr(EndCmp, ExitBB, LoopBB);
				IRB.SetInsertPoint(ExitBB);
				IRB.CreateRetVoid();
				}

	static bool createInitOrFiniKernel(Module &M, StringRef GlobalName,			static bool createInitOrFiniKernel(Module &M, StringRef GlobalName,
	bool IsCtor) {			bool IsCtor) {
	GlobalVariable *GV = M.getGlobalVariable(GlobalName);			GlobalVariable *GV = M.getGlobalVariable(GlobalName);
	if (!GV \|\| !GV->hasInitializer())			if (!GV \|\| !GV->hasInitializer())
	return false;			return false;
	ConstantArray *GA = dyn_cast<ConstantArray>(GV->getInitializer());			ConstantArray *GA = dyn_cast<ConstantArray>(GV->getInitializer());
	if (!GA \|\| GA->getNumOperands() == 0)			if (!GA \|\| GA->getNumOperands() == 0)
	return false;			return false;

	Function *InitOrFiniKernel = createInitOrFiniKernelFunction(M, IsCtor);			Function *InitOrFiniKernel = createInitOrFiniKernelFunction(M, IsCtor);
	IRBuilder<> IRB(InitOrFiniKernel->getEntryBlock().getTerminator());			if (!InitOrFiniKernel)

	FunctionType *ConstructorTy = InitOrFiniKernel->getFunctionType();

	for (Value *V : GA->operands()) {
	auto *CS = cast<ConstantStruct>(V);
	bool AlreadyRegistered =
	llvm::any_of(CS->getOperand(1)->uses(), [=](Use &U) {
	if (auto *CB = dyn_cast<CallBase>(U.getUser()))
	if (CB->getCaller() == InitOrFiniKernel)
	return true;
	return false;			return false;
	});
	if (!AlreadyRegistered)			createInitOrFiniCalls(*InitOrFiniKernel, IsCtor);
	IRB.CreateCall(ConstructorTy, CS->getOperand(1));
	}

	appendToUsed(M, {InitOrFiniKernel});			appendToUsed(M, {InitOrFiniKernel});
	return true;			return true;
	}			}

	static bool lowerCtorsAndDtors(Module &M) {			static bool lowerCtorsAndDtors(Module &M) {
	bool Modified = false;			bool Modified = false;
	Modified \|= createInitOrFiniKernel(M, "llvm.global_ctors", /IsCtor =/true);			Modified \|= createInitOrFiniKernel(M, "llvm.global_ctors", /IsCtor =/true);
	Show All 28 Lines

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-constexpr-alias.ll

	; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-lower-ctor-dtor %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-lower-ctor-dtor %s \| FileCheck %s
				yaxunlUnsubmitted Not Done Reply Inline Actions can we add a check that init_array_start and init_array_end are created by lld as expected? yaxunl: can we add a check that __init_array_start and __init_array_end are created by lld as expected?
				jhuber6AuthorUnsubmitted Done Reply Inline Actions That might be a little tough since `lld` is a separate project that might not be built. There are tests for it in there though, but it would be best to know for sure. Might need to be a runtime test? For now you can just take my word for it since it's what we use in the `libc` tests which have a buildbot https://lab.llvm.org/staging/#/builders/247. jhuber6: That might be a little tough since `lld` is a separate project that might not be built. There…
				yaxunlUnsubmitted Not Done Reply Inline Actions OK. LGTM yaxunl: OK. LGTM
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 < %s \| FileCheck -check-prefix=GCN %s

	; Make sure we emit code for constructor entries that aren't direct			; Make sure we emit code for constructor entries that aren't direct
	; function calls.			; function calls.

	; Check a constructor that's an alias, and an integer literal.			; Check a constructor that's an alias, and an integer literal.
	@llvm.global_ctors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [			@llvm.global_ctors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [
	{ i32, ptr, ptr } { i32 1, ptr @foo.alias, i8* null },			{ i32, ptr, ptr } { i32 1, ptr @foo.alias, i8* null },
	{ i32, ptr, ptr } { i32 1, ptr inttoptr (i64 4096 to ptr), i8* null }			{ i32, ptr, ptr } { i32 1, ptr inttoptr (i64 4096 to ptr), i8* null }
	]			]

	; Check a constantexpr addrspacecast			; Check a constantexpr addrspacecast
	@llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [			@llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [
	{ i32, ptr, ptr } { i32 1, ptr addrspacecast (ptr addrspace(1) @bar to ptr), i8* null }			{ i32, ptr, ptr } { i32 1, ptr addrspacecast (ptr addrspace(1) @bar to ptr), i8* null }
	]			]

	@foo.alias = hidden alias void (), ptr @foo			@foo.alias = hidden alias void (), ptr @foo

	;.			;.
				; CHECK: @__init_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
				arsenmUnsubmitted Not Done Reply Inline Actions It's not really wrong but maybe we should use the default program address space instead arsenm: It's not really wrong but maybe we should use the default program address space instead
				jhuber6AuthorUnsubmitted Done Reply Inline Actions So I put it in `addrspace(1)` because I was getting weird failures. I think the LLVM ctor / dtor globals were either addrsapce(1) or default, the latter then causing problems. jhuber6: So I put it in `addrspace(1)` because I was getting weird failures. I think the LLVM ctor /…
				; CHECK: @__init_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
				; CHECK: @__fini_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
				; CHECK: @__fini_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
	; CHECK: @llvm.used = appending global [2 x ptr] [ptr @amdgcn.device.init, ptr @amdgcn.device.fini], section "llvm.metadata"			; CHECK: @llvm.used = appending global [2 x ptr] [ptr @amdgcn.device.init, ptr @amdgcn.device.fini], section "llvm.metadata"
	; CHECK: @foo.alias = hidden alias void (), ptr @foo			; CHECK: @foo.alias = hidden alias void (), ptr @foo
	;.			;.
	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	ret void			ret void
	}			}

	define void @bar() addrspace(1) {			define void @bar() addrspace(1) {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	ret void			ret void
	}			}

	; CHECK: define amdgpu_kernel void @amdgcn.device.init() #[[ATTR0:[0-9]+]] {			; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.init()
	; CHECK-NEXT: call void @foo.alias()			; CHECK-NEXT: entry:
	; CHECK-NEXT: call void inttoptr (i64 4096 to ptr)()			; CHECK-NEXT: br i1 icmp ne (ptr addrspace(1) @__init_array_start, ptr addrspace(1) @__init_array_end), label [[WHILE_ENTRY:%.]], label [[WHILE_END:%.]]
				; CHECK: while.entry:
				; CHECK-NEXT: [[PTR:%.]] = phi ptr addrspace(1) [ @__init_array_start, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
				; CHECK-NEXT: [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
				; CHECK-NEXT: call void [[CALLBACK]]()
				; CHECK-NEXT: [[NEXT]] = getelementptr inbounds ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
				; CHECK-NEXT: [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__init_array_end
				; CHECK-NEXT: br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
				; CHECK: while.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK-NEXT: }

	; CHECK: define amdgpu_kernel void @amdgcn.device.fini() #[[ATTR1:[0-9]+]] {			; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.fini()
	; CHECK-NEXT: call void addrspacecast (ptr addrspace(1) @bar to ptr)()			; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 icmp ne (ptr addrspace(1) @__fini_array_start, ptr addrspace(1) @__fini_array_end), label [[WHILE_ENTRY:%.]], label [[WHILE_END:%.]]
				; CHECK: while.entry:
				; CHECK-NEXT: [[PTR:%.]] = phi ptr addrspace(1) [ @__fini_array_start, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
				; CHECK-NEXT: [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
				; CHECK-NEXT: call void [[CALLBACK]]()
				; CHECK-NEXT: [[NEXT]] = getelementptr inbounds ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
				; CHECK-NEXT: [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__fini_array_end
				; CHECK-NEXT: br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
				; CHECK: while.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK-NEXT: }

	;.			; CHECK: attributes #[[ATTR0:[0-9]+]] = { "device-init" }
	; CHECK: attributes #[[ATTR0]] = { "device-init" }			; CHECK: attributes #[[ATTR1:[0-9]+]] = { "device-fini" }
	; CHECK: attributes #[[ATTR1]] = { "device-fini" }


	; GCN-LABEL: foo:
	; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]
	;
	; GCN-LABEL: bar:
	; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: s_setpc_b64 s[30:31]
	;
	; GCN-LABEL: amdgcn.device.init:
	; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
	; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], foo.alias@rel32@lo+4
	; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], foo.alias@rel32@hi+12
	; GCN-NEXT: s_swappc_b64 s[30:31], s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}

	; GCN: s_mov_b64 [[LIT_ADDR:s\[[0-9]+:[0-9]+\]]], 0x1000
	; GCN: s_swappc_b64 s[30:31], [[LIT_ADDR]]
	; GCN-NEXT: s_endpgm
	;
	; GCN-LABEL: amdgcn.device.fini:
	; GCN: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
	; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], bar@gotpcrel32@lo+4
	; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], bar@gotpcrel32@hi+12
	; GCN-NEXT: s_load_dwordx2 s{{\[}}[[GOT_LO:[0-9]+]]:[[GOT_HI:[0-9]+]]{{\]}}, s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}, 0x0
	; GCN: s_swappc_b64 s[30:31], s{{\[}}[[GOT_LO]]:[[GOT_HI]]{{\]}}
	; GCN-NEXT: s_endpgm

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor-existing.ll

	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-ctor-dtor < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-ctor-dtor < %s \| FileCheck %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -s - 2>&1 \| FileCheck %s -check-prefix=CHECK-VIS			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -s - 2>&1 \| FileCheck %s -check-prefix=CHECK-VIS

	; Make sure there's no crash or error if amdgcn.device.init or			; Make sure that we don't modify the functions if amdgcn.device.init or
	; amdgcn.device.fini already exist.			; amdgcn.device.fini already exit.

	@llvm.global_ctors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }]			@llvm.global_ctors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }]
	@llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }]			@llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }]

	; CHECK-LABEL: amdgpu_kernel void @amdgcn.device.init() #0 {			; CHECK-LABEL: amdgpu_kernel void @amdgcn.device.init() #0 {
	; CHECK-NEXT: store volatile i32 1, ptr addrspace(1) null			; CHECK-NEXT: store volatile i32 1, ptr addrspace(1) null
	; CHECK-NEXT: call void @foo()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK-NEXT: }			; CHECK-NEXT: }

	; CHECK-LABEL: define amdgpu_kernel void @amdgcn.device.fini() #1 {			; CHECK-LABEL: define amdgpu_kernel void @amdgcn.device.fini() #1 {
	; CHECK-NEXT: store volatile i32 0, ptr addrspace(1) null			; CHECK-NEXT: store volatile i32 0, ptr addrspace(1) null
	; CHECK-NEXT: call void @bar()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK-NEXT: }			; CHECK-NEXT: }

	; CHECK-NOT: amdgcn.device.			; CHECK-NOT: amdgcn.device.

	; CHECK-VIS: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.init{{$}}			; CHECK-VIS: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.init{{$}}
	; CHECK-VIS: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.init.kd{{$}}			; CHECK-VIS: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.init.kd{{$}}
	; CHECK-VIS: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.fini{{$}}			; CHECK-VIS: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.fini{{$}}
	Show All 24 Lines

llvm/test/CodeGen/AMDGPU/lower-ctor-dtor.ll

	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-ctor-dtor < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-ctor-dtor < %s \| FileCheck %s
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-ctor-dtor < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-ctor-dtor < %s \| FileCheck %s

	; Make sure we get the same result if we run multiple times			; Make sure we get the same result if we run multiple times
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-ctor-dtor,amdgpu-lower-ctor-dtor < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-ctor-dtor,amdgpu-lower-ctor-dtor < %s \| FileCheck %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -s - 2>&1 \| FileCheck %s -check-prefix=VISIBILITY			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -s - 2>&1 \| FileCheck %s -check-prefix=VISIBILITY
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -S - 2>&1 \| FileCheck %s -check-prefix=SECTION			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -S - 2>&1 \| FileCheck %s -check-prefix=SECTION

	@llvm.global_ctors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }]			@llvm.global_ctors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }]
	@llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }]			@llvm.global_dtors = appending addrspace(1) global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }]

				; CHECK: @__init_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
				; CHECK: @__init_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
				; CHECK: @__fini_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
				; CHECK: @__fini_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
	; CHECK: @llvm.used = appending global [2 x ptr] [ptr @amdgcn.device.init, ptr @amdgcn.device.fini]			; CHECK: @llvm.used = appending global [2 x ptr] [ptr @amdgcn.device.init, ptr @amdgcn.device.fini]

	; CHECK-LABEL: amdgpu_kernel void @amdgcn.device.init() #0			; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.init() #0
	; CHECK-NEXT: call void @foo			; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 icmp ne (ptr addrspace(1) @__init_array_start, ptr addrspace(1) @__init_array_end), label [[WHILE_ENTRY:%.]], label [[WHILE_END:%.]]
				; CHECK: while.entry:
				; CHECK-NEXT: [[PTR:%.]] = phi ptr addrspace(1) [ @__init_array_start, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
				; CHECK-NEXT: [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
				; CHECK-NEXT: call void [[CALLBACK]]()
				; CHECK-NEXT: [[NEXT]] = getelementptr inbounds ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
				; CHECK-NEXT: [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__init_array_end
				; CHECK-NEXT: br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
				; CHECK: while.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void

	; CHECK-LABEL: amdgpu_kernel void @amdgcn.device.fini() #1			; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.fini() #1
	; CHECK-NEXT: call void @bar			; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 icmp ne (ptr addrspace(1) @__fini_array_start, ptr addrspace(1) @__fini_array_end), label [[WHILE_ENTRY:%.]], label [[WHILE_END:%.]]
				; CHECK: while.entry:
				; CHECK-NEXT: [[PTR:%.]] = phi ptr addrspace(1) [ @__fini_array_start, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
				; CHECK-NEXT: [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
				; CHECK-NEXT: call void [[CALLBACK]]()
				; CHECK-NEXT: [[NEXT]] = getelementptr inbounds ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
				; CHECK-NEXT: [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__fini_array_end
				; CHECK-NEXT: br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
				; CHECK: while.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void

	; CHECK-NOT: amdgcn.device.			; CHECK-NOT: amdgcn.device.

	; VISIBILITY: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.init			; VISIBILITY: FUNC WEAK PROTECTED {{.*}} amdgcn.device.init
	; VISIBILITY: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.init.kd			; VISIBILITY: OBJECT WEAK DEFAULT {{.*}} amdgcn.device.init.kd
	; VISIBILITY: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.fini			; VISIBILITY: FUNC WEAK PROTECTED {{.*}} amdgcn.device.fini
	; VISIBILITY: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.fini.kd			; VISIBILITY: OBJECT WEAK DEFAULT {{.*}} amdgcn.device.fini.kd
	; SECTION: .init_array.1 INIT_ARRAY {{.}} {{.}} 000008 00 WA 0 0 8			; SECTION: .init_array.1 INIT_ARRAY {{.}} {{.}} 000008 00 WA 0 0 8
	; SECTION: .fini_array.1 FINI_ARRAY {{.}} {{.}} 000008 00 WA 0 0 8			; SECTION: .fini_array.1 FINI_ARRAY {{.}} {{.}} 000008 00 WA 0 0 8

	define internal void @foo() {			define internal void @foo() {
	ret void			ret void
	}			}

	define internal void @bar() {			define internal void @bar() {
	ret void			ret void
	}			}

	; CHECK: attributes #0 = { "device-init" }			; CHECK: attributes #0 = { "device-init" }
	; CHECK: attributes #1 = { "device-fini" }			; CHECK: attributes #1 = { "device-fini" }

llvm/test/CodeGen/AMDGPU/lower-multiple-ctor-dtor.ll

	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-ctor-dtor < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-ctor-dtor < %s \| FileCheck %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -s - 2>&1 \| FileCheck %s -check-prefix=CHECK-VIS			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx700 -filetype=obj -o - < %s \| llvm-readelf -s - 2>&1 \| FileCheck %s -check-prefix=CHECK-VIS

	@llvm.global_ctors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }, { i32, ptr, ptr } { i32 1, ptr @foo.5, ptr null }]			@llvm.global_ctors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @foo, ptr null }, { i32, ptr, ptr } { i32 1, ptr @foo.5, ptr null }]
	@llvm.global_dtors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }, { i32, ptr, ptr } { i32 1, ptr @bar.5, ptr null }]			@llvm.global_dtors = appending addrspace(1) global [2 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @bar, ptr null }, { i32, ptr, ptr } { i32 1, ptr @bar.5, ptr null }]

	; CHECK-LABEL: amdgpu_kernel void @amdgcn.device.init() #0			; CHECK: @__init_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
	; CHECK-NEXT: call void @foo			; CHECK: @__init_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
	; CHECK-NEXT: call void @foo.5			; CHECK: @__fini_array_start = external addrspace(1) constant [0 x ptr addrspace(1)]
				; CHECK: @__fini_array_end = external addrspace(1) constant [0 x ptr addrspace(1)]
	; CHECK-LABEL: amdgpu_kernel void @amdgcn.device.fini() #1			; CHECK: @llvm.used = appending global [2 x ptr] [ptr @amdgcn.device.init, ptr @amdgcn.device.fini]
	; CHECK-NEXT: call void @bar
	; CHECK-NEXT: call void @bar.5			; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.init() #0
				; CHECK-NEXT: entry:
	; CHECK-VIS: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.init			; CHECK-NEXT: br i1 icmp ne (ptr addrspace(1) @__init_array_start, ptr addrspace(1) @__init_array_end), label [[WHILE_ENTRY:%.]], label [[WHILE_END:%.]]
	; CHECK-VIS: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.init.kd			; CHECK: while.entry:
	; CHECK-VIS: FUNC GLOBAL PROTECTED {{.*}} amdgcn.device.fini			; CHECK-NEXT: [[PTR:%.]] = phi ptr addrspace(1) [ @__init_array_start, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
	; CHECK-VIS: OBJECT GLOBAL DEFAULT {{.*}} amdgcn.device.fini.kd			; CHECK-NEXT: [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
				; CHECK-NEXT: call void [[CALLBACK]]()
				; CHECK-NEXT: [[NEXT]] = getelementptr inbounds ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
				; CHECK-NEXT: [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__init_array_end
				; CHECK-NEXT: br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
				; CHECK: while.end:
				; CHECK-NEXT: ret void

				; CHECK-LABEL: define weak_odr amdgpu_kernel void @amdgcn.device.fini() #1
				; CHECK-NEXT: entry:
				; CHECK-NEXT: br i1 icmp ne (ptr addrspace(1) @__fini_array_start, ptr addrspace(1) @__fini_array_end), label [[WHILE_ENTRY:%.]], label [[WHILE_END:%.]]
				; CHECK: while.entry:
				; CHECK-NEXT: [[PTR:%.]] = phi ptr addrspace(1) [ @__fini_array_start, [[ENTRY:%.]] ], [ [[NEXT:%.*]], [[WHILE_ENTRY]] ]
				; CHECK-NEXT: [[CALLBACK:%.*]] = load ptr, ptr addrspace(1) [[PTR]], align 8
				; CHECK-NEXT: call void [[CALLBACK]]()
				; CHECK-NEXT: [[NEXT]] = getelementptr inbounds ptr addrspace(1), ptr addrspace(1) [[PTR]], i64 1
				; CHECK-NEXT: [[END:%.*]] = icmp eq ptr addrspace(1) [[NEXT]], @__fini_array_end
				; CHECK-NEXT: br i1 [[END]], label [[WHILE_END]], label [[WHILE_ENTRY]]
				; CHECK: while.end:
				; CHECK-NEXT: ret void

				; CHECK-VIS: FUNC WEAK PROTECTED {{.*}} amdgcn.device.init
				; CHECK-VIS: OBJECT WEAK DEFAULT {{.*}} amdgcn.device.init.kd
				; CHECK-VIS: FUNC WEAK PROTECTED {{.*}} amdgcn.device.fini
				; CHECK-VIS: OBJECT WEAK DEFAULT {{.*}} amdgcn.device.fini.kd

	define internal void @foo() {			define internal void @foo() {
	ret void			ret void
	}			}

	define internal void @bar() {			define internal void @bar() {
	ret void			ret void
	}			}

	define internal void @foo.5() {			define internal void @foo.5() {
	ret void			ret void
	}			}

	define internal void @bar.5() {			define internal void @bar.5() {
	ret void			ret void
	}			}

	; CHECK: attributes #0 = { "device-init" }			; CHECK: attributes #0 = { "device-init" }
	; CHECK: attributes #1 = { "device-fini" }			; CHECK: attributes #1 = { "device-fini" }
	No newline at end of file