This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][lds] Use amdgpu-lds-size instead of llvm.donothing
Needs ReviewPublic

Authored by JonChesterfield on Jul 13 2023, 3:37 PM.

Download Raw Diff

Details

Reviewers

arsenm
jmmartinez

Summary

LDS objects allocated by in kernels by LowerModuleLDS were marked with llvm.donothing
to ensure they had a use from the kernel, as opposed to only from called functions.

The backend does not need that explicit use after D155190. PromoteAlloca relies on that
use to estimate the total LDS allocated on other variables. Changing PromoteAlloca to use
the attribute directly is quicker and more accurate when available. It also removes the last
requirement for the llvm.donothing calls on static LDS.

llvm.donothing is still used for dynamic LDS alignment after this patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JonChesterfield created this revision.Jul 13 2023, 3:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 3:37 PM

Herald added subscribers: foad, kerbowa, mgrang and 6 others. · View Herald Transcript

JonChesterfield requested review of this revision.Jul 13 2023, 3:37 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 3:37 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

JonChesterfield edited the summary of this revision. (Show Details)Jul 13 2023, 3:45 PM

The "amdgpu-lds-size" attributes in tests will mostly disappear on rebase after landing D155190.

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll
183	^ this is curious, looks like llvm.donothing affects whether a function is considered speculatable.

JonChesterfield added inline comments.Jul 13 2023, 3:51 PM

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
904	Most of this iteration and search for LDS could be dropped for kernels that are transformed by LowerModuleLDSPass if it added an attribute for dynamic lds alignment, along with amdgpu-lds-size.

arsenm added inline comments.Jul 13 2023, 3:57 PM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
1060	I thought it was weird only having a size attribute. But also, how does this help if the size is always rounded by the maximum dynamic alignment?
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955	Is there now a path to introducing LDS outside of a kernel? I'm slightly nervous about assuming amdgpu-lds-size is a source of truth in case something else introduces new LDS globals. Such as this pass, which is not updating the value

arsenm added inline comments.Jul 13 2023, 3:59 PM

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955	If it already pre-packed the globals into a single big one, won't the search find the case of 1 and work fine without this?

JonChesterfield added inline comments.Jul 13 2023, 4:13 PM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
1060	Two things. The promote alloca pass skips kernels that use dynamic lds. It does that by crawling globals. This use ensures it notices when kernels only use dynamic lds in some called function. An align attribute would give the same information without the search. If something wants to append to the static frame, repeatedly, tracking dynamic alignment separately (aka alignment of the end of the frame) seems sensible. I haven't totally thought this one through yet, it might suffice to have a bool do-not-append-more-lds per kernel.
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955	PromoteAlloca sometimes adds a global accessed by a kernel. It could be extended to work on non-entry functions with some effort. That's currently the only thing downstream of lowermodulelds that might add a global. amdgpu-frame-size is a reasonable contender for a source of truth. We can append to it and mark the new global with an absolute address to get something the backend handles correctly. The awkward case is dynamic lds, which currently needs to block appending. I think the lowering pass could be reworked to compose cleanly based on that attribute and append-only model I'm thinking of moving graphics to the same lowering as compute and then making any lds variable without an absolute address assigned before codegen a fatal error.

JonChesterfield added inline comments.Jul 13 2023, 4:15 PM

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
955	It misses the big struct if it isn't directly used by the kernel. This pass crawls the globals but not the call graph. Thus the donothing hack

rebase

reduce diff

reduce diff

Harbormaster completed remote builds in B245257: Diff 540232.Jul 13 2023, 8:05 PM

Approach in D155384 is better for PromoteAlloca.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPULowerModuleLDSPass.cpp

11 lines

AMDGPUPromoteAlloca.cpp

32 lines

test/

CodeGen/

AMDGPU/

lower-module-lds-all-indirect-accesses.ll

2 lines

lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll

12 lines

lower-module-lds-offsets.ll

2 lines

lower-module-lds-single-var-ambiguous.ll

4 lines

lower-module-lds-single-var-unambiguous.ll

22 lines

lower-module-lds-via-hybrid.ll

29 lines

lower-module-lds-via-table.ll

11 lines

lower-module-lds.ll

4 lines

Diff 540232

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

Show First 20 Lines • Show All 896 Lines • ▼ Show 20 Lines	for (Function &Func : M.functions()) {
M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {		M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
Instruction *I = dyn_cast<Instruction>(U.getUser());		Instruction *I = dyn_cast<Instruction>(U.getUser());
if (!I) {		if (!I) {
return false;		return false;
}		}
Function *F = I->getFunction();		Function *F = I->getFunction();
return F == &Func;		return F == &Func;
});		});

markUsedByKernel(Builder, &Func, ModuleScopeReplacement.SGV);
}		}
}		}

return ModuleScopeReplacement.SGV;		return ModuleScopeReplacement.SGV;
}		}

static DenseMap<Function *, LDSVariableReplacement>		static DenseMap<Function *, LDSVariableReplacement>
lowerKernelScopeStructVariables(		lowerKernelScopeStructVariables(
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (Function &Func : M.functions()) {
}		}

std::string VarName =		std::string VarName =
(Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();		(Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();

auto Replacement =		auto Replacement =
createLDSVariableReplacement(M, VarName, KernelUsedVariables);		createLDSVariableReplacement(M, VarName, KernelUsedVariables);

// If any indirect uses, create a direct use to ensure allocation
// TODO: Simpler to unconditionally mark used but that regresses
// codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
!Accesses->second.empty())
markUsedByKernel(Builder, &Func, Replacement.SGV);

// remove preserves existing codegen		// remove preserves existing codegen
removeLocalVarsFromUsedLists(M, KernelUsedVariables);		removeLocalVarsFromUsedLists(M, KernelUsedVariables);
KernelToReplacement[&Func] = Replacement;		KernelToReplacement[&Func] = Replacement;

// Rewrite uses within kernel to the new struct		// Rewrite uses within kernel to the new struct
replaceLDSVariablesWithStruct(		replaceLDSVariablesWithStruct(
M, KernelUsedVariables, Replacement, [&Func](Use &U) {		M, KernelUsedVariables, Replacement, [&Func](Use &U) {
Instruction *I = dyn_cast<Instruction>(U.getUser());		Instruction *I = dyn_cast<Instruction>(U.getUser());
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
report_fatal_error("Anonymous kernels cannot use LDS variables");		report_fatal_error("Anonymous kernels cannot use LDS variables");
}		}

GlobalVariable *N =		GlobalVariable *N =
buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);		buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);

KernelToCreatedDynamicLDS[func] = N;		KernelToCreatedDynamicLDS[func] = N;

		// Could replace this with a dynamic LDS alignment attribute
		arsenmUnsubmitted Not Done Reply Inline Actions I thought it was weird only having a size attribute. But also, how does this help if the size is always rounded by the maximum dynamic alignment? arsenm: I thought it was weird only having a size attribute. But also, how does this help if the size…
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Two things. The promote alloca pass skips kernels that use dynamic lds. It does that by crawling globals. This use ensures it notices when kernels only use dynamic lds in some called function. An align attribute would give the same information without the search. If something wants to append to the static frame, repeatedly, tracking dynamic alignment separately (aka alignment of the end of the frame) seems sensible. I haven't totally thought this one through yet, it might suffice to have a bool do-not-append-more-lds per kernel. JonChesterfield: Two things. The promote alloca pass skips kernels that use dynamic lds. It does that by…
markUsedByKernel(Builder, func, N);		markUsedByKernel(Builder, func, N);

auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);		auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
auto GEP = ConstantExpr::getGetElementPtr(		auto GEP = ConstantExpr::getGetElementPtr(
emptyCharArray, N, ConstantInt::get(I32, 0), true);		emptyCharArray, N, ConstantInt::get(I32, 0), true);
newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));		newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
} else {		} else {
newDynamicLDS.push_back(PoisonValue::get(I32));		newDynamicLDS.push_back(PoisonValue::get(I32));
▲ Show 20 Lines • Show All 471 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

Show First 20 Lines • Show All 895 Lines • ▼ Show 20 Lines	for (const User *U : Val->users()) {
if (VisitedConstants.insert(C).second)		if (VisitedConstants.insert(C).second)
Stack.push_back(C);		Stack.push_back(C);
}		}
}		}

return false;		return false;
};		};

for (GlobalVariable &GV : Mod->globals()) {		for (GlobalVariable &GV : Mod->globals()) {
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Most of this iteration and search for LDS could be dropped for kernels that are transformed by LowerModuleLDSPass if it added an attribute for dynamic lds alignment, along with amdgpu-lds-size. JonChesterfield: Most of this iteration and search for LDS could be dropped for kernels that are transformed by…
if (GV.getAddressSpace() != AMDGPUAS::LOCAL_ADDRESS)		if (GV.getAddressSpace() != AMDGPUAS::LOCAL_ADDRESS)
continue;		continue;

if (visitUsers(&GV, &GV)) {		if (visitUsers(&GV, &GV)) {
UsedLDS.insert(&GV);		UsedLDS.insert(&GV);
Stack.clear();		Stack.clear();
continue;		continue;
}		}
Show All 27 Lines	if (GV->hasExternalLinkage() && AllocSize == 0) {
"local memory. Promoting to local memory "		"local memory. Promoting to local memory "
"disabled.\n");		"disabled.\n");
return false;		return false;
}		}

AllocatedSizes.emplace_back(AllocSize, Alignment);		AllocatedSizes.emplace_back(AllocSize, Alignment);
}		}

		// Check how much local memory is being used by global objects
		CurrentLocalMemUsage = 0;

		// If the kernel has an amdgpu-lds-size attribute, use that value instead of
		// estimating.
		CurrentLocalMemUsage = F.getFnAttributeAsParsedInteger("amdgpu-lds-size", 0);

		if (CurrentLocalMemUsage == 0) {
		arsenmUnsubmitted Not Done Reply Inline Actions Is there now a path to introducing LDS outside of a kernel? I'm slightly nervous about assuming amdgpu-lds-size is a source of truth in case something else introduces new LDS globals. Such as this pass, which is not updating the value arsenm: Is there now a path to introducing LDS outside of a kernel? I'm slightly nervous about…
		arsenmUnsubmitted Not Done Reply Inline Actions If it already pre-packed the globals into a single big one, won't the search find the case of 1 and work fine without this? arsenm: If it already pre-packed the globals into a single big one, won't the search find the case of 1…
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions It misses the big struct if it isn't directly used by the kernel. This pass crawls the globals but not the call graph. Thus the donothing hack JonChesterfield: It misses the big struct if it isn't directly used by the kernel. This pass crawls the globals…
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions PromoteAlloca sometimes adds a global accessed by a kernel. It could be extended to work on non-entry functions with some effort. That's currently the only thing downstream of lowermodulelds that might add a global. amdgpu-frame-size is a reasonable contender for a source of truth. We can append to it and mark the new global with an absolute address to get something the backend handles correctly. The awkward case is dynamic lds, which currently needs to block appending. I think the lowering pass could be reworked to compose cleanly based on that attribute and append-only model I'm thinking of moving graphics to the same lowering as compute and then making any lds variable without an absolute address assigned before codegen a fatal error. JonChesterfield: PromoteAlloca sometimes adds a global accessed by a kernel. It could be extended to work on non…
// Sort to try to estimate the worst case alignment padding		// Sort to try to estimate the worst case alignment padding
//		//
// FIXME: We should really do something to fix the addresses to a more optimal		// FIXME: We should really do something to fix the addresses to a more optimal
// value instead		// value instead
llvm::sort(AllocatedSizes, llvm::less_second());		llvm::sort(AllocatedSizes, llvm::less_second());

// Check how much local memory is being used by global objects
CurrentLocalMemUsage = 0;

// FIXME: Try to account for padding here. The real padding and address is		// FIXME: Try to account for padding here. The real padding and address is
// currently determined from the inverse order of uses in the function when		// currently determined from the inverse order of uses in the function when
// legalizing, which could also potentially change. We try to estimate the		// legalizing, which could also potentially change. We try to estimate the
// worst case here, but we probably should fix the addresses earlier.		// worst case here, but we probably should fix the addresses earlier.
for (auto Alloc : AllocatedSizes) {		for (auto Alloc : AllocatedSizes) {
CurrentLocalMemUsage = alignTo(CurrentLocalMemUsage, Alloc.second);		CurrentLocalMemUsage = alignTo(CurrentLocalMemUsage, Alloc.second);
CurrentLocalMemUsage += Alloc.first;		CurrentLocalMemUsage += Alloc.first;
}		}
		}

unsigned MaxOccupancy =		unsigned MaxOccupancy =
ST.getOccupancyWithLocalMemSize(CurrentLocalMemUsage, F);		ST.getOccupancyWithLocalMemSize(CurrentLocalMemUsage, F);

// Restrict local memory usage so that we don't drastically reduce occupancy,		// Restrict local memory usage so that we don't drastically reduce occupancy,
// unless it is already significantly reduced.		// unless it is already significantly reduced.

// TODO: Have some sort of hint or other heuristics to guess occupancy based		// TODO: Have some sort of hint or other heuristics to guess occupancy based
▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds-all-indirect-accesses.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=hybrid < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=hybrid < %s \| FileCheck %s

	;; Reduced from a larger test case. Checks that functions and kernels that use only dynamic lds			;; Reduced from a larger test case. Checks that functions and kernels that use only dynamic lds
	;; are lowered successfully. Previously they only worked if the kernel happened to also use static lds			;; are lowered successfully. Previously they only worked if the kernel happened to also use static lds
	;; variables. Artefact of implementing dynamic variables by adapting existing code for static.			;; variables. Artefact of implementing dynamic variables by adapting existing code for static.

	@A = external addrspace(3) global [8 x ptr]			@A = external addrspace(3) global [8 x ptr]
	@B = external addrspace(3) global [0 x i32]			@B = external addrspace(3) global [0 x i32]

	define amdgpu_kernel void @kernel_0() {			define amdgpu_kernel void @kernel_0() {
	; CHECK-LABEL: define amdgpu_kernel void @kernel_0() #0 !llvm.amdgcn.lds.kernel.id !1 {			; CHECK-LABEL: define amdgpu_kernel void @kernel_0() #0 !llvm.amdgcn.lds.kernel.id !1 {
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.kernel_0.lds) ]
	; CHECK-NEXT: call void @call_store_A()			; CHECK-NEXT: call void @call_store_A()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @call_store_A()			call void @call_store_A()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_1() {			define amdgpu_kernel void @kernel_1() {
	; CHECK-LABEL: define amdgpu_kernel void @kernel_1() !llvm.amdgcn.lds.kernel.id !2 {			; CHECK-LABEL: define amdgpu_kernel void @kernel_1() !llvm.amdgcn.lds.kernel.id !2 {
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel_1.dynlds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel_1.dynlds) ]
	; CHECK-NEXT: [[PTR:%.*]] = call ptr @get_B_ptr()			; CHECK-NEXT: [[PTR:%.*]] = call ptr @get_B_ptr()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%ptr = call ptr @get_B_ptr()			%ptr = call ptr @get_B_ptr()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_2() {			define amdgpu_kernel void @kernel_2() {
	; CHECK-LABEL: define amdgpu_kernel void @kernel_2() #0 !llvm.amdgcn.lds.kernel.id !3 {			; CHECK-LABEL: define amdgpu_kernel void @kernel_2() #0 !llvm.amdgcn.lds.kernel.id !3 {
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.kernel_2.lds) ]
	; CHECK-NEXT: call void @store_A()			; CHECK-NEXT: call void @store_A()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @store_A()			call void @store_A()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_3() {			define amdgpu_kernel void @kernel_3() {
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll

	Show All 38 Lines
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx = getelementptr inbounds [0 x double], ptr addrspace(3) @dynamic_kernel_only, i32 0, i32 0			%arrayidx = getelementptr inbounds [0 x double], ptr addrspace(3) @dynamic_kernel_only, i32 0, i32 0
	store double 3.140000e+00, ptr addrspace(3) %arrayidx			store double 3.140000e+00, ptr addrspace(3) %arrayidx
	ret void			ret void
	}			}

	; The accesses from functions are rewritten to go through the llvm.amdgcn.dynlds.offset.table			; The accesses from functions are rewritten to go through the llvm.amdgcn.dynlds.offset.table
	define void @use_shared1() {			define void @use_shared1() #0 {
	; CHECK-LABEL: @use_shared1() {			; CHECK-LABEL: @use_shared1() #0 {
	; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()			; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()
	; CHECK-NEXT: [[DYNAMIC_SHARED1:%.*]] = getelementptr inbounds [5 x i32], ptr addrspace(4) @llvm.amdgcn.dynlds.offset.table, i32 0, i32 [[TMP1]]			; CHECK-NEXT: [[DYNAMIC_SHARED1:%.*]] = getelementptr inbounds [5 x i32], ptr addrspace(4) @llvm.amdgcn.dynlds.offset.table, i32 0, i32 [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[DYNAMIC_SHARED1]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[DYNAMIC_SHARED1]], align 4
	; CHECK-NEXT: [[DYNAMIC_SHARED11:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)			; CHECK-NEXT: [[DYNAMIC_SHARED11:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i8], ptr addrspace(3) [[DYNAMIC_SHARED11]], i32 0, i32 1			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i8], ptr addrspace(3) [[DYNAMIC_SHARED11]], i32 0, i32 1
	; CHECK-NEXT: store i8 0, ptr addrspace(3) [[ARRAYIDX]], align 1			; CHECK-NEXT: store i8 0, ptr addrspace(3) [[ARRAYIDX]], align 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: call void @use_shared2()			; CHECK-NEXT: call void @use_shared2()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @use_shared2()			call void @use_shared2()
	ret void			ret void
	}			}

	define amdgpu_kernel void @expect_align4() {			define amdgpu_kernel void @expect_align4() {
	; CHECK-LABEL: @expect_align4() #1 !llvm.amdgcn.lds.kernel.id !4 {			; CHECK-LABEL: @expect_align4() #1 !llvm.amdgcn.lds.kernel.id !4
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_align4.dynlds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_align4.dynlds) ]
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
	; CHECK-NEXT: call void @use_shared4()			; CHECK-NEXT: call void @use_shared4()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @use_shared4()			call void @use_shared4()
	ret void			ret void
	}			}

	; Use dynamic_shared directly too.			; Use dynamic_shared directly too.
	define amdgpu_kernel void @expect_align8() {			define amdgpu_kernel void @expect_align8() {
	; CHECK-LABEL: @expect_align8() !llvm.amdgcn.lds.kernel.id !5 {			; CHECK-LABEL: @expect_align8() !llvm.amdgcn.lds.kernel.id !5
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_align8.dynlds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_align8.dynlds) ]
	; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i64], ptr addrspace(3) @dynamic_shared8, i32 0, i32 9			; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds [0 x i64], ptr addrspace(3) @dynamic_shared8, i32 0, i32 9
	; CHECK-NEXT: store i64 3, ptr addrspace(3) [[ARRAYIDX]], align 4			; CHECK-NEXT: store i64 3, ptr addrspace(3) [[ARRAYIDX]], align 4
	; CHECK-NEXT: call void @use_shared8()			; CHECK-NEXT: call void @use_shared8()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx = getelementptr inbounds [0 x i64], ptr addrspace(3) @dynamic_shared8, i32 0, i32 9			%arrayidx = getelementptr inbounds [0 x i64], ptr addrspace(3) @dynamic_shared8, i32 0, i32 9
	store i64 3, ptr addrspace(3) %arrayidx			store i64 3, ptr addrspace(3) %arrayidx
	call void @use_shared8()			call void @use_shared8()
	ret void			ret void
	}			}

	; Note: use_shared4 uses module.lds so this will allocate at offset 4			; Note: use_shared4 uses module.lds so this will allocate at offset 4
	define amdgpu_kernel void @expect_max_of_2_and_4() {			define amdgpu_kernel void @expect_max_of_2_and_4() {
	; CHECK-LABEL: @expect_max_of_2_and_4() #1 !llvm.amdgcn.lds.kernel.id !6 {			; CHECK-LABEL: @expect_max_of_2_and_4() #1 !llvm.amdgcn.lds.kernel.id !6
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_max_of_2_and_4.dynlds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_max_of_2_and_4.dynlds) ]
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
	; CHECK-NEXT: call void @use_shared2()			; CHECK-NEXT: call void @use_shared2()
	; CHECK-NEXT: call void @use_shared4()			; CHECK-NEXT: call void @use_shared4()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @use_shared2()			call void @use_shared2()
	call void @use_shared4()			call void @use_shared4()
	ret void			ret void
	}			}


	attributes #0 = { noinline }			attributes #0 = { noinline }

	; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)			; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
	; CHECK: declare void @llvm.donothing() #2			; CHECK: declare void @llvm.donothing() #2

	; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)			; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
	; CHECK: declare i32 @llvm.amdgcn.lds.kernel.id() #3			; CHECK: declare i32 @llvm.amdgcn.lds.kernel.id() #3

	; CHECK: attributes #0 = { noinline }			; CHECK: attributes #0 = { noinline }
	; CHECK: attributes #1 = { "amdgpu-lds-size"="4" }			; CHECK: attributes #1 = { "amdgpu-lds-size"="4" }
	; CHECK: attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }			; CHECK: attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
	; CHECK: attributes #3 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }			; CHECK: attributes #3 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

				JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions ^ this is curious, looks like llvm.donothing affects whether a function is considered speculatable. JonChesterfield: ^ this is curious, looks like llvm.donothing affects whether a function is considered…
	; CHECK: !0 = !{i64 0, i64 1}			; CHECK: !0 = !{i64 0, i64 1}
	; CHECK: !1 = !{i64 4, i64 5}			; CHECK: !1 = !{i64 4, i64 5}
	; CHECK: !2 = !{i32 0}			; CHECK: !2 = !{i32 0}
	; CHECK: !3 = !{i32 1}			; CHECK: !3 = !{i32 1}
	; CHECK: !4 = !{i32 2}			; CHECK: !4 = !{i32 2}
	; CHECK: !5 = !{i32 3}			; CHECK: !5 = !{i32 3}
	; CHECK: !6 = !{i32 4}			; CHECK: !6 = !{i32 4}

llvm/test/CodeGen/AMDGPU/lower-module-lds-offsets.ll

	Show All 11 Lines
	; GCN-LABEL: {{^}}k0:			; GCN-LABEL: {{^}}k0:
	; GCN-DAG: v_mov_b32_e32 [[NULL:v[0-9]+]], 0			; GCN-DAG: v_mov_b32_e32 [[NULL:v[0-9]+]], 0
	; GCN-DAG: v_mov_b32_e32 [[ONE:v[0-9]+]], 1			; GCN-DAG: v_mov_b32_e32 [[ONE:v[0-9]+]], 1
	; GCN: ds_write_b8 [[NULL]], [[ONE]]			; GCN: ds_write_b8 [[NULL]], [[ONE]]
	; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2			; GCN: v_mov_b32_e32 [[TWO:v[0-9]+]], 2
	; GCN: ds_write_b8 [[NULL]], [[TWO]] offset:16			; GCN: ds_write_b8 [[NULL]], [[TWO]] offset:16
	define amdgpu_kernel void @k0() {			define amdgpu_kernel void @k0() {
	; OPT-LABEL: @k0(			; OPT-LABEL: @k0(
	; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds) ]
	; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
	; OPT-NEXT: store i8 1, ptr addrspace(3) @llvm.amdgcn.module.lds, align 1			; OPT-NEXT: store i8 1, ptr addrspace(3) @llvm.amdgcn.module.lds, align 1
	; OPT-NEXT: store i8 2, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16			; OPT-NEXT: store i8 2, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16
	; OPT-NEXT: call void @f0()			; OPT-NEXT: call void @f0()
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	;			;
	store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1			store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1
	store i8 2, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 2, ptr addrspace(3) @lds.size.16.align.16, align 16
	call void @f0()			call void @f0()
	Show All 15 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds-single-var-ambiguous.ll

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	;
%mul = mul i16 %ld, 4		%mul = mul i16 %ld, 4
store i16 %mul, ptr addrspace(3) @function.lds		store i16 %mul, ptr addrspace(3) @function.lds
ret void		ret void
}		}


define amdgpu_kernel void @k0_f0() {		define amdgpu_kernel void @k0_f0() {
; M_OR_HY-LABEL: @k0_f0(		; M_OR_HY-LABEL: @k0_f0(
; M_OR_HY-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; M_OR_HY-NEXT: call void @f0()		; M_OR_HY-NEXT: call void @f0()
; M_OR_HY-NEXT: ret void		; M_OR_HY-NEXT: ret void
;		;
; TABLE-LABEL: @k0_f0(		; TABLE-LABEL: @k0_f0(
; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k0_f0.lds) ]
; TABLE-NEXT: call void @f0()		; TABLE-NEXT: call void @f0()
; TABLE-NEXT: ret void		; TABLE-NEXT: ret void
;		;
call void @f0()		call void @f0()
ret void		ret void
}		}

define amdgpu_kernel void @k1_f0() {		define amdgpu_kernel void @k1_f0() {
; M_OR_HY-LABEL: @k1_f0(		; M_OR_HY-LABEL: @k1_f0(
; M_OR_HY-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; M_OR_HY-NEXT: call void @f0()		; M_OR_HY-NEXT: call void @f0()
; M_OR_HY-NEXT: ret void		; M_OR_HY-NEXT: ret void
;		;
; TABLE-LABEL: @k1_f0(		; TABLE-LABEL: @k1_f0(
; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k1_f0.lds) ]
; TABLE-NEXT: call void @f0()		; TABLE-NEXT: call void @f0()
; TABLE-NEXT: ret void		; TABLE-NEXT: ret void
;		;
call void @f0()		call void @f0()
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/lower-module-lds-single-var-unambiguous.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
;		;
%ld = load i16, ptr addrspace(3) @f0.lds		%ld = load i16, ptr addrspace(3) @f0.lds
%mul = mul i16 %ld, 3		%mul = mul i16 %ld, 3
store i16 %mul, ptr addrspace(3) @f0.lds		store i16 %mul, ptr addrspace(3) @f0.lds
ret void		ret void
}		}

define amdgpu_kernel void @k_f0() {		define amdgpu_kernel void @k_f0() {
; MODULE-LABEL: @k_f0(		; CHECK-LABEL: @k_f0(
; MODULE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ], !alias.scope [[META5:![0-9]+]], !noalias [[META1]]		; CHECK-NEXT: call void @f0()
; MODULE-NEXT: call void @f0()		; CHECK-NEXT: ret void
; MODULE-NEXT: ret void
;
; TABLE-LABEL: @k_f0(
; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k_f0.lds) ]
; TABLE-NEXT: call void @f0()
; TABLE-NEXT: ret void
;
; K_OR_HY-LABEL: @k_f0(
; K_OR_HY-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k_f0.lds) ]
; K_OR_HY-NEXT: call void @f0()
; K_OR_HY-NEXT: ret void
;		;
call void @f0()		call void @f0()
ret void		ret void
}		}

;; As above, but with the kernel also uing the variable.		;; As above, but with the kernel also uing the variable.

@both.lds = addrspace(3) global i32 undef		@both.lds = addrspace(3) global i32 undef
define void @f_both() {		define void @f_both() {
; MODULE-LABEL: @f_both(		; MODULE-LABEL: @f_both(
; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META4]]		; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5:![0-9]+]], !noalias [[META4]]
; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 4		; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 4
; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META4]]		; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META4]]
; MODULE-NEXT: ret void		; MODULE-NEXT: ret void
;		;
; TABLE-LABEL: @f_both(		; TABLE-LABEL: @f_both(
; TABLE-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()		; TABLE-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()
; TABLE-NEXT: [[BOTH_LDS2:%.*]] = getelementptr inbounds [2 x [2 x i32]], ptr addrspace(4) @llvm.amdgcn.lds.offset.table, i32 0, i32 [[TMP1]], i32 0		; TABLE-NEXT: [[BOTH_LDS2:%.*]] = getelementptr inbounds [2 x [2 x i32]], ptr addrspace(4) @llvm.amdgcn.lds.offset.table, i32 0, i32 [[TMP1]], i32 0
; TABLE-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[BOTH_LDS2]], align 4		; TABLE-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[BOTH_LDS2]], align 4
Show All 15 Lines	;
%ld = load i32, ptr addrspace(3) @both.lds		%ld = load i32, ptr addrspace(3) @both.lds
%mul = mul i32 %ld, 4		%mul = mul i32 %ld, 4
store i32 %mul, ptr addrspace(3) @both.lds		store i32 %mul, ptr addrspace(3) @both.lds
ret void		ret void
}		}

define amdgpu_kernel void @k0_both() {		define amdgpu_kernel void @k0_both() {
; MODULE-LABEL: @k0_both(		; MODULE-LABEL: @k0_both(
; MODULE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META1]]		; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META1]]
; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5		; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5
; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META1]]		; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope [[META5]], !noalias [[META1]]
; MODULE-NEXT: call void @f_both()		; MODULE-NEXT: call void @f_both()
; MODULE-NEXT: ret void		; MODULE-NEXT: ret void
;		;
; TABLE-LABEL: @k0_both(		; TABLE-LABEL: @k0_both(
; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds) ]
; TABLE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4		; TABLE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4
; TABLE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5		; TABLE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5
; TABLE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4		; TABLE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4
; TABLE-NEXT: call void @f_both()		; TABLE-NEXT: call void @f_both()
; TABLE-NEXT: ret void		; TABLE-NEXT: ret void
;		;
; K_OR_HY-LABEL: @k0_both(		; K_OR_HY-LABEL: @k0_both(
; K_OR_HY-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds) ]
; K_OR_HY-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4		; K_OR_HY-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4
; K_OR_HY-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5		; K_OR_HY-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5
; K_OR_HY-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4		; K_OR_HY-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4
; K_OR_HY-NEXT: call void @f_both()		; K_OR_HY-NEXT: call void @f_both()
; K_OR_HY-NEXT: ret void		; K_OR_HY-NEXT: ret void
;		;
%ld = load i32, ptr addrspace(3) @both.lds		%ld = load i32, ptr addrspace(3) @both.lds
%mul = mul i32 %ld, 5		%mul = mul i32 %ld, 5
store i32 %mul, ptr addrspace(3) @both.lds		store i32 %mul, ptr addrspace(3) @both.lds
call void @f_both()		call void @f_both()
ret void		ret void
}		}

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll

Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_endpgm
%mul = mul i64 %ld, 8		%mul = mul i64 %ld, 8
store i64 %mul, ptr addrspace(3) @v2		store i64 %mul, ptr addrspace(3) @v2
ret void		ret void
}		}

; Access two variables, will allocate those two		; Access two variables, will allocate those two
define amdgpu_kernel void @k01() {		define amdgpu_kernel void @k01() {
; OPT-LABEL: @k01(		; OPT-LABEL: @k01(
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds) ]
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; OPT-NEXT: call void @f0()		; OPT-NEXT: call void @f0()
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k01:		; GCN-LABEL: k01:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
Show All 17 Lines
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
call void @f0()		call void @f0()
call void @f1()		call void @f1()
ret void		ret void
}		}

define amdgpu_kernel void @k23() {		define amdgpu_kernel void @k23() {
; OPT-LABEL: @k23(		; OPT-LABEL: @k23(
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds) ], !alias.scope [[META4:![0-9]+]], !noalias [[META7:![0-9]+]]
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: call void @f3()		; OPT-NEXT: call void @f3()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k23:		; GCN-LABEL: k23:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
Show All 21 Lines	; GCN-NEXT: s_endpgm
call void @f2()		call void @f2()
call void @f3()		call void @f3()
ret void		ret void
}		}

; Access and allocate three variables		; Access and allocate three variables
define amdgpu_kernel void @k123() {		define amdgpu_kernel void @k123() {
; OPT-LABEL: @k123(		; OPT-LABEL: @k123(
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ], !alias.scope [[META10:![0-9]+]], !noalias [[META13:![0-9]+]]
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope [[META13]], !noalias [[META10]]		; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope [[META5:![0-9]+]], !noalias [[META8:![0-9]+]]
; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8		; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8
; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope [[META13]], !noalias [[META10]]		; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope [[META5]], !noalias [[META8]]
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k123:		; GCN-LABEL: k123:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
; GCN-NEXT: s_add_i32 s6, s6, s9		; GCN-NEXT: s_add_i32 s6, s6, s9
Show All 34 Lines
!0 = !{i32 0}		!0 = !{i32 0}
!1 = !{i32 2}		!1 = !{i32 2}
!2 = !{i32 1}		!2 = !{i32 1}


; OPT: attributes #0 = { "amdgpu-lds-size"="8" }		; OPT: attributes #0 = { "amdgpu-lds-size"="8" }
; OPT: attributes #1 = { "amdgpu-lds-size"="12" }		; OPT: attributes #1 = { "amdgpu-lds-size"="12" }
; OPT: attributes #2 = { "amdgpu-lds-size"="20" }		; OPT: attributes #2 = { "amdgpu-lds-size"="20" }
; OPT: attributes #3 = { nocallback nofree nosync nounwind willreturn memory(none) }		; OPT: attributes #3 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
; OPT: attributes #4 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

; OPT: !0 = !{i64 0, i64 1}		; OPT: !0 = !{i64 0, i64 1}
; OPT: !1 = !{i64 4, i64 5}		; OPT: !1 = !{i64 4, i64 5}
; OPT: !2 = !{i64 8, i64 9}		; OPT: !2 = !{i64 8, i64 9}
; OPT: !3 = !{i32 1}		; OPT: !3 = !{i32 1}
; OPT: !4 = !{!5}		; OPT: !4 = !{i32 0}
; OPT: !5 = distinct !{!5, !6}		; OPT: !5 = !{!6}
; OPT: !6 = distinct !{!6}		; OPT: !6 = distinct !{!6, !7}
; OPT: !7 = !{!8}		; OPT: !7 = distinct !{!7}
; OPT: !8 = distinct !{!8, !6}		; OPT: !8 = !{!9}
; OPT: !9 = !{i32 0}		; OPT: !9 = distinct !{!9, !7}
; OPT: !10 = !{!11}
; OPT: !11 = distinct !{!11, !12}
; OPT: !12 = distinct !{!12}
; OPT: !13 = !{!14}
; OPT: !14 = distinct !{!14, !12}

; Table size length number-kernels * number-variables * sizeof(uint16_t)		; Table size length number-kernels * number-variables * sizeof(uint16_t)
; GCN: .type llvm.amdgcn.lds.offset.table,@object		; GCN: .type llvm.amdgcn.lds.offset.table,@object
; GCN-NEXT: .section .data.rel.ro,#alloc,#write		; GCN-NEXT: .section .data.rel.ro,#alloc,#write
; GCN-NEXT: .p2align 2, 0x0		; GCN-NEXT: .p2align 2, 0x0
; GCN-NEXT: llvm.amdgcn.lds.offset.table:		; GCN-NEXT: llvm.amdgcn.lds.offset.table:
; GCN-NEXT: .long 8		; GCN-NEXT: .long 8
; GCN-NEXT: .long 0		; GCN-NEXT: .long 0
; GCN-NEXT: .size llvm.amdgcn.lds.offset.table, 8		; GCN-NEXT: .size llvm.amdgcn.lds.offset.table, 8

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_endpgm
%mul = mul i64 %ld, 8		%mul = mul i64 %ld, 8
store i64 %mul, ptr addrspace(3) @v2		store i64 %mul, ptr addrspace(3) @v2
ret void		ret void
}		}

; Access two variables, will allocate those two		; Access two variables, will allocate those two
define amdgpu_kernel void @k01() {		define amdgpu_kernel void @k01() {
; OPT-LABEL: @k01() #0 !llvm.amdgcn.lds.kernel.id !1 {		; OPT-LABEL: @k01() #0 !llvm.amdgcn.lds.kernel.id !1 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds) ]
; OPT-NEXT: call void @f0()		; OPT-NEXT: call void @f0()
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k01:		; GCN-LABEL: k01:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
Show All 20 Lines
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
; GCN: .amdhsa_group_segment_fixed_size 8		; GCN: .amdhsa_group_segment_fixed_size 8
call void @f0()		call void @f0()
call void @f1()		call void @f1()
ret void		ret void
}		}

define amdgpu_kernel void @k23() {		define amdgpu_kernel void @k23() {
; OPT-LABEL: @k23() #1 !llvm.amdgcn.lds.kernel.id !7 {		; OPT-LABEL: @k23() #1 !llvm.amdgcn.lds.kernel.id !2 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds) ]
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: call void @f3()		; OPT-NEXT: call void @f3()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k23:		; GCN-LABEL: k23:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
Show All 21 Lines
; GCN: .amdhsa_group_segment_fixed_size 16		; GCN: .amdhsa_group_segment_fixed_size 16
call void @f2()		call void @f2()
call void @f3()		call void @f3()
ret void		ret void
}		}

; Access and allocate three variables		; Access and allocate three variables
define amdgpu_kernel void @k123() {		define amdgpu_kernel void @k123() {
; OPT-LABEL: @k123() #2 !llvm.amdgcn.lds.kernel.id !13 {		; OPT-LABEL: @k123() #2 !llvm.amdgcn.lds.kernel.id !3 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !20, !noalias !21		; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope [[META4:![0-9]+]], !noalias [[META7:![0-9]+]]
; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8		; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8
; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !20, !noalias !21		; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope [[META4]], !noalias [[META7]]
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k123:		; GCN-LABEL: k123:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
; GCN-NEXT: s_add_i32 s6, s6, s9		; GCN-NEXT: s_add_i32 s6, s6, s9
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

Show All 35 Lines	define void @func() {
%val1 = add i32 %val0, 4		%val1 = add i32 %val0, 4
store i32 %val1, ptr addrspace(3) @var1, align 4		store i32 %val1, ptr addrspace(3) @var1, align 4
%unused0 = atomicrmw add ptr addrspace(3) @with_init, i64 1 monotonic		%unused0 = atomicrmw add ptr addrspace(3) @with_init, i64 1 monotonic
ret void		ret void
}		}

; This kernel calls a function that uses LDS so needs the block		; This kernel calls a function that uses LDS so needs the block
; CHECK-LABEL: @kern_call() #0		; CHECK-LABEL: @kern_call() #0
; CHECK: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; CHECK: call void @func()		; CHECK: call void @func()
; CHECK: %dec = atomicrmw fsub ptr addrspace(3) @llvm.amdgcn.module.lds, float 2.000000e+00 monotonic, align 8		; CHECK: %dec = atomicrmw fsub ptr addrspace(3) @llvm.amdgcn.module.lds, float 2.000000e+00 monotonic, align 8
define amdgpu_kernel void @kern_call() {		define amdgpu_kernel void @kern_call() {
call void @func()		call void @func()
%dec = atomicrmw fsub ptr addrspace(3) @var0, float 2.0 monotonic		%dec = atomicrmw fsub ptr addrspace(3) @var0, float 2.0 monotonic
ret void		ret void
}		}

; This kernel does alloc the LDS block as it makes no calls		; This kernel does alloc the LDS block as it makes no calls
; CHECK-LABEL: @kern_empty()		; CHECK-LABEL: @kern_empty() {
; CHECK-NOT: call void @llvm.donothing()
define spir_kernel void @kern_empty() {		define spir_kernel void @kern_empty() {
ret void		ret void
}		}

; Make sure we don't crash trying to insert code into a kernel		; Make sure we don't crash trying to insert code into a kernel
; declaration.		; declaration.
declare amdgpu_kernel void @kernel_declaration()		declare amdgpu_kernel void @kernel_declaration()

; CHECK: attributes #0 = { "amdgpu-lds-size"="12" }		; CHECK: attributes #0 = { "amdgpu-lds-size"="12" }