Diff 540002

llvm/docs/AMDGPUUsage.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,084 Lines • ▼ Show 20 Lines	.. table:: AMDGPU LLVM IR Attributes
"amdgpu-no-default-queue" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit		"amdgpu-no-default-queue" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
kernel argument that holds the default queue pointer. If this		kernel argument that holds the default queue pointer. If this
attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.		attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.

"amdgpu-no-completion-action" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit		"amdgpu-no-completion-action" Similar to amdgpu-no-implicitarg-ptr, except specific to the implicit
kernel argument that holds the completion action pointer. If this		kernel argument that holds the completion action pointer. If this
attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.		attribute is absent, then the amdgpu-no-implicitarg-ptr is also removed.

		"amdgpu-lds-size" The number of bytes that will be allocated in the Local Data Store at
		address zero. Variables are allocated within this frame using absolute
		symbol metadata, primarily by the AMDGPULowerModuleLDS pass.

======================================= ==========================================================		======================================= ==========================================================

Calling Conventions		Calling Conventions
-------------------		-------------------

The AMDGPU backend supports the following calling conventions:		The AMDGPU backend supports the following calling conventions:

.. table:: AMDGPU Calling Conventions		.. table:: AMDGPU Calling Conventions
▲ Show 20 Lines • Show All 14,297 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

Show First 20 Lines • Show All 1,206 Lines • ▼ Show 20 Lines	bool runOnModule(Module &M) override {
DenseMap<Function , GlobalVariable > KernelToCreatedDynamicLDS =		DenseMap<Function , GlobalVariable > KernelToCreatedDynamicLDS =
lowerDynamicLDSVariables(M, LDSUsesInfo,		lowerDynamicLDSVariables(M, LDSUsesInfo,
KernelsThatIndirectlyAllocateDynamicLDS,		KernelsThatIndirectlyAllocateDynamicLDS,
DynamicVariables, OrderedKernels);		DynamicVariables, OrderedKernels);

// All kernel frames have been allocated. Calculate and record the		// All kernel frames have been allocated. Calculate and record the
// addresses.		// addresses.

{		{
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Currently this frame layout is passed in as hoc fashion based on kernel structs having a name derived from the corresponding kernel. However the backend doesn't actually need the exact frame layout because the globals also have absolute_address metadata on them. Introduce an attribute (name chosen to be similar to amdgpu-gds-size) which records the frame size. JonChesterfield: Currently this frame layout is passed in as hoc fashion based on kernel structs having a name…
const DataLayout &DL = M.getDataLayout();		const DataLayout &DL = M.getDataLayout();

for (Function &Func : M.functions()) {		for (Function &Func : M.functions()) {
if (Func.isDeclaration() \|\| !isKernelLDS(&Func))		if (Func.isDeclaration() \|\| !isKernelLDS(&Func))
continue;		continue;

// All three of these are optional. The first variable is allocated at		// All three of these are optional. The first variable is allocated at
// zero. They are allocated by allocateKnownAddressLDSGlobal in the		// zero. They are allocated by allocateKnownAddressLDSGlobal in the
Show All 37 Lines	// addresses.

if (AllocateDynamicVariable) {		if (AllocateDynamicVariable) {
GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];		GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];

Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));		Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));

recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);		recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
}		}

		if (Offset != 0)
		Func.addFnAttr("amdgpu-lds-size", std::to_string(Offset));
}		}
}		}

for (auto &GV : make_early_inc_range(M.globals()))		for (auto &GV : make_early_inc_range(M.globals()))
if (AMDGPU::isLDSVariableToLower(GV)) {		if (AMDGPU::isLDSVariableToLower(GV)) {
// probably want to remove from used lists		// probably want to remove from used lists
GV.removeDeadConstantUsers();		GV.removeDeadConstantUsers();
if (GV.use_empty())		if (GV.use_empty())
▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp

Show All 37 Lines	AMDGPUMachineFunction::AMDGPUMachineFunction(const Function &F,
// global sizes?		// global sizes?
StringRef S = F.getFnAttribute("amdgpu-gds-size").getValueAsString();		StringRef S = F.getFnAttribute("amdgpu-gds-size").getValueAsString();
if (!S.empty())		if (!S.empty())
S.consumeInteger(0, GDSSize);		S.consumeInteger(0, GDSSize);

// Assume the attribute allocates before any known GDS globals.		// Assume the attribute allocates before any known GDS globals.
StaticGDSSize = GDSSize;		StaticGDSSize = GDSSize;

		// The two separate variables are only profitable when the LDS module lowering
		// pass is disabled. If graphics does not use dynamic LDS, this is never
		// profitable. Leaving cleanup for a later change.
		LDSSize = F.getFnAttributeAsParsedInteger("amdgpu-lds-size", 0);
		StaticLDSSize = LDSSize;

CallingConv::ID CC = F.getCallingConv();		CallingConv::ID CC = F.getCallingConv();
if (CC == CallingConv::AMDGPU_KERNEL \|\| CC == CallingConv::SPIR_KERNEL)		if (CC == CallingConv::AMDGPU_KERNEL \|\| CC == CallingConv::SPIR_KERNEL)
ExplicitKernArgSize = ST.getExplicitKernArgSize(F, MaxKernArgAlign);		ExplicitKernArgSize = ST.getExplicitKernArgSize(F, MaxKernArgAlign);

// FIXME: Shouldn't be target specific		// FIXME: Shouldn't be target specific
Attribute NSZAttr = F.getFnAttribute("no-signed-zeros-fp-math");		Attribute NSZAttr = F.getFnAttribute("no-signed-zeros-fp-math");
NoSignedZerosFPMath =		NoSignedZerosFPMath =
NSZAttr.isStringAttribute() && NSZAttr.getValueAsString() == "true";		NSZAttr.isStringAttribute() && NSZAttr.getValueAsString() == "true";
}		}

unsigned AMDGPUMachineFunction::allocateLDSGlobal(const DataLayout &DL,		unsigned AMDGPUMachineFunction::allocateLDSGlobal(const DataLayout &DL,
const GlobalVariable &GV,		const GlobalVariable &GV,
Align Trailing) {		Align Trailing) {
auto Entry = LocalMemoryObjects.insert(std::pair(&GV, 0));		auto Entry = LocalMemoryObjects.insert(std::pair(&GV, 0));
if (!Entry.second)		if (!Entry.second)
return Entry.first->second;		return Entry.first->second;

Align Alignment =		Align Alignment =
DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());		DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());

unsigned Offset;		unsigned Offset;
if (GV.getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {		if (GV.getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {

		std::optional<uint32_t> MaybeAbs = getLDSAbsoluteAddress(GV);
		if (MaybeAbs) {
		// Absolute address LDS variables that exist prior to the LDS lowering
		// pass raise a fatal error in that pass. These failure modes are only
		// reachable if that lowering pass is disabled or broken. If/when adding
		// support for absolute addresses on user specified variables, the
		// alignment check moves to the lowering pass and the frame calculation
		// needs to take the user variables into consideration.

		uint32_t ObjectStart = *MaybeAbs;

		if (ObjectStart != alignTo(ObjectStart, Alignment)) {
		report_fatal_error("Absolute address LDS variable inconsistent with "
		"variable alignment");
		}

		if (isModuleEntryFunction()) {
		// If this is a module entry function, we can also sanity check against
		// the static frame. Strictly it would be better to check against the
		// attribute, i.e. that the variable is within the always-allocated
		// section, and not within some other non-absolute-address object
		// allocated here, but the extra error detection is minimal and we would
		// have to pass the Function around or cache the attribute value.
		uint32_t ObjectEnd =
		ObjectStart + DL.getTypeAllocSize(GV.getValueType());
		if (ObjectEnd > StaticLDSSize) {
		report_fatal_error(
		"Absolute address LDS variable outside of static frame");
		}
		}

		Entry.first->second = ObjectStart;
		return ObjectStart;
		}

/// TODO: We should sort these to minimize wasted space due to alignment		/// TODO: We should sort these to minimize wasted space due to alignment
/// padding. Currently the padding is decided by the first encountered use		/// padding. Currently the padding is decided by the first encountered use
/// during lowering.		/// during lowering.
Offset = StaticLDSSize = alignTo(StaticLDSSize, Alignment);		Offset = StaticLDSSize = alignTo(StaticLDSSize, Alignment);

StaticLDSSize += DL.getTypeAllocSize(GV.getValueType());		StaticLDSSize += DL.getTypeAllocSize(GV.getValueType());

// Align LDS size to trailing, e.g. for aligning dynamic shared memory		// Align LDS size to trailing, e.g. for aligning dynamic shared memory
Show All 40 Lines
void AMDGPUMachineFunction::allocateKnownAddressLDSGlobal(const Function &F) {		void AMDGPUMachineFunction::allocateKnownAddressLDSGlobal(const Function &F) {
const Module *M = F.getParent();		const Module *M = F.getParent();
// This function is called before allocating any other LDS so that it can		// This function is called before allocating any other LDS so that it can
// reliably put values at known addresses. Consequently, dynamic LDS, if		// reliably put values at known addresses. Consequently, dynamic LDS, if
// present, will not yet have been allocated		// present, will not yet have been allocated

assert(getDynLDSAlign() == Align() && "dynamic LDS not yet allocated");		assert(getDynLDSAlign() == Align() && "dynamic LDS not yet allocated");

if (isModuleEntryFunction()) {		if (isModuleEntryFunction()) {
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions This frame logic will be deleted in a subsequent patch, leaving only amdgpu-lds-size as the lookup. At that point it can probably move to the function constructor and this allocate known address function can be deleted. In the first instance, add a fatal error if the different accounting have gone awry. Landing this patch (and watching it go through internal CI) is supporting evidence that the two data representations are equivalent. JonChesterfield: This frame logic will be deleted in a subsequent patch, leaving only amdgpu-lds-size as the…

// Pointer values start from zero, memory allocated per-kernel-launch		// Pointer values start from zero, memory allocated per-kernel-launch
// Variables can be grouped into a module level struct and a struct per		// Variables can be grouped into a module level struct and a struct per
// kernel function by AMDGPULowerModuleLDSPass. If that is done, they		// kernel function by AMDGPULowerModuleLDSPass. If that is done, they
// are allocated at statically computable addresses here.		// are allocated at statically computable addresses here.
//		//
// Address 0		// Address 0
// {		// {
// llvm.amdgcn.module.lds		// llvm.amdgcn.module.lds
// }		// }
// alignment padding		// alignment padding
// {		// {
// llvm.amdgcn.kernel.some-name.lds		// llvm.amdgcn.kernel.some-name.lds
// }		// }
// other variables, e.g. dynamic lds, allocated after this call		// other variables, e.g. dynamic lds, allocated after this call

const GlobalVariable *GV = M->getNamedGlobal(ModuleLDSName);		const GlobalVariable *GV = M->getNamedGlobal(ModuleLDSName);
const GlobalVariable *KV = getKernelLDSGlobalFromFunction(F);		const GlobalVariable *KV = getKernelLDSGlobalFromFunction(F);

		// Note: When removing allocateKnownAddressLDSGlobal, the write to
		// setDynLDSAlign will be lost. If the dynamic LDS variable is not used in
		// the kernel, nothing will set that alignment. Need to update the comments
		// in lowering about dynamic variables.
const GlobalVariable *Dyn = getKernelDynLDSGlobalFromFunction(F);		const GlobalVariable *Dyn = getKernelDynLDSGlobalFromFunction(F);

if (GV && !canElideModuleLDS(F)) {		if (GV && !canElideModuleLDS(F)) {
unsigned Offset = allocateLDSGlobal(M->getDataLayout(), *GV, Align());		unsigned Offset = allocateLDSGlobal(M->getDataLayout(), *GV, Align());
std::optional<uint32_t> Expect = getLDSAbsoluteAddress(*GV);		std::optional<uint32_t> Expect = getLDSAbsoluteAddress(*GV);
if (!Expect \|\| (Offset != *Expect)) {		if (!Expect \|\| (Offset != *Expect)) {
report_fatal_error("Inconsistent metadata on module LDS variable");		report_fatal_error("Inconsistent metadata on module LDS variable");
}		}
Show All 16 Lines	if (Dyn) {
// at the same address.		// at the same address.
setDynLDSAlign(F, *Dyn);		setDynLDSAlign(F, *Dyn);
unsigned Offset = LDSSize;		unsigned Offset = LDSSize;
std::optional<uint32_t> Expect = getLDSAbsoluteAddress(*Dyn);		std::optional<uint32_t> Expect = getLDSAbsoluteAddress(*Dyn);
if (!Expect \|\| (Offset != *Expect)) {		if (!Expect \|\| (Offset != *Expect)) {
report_fatal_error("Inconsistent metadata on dynamic LDS variable");		report_fatal_error("Inconsistent metadata on dynamic LDS variable");
}		}
}		}

		uint32_t attrSize = F.getFnAttributeAsParsedInteger("amdgpu-lds-size", 0);
		if (attrSize != LDSSize) {
		arsenmUnsubmitted Done Reply Inline Actions Use getFnAttributeAsParsedInteger arsenm: Use getFnAttributeAsParsedInteger
		report_fatal_error("Inconsistent size metadata on LDS variable");
		}
}		}
}		}

std::optional<uint32_t>		std::optional<uint32_t>
AMDGPUMachineFunction::getLDSKernelIdMetadata(const Function &F) {		AMDGPUMachineFunction::getLDSKernelIdMetadata(const Function &F) {
// TODO: Would be more consistent with the abs symbols to use a range		// TODO: Would be more consistent with the abs symbols to use a range
MDNode *MD = F.getMetadata("llvm.amdgcn.lds.kernel.id");		MDNode *MD = F.getMetadata("llvm.amdgcn.lds.kernel.id");
if (MD && MD->getNumOperands() == 1) {		if (MD && MD->getNumOperands() == 1) {
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

	Show All 16 Lines
	; CHECK: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 8, !absolute_symbol !0			; CHECK: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 8, !absolute_symbol !0
	; CHECK: @llvm.compiler.used = appending global [1 x ptr] [ptr addrspacecast (ptr addrspace(3) @llvm.amdgcn.module.lds to ptr)], section "llvm.metadata"			; CHECK: @llvm.compiler.used = appending global [1 x ptr] [ptr addrspacecast (ptr addrspace(3) @llvm.amdgcn.module.lds to ptr)], section "llvm.metadata"
	; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16, !absolute_symbol !0			; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16, !absolute_symbol !0			; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 2, !absolute_symbol !0			; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 2, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 4, !absolute_symbol !0			; CHECK: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 4, !absolute_symbol !0
	;.			;.
	define amdgpu_kernel void @k0() #0 {			define amdgpu_kernel void @k0() #0 {
	; CHECK-LABEL: @k0(			; CHECK-LABEL: @k0() #0
	; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3), align 2, !alias.scope !1, !noalias !4			; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3), align 2, !alias.scope !1, !noalias !4
	; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2), align 4, !alias.scope !8, !noalias !9			; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2), align 4, !alias.scope !8, !noalias !9
	; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 16, !alias.scope !10, !noalias !11			; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 16, !alias.scope !10, !noalias !11
	; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16, !alias.scope !12, !noalias !13			; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16, !alias.scope !12, !noalias !13
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1			store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1

	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16

	ret void			ret void
	}			}

	define amdgpu_kernel void @k1() #0 {			define amdgpu_kernel void @k1() #0 {
	; CHECK-LABEL: @k1(			; CHECK-LABEL: @k1() #1
	; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2), align 4, !alias.scope !14, !noalias !17			; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2), align 4, !alias.scope !14, !noalias !17
	; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1), align 16, !alias.scope !20, !noalias !21			; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1), align 16, !alias.scope !20, !noalias !21
	; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, align 16, !alias.scope !22, !noalias !23			; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, align 16, !alias.scope !22, !noalias !23
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16

	ret void			ret void
	}			}

	define amdgpu_kernel void @k2() #0 {			define amdgpu_kernel void @k2() #0 {
	; CHECK-LABEL: @k2(			; CHECK-LABEL: @k2() #2
	; CHECK-NEXT: store i8 2, ptr addrspace(3) @llvm.amdgcn.kernel.k2.lds, align 2			; CHECK-NEXT: store i8 2, ptr addrspace(3) @llvm.amdgcn.kernel.k2.lds, align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	ret void			ret void
	}			}

	define amdgpu_kernel void @k3() #0 {			define amdgpu_kernel void @k3() #0 {
	; CHECK-LABEL: @k3(			; CHECK-LABEL: @k3() #3
	; CHECK-NEXT: store i8 4, ptr addrspace(3) @llvm.amdgcn.kernel.k3.lds, align 4			; CHECK-NEXT: store i8 4, ptr addrspace(3) @llvm.amdgcn.kernel.k3.lds, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	ret void			ret void
	}			}

				; CHECK-LABEL: @calls_f0() #4
	define amdgpu_kernel void @calls_f0() {			define amdgpu_kernel void @calls_f0() {
	call void @f0()			call void @f0()
	ret void			ret void
	}			}

	define void @f0() {			define void @f0() {
	; CHECK-LABEL: define void @f0(			; CHECK-LABEL: define void @f0()
	; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.module.lds.t, ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 8, !noalias !24			; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.module.lds.t, ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 8, !noalias !24
	; CHECK-NEXT: store i8 8, ptr addrspace(3) @llvm.amdgcn.module.lds, align 8, !noalias !24			; CHECK-NEXT: store i8 8, ptr addrspace(3) @llvm.amdgcn.module.lds, align 8, !noalias !24
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1			store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1

	store i8 8, ptr addrspace(3) @lds.size.8.align.8, align 4			store i8 8, ptr addrspace(3) @lds.size.8.align.8, align 4

	ret void			ret void
	}			}

	attributes #0 = { "amdgpu-elide-module-lds" }			; CHECK: attributes #0 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="23" }
	; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }			; CHECK: attributes #1 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="22" }
				; CHECK: attributes #2 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="2" }
				; CHECK: attributes #3 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="4" }
				; CHECK: attributes #4 = { "amdgpu-lds-size"="9" }

	; CHECK: !0 = !{i64 0, i64 1}			; CHECK: !0 = !{i64 0, i64 1}

llvm/test/CodeGen/AMDGPU/lower-module-lds-all-indirect-accesses.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=hybrid < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=hybrid < %s \| FileCheck %s

	;; Reduced from a larger test case. Checks that functions and kernels that use only dynamic lds			;; Reduced from a larger test case. Checks that functions and kernels that use only dynamic lds
	;; are lowered successfully. Previously they only worked if the kernel happened to also use static lds			;; are lowered successfully. Previously they only worked if the kernel happened to also use static lds
	;; variables. Artefact of implementing dynamic variables by adapting existing code for static.			;; variables. Artefact of implementing dynamic variables by adapting existing code for static.

	@A = external addrspace(3) global [8 x ptr]			@A = external addrspace(3) global [8 x ptr]
	@B = external addrspace(3) global [0 x i32]			@B = external addrspace(3) global [0 x i32]

	define amdgpu_kernel void @kernel_0() {			define amdgpu_kernel void @kernel_0() {
	; CHECK-LABEL: define amdgpu_kernel void @kernel_0() !llvm.amdgcn.lds.kernel.id !1 {			; CHECK-LABEL: define amdgpu_kernel void @kernel_0() #0 !llvm.amdgcn.lds.kernel.id !1 {
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.kernel_0.lds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.kernel_0.lds) ]
	; CHECK-NEXT: call void @call_store_A()			; CHECK-NEXT: call void @call_store_A()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @call_store_A()			call void @call_store_A()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_1() {			define amdgpu_kernel void @kernel_1() {
	; CHECK-LABEL: define amdgpu_kernel void @kernel_1() !llvm.amdgcn.lds.kernel.id !2 {			; CHECK-LABEL: define amdgpu_kernel void @kernel_1() !llvm.amdgcn.lds.kernel.id !2 {
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel_1.dynlds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel_1.dynlds) ]
	; CHECK-NEXT: [[PTR:%.*]] = call ptr @get_B_ptr()			; CHECK-NEXT: [[PTR:%.*]] = call ptr @get_B_ptr()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%ptr = call ptr @get_B_ptr()			%ptr = call ptr @get_B_ptr()
	ret void			ret void
	}			}

	define amdgpu_kernel void @kernel_2() {			define amdgpu_kernel void @kernel_2() {
	; CHECK-LABEL: define amdgpu_kernel void @kernel_2() !llvm.amdgcn.lds.kernel.id !3 {			; CHECK-LABEL: define amdgpu_kernel void @kernel_2() #0 !llvm.amdgcn.lds.kernel.id !3 {
	; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.kernel_2.lds) ]			; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.kernel_2.lds) ]
	; CHECK-NEXT: call void @store_A()			; CHECK-NEXT: call void @store_A()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @store_A()			call void @store_A()
	ret void			ret void
	}			}

	Show All 36 Lines
	; CHECK-NEXT: [[B:%.*]] = getelementptr inbounds [4 x i32], ptr addrspace(4) @llvm.amdgcn.dynlds.offset.table, i32 0, i32 [[TMP1]]			; CHECK-NEXT: [[B:%.*]] = getelementptr inbounds [4 x i32], ptr addrspace(4) @llvm.amdgcn.dynlds.offset.table, i32 0, i32 [[TMP1]]
	; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[B]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[B]], align 4
	; CHECK-NEXT: [[B1:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)			; CHECK-NEXT: [[B1:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)
	; CHECK-NEXT: [[TMP3:%.*]] = addrspacecast ptr addrspace(3) [[B1]] to ptr			; CHECK-NEXT: [[TMP3:%.*]] = addrspacecast ptr addrspace(3) [[B1]] to ptr
	; CHECK-NEXT: ret ptr [[TMP3]]			; CHECK-NEXT: ret ptr [[TMP3]]
	;			;
	ret ptr addrspacecast (ptr addrspace(3) @B to ptr)			ret ptr addrspacecast (ptr addrspace(3) @B to ptr)
	}			}

				; CHECK: attributes #0 = { "amdgpu-lds-size"="64" }

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	; CHECK: store i32 %x, ptr %5, align 4			; CHECK: store i32 %x, ptr %5, align 4
	; CHECK: ret void			; CHECK: ret void
	define void @set_func(i32 %x) local_unnamed_addr #1 {			define void @set_func(i32 %x) local_unnamed_addr #1 {
	entry:			entry:
	store i32 %x, ptr inttoptr (i64 add (i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64)) to ptr), align 4			store i32 %x, ptr inttoptr (i64 add (i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64)) to ptr), align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: @timestwo() #0			; CHECK-LABEL: @timestwo() #1
	; CHECK-NOT: call void @llvm.donothing()			; CHECK-NOT: call void @llvm.donothing()

	; CHECK: %1 = addrspacecast ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds to ptr			; CHECK: %1 = addrspacecast ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds to ptr
	; CHECK: %2 = ptrtoint ptr %1 to i64			; CHECK: %2 = ptrtoint ptr %1 to i64
	; CHECK: %3 = addrspacecast ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.timestwo.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds, i32 0, i32 1) to ptr			; CHECK: %3 = addrspacecast ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.timestwo.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds, i32 0, i32 1) to ptr
	; CHECK: %4 = ptrtoint ptr %3 to i64			; CHECK: %4 = ptrtoint ptr %3 to i64
	; CHECK: %5 = add i64 %2, %4			; CHECK: %5 = add i64 %2, %4
	; CHECK: %6 = inttoptr i64 %5 to ptr			; CHECK: %6 = inttoptr i64 %5 to ptr
	; CHECK: %ld = load i32, ptr %6, align 4			; CHECK: %ld = load i32, ptr %6, align 4
	; CHECK: %mul = mul i32 %ld, 2			; CHECK: %mul = mul i32 %ld, 2
	; CHECK: %7 = addrspacecast ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.timestwo.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds, i32 0, i32 1) to ptr			; CHECK: %7 = addrspacecast ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.timestwo.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds, i32 0, i32 1) to ptr
	; CHECK: %8 = ptrtoint ptr %7 to i64			; CHECK: %8 = ptrtoint ptr %7 to i64
	; CHECK: %9 = addrspacecast ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds to ptr			; CHECK: %9 = addrspacecast ptr addrspace(3) @llvm.amdgcn.kernel.timestwo.lds to ptr
	; CHECK: %10 = ptrtoint ptr %9 to i64			; CHECK: %10 = ptrtoint ptr %9 to i64
	; CHECK: %11 = add i64 %8, %10			; CHECK: %11 = add i64 %8, %10
	; CHECK: %12 = inttoptr i64 %11 to ptr			; CHECK: %12 = inttoptr i64 %11 to ptr
	; CHECK: store i32 %mul, ptr %12, align 4			; CHECK: store i32 %mul, ptr %12, align 4
	; CHECK: ret void			; CHECK: ret void
	define amdgpu_kernel void @timestwo() {			define amdgpu_kernel void @timestwo() #1 {
	%ld = load i32, ptr inttoptr (i64 add (i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @kern to ptr) to i64)) to ptr), align 4			%ld = load i32, ptr inttoptr (i64 add (i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @kern to ptr) to i64)) to ptr), align 4
	%mul = mul i32 %ld, 2			%mul = mul i32 %ld, 2
	store i32 %mul, ptr inttoptr (i64 add (i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @kern to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64)) to ptr), align 4			store i32 %mul, ptr inttoptr (i64 add (i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @kern to ptr) to i64), i64 ptrtoint (ptr addrspacecast (ptr addrspace(3) @b_both to ptr) to i64)) to ptr), align 4
	ret void			ret void
	}			}

	; CHECK-LABEL: @through_functions()			; CHECK-LABEL: @through_functions() #2
	define amdgpu_kernel void @through_functions() {			define amdgpu_kernel void @through_functions() {
	%ld = call i32 @get_func()			%ld = call i32 @get_func()
	%mul = mul i32 %ld, 4			%mul = mul i32 %ld, 4
	call void @set_func(i32 %mul)			call void @set_func(i32 %mul)
	ret void			ret void
	}			}

	attributes #0 = { "amdgpu-elide-module-lds" }			attributes #0 = { "amdgpu-elide-module-lds" }
	; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }			; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }
				; CHECK: attributes #1 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="8" }
				; CHECK: attributes #2 = { "amdgpu-lds-size"="8" }

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
; CHECK-NEXT: call void @use_shared2()		; CHECK-NEXT: call void @use_shared2()
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
call void @use_shared2()		call void @use_shared2()
ret void		ret void
}		}

define amdgpu_kernel void @expect_align4() {		define amdgpu_kernel void @expect_align4() {
; CHECK-LABEL: @expect_align4() !llvm.amdgcn.lds.kernel.id !4 {		; CHECK-LABEL: @expect_align4() #2 !llvm.amdgcn.lds.kernel.id !4 {
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_align4.dynlds) ]		; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_align4.dynlds) ]
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]		; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; CHECK-NEXT: call void @use_shared4()		; CHECK-NEXT: call void @use_shared4()
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
call void @use_shared4()		call void @use_shared4()
ret void		ret void
}		}
Show All 10 Lines	;
%arrayidx = getelementptr inbounds [0 x i64], ptr addrspace(3) @dynamic_shared8, i32 0, i32 9		%arrayidx = getelementptr inbounds [0 x i64], ptr addrspace(3) @dynamic_shared8, i32 0, i32 9
store i64 3, ptr addrspace(3) %arrayidx		store i64 3, ptr addrspace(3) %arrayidx
call void @use_shared8()		call void @use_shared8()
ret void		ret void
}		}

; Note: use_shared4 uses module.lds so this will allocate at offset 4		; Note: use_shared4 uses module.lds so this will allocate at offset 4
define amdgpu_kernel void @expect_max_of_2_and_4() {		define amdgpu_kernel void @expect_max_of_2_and_4() {
; CHECK-LABEL: @expect_max_of_2_and_4() !llvm.amdgcn.lds.kernel.id !6 {		; CHECK-LABEL: @expect_max_of_2_and_4() #2 !llvm.amdgcn.lds.kernel.id !6 {
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_max_of_2_and_4.dynlds) ]		; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.expect_max_of_2_and_4.dynlds) ]
; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]		; CHECK-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; CHECK-NEXT: call void @use_shared2()		; CHECK-NEXT: call void @use_shared2()
; CHECK-NEXT: call void @use_shared4()		; CHECK-NEXT: call void @use_shared4()
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
call void @use_shared2()		call void @use_shared2()
call void @use_shared4()		call void @use_shared4()
ret void		ret void
}		}


attributes #0 = { noinline }		attributes #0 = { noinline }

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)		; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
; CHECK: declare void @llvm.donothing() #2		; CHECK: declare void @llvm.donothing() #3

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)		; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
; CHECK: declare i32 @llvm.amdgcn.lds.kernel.id() #3		; CHECK: declare i32 @llvm.amdgcn.lds.kernel.id() #4

; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }		; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }
; CHECK: attributes #1 = { noinline }		; CHECK: attributes #1 = { noinline }
; CHECK: attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }		; CHECK: attributes #2 = { "amdgpu-lds-size"="4" }
; CHECK: attributes #3 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }		; CHECK: attributes #3 = { nocallback nofree nosync nounwind willreturn memory(none) }
		; CHECK: attributes #4 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

; CHECK: !0 = !{i64 0, i64 1}		; CHECK: !0 = !{i64 0, i64 1}
; CHECK: !1 = !{i64 4, i64 5}		; CHECK: !1 = !{i64 4, i64 5}
; CHECK: !2 = !{i32 0}		; CHECK: !2 = !{i32 0}
; CHECK: !3 = !{i32 1}		; CHECK: !3 = !{i32 1}
; CHECK: !4 = !{i32 2}		; CHECK: !4 = !{i32 2}
; CHECK: !5 = !{i32 3}		; CHECK: !5 = !{i32 3}
; CHECK: !6 = !{i32 4}		; CHECK: !6 = !{i32 4}

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll

	Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines

	; OPT: declare i32 @llvm.amdgcn.lds.kernel.id()			; OPT: declare i32 @llvm.amdgcn.lds.kernel.id()

	!0 = !{i32 0}			!0 = !{i32 0}
	!1 = !{i32 2}			!1 = !{i32 2}
	!2 = !{i32 1}			!2 = !{i32 1}


	; OPT: attributes #0 = { "amdgpu-elide-module-lds" }			; OPT: attributes #0 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="8" }
	; OPT: attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }			; OPT: attributes #1 = { "amdgpu-lds-size"="8" }
	; OPT: attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }			; OPT: attributes #2 = { "amdgpu-elide-module-lds" "amdgpu-lds-size"="12" }
				; OPT: attributes #3 = { "amdgpu-lds-size"="20" }
				; OPT: attributes #4 = { nocallback nofree nosync nounwind willreturn memory(none) }
				; OPT: attributes #5 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }

	; OPT: !0 = !{i64 0, i64 1}			; OPT: !0 = !{i64 0, i64 1}
	; OPT: !1 = !{i64 4, i64 5}			; OPT: !1 = !{i64 4, i64 5}
	; OPT: !2 = !{i64 8, i64 9}			; OPT: !2 = !{i64 8, i64 9}
	; OPT: !3 = !{i32 1}			; OPT: !3 = !{i32 1}
	; OPT: !4 = !{!5}			; OPT: !4 = !{!5}
	; OPT: !5 = distinct !{!5, !6}			; OPT: !5 = distinct !{!5, !6}
	; OPT: !6 = distinct !{!6}			; OPT: !6 = distinct !{!6}
	Show All 17 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_setpc_b64 s[30:31]
%ld = load i8, ptr addrspace(3) @v3		%ld = load i8, ptr addrspace(3) @v3
%mul = mul i8 %ld, 5		%mul = mul i8 %ld, 5
store i8 %mul, ptr addrspace(3) @v3		store i8 %mul, ptr addrspace(3) @v3
ret void		ret void
}		}

; Doesn't access any via a function, won't be in the lookup table		; Doesn't access any via a function, won't be in the lookup table
define amdgpu_kernel void @kernel_no_table() {		define amdgpu_kernel void @kernel_no_table() {
; OPT-LABEL: @kernel_no_table() {		; OPT-LABEL: @kernel_no_table() #0 {
; OPT-NEXT: [[LD:%.*]] = load i64, ptr addrspace(3) @llvm.amdgcn.kernel.kernel_no_table.lds, align 8		; OPT-NEXT: [[LD:%.*]] = load i64, ptr addrspace(3) @llvm.amdgcn.kernel.kernel_no_table.lds, align 8
; OPT-NEXT: [[MUL:%.*]] = mul i64 [[LD]], 8		; OPT-NEXT: [[MUL:%.*]] = mul i64 [[LD]], 8
; OPT-NEXT: store i64 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.kernel_no_table.lds, align 8		; OPT-NEXT: store i64 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.kernel_no_table.lds, align 8
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: kernel_no_table:		; GCN-LABEL: kernel_no_table:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: v_mov_b32_e32 v2, 0		; GCN-NEXT: v_mov_b32_e32 v2, 0
; GCN-NEXT: s_mov_b32 m0, -1		; GCN-NEXT: s_mov_b32 m0, -1
; GCN-NEXT: ds_read_b64 v[0:1], v2		; GCN-NEXT: ds_read_b64 v[0:1], v2
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: v_lshl_b64 v[0:1], v[0:1], 3		; GCN-NEXT: v_lshl_b64 v[0:1], v[0:1], 3
; GCN-NEXT: ds_write_b64 v2, v[0:1]		; GCN-NEXT: ds_write_b64 v2, v[0:1]
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
%ld = load i64, ptr addrspace(3) @v2		%ld = load i64, ptr addrspace(3) @v2
%mul = mul i64 %ld, 8		%mul = mul i64 %ld, 8
store i64 %mul, ptr addrspace(3) @v2		store i64 %mul, ptr addrspace(3) @v2
ret void		ret void
}		}

; Access two variables, will allocate those two		; Access two variables, will allocate those two
define amdgpu_kernel void @k01() {		define amdgpu_kernel void @k01() {
; OPT-LABEL: @k01() !llvm.amdgcn.lds.kernel.id !1 {		; OPT-LABEL: @k01() #0 !llvm.amdgcn.lds.kernel.id !1 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds) ]		; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds) ]
; OPT-NEXT: call void @f0()		; OPT-NEXT: call void @f0()
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k01:		; GCN-LABEL: k01:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
Show All 21 Lines
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
; GCN: .amdhsa_group_segment_fixed_size 8		; GCN: .amdhsa_group_segment_fixed_size 8
call void @f0()		call void @f0()
call void @f1()		call void @f1()
ret void		ret void
}		}

define amdgpu_kernel void @k23() {		define amdgpu_kernel void @k23() {
; OPT-LABEL: @k23() !llvm.amdgcn.lds.kernel.id !7 {		; OPT-LABEL: @k23() #1 !llvm.amdgcn.lds.kernel.id !7 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds) ]		; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds) ]
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: call void @f3()		; OPT-NEXT: call void @f3()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k23:		; GCN-LABEL: k23:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
Show All 22 Lines
; GCN: .amdhsa_group_segment_fixed_size 16		; GCN: .amdhsa_group_segment_fixed_size 16
call void @f2()		call void @f2()
call void @f3()		call void @f3()
ret void		ret void
}		}

; Access and allocate three variables		; Access and allocate three variables
define amdgpu_kernel void @k123() {		define amdgpu_kernel void @k123() {
; OPT-LABEL: @k123() !llvm.amdgcn.lds.kernel.id !13 {		; OPT-LABEL: @k123() #2 !llvm.amdgcn.lds.kernel.id !13 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]		; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !20, !noalias !21		; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !20, !noalias !21
; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8		; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8
; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !20, !noalias !21		; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !20, !noalias !21
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
Show All 34 Lines	; GCN: .amdhsa_group_segment_fixed_size 16
store i8 %mul, ptr addrspace(3) @v3		store i8 %mul, ptr addrspace(3) @v3
call void @f2()		call void @f2()
ret void		ret void
}		}


; OPT: declare i32 @llvm.amdgcn.lds.kernel.id()		; OPT: declare i32 @llvm.amdgcn.lds.kernel.id()

		; OPT: attributes #0 = { "amdgpu-lds-size"="8" }
		; OPT: attributes #1 = { "amdgpu-lds-size"="12" }
		; OPT: attributes #2 = { "amdgpu-lds-size"="16" }

!0 = !{i64 0, i64 1}		!0 = !{i64 0, i64 1}
!1 = !{i32 0}		!1 = !{i32 0}
!2 = !{i32 2}		!2 = !{i32 2}
!3 = !{i32 1}		!3 = !{i32 1}


; Table size length number-kernels * number-variables * sizeof(uint16_t)		; Table size length number-kernels * number-variables * sizeof(uint16_t)
; GCN: .type llvm.amdgcn.lds.offset.table,@object		; GCN: .type llvm.amdgcn.lds.offset.table,@object
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

Show All 34 Lines	define void @func() {
%val0 = load i32, ptr addrspace(3) @var1, align 4		%val0 = load i32, ptr addrspace(3) @var1, align 4
%val1 = add i32 %val0, 4		%val1 = add i32 %val0, 4
store i32 %val1, ptr addrspace(3) @var1, align 4		store i32 %val1, ptr addrspace(3) @var1, align 4
%unused0 = atomicrmw add ptr addrspace(3) @with_init, i64 1 monotonic		%unused0 = atomicrmw add ptr addrspace(3) @with_init, i64 1 monotonic
ret void		ret void
}		}

; This kernel calls a function that uses LDS so needs the block		; This kernel calls a function that uses LDS so needs the block
; CHECK-LABEL: @kern_call()		; CHECK-LABEL: @kern_call() #0
; CHECK: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]		; CHECK: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; CHECK: call void @func()		; CHECK: call void @func()
; CHECK: %dec = atomicrmw fsub ptr addrspace(3) @llvm.amdgcn.module.lds, float 2.000000e+00 monotonic, align 8		; CHECK: %dec = atomicrmw fsub ptr addrspace(3) @llvm.amdgcn.module.lds, float 2.000000e+00 monotonic, align 8
define amdgpu_kernel void @kern_call() {		define amdgpu_kernel void @kern_call() {
call void @func()		call void @func()
%dec = atomicrmw fsub ptr addrspace(3) @var0, float 2.0 monotonic		%dec = atomicrmw fsub ptr addrspace(3) @var0, float 2.0 monotonic
ret void		ret void
}		}

; This kernel does alloc the LDS block as it makes no calls		; This kernel does alloc the LDS block as it makes no calls
; CHECK-LABEL: @kern_empty()		; CHECK-LABEL: @kern_empty() #1
; CHECK-NOT: call void @llvm.donothing()		; CHECK-NOT: call void @llvm.donothing()
define spir_kernel void @kern_empty() #0{		define spir_kernel void @kern_empty() #0{
ret void		ret void
}		}

; Make sure we don't crash trying to insert code into a kernel		; Make sure we don't crash trying to insert code into a kernel
; declaration.		; declaration.
declare amdgpu_kernel void @kernel_declaration()		declare amdgpu_kernel void @kernel_declaration()

attributes #0 = { "amdgpu-elide-module-lds" }		attributes #0 = { "amdgpu-elide-module-lds" }
; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }
		; CHECK: attributes #0 = { "amdgpu-lds-size"="12" }
		; CHECK: attributes #1 = { "amdgpu-elide-module-lds" }
		arsenmUnsubmitted Not Done Reply Inline Actions is this one going away? arsenm: is this one going away?
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Yep. JonChesterfield: Yep.

This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][lds] Introduce LDS frame size function attribute
AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 540002

llvm/docs/AMDGPUUsage.rst

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-all-indirect-accesses.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][lds] Introduce LDS frame size function attributeAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 540002

llvm/docs/AMDGPUUsage.rst

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-all-indirect-accesses.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-indirect-extern-uses-max-reachable-alignment.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll

llvm/test/CodeGen/AMDGPU/lower-module-lds.ll

[amdgpu][lds] Introduce LDS frame size function attribute
AbandonedPublic