This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD
ClosedPublic

Authored by JonChesterfield on Feb 16 2023, 2:46 PM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec
foad

Commits

rGd3dda422bfd1: [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD

Summary

Post ISel, LDS variables are absolute values. Representing them as
such is simpler than the frame recalculation currently used to build assembler
tables from their addresses.

This is a precursor to lowering dynamic/external LDS accesses from non-kernel
functions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

JonChesterfield created this revision.Feb 16 2023, 2:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 2:46 PM

Herald added subscribers: kosarev, StephenFan, kerbowa and 6 others. · View Herald Transcript

JonChesterfield requested review of this revision.Feb 16 2023, 2:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2023, 2:46 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

arsenm added inline comments.Feb 16 2023, 3:03 PM

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
515 ↗	(On Diff #498146)	F.getParent
599 ↗	(On Diff #498146)	Ditto
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112	Changing the IR in a codegen pass is bad. Can you set this earlier?

JonChesterfield added inline comments.Feb 16 2023, 3:58 PM

llvm/lib/Target/AMDGPU/AMDGPUCallLowering.cpp
515 ↗	(On Diff #498146)	Nope, F.getParent will yield a const Module, need a non-const one in order to get a non-const GlobalVariable in order to call setMetadata on it. Otherwise would just call F.getParent within allocateKnownAddressLDSGlobal.
516 ↗	(On Diff #498146)	Making the function argument non-const would be simpler for the allocateKnownAddress call but makes a mess of some of the call sites
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112	Setting metadata on the IR during codegen is not pretty but it's also inconsequential. The frame calculation this deletes is also quite ugly though. For the existing cases we can set the metadata in the IR lowering (carefully, but accurately enough). It'll make changes to promote alloca fragile. That won't work for external/dynamic LDS as we don't know that address until the end of ISel. So yes... if we want a hard no on setting metadata to record information learned during codegen, we can avoid it. Basically by lifting the remaining LDS lowering currently done by codegen wholly out of codegen. That'll change how dynamic LDS is handled from kernels which I was hoping to avoid. If we're up for tagging variables with metadata at this point we can write the address chosen for dynamic LDS variables the same way and the lowerConstant path added here will do the right thing for them.

JonChesterfield mentioned this in D144233: [amdgpu] Implement dynamic LDS accesses from non-kernel functions.Feb 16 2023, 3:59 PM

Created D144233 (which was supposed to be a diff relative to this patch but actually has all the line noise in it anyway).
That is not ready to go but shows how function scope external LDS can be layered on the same metadata scheme proposed here. Main difference is that we really don't know that address until codegen time, at least as presently implemented.

Harbormaster completed remote builds in B214256: Diff 498146.Feb 16 2023, 4:54 PM

JonChesterfield added inline comments.Feb 16 2023, 5:00 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120	Note to self - this shouldn't be dead after this patch - find out what happened to the lowering path for the currently unused "direct" scheme

JonChesterfield added inline comments.Feb 16 2023, 5:42 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120	I think a block was lost in a git merge - AMDGPUIselLowering should have a block in it like: if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) { if (!MFI->isModuleEntryFunction()) { if (const GlobalVariable GVar = dyn_cast<GlobalVariable>(GV)) { if (AMDGPUMachineFunction::isKnownAddressLDSGlobal(GVar)) { unsigned Offset = AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(*GVar); return DAG.getConstant(Offset, SDLoc(Op), Op.getValueType()); } } } } that allows direct access to kernel allocated variables from functions where the access is unambiguous. This highlights missing test coverage for the "kernel" lowering strategy (from IR to ISA). Bug is latent but will complicate getting rid of the calculateKnownAddressOfLDSGlobal function.

arsenm added inline comments.Feb 20 2023, 12:59 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112	So you're implying the promote alloca introduced LDS still has possibly unpredictable offsets assigned for it?

arsenm added inline comments.Feb 20 2023, 1:02 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120	So if you set the absolute_address in the IR pass, and LowerGlobalAddress respected it, how would that be less reliable?

JonChesterfield planned changes to this revision.Feb 23 2023, 6:25 AM

JonChesterfield added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
112	Yep. It's a phase ordering problem. promoteAlloca needs to know how much LDS is already allocated to estimate how much more it can use. Some of that might be allocated for non-kernel functions, so LowerModuleLDS runs before promoteAlloca to make those allocations visible to it. LowerModuleLDS thus doesn't know how much (if any) extra will be introduced by promoteAlloca.At least, it doesn't at present. Then the lowered frame looks like: { managed by lowermodulelds introduced by promotealloca (and maybe other things? none I know of) maybe alignment dynamic lds goes here } Also if promotealloca introduces more than one lds variable (haven't checked, seems reasonable that it might) and introduces them as independent values with different alignment, then we're back to allocate-in-dag-traversal order, in which case the address of dynamic lds is unstable.
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
120	Would work identically for the module/kernel specific structures as they are deterministic. Actually somewhat prettier to set those via metadata as it avoids the problem of codegen a function before the corresponding kernel. The above ^ would change to look at metadata on the variables. That will unwind a little if something after lowermodulelds decides to change those structs, but that's manageable. Even if we can't mutate the IR during codegen for dynamic lds, I think that's an objective improvement to the current lowering scheme involving repeated frame calculations.

JonChesterfield mentioned this in D141852: [amdgpu] Change LDS lowering default to hybrid.Feb 24 2023, 7:27 AM

rebase

Harbormaster completed remote builds in B215761: Diff 500201.Feb 24 2023, 1:05 PM

Set the metadata in lowermodulelds - test noise update still todo

JonChesterfield edited the summary of this revision. (Show Details)Feb 25 2023, 2:03 AM

factor out the md parse

drop one function, move a second out of the header

This is still a NFC in terms of machine code but introducing the metadata during LDS lowering introduces a lot of churn into the tests, leaving updating those for a later patch.

Strategy of emitting metadata in the IR pass and then checking allocation matches it removes the frame calculation functions from AMDGPUMachineFunction and reduces (but doesn't quite eliminate) the function/struct symbol name string effort. It's probably possible to tag functions with the struct directly (using more metadata) to remove that entirely.

The primary drawback is that this uses metadata to convey information where we will miscompile if that metadata is removed, directly at odds with the design intent of metadata. However I claim that ship has sailed, and we already assume that some metadata makes it to the backend safely. That the lowering pass runs as part of codegen, not as part of per-TU opt, reduces the risk of another pass invalidating it.

Aside from the metadata-might-be-invalidated hazard, I consider this approach better than the frame calculation in every respect, and would have done it this way around originally if I'd noticed the absolute symbol construct before writing the frame calculation approach.

This is ready for code review - hopefully nothing too surprising in the implementation - but needs a large mechanical update to eight tests to introduce the metadata and change some integers.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
101	Function has four call sites across three files, think it's probably worth factoring out.

JonChesterfield added a subscriber: ronlieb.Feb 25 2023, 2:36 AM

JonChesterfield added inline comments.Feb 25 2023, 2:38 AM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h
108–109	Might be better to specialise this to take a global, check for absolute_symbol and rename the function to match. optional<uint32> getAbsoluteSymbolMetadata(const GlobalVariable *) perhaps.

Harbormaster completed remote builds in B215915: Diff 500400.Feb 25 2023, 5:03 AM

In D144221#4152246, @JonChesterfield wrote:

The primary drawback is that this uses metadata to convey information where we will miscompile if that metadata is removed, directly at odds with the design intent of metadata.

absolute_symbol is one of the few cases for global metadata which you are specifically not allowed to drop as per the LangRef.

arsenm added inline comments.Mar 3 2023, 1:01 PM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
616	Don't like variables shadowing type names (also hate the LLVM capitalized naming convention). auto *IntTy?
llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
104	Early return (or just move the null check to the caller). The verifier should really be enforcing the operand count so every user doesn't need to check that it's 1. Should add the verifier check and drop the check

arsenm added inline comments.Mar 3 2023, 1:40 PM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
618	Going based on the langref, this is incorrect usage of absolute_symbol. It seems to expect 2 operands, indicating an absolute range. I assume you can specify specific address, specific address + 1 for a fixed address. Again demonstrates that the verifier really should check this

arsenm added inline comments.Mar 3 2023, 1:43 PM

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
104	Also there is already GlobalValue::getAbsoluteSymbolRange()

review comments, update tests

I think that's all comments fixed except for teaching the verifier about range metadata which should be done separately to this.

JonChesterfield edited the summary of this revision. (Show Details)Mar 9 2023, 6:41 PM

Harbormaster completed remote builds in B218573: Diff 504003.Mar 9 2023, 7:40 PM

JonChesterfield added inline comments.Mar 10 2023, 10:14 AM

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
826	This is right but moderately annoying to extend for D144233 - would rather land this as is and then amend

arsenm accepted this revision.Mar 10 2023, 4:29 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp
158	This should probably be an attribute since it doesn't have the special case must-not-drop property but I guess it already was metadata

This revision is now accepted and ready to land.Mar 10 2023, 4:29 PM

This revision was landed with ongoing or failed builds.Mar 12 2023, 6:48 AM

Closed by commit rGd3dda422bfd1: [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD (authored by JonChesterfield). · Explain Why

This revision was automatically updated to reflect the committed changes.

JonChesterfield added a commit: rGd3dda422bfd1: [amdgpu][nfc] Replace ad hoc LDS frame recalculation with absolute_symbol MD.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelLowering.cpp

13 lines

AMDGPULowerModuleLDSPass.cpp

46 lines

AMDGPUMCInstLower.cpp

12 lines

AMDGPUMachineFunction.h

14 lines

AMDGPUMachineFunction.cpp

101 lines

test/

CodeGen/

AMDGPU/

lower-kernel-and-module-lds.ll

30 lines

lower-kernel-lds.ll

20 lines

lower-lds-struct-aa-memcpy.ll

29 lines

lower-lds-struct-aa-merge.ll

21 lines

lower-lds-struct-aa.ll

51 lines

lower-module-lds-single-var-unambiguous.ll

14 lines

lower-module-lds-via-hybrid.ll

29 lines

lower-module-lds-via-table.ll

25 lines

Diff 504432

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

	Show First 20 Lines • Show All 1,323 Lines • ▼ Show 20 Lines
	SDValue AMDGPUTargetLowering::LowerGlobalAddress(AMDGPUMachineFunction* MFI,			SDValue AMDGPUTargetLowering::LowerGlobalAddress(AMDGPUMachineFunction* MFI,
	SDValue Op,			SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {

	const DataLayout &DL = DAG.getDataLayout();			const DataLayout &DL = DAG.getDataLayout();
	GlobalAddressSDNode *G = cast<GlobalAddressSDNode>(Op);			GlobalAddressSDNode *G = cast<GlobalAddressSDNode>(Op);
	const GlobalValue *GV = G->getGlobal();			const GlobalValue *GV = G->getGlobal();

	if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
	if (!MFI->isModuleEntryFunction()) {			if (!MFI->isModuleEntryFunction()) {
	if (const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV)) {			if (std::optional<uint32_t> Address =
	if (AMDGPUMachineFunction::isKnownAddressLDSGlobal(*GVar)) {			AMDGPUMachineFunction::getLDSAbsoluteAddress(*GV)) {
	unsigned Offset =			return DAG.getConstant(*Address, SDLoc(Op), Op.getValueType());
	AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(*GVar);
	return DAG.getConstant(Offset, SDLoc(Op), Op.getValueType());
	}
	}
	}			}
	}			}

	if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS \|\|			if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS \|\|
	G->getAddressSpace() == AMDGPUAS::REGION_ADDRESS) {			G->getAddressSpace() == AMDGPUAS::REGION_ADDRESS) {
	if (!MFI->isModuleEntryFunction() &&			if (!MFI->isModuleEntryFunction() &&
	!GV->getName().equals("llvm.amdgcn.module.lds")) {			!GV->getName().equals("llvm.amdgcn.module.lds")) {
	SDLoc DL(Op);			SDLoc DL(Op);
	▲ Show 20 Lines • Show All 3,737 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp

Show First 20 Lines • Show All 602 Lines • ▼ Show 20 Lines	for (auto &K : LDSVars) {
DL.getTypeAllocSize(GV->getValueType()).getFixedValue());		DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
if (MostUsed < Candidate)		if (MostUsed < Candidate)
MostUsed = Candidate;		MostUsed = Candidate;
}		}

return MostUsed.GV;		return MostUsed.GV;
}		}

		static void recordLDSAbsoluteAddress(Module M, GlobalVariable GV,
		uint32_t Address) {
		// Write the specified address into metadata where it can be retrieved by
		// the assembler. Format is a half open range, [Address Address+1)
		LLVMContext &Ctx = M->getContext();
		auto *IntTy =
		arsenmUnsubmitted Done Reply Inline Actions Don't like variables shadowing type names (also hate the LLVM capitalized naming convention). auto IntTy? arsenm:* Don't like variables shadowing type names (also hate the LLVM capitalized naming convention).
		M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
		auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
		arsenmUnsubmitted Done Reply Inline Actions Going based on the langref, this is incorrect usage of absolute_symbol. It seems to expect 2 operands, indicating an absolute range. I assume you can specify specific address, specific address + 1 for a fixed address. Again demonstrates that the verifier really should check this arsenm: Going based on the langref, this is incorrect usage of absolute_symbol. It seems to expect 2…
		auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
		GV->setMetadata(LLVMContext::MD_absolute_symbol,
		MDNode::get(Ctx, {MinC, MaxC}));
		}

bool runOnModule(Module &M) override {		bool runOnModule(Module &M) override {
LLVMContext &Ctx = M.getContext();		LLVMContext &Ctx = M.getContext();
CallGraph CG = CallGraph(M);		CallGraph CG = CallGraph(M);
bool Changed = superAlignLDSGlobals(M);		bool Changed = superAlignLDSGlobals(M);

Changed \|= eliminateConstantExprUsesOfLDSFromAllInstructions(M);		Changed \|= eliminateConstantExprUsesOfLDSFromAllInstructions(M);

Changed = true; // todo: narrow this down		Changed = true; // todo: narrow this down
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	bool runOnModule(Module &M) override {
// module instance		// module instance
DenseSet<Function *> KernelsThatAllocateModuleLDS =		DenseSet<Function *> KernelsThatAllocateModuleLDS =
kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,		kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
ModuleScopeVariables);		ModuleScopeVariables);
DenseSet<Function *> KernelsThatAllocateTableLDS =		DenseSet<Function *> KernelsThatAllocateTableLDS =
kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,		kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
TableLookupVariables);		TableLookupVariables);

		GlobalVariable *MaybeModuleScopeStruct = nullptr;
if (!ModuleScopeVariables.empty()) {		if (!ModuleScopeVariables.empty()) {
LDSVariableReplacement ModuleScopeReplacement =		LDSVariableReplacement ModuleScopeReplacement =
createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",		createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
ModuleScopeVariables);		ModuleScopeVariables);
		MaybeModuleScopeStruct = ModuleScopeReplacement.SGV;
appendToCompilerUsed(M,		appendToCompilerUsed(M,
{static_cast<GlobalValue *>(		{static_cast<GlobalValue *>(
ConstantExpr::getPointerBitCastOrAddrSpaceCast(		ConstantExpr::getPointerBitCastOrAddrSpaceCast(
cast<Constant>(ModuleScopeReplacement.SGV),		cast<Constant>(ModuleScopeReplacement.SGV),
Type::getInt8PtrTy(Ctx)))});		Type::getInt8PtrTy(Ctx)))});

		// module.lds will be allocated at zero in any kernel that allocates it
		recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);

// historic		// historic
removeLocalVarsFromUsedLists(M, ModuleScopeVariables);		removeLocalVarsFromUsedLists(M, ModuleScopeVariables);

// Replace all uses of module scope variable from non-kernel functions		// Replace all uses of module scope variable from non-kernel functions
replaceLDSVariablesWithStruct(		replaceLDSVariablesWithStruct(
M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {		M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
Instruction *I = dyn_cast<Instruction>(U.getUser());		Instruction *I = dyn_cast<Instruction>(U.getUser());
if (!I) {		if (!I) {
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	for (Function &Func : M.functions()) {
}		}

std::string VarName =		std::string VarName =
(Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();		(Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();

auto Replacement =		auto Replacement =
createLDSVariableReplacement(M, VarName, KernelUsedVariables);		createLDSVariableReplacement(M, VarName, KernelUsedVariables);

		// This struct is allocated at a predictable address that can be
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions This is right but moderately annoying to extend for D144233 - would rather land this as is and then amend JonChesterfield: This is right but moderately annoying to extend for D144233 - would rather land this as is and…
		// calculated now, recorded in metadata then used to lower references to
		// it during codegen.
		{
		// frame layout, starting from 0
		//{
		// module.lds
		// alignment padding
		// kernel instance
		//}

		if (!MaybeModuleScopeStruct \|\|
		Func.hasFnAttribute("amdgpu-elide-module-lds")) {
		// There's no module.lds for this kernel so this replacement struct
		// goes first
		recordLDSAbsoluteAddress(&M, Replacement.SGV, 0);
		} else {
		const DataLayout &DL = M.getDataLayout();
		TypeSize ModuleSize =
		DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
		GlobalVariable *KernelStruct = Replacement.SGV;
		Align KernelAlign = AMDGPU::getAlign(DL, KernelStruct);
		recordLDSAbsoluteAddress(&M, Replacement.SGV,
		alignTo(ModuleSize, KernelAlign));
		}
		}

// remove preserves existing codegen		// remove preserves existing codegen
removeLocalVarsFromUsedLists(M, KernelUsedVariables);		removeLocalVarsFromUsedLists(M, KernelUsedVariables);
KernelToReplacement[&Func] = Replacement;		KernelToReplacement[&Func] = Replacement;

// Rewrite uses within kernel to the new struct		// Rewrite uses within kernel to the new struct
replaceLDSVariablesWithStruct(		replaceLDSVariablesWithStruct(
M, KernelUsedVariables, Replacement, [&Func](Use &U) {		M, KernelUsedVariables, Replacement, [&Func](Use &U) {
Instruction *I = dyn_cast<Instruction>(U.getUser());		Instruction *I = dyn_cast<Instruction>(U.getUser());
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp

//===- AMDGPUMCInstLower.cpp - Lower AMDGPU MachineInstr to an MCInst -----===//		//===- AMDGPUMCInstLower.cpp - Lower AMDGPU MachineInstr to an MCInst -----===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
/// \file		/// \file
/// Code to lower AMDGPU MachineInstrs to their corresponding MCInst.		/// Code to lower AMDGPU MachineInstrs to their corresponding MCInst.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//

#include "AMDGPUMCInstLower.h"		#include "AMDGPUMCInstLower.h"
		#include "AMDGPU.h"
#include "AMDGPUAsmPrinter.h"		#include "AMDGPUAsmPrinter.h"
#include "AMDGPUMachineFunction.h"		#include "AMDGPUMachineFunction.h"
#include "AMDGPUTargetMachine.h"		#include "AMDGPUTargetMachine.h"
#include "MCTargetDesc/AMDGPUInstPrinter.h"		#include "MCTargetDesc/AMDGPUInstPrinter.h"
#include "MCTargetDesc/AMDGPUMCTargetDesc.h"		#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	bool AMDGPUAsmPrinter::lowerOperand(const MachineOperand &MO,
const GCNSubtarget &STI = MF->getSubtarget<GCNSubtarget>();		const GCNSubtarget &STI = MF->getSubtarget<GCNSubtarget>();
AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);		AMDGPUMCInstLower MCInstLowering(OutContext, STI, *this);
return MCInstLowering.lowerOperand(MO, MCOp);		return MCInstLowering.lowerOperand(MO, MCOp);
}		}

const MCExpr AMDGPUAsmPrinter::lowerConstant(const Constant CV) {		const MCExpr AMDGPUAsmPrinter::lowerConstant(const Constant CV) {

// Intercept LDS variables with known addresses		// Intercept LDS variables with known addresses
if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(CV)) {		if (const GlobalVariable *GV = dyn_cast<const GlobalVariable>(CV)) {
if (AMDGPUMachineFunction::isKnownAddressLDSGlobal(*GV)) {		if (std::optional<uint32_t> Address =
unsigned offset =		AMDGPUMachineFunction::getLDSAbsoluteAddress(*GV)) {
AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(*GV);		auto *IntTy = Type::getInt32Ty(CV->getContext());
Constant *C = ConstantInt::get(CV->getContext(), APInt(32, offset));		return AsmPrinter::lowerConstant(ConstantInt::get(IntTy, *Address));
return AsmPrinter::lowerConstant(C);
}		}
}		}

if (const MCExpr *E = lowerAddrSpaceCast(TM, CV, OutContext))		if (const MCExpr *E = lowerAddrSpaceCast(TM, CV, OutContext))
return E;		return E;
return AsmPrinter::lowerConstant(CV);		return AsmPrinter::lowerConstant(CV);
}		}

▲ Show 20 Lines • Show All 144 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.h

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	public:
unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV) {		unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV) {
return allocateLDSGlobal(DL, GV, DynLDSAlign);		return allocateLDSGlobal(DL, GV, DynLDSAlign);
}		}

unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV,		unsigned allocateLDSGlobal(const DataLayout &DL, const GlobalVariable &GV,
Align Trailing);		Align Trailing);

void allocateKnownAddressLDSGlobal(const Function &F);		void allocateKnownAddressLDSGlobal(const Function &F);

// A kernel function may have an associated LDS allocation, and a kernel-scope
// LDS allocation must have an associated kernel function

// LDS allocation should have an associated kernel function
static const Function *
getKernelLDSFunctionFromGlobal(const GlobalVariable &GV);
static const GlobalVariable *
getKernelLDSGlobalFromFunction(const Function &F);

// Module or kernel scope LDS variable
static bool isKnownAddressLDSGlobal(const GlobalVariable &GV);
static unsigned calculateKnownAddressOfLDSGlobal(const GlobalVariable &GV);
JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Note to self - this shouldn't be dead after this patch - find out what happened to the lowering path for the currently unused "direct" scheme JonChesterfield: Note to self - this shouldn't be dead after this patch - find out what happened to the lowering…
JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions I think a block was lost in a git merge - AMDGPUIselLowering should have a block in it like: if (G->getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) { if (!MFI->isModuleEntryFunction()) { if (const GlobalVariable GVar = dyn_cast<GlobalVariable>(GV)) { if (AMDGPUMachineFunction::isKnownAddressLDSGlobal(GVar)) { unsigned Offset = AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(GVar); return DAG.getConstant(Offset, SDLoc(Op), Op.getValueType()); } } } } that allows direct access to kernel allocated variables from functions where the access is unambiguous. This highlights missing test coverage for the "kernel" lowering strategy (from IR to ISA). Bug is latent but will complicate getting rid of the calculateKnownAddressOfLDSGlobal function. JonChesterfield:* I think a block was lost in a git merge - AMDGPUIselLowering should have a block in it like…
arsenmUnsubmitted Not Done Reply Inline Actions So if you set the absolute_address in the IR pass, and LowerGlobalAddress respected it, how would that be less reliable? arsenm: So if you set the absolute_address in the IR pass, and LowerGlobalAddress respected it, how…
JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Would work identically for the module/kernel specific structures as they are deterministic. Actually somewhat prettier to set those via metadata as it avoids the problem of codegen a function before the corresponding kernel. The above ^ would change to look at metadata on the variables. That will unwind a little if something after lowermodulelds decides to change those structs, but that's manageable. Even if we can't mutate the IR during codegen for dynamic lds, I think that's an objective improvement to the current lowering scheme involving repeated frame calculations. JonChesterfield: Would work identically for the module/kernel specific structures as they are deterministic.

static std::optional<uint32_t> getLDSKernelIdMetadata(const Function &F);		static std::optional<uint32_t> getLDSKernelIdMetadata(const Function &F);
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Might be better to specialise this to take a global, check for absolute_symbol and rename the function to match. optional<uint32> getAbsoluteSymbolMetadata(const GlobalVariable ) perhaps. JonChesterfield:* Might be better to specialise this to take a global, check for absolute_symbol and rename the…
		static std::optional<uint32_t> getLDSAbsoluteAddress(const GlobalValue &GV);

Align getDynLDSAlign() const { return DynLDSAlign; }		Align getDynLDSAlign() const { return DynLDSAlign; }

void setDynLDSAlign(const DataLayout &DL, const GlobalVariable &GV);		void setDynLDSAlign(const DataLayout &DL, const GlobalVariable &GV);
};		};

}		}
#endif		#endif

llvm/lib/Target/AMDGPU/AMDGPUMachineFunction.cpp

//===-- AMDGPUMachineFunctionInfo.cpp ---------------------------------------=//		//===-- AMDGPUMachineFunctionInfo.cpp ---------------------------------------=//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUMachineFunction.h"		#include "AMDGPUMachineFunction.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUPerfHintAnalysis.h"		#include "AMDGPUPerfHintAnalysis.h"
#include "AMDGPUSubtarget.h"		#include "AMDGPUSubtarget.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
		#include "llvm/IR/ConstantRange.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
		#include "llvm/IR/Metadata.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"

using namespace llvm;		using namespace llvm;

AMDGPUMachineFunction::AMDGPUMachineFunction(const Function &F,		AMDGPUMachineFunction::AMDGPUMachineFunction(const Function &F,
const AMDGPUSubtarget &ST)		const AMDGPUSubtarget &ST)
: IsEntryFunction(AMDGPU::isEntryFunctionCC(F.getCallingConv())),		: IsEntryFunction(AMDGPU::isEntryFunctionCC(F.getCallingConv())),
IsModuleEntryFunction(		IsModuleEntryFunction(
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	unsigned AMDGPUMachineFunction::allocateLDSGlobal(const DataLayout &DL,
}		}

Entry.first->second = Offset;		Entry.first->second = Offset;
return Offset;		return Offset;
}		}

static constexpr StringLiteral ModuleLDSName = "llvm.amdgcn.module.lds";		static constexpr StringLiteral ModuleLDSName = "llvm.amdgcn.module.lds";

bool AMDGPUMachineFunction::isKnownAddressLDSGlobal(const GlobalVariable &GV) {		static const GlobalVariable *getKernelLDSGlobalFromFunction(const Function &F) {
auto name = GV.getName();
return (name == ModuleLDSName) \|\|
(name.startswith("llvm.amdgcn.kernel.") && name.endswith(".lds"));
}

const Function *AMDGPUMachineFunction::getKernelLDSFunctionFromGlobal(
const GlobalVariable &GV) {
const Module &M = *GV.getParent();
StringRef N(GV.getName());
if (N.consume_front("llvm.amdgcn.kernel.") && N.consume_back(".lds")) {
return M.getFunction(N);
}
return nullptr;
}

const GlobalVariable *
AMDGPUMachineFunction::getKernelLDSGlobalFromFunction(const Function &F) {
const Module *M = F.getParent();		const Module *M = F.getParent();
std::string KernelLDSName = "llvm.amdgcn.kernel.";		std::string KernelLDSName = "llvm.amdgcn.kernel.";
KernelLDSName += F.getName();		KernelLDSName += F.getName();
KernelLDSName += ".lds";		KernelLDSName += ".lds";
return M->getNamedGlobal(KernelLDSName);		return M->getNamedGlobal(KernelLDSName);
}		}

		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Function has four call sites across three files, think it's probably worth factoring out. JonChesterfield: Function has four call sites across three files, think it's probably worth factoring out.
// This kernel calls no functions that require the module lds struct		// This kernel calls no functions that require the module lds struct
static bool canElideModuleLDS(const Function &F) {		static bool canElideModuleLDS(const Function &F) {
return F.hasFnAttribute("amdgpu-elide-module-lds");		return F.hasFnAttribute("amdgpu-elide-module-lds");
		arsenmUnsubmitted Done Reply Inline Actions Early return (or just move the null check to the caller). The verifier should really be enforcing the operand count so every user doesn't need to check that it's 1. Should add the verifier check and drop the check arsenm: Early return (or just move the null check to the caller). The verifier should really be…
		arsenmUnsubmitted Done Reply Inline Actions Also there is already GlobalValue::getAbsoluteSymbolRange() arsenm: Also there is already GlobalValue::getAbsoluteSymbolRange()
}		}

unsigned AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(
const GlobalVariable &GV) {
// module.lds, then alignment padding, then kernel.lds, then other variables
// if any

assert(isKnownAddressLDSGlobal(GV));
unsigned Offset = 0;

if (GV.getName() == ModuleLDSName) {
return 0;
}

const Module *M = GV.getParent();
const DataLayout &DL = M->getDataLayout();

const GlobalVariable *GVM = M->getNamedGlobal(ModuleLDSName);
const Function *f = getKernelLDSFunctionFromGlobal(GV);

// Account for module.lds if allocated for this function
if (GVM && f && !canElideModuleLDS(*f)) {
// allocator aligns this to var align, but it's zero to begin with
Offset += DL.getTypeAllocSize(GVM->getValueType());
}

// No dynamic LDS alignment done by allocateModuleLDSGlobal
Offset = alignTo(
Offset, DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType()));

return Offset;
}

void AMDGPUMachineFunction::allocateKnownAddressLDSGlobal(const Function &F) {		void AMDGPUMachineFunction::allocateKnownAddressLDSGlobal(const Function &F) {
const Module *M = F.getParent();		const Module *M = F.getParent();

// This function is called before allocating any other LDS so that it can		// This function is called before allocating any other LDS so that it can
// reliably put values at known addresses. Consequently, dynamic LDS, if		// reliably put values at known addresses. Consequently, dynamic LDS, if
// present, will not yet have been allocated		// present, will not yet have been allocated

		arsenmUnsubmitted Not Done Reply Inline Actions Changing the IR in a codegen pass is bad. Can you set this earlier? arsenm: Changing the IR in a codegen pass is bad. Can you set this earlier?
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Setting metadata on the IR during codegen is not pretty but it's also inconsequential. The frame calculation this deletes is also quite ugly though. For the existing cases we can set the metadata in the IR lowering (carefully, but accurately enough). It'll make changes to promote alloca fragile. That won't work for external/dynamic LDS as we don't know that address until the end of ISel. So yes... if we want a hard no on setting metadata to record information learned during codegen, we can avoid it. Basically by lifting the remaining LDS lowering currently done by codegen wholly out of codegen. That'll change how dynamic LDS is handled from kernels which I was hoping to avoid. If we're up for tagging variables with metadata at this point we can write the address chosen for dynamic LDS variables the same way and the lowerConstant path added here will do the right thing for them. JonChesterfield: Setting metadata on the IR during codegen is not pretty but it's also inconsequential. The…
		arsenmUnsubmitted Not Done Reply Inline Actions So you're implying the promote alloca introduced LDS still has possibly unpredictable offsets assigned for it? arsenm: So you're implying the promote alloca introduced LDS still has possibly unpredictable offsets…
		JonChesterfieldAuthorUnsubmitted Done Reply Inline Actions Yep. It's a phase ordering problem. promoteAlloca needs to know how much LDS is already allocated to estimate how much more it can use. Some of that might be allocated for non-kernel functions, so LowerModuleLDS runs before promoteAlloca to make those allocations visible to it. LowerModuleLDS thus doesn't know how much (if any) extra will be introduced by promoteAlloca.At least, it doesn't at present. Then the lowered frame looks like: { managed by lowermodulelds introduced by promotealloca (and maybe other things? none I know of) maybe alignment dynamic lds goes here } Also if promotealloca introduces more than one lds variable (haven't checked, seems reasonable that it might) and introduces them as independent values with different alignment, then we're back to allocate-in-dag-traversal order, in which case the address of dynamic lds is unstable. JonChesterfield: Yep. It's a phase ordering problem. promoteAlloca needs to know how much LDS is already…
assert(getDynLDSAlign() == Align() && "dynamic LDS not yet allocated");		assert(getDynLDSAlign() == Align() && "dynamic LDS not yet allocated");

if (isModuleEntryFunction()) {		if (isModuleEntryFunction()) {

// Pointer values start from zero, memory allocated per-kernel-launch		// Pointer values start from zero, memory allocated per-kernel-launch
// Variables can be grouped into a module level struct and a struct per		// Variables can be grouped into a module level struct and a struct per
// kernel function by AMDGPULowerModuleLDSPass. If that is done, they		// kernel function by AMDGPULowerModuleLDSPass. If that is done, they
// are allocated at statically computable addresses here.		// are allocated at statically computable addresses here.
//		//
// Address 0		// Address 0
// {		// {
// llvm.amdgcn.module.lds		// llvm.amdgcn.module.lds
// }		// }
// alignment padding		// alignment padding
// {		// {
// llvm.amdgcn.kernel.some-name.lds		// llvm.amdgcn.kernel.some-name.lds
// }		// }
// other variables, e.g. dynamic lds, allocated after this call		// other variables, e.g. dynamic lds, allocated after this call

const GlobalVariable *GV = M->getNamedGlobal(ModuleLDSName);		const GlobalVariable *GV = M->getNamedGlobal(ModuleLDSName);
const GlobalVariable *KV = getKernelLDSGlobalFromFunction(F);		const GlobalVariable *KV = getKernelLDSGlobalFromFunction(F);

if (GV && !canElideModuleLDS(F)) {		if (GV && !canElideModuleLDS(F)) {
assert(isKnownAddressLDSGlobal(*GV));
unsigned Offset = allocateLDSGlobal(M->getDataLayout(), *GV, Align());		unsigned Offset = allocateLDSGlobal(M->getDataLayout(), *GV, Align());
(void)Offset;		std::optional<uint32_t> Expect = getLDSAbsoluteAddress(*GV);
assert(Offset == calculateKnownAddressOfLDSGlobal(*GV) &&		if (!Expect \|\| (Offset != Expect)) {
"Module LDS expected to be allocated before other LDS");		report_fatal_error("Inconsistent metadata on module LDS variable");
		}
}		}

if (KV) {		if (KV) {
// The per-kernel offset is deterministic because it is allocated		// The per-kernel offset is deterministic because it is allocated
// before any other non-module LDS variables.		// before any other non-module LDS variables.
assert(isKnownAddressLDSGlobal(*KV));
unsigned Offset = allocateLDSGlobal(M->getDataLayout(), *KV, Align());		unsigned Offset = allocateLDSGlobal(M->getDataLayout(), *KV, Align());
(void)Offset;		std::optional<uint32_t> Expect = getLDSAbsoluteAddress(*KV);
assert(Offset == calculateKnownAddressOfLDSGlobal(*KV) &&		if (!Expect \|\| (Offset != Expect)) {
"Kernel LDS expected to be immediately after module LDS");		report_fatal_error("Inconsistent metadata on kernel LDS variable");
		}
}		}
}		}
}		}

std::optional<uint32_t>		std::optional<uint32_t>
AMDGPUMachineFunction::getLDSKernelIdMetadata(const Function &F) {		AMDGPUMachineFunction::getLDSKernelIdMetadata(const Function &F) {
auto MD = F.getMetadata("llvm.amdgcn.lds.kernel.id");		// TODO: Would be more consistent with the abs symbols to use a range
		MDNode *MD = F.getMetadata("llvm.amdgcn.lds.kernel.id");
		arsenmUnsubmitted Not Done Reply Inline Actions This should probably be an attribute since it doesn't have the special case must-not-drop property but I guess it already was metadata arsenm: This should probably be an attribute since it doesn't have the special case must-not-drop…
if (MD && MD->getNumOperands() == 1) {		if (MD && MD->getNumOperands() == 1) {
ConstantInt *KnownSize = mdconst::extract<ConstantInt>(MD->getOperand(0));		if (ConstantInt *KnownSize =
if (KnownSize) {		mdconst::extract<ConstantInt>(MD->getOperand(0))) {
uint64_t V = KnownSize->getZExtValue();		uint64_t ZExt = KnownSize->getZExtValue();
if (V <= UINT32_MAX) {		if (ZExt <= UINT32_MAX) {
return V;		return ZExt;
		}
		}
		}
		return {};
}		}

		std::optional<uint32_t>
		AMDGPUMachineFunction::getLDSAbsoluteAddress(const GlobalValue &GV) {
		if (GV.getAddressSpace() != AMDGPUAS::LOCAL_ADDRESS)
		return {};

		std::optional<ConstantRange> AbsSymRange = GV.getAbsoluteSymbolRange();
		if (!AbsSymRange)
		return {};

		if (const APInt *V = AbsSymRange->getSingleElement()) {
		std::optional<uint64_t> ZExt = V->tryZExtValue();
		if (ZExt && (*ZExt <= UINT32_MAX)) {
		return *ZExt;
}		}
}		}

return {};		return {};
}		}

void AMDGPUMachineFunction::setDynLDSAlign(const DataLayout &DL,		void AMDGPUMachineFunction::setDynLDSAlign(const DataLayout &DL,
const GlobalVariable &GV) {		const GlobalVariable &GV) {
assert(DL.getTypeAllocSize(GV.getValueType()).isZero());		assert(DL.getTypeAllocSize(GV.getValueType()).isZero());

Align Alignment =		Align Alignment =
DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());		DL.getValueOrABITypeAlignment(GV.getAlign(), GV.getValueType());
if (Alignment <= DynLDSAlign)		if (Alignment <= DynLDSAlign)
return;		return;

LDSSize = alignTo(StaticLDSSize, Alignment);		LDSSize = alignTo(StaticLDSSize, Alignment);
DynLDSAlign = Alignment;		DynLDSAlign = Alignment;
}		}

llvm/test/CodeGen/AMDGPU/lower-kernel-and-module-lds.ll

	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s

	@lds.size.1.align.1 = internal unnamed_addr addrspace(3) global [1 x i8] undef, align 1			@lds.size.1.align.1 = internal unnamed_addr addrspace(3) global [1 x i8] undef, align 1
	@lds.size.2.align.2 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 2			@lds.size.2.align.2 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 2
	@lds.size.4.align.4 = internal unnamed_addr addrspace(3) global [4 x i8] undef, align 4			@lds.size.4.align.4 = internal unnamed_addr addrspace(3) global [4 x i8] undef, align 4
	@lds.size.8.align.8 = internal unnamed_addr addrspace(3) global [8 x i8] undef, align 8			@lds.size.8.align.8 = internal unnamed_addr addrspace(3) global [8 x i8] undef, align 8
	@lds.size.16.align.16 = internal unnamed_addr addrspace(3) global [16 x i8] undef, align 16			@lds.size.16.align.16 = internal unnamed_addr addrspace(3) global [16 x i8] undef, align 16

	; CHECK: %llvm.amdgcn.module.lds.t = type { [8 x i8], [1 x i8] }			; CHECK: %llvm.amdgcn.module.lds.t = type { [8 x i8], [1 x i8] }
	; CHECK: %llvm.amdgcn.kernel.k0.lds.t = type { [16 x i8], [4 x i8], [2 x i8], [1 x i8] }			; CHECK: %llvm.amdgcn.kernel.k0.lds.t = type { [16 x i8], [4 x i8], [2 x i8], [1 x i8] }
	; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [16 x i8], [4 x i8], [2 x i8] }			; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [16 x i8], [4 x i8], [2 x i8] }
	; CHECK: %llvm.amdgcn.kernel.k2.lds.t = type { [2 x i8] }			; CHECK: %llvm.amdgcn.kernel.k2.lds.t = type { [2 x i8] }
	; CHECK: %llvm.amdgcn.kernel.k3.lds.t = type { [4 x i8] }			; CHECK: %llvm.amdgcn.kernel.k3.lds.t = type { [4 x i8] }

	;.			;.
	; CHECK: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 8			; CHECK: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 8, !absolute_symbol !0
	; CHECK: @llvm.compiler.used = appending global [1 x ptr] [ptr addrspacecast (ptr addrspace(3) @llvm.amdgcn.module.lds to ptr)], section "llvm.metadata"			; CHECK: @llvm.compiler.used = appending global [1 x ptr] [ptr addrspacecast (ptr addrspace(3) @llvm.amdgcn.module.lds to ptr)], section "llvm.metadata"
	; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16			; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16			; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 2			; CHECK: @llvm.amdgcn.kernel.k2.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k2.lds.t undef, align 2, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 4			; CHECK: @llvm.amdgcn.kernel.k3.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k3.lds.t undef, align 4, !absolute_symbol !0
	;.			;.
	define amdgpu_kernel void @k0() #0 {			define amdgpu_kernel void @k0() #0 {
	; CHECK-LABEL: @k0(			; CHECK-LABEL: @k0(
	; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3), align 2, !alias.scope !0, !noalias !3			; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3), align 2, !alias.scope !1, !noalias !4
	; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2), align 4, !alias.scope !7, !noalias !8			; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2), align 4, !alias.scope !8, !noalias !9
	; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 16, !alias.scope !9, !noalias !10			; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 16, !alias.scope !10, !noalias !11
	; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16, !alias.scope !11, !noalias !12			; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16, !alias.scope !12, !noalias !13
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1			store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1

	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16

	ret void			ret void
	}			}

	define amdgpu_kernel void @k1() #0 {			define amdgpu_kernel void @k1() #0 {
	; CHECK-LABEL: @k1(			; CHECK-LABEL: @k1(
	; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2), align 4, !alias.scope !13, !noalias !16			; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2), align 4, !alias.scope !14, !noalias !17
	; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1), align 16, !alias.scope !19, !noalias !20			; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1), align 16, !alias.scope !20, !noalias !21
	; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, align 16, !alias.scope !21, !noalias !22			; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, align 16, !alias.scope !22, !noalias !23
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16

	Show All 23 Lines

	define amdgpu_kernel void @calls_f0() {			define amdgpu_kernel void @calls_f0() {
	call void @f0()			call void @f0()
	ret void			ret void
	}			}

	define void @f0() {			define void @f0() {
	; CHECK-LABEL: define void @f0(			; CHECK-LABEL: define void @f0(
	; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.module.lds.t, ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 8, !noalias !23			; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.module.lds.t, ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 8, !noalias !24
	; CHECK-NEXT: store i8 8, ptr addrspace(3) @llvm.amdgcn.module.lds, align 8, !noalias !23			; CHECK-NEXT: store i8 8, ptr addrspace(3) @llvm.amdgcn.module.lds, align 8, !noalias !24
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1			store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1

	store i8 8, ptr addrspace(3) @lds.size.8.align.8, align 4			store i8 8, ptr addrspace(3) @lds.size.8.align.8, align 4

	ret void			ret void
	}			}

	attributes #0 = { "amdgpu-elide-module-lds" }			attributes #0 = { "amdgpu-elide-module-lds" }
	; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }			; CHECK: attributes #0 = { "amdgpu-elide-module-lds" }

				; CHECK: !0 = !{i64 0, i64 1}

llvm/test/CodeGen/AMDGPU/lower-kernel-lds.ll

	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s

	@lds.size.1.align.1 = internal unnamed_addr addrspace(3) global [1 x i8] undef, align 1			@lds.size.1.align.1 = internal unnamed_addr addrspace(3) global [1 x i8] undef, align 1
	@lds.size.2.align.2 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 2			@lds.size.2.align.2 = internal unnamed_addr addrspace(3) global [2 x i8] undef, align 2
	@lds.size.4.align.4 = internal unnamed_addr addrspace(3) global [4 x i8] undef, align 4			@lds.size.4.align.4 = internal unnamed_addr addrspace(3) global [4 x i8] undef, align 4
	@lds.size.16.align.16 = internal unnamed_addr addrspace(3) global [16 x i8] undef, align 16			@lds.size.16.align.16 = internal unnamed_addr addrspace(3) global [16 x i8] undef, align 16

	; CHECK: %llvm.amdgcn.kernel.k0.lds.t = type { [16 x i8], [4 x i8], [2 x i8], [1 x i8] }			; CHECK: %llvm.amdgcn.kernel.k0.lds.t = type { [16 x i8], [4 x i8], [2 x i8], [1 x i8] }
	; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [16 x i8], [4 x i8], [2 x i8] }			; CHECK: %llvm.amdgcn.kernel.k1.lds.t = type { [16 x i8], [4 x i8], [2 x i8] }

	;.			;.
	; CHECK: @lds.k2 = addrspace(3) global [1 x i8] undef, align 1			; CHECK: @lds.k2 = addrspace(3) global [1 x i8] undef, align 1
	; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16			; CHECK: @llvm.amdgcn.kernel.k0.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k0.lds.t undef, align 16, !absolute_symbol !0
	; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16			; CHECK: @llvm.amdgcn.kernel.k1.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k1.lds.t undef, align 16, !absolute_symbol !0
	;.			;.
	define amdgpu_kernel void @k0() {			define amdgpu_kernel void @k0() {
	; CHECK-LABEL: @k0(			; CHECK-LABEL: @k0(
	; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3), align 2, !alias.scope !0, !noalias !3			; CHECK-NEXT: store i8 1, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 3), align 2, !alias.scope !1, !noalias !4
	; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2), align 4, !alias.scope !7, !noalias !8			; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 2), align 4, !alias.scope !8, !noalias !9
	; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 16, !alias.scope !9, !noalias !10			; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k0.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, i32 0, i32 1), align 16, !alias.scope !10, !noalias !11
	; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16, !alias.scope !11, !noalias !12			; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k0.lds, align 16, !alias.scope !12, !noalias !13
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1			store i8 1, ptr addrspace(3) @lds.size.1.align.1, align 1

	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16

	ret void			ret void
	}			}

	define amdgpu_kernel void @k1() {			define amdgpu_kernel void @k1() {
	; CHECK-LABEL: @k1(			; CHECK-LABEL: @k1(
	; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2), align 4, !alias.scope !13, !noalias !16			; CHECK-NEXT: store i8 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 2), align 4, !alias.scope !14, !noalias !17
	; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1), align 16, !alias.scope !19, !noalias !20			; CHECK-NEXT: store i8 4, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k1.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, i32 0, i32 1), align 16, !alias.scope !20, !noalias !21
	; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, align 16, !alias.scope !21, !noalias !22			; CHECK-NEXT: store i8 16, ptr addrspace(3) @llvm.amdgcn.kernel.k1.lds, align 16, !alias.scope !22, !noalias !23
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2			store i8 2, ptr addrspace(3) @lds.size.2.align.2, align 2

	store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4			store i8 4, ptr addrspace(3) @lds.size.4.align.4, align 4

	store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16			store i8 16, ptr addrspace(3) @lds.size.16.align.16, align 16

	ret void			ret void
	}			}

	; Do not lower LDS for graphics shaders.			; Do not lower LDS for graphics shaders.

	@lds.k2 = addrspace(3) global [1 x i8] undef, align 1			@lds.k2 = addrspace(3) global [1 x i8] undef, align 1

	define amdgpu_ps void @k2() {			define amdgpu_ps void @k2() {
	; CHECK-LABEL: @k2(			; CHECK-LABEL: @k2(
	; CHECK-NEXT: store i8 1, ptr addrspace(3) @lds.k2, align 1			; CHECK-NEXT: store i8 1, ptr addrspace(3) @lds.k2, align 1
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 1, ptr addrspace(3) @lds.k2, align 1			store i8 1, ptr addrspace(3) @lds.k2, align 1

	ret void			ret void
	}			}

				; CHECK: !0 = !{i64 0, i64 1}

llvm/test/CodeGen/AMDGPU/lower-lds-struct-aa-memcpy.ll

; RUN: llc -march=amdgcn -mcpu=gfx900 -O3 --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck -check-prefix=GCN %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -O3 --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck -check-prefix=GCN %s
; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s		; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s
; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s		; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s

%vec_type = type { %vec_base }		%vec_type = type { %vec_base }
%vec_base = type { %union.anon }		%vec_base = type { %union.anon }
%union.anon = type { %"vec_base<char, 3>::n_vec_" }		%union.anon = type { %"vec_base<char, 3>::n_vec_" }
%"vec_base<char, 3>::n_vec_" = type { [3 x i8] }		%"vec_base<char, 3>::n_vec_" = type { [3 x i8] }

$_f1 = comdat any		$_f1 = comdat any
$_f2 = comdat any		$_f2 = comdat any
@_f1 = linkonce_odr hidden local_unnamed_addr addrspace(3) global %vec_type undef, comdat, align 1		@_f1 = linkonce_odr hidden local_unnamed_addr addrspace(3) global %vec_type undef, comdat, align 1
@_f2 = linkonce_odr hidden local_unnamed_addr addrspace(3) global %vec_type undef, comdat, align 1		@_f2 = linkonce_odr hidden local_unnamed_addr addrspace(3) global %vec_type undef, comdat, align 1

;.		;.
; CHECK: @[[LLVM_AMDGCN_KERNEL_TEST_LDS:[a-zA-Z0-9_$"\\.-]+]] = internal addrspace(3) global [[LLVM_AMDGCN_KERNEL_TEST_LDS_T:%.*]] undef, align 4		; CHECK: @[[LLVM_AMDGCN_KERNEL_TEST_LDS:[a-zA-Z0-9_$"\\.-]+]] = internal addrspace(3) global [[LLVM_AMDGCN_KERNEL_TEST_LDS_T:%.*]] undef, align 4, !absolute_symbol !0
;.		;.
define protected amdgpu_kernel void @test(ptr addrspace(1) nocapture %ptr.coerce) local_unnamed_addr #0 {		define protected amdgpu_kernel void @test(ptr addrspace(1) nocapture %ptr.coerce) local_unnamed_addr #0 {
; GCN-LABEL: test:		; GCN-LABEL: test:
; GCN: ; %bb.0: ; %entry		; GCN: ; %bb.0: ; %entry
; GCN-NEXT: v_mov_b32_e32 v0, 0		; GCN-NEXT: v_mov_b32_e32 v0, 0
; GCN-NEXT: v_mov_b32_e32 v1, 2		; GCN-NEXT: v_mov_b32_e32 v1, 2
; GCN-NEXT: ds_write_b8 v0, v1		; GCN-NEXT: ds_write_b8 v0, v1
; GCN-NEXT: ds_read_u8 v2, v0 offset:2		; GCN-NEXT: ds_read_u8 v2, v0 offset:2
; GCN-NEXT: ds_read_u16 v3, v0		; GCN-NEXT: ds_read_u16 v3, v0
; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24		; GCN-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
; GCN-NEXT: s_waitcnt lgkmcnt(0)		; GCN-NEXT: s_waitcnt lgkmcnt(0)
; GCN-NEXT: ds_write_b8 v0, v2 offset:6		; GCN-NEXT: ds_write_b8 v0, v2 offset:6
; GCN-NEXT: ds_write_b16 v0, v3 offset:4		; GCN-NEXT: ds_write_b16 v0, v3 offset:4
; GCN-NEXT: v_cmp_eq_u16_sdwa s[2:3], v3, v1 src0_sel:BYTE_0 src1_sel:DWORD		; GCN-NEXT: v_cmp_eq_u16_sdwa s[2:3], v3, v1 src0_sel:BYTE_0 src1_sel:DWORD
; GCN-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[2:3]		; GCN-NEXT: v_cndmask_b32_e64 v1, 0, 1, s[2:3]
; GCN-NEXT: global_store_byte v0, v1, s[0:1]		; GCN-NEXT: global_store_byte v0, v1, s[0:1]
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
; CHECK-LABEL: @test(		; CHECK-LABEL: @test(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: store i8 3, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, align 4, !alias.scope !0, !noalias !3		; CHECK-NEXT: store i8 3, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, align 4, !alias.scope !1, !noalias !4
; CHECK-NEXT: tail call void @llvm.memcpy.p3.p3.i64(ptr addrspace(3) noundef align 1 dereferenceable(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), ptr addrspace(3) noundef align 1 dereferenceable(3) @llvm.amdgcn.kernel.test.lds, i64 3, i1 false), !alias.scope !5, !noalias !6		; CHECK-NEXT: tail call void @llvm.memcpy.p3.p3.i64(ptr addrspace(3) noundef align 1 dereferenceable(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), ptr addrspace(3) noundef align 1 dereferenceable(3) @llvm.amdgcn.kernel.test.lds, i64 3, i1 false), !alias.scope !6, !noalias !7
; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), align 4, !alias.scope !3, !noalias !0		; CHECK-NEXT: [[TMP4:%.*]] = load i8, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), align 4, !alias.scope !4, !noalias !1
; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP4]], 3		; CHECK-NEXT: [[CMP_I_I:%.*]] = icmp eq i8 [[TMP4]], 3
; CHECK-NEXT: store i8 2, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, align 4, !alias.scope !0, !noalias !3		; CHECK-NEXT: store i8 2, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, align 4, !alias.scope !1, !noalias !4
; CHECK-NEXT: tail call void @llvm.memcpy.p3.p3.i64(ptr addrspace(3) noundef align 1 dereferenceable(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), ptr addrspace(3) noundef align 1 dereferenceable(3) @llvm.amdgcn.kernel.test.lds, i64 3, i1 false), !alias.scope !5, !noalias !6		; CHECK-NEXT: tail call void @llvm.memcpy.p3.p3.i64(ptr addrspace(3) noundef align 1 dereferenceable(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), ptr addrspace(3) noundef align 1 dereferenceable(3) @llvm.amdgcn.kernel.test.lds, i64 3, i1 false), !alias.scope !6, !noalias !7
; CHECK-NEXT: [[TMP9:%.*]] = load i8, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), align 4, !alias.scope !3, !noalias !0		; CHECK-NEXT: [[TMP9:%.*]] = load i8, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.test.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.test.lds, i32 0, i32 2), align 4, !alias.scope !4, !noalias !1
; CHECK-NEXT: [[CMP_I_I19:%.*]] = icmp eq i8 [[TMP9]], 2		; CHECK-NEXT: [[CMP_I_I19:%.*]] = icmp eq i8 [[TMP9]], 2
; CHECK-NEXT: [[TMP10:%.*]] = and i1 [[CMP_I_I19]], [[CMP_I_I]]		; CHECK-NEXT: [[TMP10:%.*]] = and i1 [[CMP_I_I19]], [[CMP_I_I]]
; CHECK-NEXT: [[FROMBOOL8:%.*]] = zext i1 [[TMP10]] to i8		; CHECK-NEXT: [[FROMBOOL8:%.*]] = zext i1 [[TMP10]] to i8
; CHECK-NEXT: store i8 [[FROMBOOL8]], ptr addrspace(1) [[PTR_COERCE:%.*]], align 1		; CHECK-NEXT: store i8 [[FROMBOOL8]], ptr addrspace(1) [[PTR_COERCE:%.*]], align 1
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
store i8 3, ptr addrspace(3) @_f1, align 1		store i8 3, ptr addrspace(3) @_f1, align 1
Show All 10 Lines	entry:
ret void		ret void
}		}

declare void @llvm.memcpy.p3.p3.i64(ptr addrspace(3) noalias nocapture writeonly, ptr addrspace(3) noalias nocapture readonly, i64, i1 immarg) #1		declare void @llvm.memcpy.p3.p3.i64(ptr addrspace(3) noalias nocapture writeonly, ptr addrspace(3) noalias nocapture readonly, i64, i1 immarg) #1

;.		;.
; CHECK: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }		; CHECK: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
;.		;.
; CHECK: [[META0:![0-9]+]] = !{!1}		; CHECK: [[META0:![0-9]+]] = !{i64 0, i64 1}
; CHECK: [[META1:![0-9]+]] = distinct !{!1, !2}		; CHECK: [[META1:![0-9]+]] = !{!2}
; CHECK: [[META2:![0-9]+]] = distinct !{!2}		; CHECK: [[META2:![0-9]+]] = distinct !{!2, !3}
; CHECK: [[META3:![0-9]+]] = !{!4}		; CHECK: [[META3:![0-9]+]] = distinct !{!3}
; CHECK: [[META4:![0-9]+]] = distinct !{!4, !2}		; CHECK: [[META4:![0-9]+]] = !{!5}
; CHECK: [[META5:![0-9]+]] = !{!4, !1}		; CHECK: [[META5:![0-9]+]] = distinct !{!5, !3}
; CHECK: [[META6:![0-9]+]] = !{}		; CHECK: [[META6:![0-9]+]] = !{!5, !2}
		; CHECK: [[META7:![0-9]+]] = !{}
;.		;.

llvm/test/CodeGen/AMDGPU/lower-lds-struct-aa-merge.ll

	; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s
	; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s			; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds --amdgpu-lower-module-lds-strategy=module < %s \| FileCheck %s

	@a = internal unnamed_addr addrspace(3) global [64 x i32] undef, align 4			@a = internal unnamed_addr addrspace(3) global [64 x i32] undef, align 4
	@b = internal unnamed_addr addrspace(3) global [64 x i32] undef, align 4			@b = internal unnamed_addr addrspace(3) global [64 x i32] undef, align 4

	; CHECK-LABEL: @no_clobber_ds_load_stores_x2_preexisting_aa			; CHECK-LABEL: @no_clobber_ds_load_stores_x2_preexisting_aa
	; CHECK: store i32 1, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2_preexisting_aa.lds, align 16, !tbaa !0, !noalias !5			; CHECK: store i32 1, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2_preexisting_aa.lds, align 16, !tbaa !1, !noalias !6
	; CHECK: %val.a = load i32, ptr addrspace(3) %gep.a, align 4, !tbaa !0, !noalias !5			; CHECK: %val.a = load i32, ptr addrspace(3) %gep.a, align 4, !tbaa !1, !noalias !6
	; CHECK: store i32 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2_preexisting_aa.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2_preexisting_aa.lds, i32 0, i32 1), align 16, !tbaa !0, !noalias !5			; CHECK: store i32 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2_preexisting_aa.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2_preexisting_aa.lds, i32 0, i32 1), align 16, !tbaa !1, !noalias !6
	; CHECK: %val.b = load i32, ptr addrspace(3) %gep.b, align 4, !tbaa !0, !noalias !5			; CHECK: %val.b = load i32, ptr addrspace(3) %gep.b, align 4, !tbaa !1, !noalias !6

	define amdgpu_kernel void @no_clobber_ds_load_stores_x2_preexisting_aa(ptr addrspace(1) %arg, i32 %i) {			define amdgpu_kernel void @no_clobber_ds_load_stores_x2_preexisting_aa(ptr addrspace(1) %arg, i32 %i) {
	bb:			bb:
	store i32 1, ptr addrspace(3) @a, align 4, !alias.scope !0, !noalias !3, !tbaa !5			store i32 1, ptr addrspace(3) @a, align 4, !alias.scope !0, !noalias !3, !tbaa !5
	%gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @a, i32 0, i32 %i			%gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @a, i32 0, i32 %i
	%val.a = load i32, ptr addrspace(3) %gep.a, align 4, !alias.scope !0, !noalias !3, !tbaa !5			%val.a = load i32, ptr addrspace(3) %gep.a, align 4, !alias.scope !0, !noalias !3, !tbaa !5
	store i32 2, ptr addrspace(3) @b, align 4, !alias.scope !3, !noalias !0, !tbaa !5			store i32 2, ptr addrspace(3) @b, align 4, !alias.scope !3, !noalias !0, !tbaa !5
	%gep.b = getelementptr inbounds [64 x i32], ptr addrspace(3) @b, i32 0, i32 %i			%gep.b = getelementptr inbounds [64 x i32], ptr addrspace(3) @b, i32 0, i32 %i
	Show All 9 Lines
	!3 = !{!4}			!3 = !{!4}
	!4 = distinct !{!4, !2}			!4 = distinct !{!4, !2}
	!5 = !{!6, !7, i64 0}			!5 = !{!6, !7, i64 0}
	!6 = !{!"no_clobber_ds_load_stores_x2_preexisting_aa", !7, i64 0}			!6 = !{!"no_clobber_ds_load_stores_x2_preexisting_aa", !7, i64 0}
	!7 = !{!"int", !8, i64 0}			!7 = !{!"int", !8, i64 0}
	!8 = !{!"omnipotent char", !9, i64 0}			!8 = !{!"omnipotent char", !9, i64 0}
	!9 = !{!"Simple C++ TBAA"}			!9 = !{!"Simple C++ TBAA"}

	; CHECK:!0 = !{!1, !2, i64 0}			; CHECK:!0 = !{i64 0, i64 1}
	; CHECK:!1 = !{!"no_clobber_ds_load_stores_x2_preexisting_aa", !2, i64 0}			; CHECK:!1 = !{!2, !3, i64 0}
	; CHECK:!2 = !{!"int", !3, i64 0}			; CHECK:!2 = !{!"no_clobber_ds_load_stores_x2_preexisting_aa", !3, i64 0}
	; CHECK:!3 = !{!"omnipotent char", !4, i64 0}			; CHECK:!3 = !{!"int", !4, i64 0}
	; CHECK:!4 = !{!"Simple C++ TBAA"}			; CHECK:!4 = !{!"omnipotent char", !5, i64 0}
	; CHECK:!5 = !{}			; CHECK:!5 = !{!"Simple C++ TBAA"}
				; CHECK:!6 = !{}

llvm/test/CodeGen/AMDGPU/lower-lds-struct-aa.ll

	Show All 11 Lines

	; GCN-LABEL: {{^}}no_clobber_ds_load_stores_x2:			; GCN-LABEL: {{^}}no_clobber_ds_load_stores_x2:
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: ds_write_b32			; GCN: ds_write_b32
	; GCN: ds_read_b32			; GCN: ds_read_b32
	; GCN: ds_read_b32			; GCN: ds_read_b32

	; CHECK-LABEL: @no_clobber_ds_load_stores_x2			; CHECK-LABEL: @no_clobber_ds_load_stores_x2
	; CHECK: store i32 1, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2.lds, align 16, !alias.scope !0, !noalias !3			; CHECK: store i32 1, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2.lds, align 16, !alias.scope !1, !noalias !4
	; CHECK: %val.a = load i32, ptr addrspace(3) %gep.a, align 4, !alias.scope !0, !noalias !3			; CHECK: %val.a = load i32, ptr addrspace(3) %gep.a, align 4, !alias.scope !1, !noalias !4
	; CHECK: store i32 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2.lds, i32 0, i32 1), align 16, !alias.scope !3, !noalias !0			; CHECK: store i32 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x2.lds, i32 0, i32 1), align 16, !alias.scope !4, !noalias !1
	; CHECK: %val.b = load i32, ptr addrspace(3) %gep.b, align 4, !alias.scope !3, !noalias !0			; CHECK: %val.b = load i32, ptr addrspace(3) %gep.b, align 4, !alias.scope !4, !noalias !1

	define amdgpu_kernel void @no_clobber_ds_load_stores_x2(ptr addrspace(1) %arg, i32 %i) {			define amdgpu_kernel void @no_clobber_ds_load_stores_x2(ptr addrspace(1) %arg, i32 %i) {
	bb:			bb:
	store i32 1, ptr addrspace(3) @a, align 4			store i32 1, ptr addrspace(3) @a, align 4
	%gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @a, i32 0, i32 %i			%gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @a, i32 0, i32 %i
	%val.a = load i32, ptr addrspace(3) %gep.a, align 4			%val.a = load i32, ptr addrspace(3) %gep.a, align 4
	store i32 2, ptr addrspace(3) @b, align 4			store i32 2, ptr addrspace(3) @b, align 4
	%gep.b = getelementptr inbounds [64 x i32], ptr addrspace(3) @b, i32 0, i32 %i			%gep.b = getelementptr inbounds [64 x i32], ptr addrspace(3) @b, i32 0, i32 %i
	%val.b = load i32, ptr addrspace(3) %gep.b, align 4			%val.b = load i32, ptr addrspace(3) %gep.b, align 4
	%val = add i32 %val.a, %val.b			%val = add i32 %val.a, %val.b
	store i32 %val, ptr addrspace(1) %arg, align 4			store i32 %val, ptr addrspace(1) %arg, align 4
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}no_clobber_ds_load_stores_x3:			; GCN-LABEL: {{^}}no_clobber_ds_load_stores_x3:
	; GCN-DAG: ds_write_b32			; GCN-DAG: ds_write_b32
	; GCN-DAG: ds_write_b32			; GCN-DAG: ds_write_b32
	; GCN-DAG: ds_write_b32			; GCN-DAG: ds_write_b32
	; GCN-DAG: ds_read_b32			; GCN-DAG: ds_read_b32
	; GCN-DAG: ds_read_b32			; GCN-DAG: ds_read_b32
	; GCN-DAG: ds_read_b32			; GCN-DAG: ds_read_b32

	; CHECK-LABEL: @no_clobber_ds_load_stores_x3			; CHECK-LABEL: @no_clobber_ds_load_stores_x3
	; CHECK: store i32 1, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, align 16, !alias.scope !5, !noalias !8			; CHECK: store i32 1, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, align 16, !alias.scope !6, !noalias !9
	; CHECK: %gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, i32 0, i32 %i			; CHECK: %gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, i32 0, i32 %i
	; CHECK: %val.a = load i32, ptr addrspace(3) %gep.a, align 4, !alias.scope !5, !noalias !8			; CHECK: %val.a = load i32, ptr addrspace(3) %gep.a, align 4, !alias.scope !6, !noalias !9
	; CHECK: store i32 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, i32 0, i32 1), align 16, !alias.scope !11, !noalias !12			; CHECK: store i32 2, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, i32 0, i32 1), align 16, !alias.scope !12, !noalias !13
	; CHECK: %val.b = load i32, ptr addrspace(3) %gep.b, align 4, !alias.scope !11, !noalias !12			; CHECK: %val.b = load i32, ptr addrspace(3) %gep.b, align 4, !alias.scope !12, !noalias !13
	; CHECK: store i32 3, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, i32 0, i32 2), align 16, !alias.scope !13, !noalias !14			; CHECK: store i32 3, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.no_clobber_ds_load_stores_x3.lds, i32 0, i32 2), align 16, !alias.scope !14, !noalias !15
	; CHECK: %val.c = load i32, ptr addrspace(3) %gep.c, align 4, !alias.scope !13, !noalias !14			; CHECK: %val.c = load i32, ptr addrspace(3) %gep.c, align 4, !alias.scope !14, !noalias !15

	define amdgpu_kernel void @no_clobber_ds_load_stores_x3(ptr addrspace(1) %arg, i32 %i) {			define amdgpu_kernel void @no_clobber_ds_load_stores_x3(ptr addrspace(1) %arg, i32 %i) {
	bb:			bb:
	store i32 1, ptr addrspace(3) @a, align 4			store i32 1, ptr addrspace(3) @a, align 4
	%gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @a, i32 0, i32 %i			%gep.a = getelementptr inbounds [64 x i32], ptr addrspace(3) @a, i32 0, i32 %i
	%val.a = load i32, ptr addrspace(3) %gep.a, align 4			%val.a = load i32, ptr addrspace(3) %gep.a, align 4
	store i32 2, ptr addrspace(3) @b, align 4			store i32 2, ptr addrspace(3) @b, align 4
	%gep.b = getelementptr inbounds [64 x i32], ptr addrspace(3) @b, i32 0, i32 %i			%gep.b = getelementptr inbounds [64 x i32], ptr addrspace(3) @b, i32 0, i32 %i
	%val.b = load i32, ptr addrspace(3) %gep.b, align 4			%val.b = load i32, ptr addrspace(3) %gep.b, align 4
	store i32 3, ptr addrspace(3) @c, align 4			store i32 3, ptr addrspace(3) @c, align 4
	%gep.c = getelementptr inbounds [64 x i32], ptr addrspace(3) @c, i32 0, i32 %i			%gep.c = getelementptr inbounds [64 x i32], ptr addrspace(3) @c, i32 0, i32 %i
	%val.c = load i32, ptr addrspace(3) %gep.c, align 4			%val.c = load i32, ptr addrspace(3) %gep.c, align 4
	%val.1 = add i32 %val.a, %val.b			%val.1 = add i32 %val.a, %val.b
	%val = add i32 %val.1, %val.c			%val = add i32 %val.1, %val.c
	store i32 %val, ptr addrspace(1) %arg, align 4			store i32 %val, ptr addrspace(1) %arg, align 4
	ret void			ret void
	}			}

	; CHECK: !0 = !{!1}			; CHECK: !0 = !{i64 0, i64 1}
	; CHECK: !1 = distinct !{!1, !2}			; CHECK: !1 = !{!2}
	; CHECK: !2 = distinct !{!2}			; CHECK: !2 = distinct !{!2, !3}
	; CHECK: !3 = !{!4}			; CHECK: !3 = distinct !{!3}
	; CHECK: !4 = distinct !{!4, !2}			; CHECK: !4 = !{!5}
	; CHECK: !5 = !{!6}			; CHECK: !5 = distinct !{!5, !3}
	; CHECK: !6 = distinct !{!6, !7}			; CHECK: !6 = !{!7}
	; CHECK: !7 = distinct !{!7}			; CHECK: !7 = distinct !{!7, !8}
	; CHECK: !8 = !{!9, !10}			; CHECK: !8 = distinct !{!8}
	; CHECK: !9 = distinct !{!9, !7}			; CHECK: !9 = !{!10, !11}
	; CHECK: !10 = distinct !{!10, !7}			; CHECK: !10 = distinct !{!10, !8}
	; CHECK: !11 = !{!9}			; CHECK: !11 = distinct !{!11, !8}
	; CHECK: !12 = !{!6, !10}			; CHECK: !12 = !{!10}
	; CHECK: !13 = !{!10}			; CHECK: !13 = !{!7, !11}
	; CHECK: !14 = !{!6, !9}			; CHECK: !14 = !{!11}
				; CHECK: !15 = !{!7, !10}

llvm/test/CodeGen/AMDGPU/lower-module-lds-single-var-unambiguous.ll

Show All 22 Lines	;
ret void		ret void
}		}

;; Function is reachable from one kernel. Variable goes in module lds or the kernel struct, but never both.		;; Function is reachable from one kernel. Variable goes in module lds or the kernel struct, but never both.

@f0.lds = addrspace(3) global i16 undef		@f0.lds = addrspace(3) global i16 undef
define void @f0() {		define void @f0() {
; MODULE-LABEL: @f0(		; MODULE-LABEL: @f0(
; MODULE-NEXT: [[LD:%.]] = load i16, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_MODULE_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 4, !alias.scope !0, !noalias !3		; MODULE-NEXT: [[LD:%.]] = load i16, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_MODULE_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 4, !alias.scope !1, !noalias !4
; MODULE-NEXT: [[MUL:%.*]] = mul i16 [[LD]], 3		; MODULE-NEXT: [[MUL:%.*]] = mul i16 [[LD]], 3
; MODULE-NEXT: store i16 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_MODULE_LDS_T]], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 4, !alias.scope !0, !noalias !3		; MODULE-NEXT: store i16 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_MODULE_LDS_T]], ptr addrspace(3) @llvm.amdgcn.module.lds, i32 0, i32 1), align 4, !alias.scope !1, !noalias !4
; MODULE-NEXT: ret void		; MODULE-NEXT: ret void
;		;
; TABLE-LABEL: @f0(		; TABLE-LABEL: @f0(
; TABLE-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()		; TABLE-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()
; TABLE-NEXT: [[F0_LDS2:%.*]] = getelementptr inbounds [2 x [2 x i32]], ptr addrspace(4) @llvm.amdgcn.lds.offset.table, i32 0, i32 [[TMP1]], i32 1		; TABLE-NEXT: [[F0_LDS2:%.*]] = getelementptr inbounds [2 x [2 x i32]], ptr addrspace(4) @llvm.amdgcn.lds.offset.table, i32 0, i32 [[TMP1]], i32 1
; TABLE-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[F0_LDS2]], align 4		; TABLE-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[F0_LDS2]], align 4
; TABLE-NEXT: [[F0_LDS3:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)		; TABLE-NEXT: [[F0_LDS3:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)
; TABLE-NEXT: [[LD:%.*]] = load i16, ptr addrspace(3) [[F0_LDS3]], align 2		; TABLE-NEXT: [[LD:%.*]] = load i16, ptr addrspace(3) [[F0_LDS3]], align 2
Show All 13 Lines	;
%ld = load i16, ptr addrspace(3) @f0.lds		%ld = load i16, ptr addrspace(3) @f0.lds
%mul = mul i16 %ld, 3		%mul = mul i16 %ld, 3
store i16 %mul, ptr addrspace(3) @f0.lds		store i16 %mul, ptr addrspace(3) @f0.lds
ret void		ret void
}		}

define amdgpu_kernel void @k_f0() {		define amdgpu_kernel void @k_f0() {
; MODULE-LABEL: @k_f0(		; MODULE-LABEL: @k_f0(
; MODULE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]		; MODULE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ], !alias.scope !5, !noalias !1
; MODULE-NEXT: call void @f0()		; MODULE-NEXT: call void @f0()
; MODULE-NEXT: ret void		; MODULE-NEXT: ret void
;		;
; TABLE-LABEL: @k_f0(		; TABLE-LABEL: @k_f0(
; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k_f0.lds) ]		; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k_f0.lds) ]
; TABLE-NEXT: call void @f0()		; TABLE-NEXT: call void @f0()
; TABLE-NEXT: ret void		; TABLE-NEXT: ret void
;		;
; K_OR_HY-LABEL: @k_f0(		; K_OR_HY-LABEL: @k_f0(
; K_OR_HY-NEXT: call void @f0()		; K_OR_HY-NEXT: call void @f0()
; K_OR_HY-NEXT: ret void		; K_OR_HY-NEXT: ret void
;		;
call void @f0()		call void @f0()
ret void		ret void
}		}

;; As above, but with the kernel also uing the variable.		;; As above, but with the kernel also uing the variable.

@both.lds = addrspace(3) global i32 undef		@both.lds = addrspace(3) global i32 undef
define void @f_both() {		define void @f_both() {
; MODULE-LABEL: @f_both(		; MODULE-LABEL: @f_both(
; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !4, !noalias !3		; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !5, !noalias !4
; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 4		; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 4
; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !4, !noalias !3		; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !5, !noalias !4
; MODULE-NEXT: ret void		; MODULE-NEXT: ret void
;		;
; TABLE-LABEL: @f_both(		; TABLE-LABEL: @f_both(
; TABLE-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()		; TABLE-NEXT: [[TMP1:%.*]] = call i32 @llvm.amdgcn.lds.kernel.id()
; TABLE-NEXT: [[BOTH_LDS2:%.*]] = getelementptr inbounds [2 x [2 x i32]], ptr addrspace(4) @llvm.amdgcn.lds.offset.table, i32 0, i32 [[TMP1]], i32 0		; TABLE-NEXT: [[BOTH_LDS2:%.*]] = getelementptr inbounds [2 x [2 x i32]], ptr addrspace(4) @llvm.amdgcn.lds.offset.table, i32 0, i32 [[TMP1]], i32 0
; TABLE-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[BOTH_LDS2]], align 4		; TABLE-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[BOTH_LDS2]], align 4
; TABLE-NEXT: [[BOTH_LDS3:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)		; TABLE-NEXT: [[BOTH_LDS3:%.*]] = inttoptr i32 [[TMP2]] to ptr addrspace(3)
; TABLE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) [[BOTH_LDS3]], align 4		; TABLE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) [[BOTH_LDS3]], align 4
Show All 14 Lines	;
%mul = mul i32 %ld, 4		%mul = mul i32 %ld, 4
store i32 %mul, ptr addrspace(3) @both.lds		store i32 %mul, ptr addrspace(3) @both.lds
ret void		ret void
}		}

define amdgpu_kernel void @k0_both() {		define amdgpu_kernel void @k0_both() {
; MODULE-LABEL: @k0_both(		; MODULE-LABEL: @k0_both(
; MODULE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]		; MODULE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !4, !noalias !0		; MODULE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !5, !noalias !1
; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5		; MODULE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5
; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !4, !noalias !0		; MODULE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.module.lds, align 4, !alias.scope !5, !noalias !1
; MODULE-NEXT: call void @f_both()		; MODULE-NEXT: call void @f_both()
; MODULE-NEXT: ret void		; MODULE-NEXT: ret void
;		;
; TABLE-LABEL: @k0_both(		; TABLE-LABEL: @k0_both(
; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds) ]		; TABLE-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds) ]
; TABLE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4		; TABLE-NEXT: [[LD:%.*]] = load i32, ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4
; TABLE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5		; TABLE-NEXT: [[MUL:%.*]] = mul i32 [[LD]], 5
; TABLE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4		; TABLE-NEXT: store i32 [[MUL]], ptr addrspace(3) @llvm.amdgcn.kernel.k0_both.lds, align 4
Show All 16 Lines

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-hybrid.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: opt -S -mtriple=amdgcn--amdhsa -passes=amdgpu-lower-module-lds < %s --amdgpu-lower-module-lds-strategy=hybrid \| FileCheck -check-prefix=OPT %s			; RUN: opt -S -mtriple=amdgcn--amdhsa -passes=amdgpu-lower-module-lds < %s --amdgpu-lower-module-lds-strategy=hybrid \| FileCheck -check-prefix=OPT %s
	; RUN: llc -mtriple=amdgcn--amdhsa -verify-machineinstrs < %s --amdgpu-lower-module-lds-strategy=hybrid \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn--amdhsa -verify-machineinstrs < %s --amdgpu-lower-module-lds-strategy=hybrid \| FileCheck -check-prefix=GCN %s

	; Opt checks from utils/update_test_checks.py, llc checks from utils/update_llc_test_checks.py			; Opt checks from utils/update_test_checks.py, llc checks from utils/update_llc_test_checks.py

	; Define four variables and four non-kernel functions which access exactly one variable each			; Define four variables and four non-kernel functions which access exactly one variable each
	@v0 = addrspace(3) global float undef			@v0 = addrspace(3) global float undef
	@v1 = addrspace(3) global i16 undef, align 16			@v1 = addrspace(3) global i16 undef, align 16
	@v2 = addrspace(3) global i64 undef			@v2 = addrspace(3) global i64 undef
	@v3 = addrspace(3) global i8 undef			@v3 = addrspace(3) global i8 undef
	@unused = addrspace(3) global i16 undef			@unused = addrspace(3) global i16 undef

	; OPT: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 16			; OPT: @llvm.amdgcn.module.lds = internal addrspace(3) global %llvm.amdgcn.module.lds.t undef, align 16, !absolute_symbol !0
	; OPT: @llvm.compiler.used = appending global [1 x ptr] [ptr addrspacecast (ptr addrspace(3) @llvm.amdgcn.module.lds to ptr)], section "llvm.metadata"			; OPT: @llvm.compiler.used = appending global [1 x ptr] [ptr addrspacecast (ptr addrspace(3) @llvm.amdgcn.module.lds to ptr)], section "llvm.metadata"
	; OPT: @llvm.amdgcn.kernel.kernel_no_table.lds = internal addrspace(3) global %llvm.amdgcn.kernel.kernel_no_table.lds.t undef, align 8			; OPT: @llvm.amdgcn.kernel.kernel_no_table.lds = internal addrspace(3) global %llvm.amdgcn.kernel.kernel_no_table.lds.t undef, align 8, !absolute_symbol !0
	; OPT: @llvm.amdgcn.kernel.k01.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k01.lds.t undef, align 4			; OPT: @llvm.amdgcn.kernel.k01.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k01.lds.t undef, align 4, !absolute_symbol !1
	; OPT: @llvm.amdgcn.kernel.k23.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k23.lds.t undef, align 8			; OPT: @llvm.amdgcn.kernel.k23.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k23.lds.t undef, align 8, !absolute_symbol !0
	; OPT: @llvm.amdgcn.kernel.k123.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k123.lds.t undef, align 8			; OPT: @llvm.amdgcn.kernel.k123.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k123.lds.t undef, align 8, !absolute_symbol !2
	; OPT{LITERAL}: @llvm.amdgcn.lds.offset.table = internal addrspace(4) constant [2 x [1 x i32]] [[1 x i32] [i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds to i32)], [1 x i32] [i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds to i32)]]			; OPT{LITERAL}: @llvm.amdgcn.lds.offset.table = internal addrspace(4) constant [2 x [1 x i32]] [[1 x i32] [i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds to i32)], [1 x i32] [i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds to i32)]]

	;.			;.
	define void @f0() {			define void @f0() {
	; OPT-LABEL: @f0(			; OPT-LABEL: @f0(
	; OPT-NEXT: %ld = load float, ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds, align 4			; OPT-NEXT: %ld = load float, ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds, align 4
	; OPT-NEXT: %mul = fmul float %ld, 2.000000e+00			; OPT-NEXT: %mul = fmul float %ld, 2.000000e+00
	; OPT-NEXT: store float %mul, ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds, align 4			; OPT-NEXT: store float %mul, ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds, align 4
	▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
	}			}

	; Access and allocate three variables			; Access and allocate three variables
	define amdgpu_kernel void @k123() {			define amdgpu_kernel void @k123() {
	; OPT-LABEL: @k123(			; OPT-LABEL: @k123(
	; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]			; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]
	; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]			; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.module.lds) ]
	; OPT-NEXT: call void @f1()			; OPT-NEXT: call void @f1()
	; OPT-NEXT: %ld = load i8, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope !2, !noalias !5			; OPT-NEXT: %ld = load i8, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope !5, !noalias !8
	; OPT-NEXT: %mul = mul i8 %ld, 8			; OPT-NEXT: %mul = mul i8 %ld, 8
	; OPT-NEXT: store i8 %mul, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope !2, !noalias !5			; OPT-NEXT: store i8 %mul, ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 8, !alias.scope !5, !noalias !8
	; OPT-NEXT: call void @f2()			; OPT-NEXT: call void @f2()
	; OPT-NEXT: ret void			; OPT-NEXT: ret void
	;			;
	; GCN-LABEL: k123:			; GCN-LABEL: k123:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_mov_b32 s32, 0			; GCN-NEXT: s_mov_b32 s32, 0
	; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7			; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
	; GCN-NEXT: s_add_i32 s6, s6, s9			; GCN-NEXT: s_add_i32 s6, s6, s9
	Show All 36 Lines
	!2 = !{i32 1}			!2 = !{i32 1}


	;.			;.
	; OPT: attributes #0 = { "amdgpu-elide-module-lds" }			; OPT: attributes #0 = { "amdgpu-elide-module-lds" }
	; OPT: attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }			; OPT: attributes #1 = { nocallback nofree nosync nounwind willreturn memory(none) }
	; OPT: attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }			; OPT: attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
	;.			;.
	; OPT: !0 = !{i32 1}			; OPT: !0 = !{i64 0, i64 1}
	; OPT: !1 = !{i32 0}			; OPT: !1 = !{i64 4, i64 5}
	; OPT: !2 = !{!3}			; OPT: !2 = !{i64 8, i64 9}
	; OPT: !3 = distinct !{!3, !4}			; OPT: !3 = !{i32 1}
	; OPT: !4 = distinct !{!4}			; OPT: !4 = !{i32 0}
	; OPT: !5 = !{!6}			; OPT: !5 = !{!6}
	; OPT: !6 = distinct !{!6, !4}			; OPT: !6 = distinct !{!6, !7}
				; OPT: !7 = distinct !{!7}
				; OPT: !8 = !{!9}
				; OPT: !9 = distinct !{!9, !7}
	;.			;.

	; Table size length number-kernels * number-variables * sizeof(uint16_t)			; Table size length number-kernels * number-variables * sizeof(uint16_t)
	; GCN: .type llvm.amdgcn.lds.offset.table,@object			; GCN: .type llvm.amdgcn.lds.offset.table,@object
	; GCN-NEXT: .section .data.rel.ro,#alloc,#write			; GCN-NEXT: .section .data.rel.ro,#alloc,#write
	; GCN-NEXT: .p2align 2, 0x0			; GCN-NEXT: .p2align 2, 0x0
	; GCN-NEXT: llvm.amdgcn.lds.offset.table:			; GCN-NEXT: llvm.amdgcn.lds.offset.table:
	; GCN-NEXT: .long 8			; GCN-NEXT: .long 8
	; GCN-NEXT: .long 0			; GCN-NEXT: .long 0
	; GCN-NEXT: .size llvm.amdgcn.lds.offset.table, 8			; GCN-NEXT: .size llvm.amdgcn.lds.offset.table, 8

llvm/test/CodeGen/AMDGPU/lower-module-lds-via-table.ll

Show All 9 Lines
@v3 = addrspace(3) global i8 undef		@v3 = addrspace(3) global i8 undef
@unused = addrspace(3) global i16 undef		@unused = addrspace(3) global i16 undef

; OPT: %llvm.amdgcn.kernel.kernel_no_table.lds.t = type { i64 }		; OPT: %llvm.amdgcn.kernel.kernel_no_table.lds.t = type { i64 }
; OPT: %llvm.amdgcn.kernel.k01.lds.t = type { i16, [2 x i8], float }		; OPT: %llvm.amdgcn.kernel.k01.lds.t = type { i16, [2 x i8], float }
; OPT: %llvm.amdgcn.kernel.k23.lds.t = type { i64, i8 }		; OPT: %llvm.amdgcn.kernel.k23.lds.t = type { i64, i8 }
; OPT: %llvm.amdgcn.kernel.k123.lds.t = type { i16, i8, [5 x i8], i64 }		; OPT: %llvm.amdgcn.kernel.k123.lds.t = type { i16, i8, [5 x i8], i64 }

; OPT: @llvm.amdgcn.kernel.kernel_no_table.lds = internal addrspace(3) global %llvm.amdgcn.kernel.kernel_no_table.lds.t undef, align 8		; OPT: @llvm.amdgcn.kernel.kernel_no_table.lds = internal addrspace(3) global %llvm.amdgcn.kernel.kernel_no_table.lds.t undef, align 8, !absolute_symbol !0
; OPT: @llvm.amdgcn.kernel.k01.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k01.lds.t undef, align 16		; OPT: @llvm.amdgcn.kernel.k01.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k01.lds.t undef, align 16, !absolute_symbol !0
; OPT: @llvm.amdgcn.kernel.k23.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k23.lds.t undef, align 8		; OPT: @llvm.amdgcn.kernel.k23.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k23.lds.t undef, align 8, !absolute_symbol !0
; OPT: @llvm.amdgcn.kernel.k123.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k123.lds.t undef, align 16		; OPT: @llvm.amdgcn.kernel.k123.lds = internal addrspace(3) global %llvm.amdgcn.kernel.k123.lds.t undef, align 16, !absolute_symbol !0

; Salient parts of the IR lookup table check:		; Salient parts of the IR lookup table check:
; It has (top level) size 3 as there are 3 kernels that call functions which use lds		; It has (top level) size 3 as there are 3 kernels that call functions which use lds
; The next level down has type [4 x i16] as there are 4 variables accessed by functions which use lds		; The next level down has type [4 x i16] as there are 4 variables accessed by functions which use lds
; The kernel naming pattern and the structs being named after the functions helps verify placement of undef		; The kernel naming pattern and the structs being named after the functions helps verify placement of undef
; The remainder are constant expressions into the variable instances checked above		; The remainder are constant expressions into the variable instances checked above

; OPT{LITERAL}: @llvm.amdgcn.lds.offset.table = internal addrspace(4) constant [3 x [4 x i32]] [[4 x i32] [i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k01.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds, i32 0, i32 2) to i32), i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds to i32), i32 poison, i32 poison], [4 x i32] [i32 poison, i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds to i32), i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 3) to i32), i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1) to i32)], [4 x i32] [i32 poison, i32 poison, i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds to i32), i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k23.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds, i32 0, i32 1) to i32)]]		; OPT{LITERAL}: @llvm.amdgcn.lds.offset.table = internal addrspace(4) constant [3 x [4 x i32]] [[4 x i32] [i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k01.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds, i32 0, i32 2) to i32), i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds to i32), i32 poison, i32 poison], [4 x i32] [i32 poison, i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds to i32), i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 3) to i32), i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k123.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1) to i32)], [4 x i32] [i32 poison, i32 poison, i32 ptrtoint (ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds to i32), i32 ptrtoint (ptr addrspace(3) getelementptr inbounds (%llvm.amdgcn.kernel.k23.lds.t, ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds, i32 0, i32 1) to i32)]]
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	; GCN-NEXT: s_endpgm
%ld = load i64, ptr addrspace(3) @v2		%ld = load i64, ptr addrspace(3) @v2
%mul = mul i64 %ld, 8		%mul = mul i64 %ld, 8
store i64 %mul, ptr addrspace(3) @v2		store i64 %mul, ptr addrspace(3) @v2
ret void		ret void
}		}

; Access two variables, will allocate those two		; Access two variables, will allocate those two
define amdgpu_kernel void @k01() {		define amdgpu_kernel void @k01() {
; OPT-LABEL: @k01() !llvm.amdgcn.lds.kernel.id !0 {		; OPT-LABEL: @k01() !llvm.amdgcn.lds.kernel.id !1 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds) ]		; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k01.lds) ]
; OPT-NEXT: call void @f0()		; OPT-NEXT: call void @f0()
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k01:		; GCN-LABEL: k01:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
Show All 21 Lines
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
; GCN: .amdhsa_group_segment_fixed_size 8		; GCN: .amdhsa_group_segment_fixed_size 8
call void @f0()		call void @f0()
call void @f1()		call void @f1()
ret void		ret void
}		}

define amdgpu_kernel void @k23() {		define amdgpu_kernel void @k23() {
; OPT-LABEL: @k23() !llvm.amdgcn.lds.kernel.id !1 {		; OPT-LABEL: @k23() !llvm.amdgcn.lds.kernel.id !2 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds) ]		; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k23.lds) ]
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: call void @f3()		; OPT-NEXT: call void @f3()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k23:		; GCN-LABEL: k23:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
Show All 22 Lines
; GCN: .amdhsa_group_segment_fixed_size 16		; GCN: .amdhsa_group_segment_fixed_size 16
call void @f2()		call void @f2()
call void @f3()		call void @f3()
ret void		ret void
}		}

; Access and allocate three variables		; Access and allocate three variables
define amdgpu_kernel void @k123() {		define amdgpu_kernel void @k123() {
; OPT-LABEL: @k123() !llvm.amdgcn.lds.kernel.id !2 {		; OPT-LABEL: @k123() !llvm.amdgcn.lds.kernel.id !3 {
; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]		; OPT-NEXT: call void @llvm.donothing() [ "ExplicitUse"(ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds) ]
; OPT-NEXT: call void @f1()		; OPT-NEXT: call void @f1()
; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !3, !noalias !6		; OPT-NEXT: [[LD:%.]] = load i8, ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T:%.]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !4, !noalias !7
; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8		; OPT-NEXT: [[MUL:%.*]] = mul i8 [[LD]], 8
; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !3, !noalias !6		; OPT-NEXT: store i8 [[MUL]], ptr addrspace(3) getelementptr inbounds ([[LLVM_AMDGCN_KERNEL_K123_LDS_T]], ptr addrspace(3) @llvm.amdgcn.kernel.k123.lds, i32 0, i32 1), align 2, !alias.scope !4, !noalias !7
; OPT-NEXT: call void @f2()		; OPT-NEXT: call void @f2()
; OPT-NEXT: ret void		; OPT-NEXT: ret void
;		;
; GCN-LABEL: k123:		; GCN-LABEL: k123:
; GCN: ; %bb.0:		; GCN: ; %bb.0:
; GCN-NEXT: s_mov_b32 s32, 0		; GCN-NEXT: s_mov_b32 s32, 0
; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7		; GCN-NEXT: s_mov_b32 flat_scratch_lo, s7
; GCN-NEXT: s_add_i32 s6, s6, s9		; GCN-NEXT: s_add_i32 s6, s6, s9
Show All 29 Lines	; GCN: .amdhsa_group_segment_fixed_size 16
store i8 %mul, ptr addrspace(3) @v3		store i8 %mul, ptr addrspace(3) @v3
call void @f2()		call void @f2()
ret void		ret void
}		}


; OPT: declare i32 @llvm.amdgcn.lds.kernel.id()		; OPT: declare i32 @llvm.amdgcn.lds.kernel.id()

!0 = !{i32 0}		!0 = !{i64 0, i64 1}
!1 = !{i32 2}		!1 = !{i32 0}
!2 = !{i32 1}		!2 = !{i32 2}
		!3 = !{i32 1}


; Table size length number-kernels * number-variables * sizeof(uint16_t)		; Table size length number-kernels * number-variables * sizeof(uint16_t)
; GCN: .type llvm.amdgcn.lds.offset.table,@object		; GCN: .type llvm.amdgcn.lds.offset.table,@object
; GCN-NEXT: .section .data.rel.ro,#alloc,#write		; GCN-NEXT: .section .data.rel.ro,#alloc,#write
; GCN-NEXT: .p2align 4, 0x0		; GCN-NEXT: .p2align 4, 0x0
; GCN-NEXT: llvm.amdgcn.lds.offset.table:		; GCN-NEXT: llvm.amdgcn.lds.offset.table:
; GCN-NEXT: .long 0+4		; GCN-NEXT: .long 0+4
Show All 12 Lines