This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUISelDAGToDAG.cpp
-
AMDGPUInstructionSelector.cpp
-
SIFoldOperands.cpp
1/2
SIRegisterInfo.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
-
inst-select-load-private.mir
-
inst-select-store-private.mir
-
amdpal-callable.ll
-
fold-fi-mubuf.mir
1/2
local-stack-alloc-block-sp-reference.ll
-
stack-pointer-offset-relative-frameindex.ll

Differential D95071

[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
ClosedPublic

Authored by cdevadas on Jan 20 2021, 10:33 AM.

Download Raw Diff

Details

Reviewers

arsenm
scott.linder

Commits

rGff8a1cae1814: [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.

Summary

During instruction selection, there is an inconsistency in choosing
the initial soffset value. With certain early passes, this value is
getting modified and that brought additional fixup during
eliminateFrameIndex to work for all cases. This whole transformation
looks trivial and can be handled better.

This patch clearly defines the initial value for soffset and keeps it
unchanged before eliminateFrameIndex. The initial value must be zero
for MUBUF with a frame index. The non-frame index MUBUF forms that
use a raw offset from SP will have the stack register for soffset.
During frame elimination, the soffset remains zero for entry functions
with zero dynamic allocas and no callsites, or else is updated to the
appropriate frame/stack register.

Also, did some code clean up and made all asserts around soffset
stricter to match.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cdevadas created this revision.Jan 20 2021, 10:33 AM

Herald added subscribers: kerbowa, arphaman, hiraditya and 7 others. · View Herald TranscriptJan 20 2021, 10:33 AM

cdevadas requested review of this revision.Jan 20 2021, 10:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2021, 10:33 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B85929: Diff 317931.Jan 20 2021, 12:21 PM

Thank you for the patch, I definitely think this is a step in the right direction! However, I think we are still broken in common cases, the first of which can be seen by simply removing amdgpu_kernel from llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll and noticing that the soffset of the buffer_load_dwords in %bb.1 do not correctly refer to the frame pointer, they are constant 0 just like in the kernel case:

$ sed 's/amdgpu_kernel//' llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll | release/bin/llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | grep '%bb' -A3
; %bb.0:                                ; %entry
        s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
        s_waitcnt_vscnt null, 0x0
        s_or_saveexec_b32 s4, -1
--
; %bb.1:                                ; %if.then4.i
        s_clause 0x1
        buffer_load_dword v0, v40, s[0:3], 0 offen
        buffer_load_dword v1, v40, s[0:3], 0 offen offset:4

This was true before this patch, and also true with a slightly different result back when the FIXME was first added:

$ git switch --detach 60b1967c3933c && ninja -C release/ llc && sed 's/amdgpu_kernel//' llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll | release/bin/llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs | grep '%bb' -A3
HEAD is now at 60b1967c3933 [AMDGPU] Add Scratch Wave Offset to Scratch Buffer Descriptor in entry functions
ninja: Entering directory `release/'
ninja: no work to do.
; %bb.0:                                ; %entry
        s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
        s_waitcnt_vscnt null, 0x0
        s_or_saveexec_b32 s4, -1
--
; %bb.1:                                ; %if.then4.i
        buffer_load_dword v0, v32, s[0:3], s32 offen
        buffer_load_dword v1, v32, s[0:3], s32 offen offset:4
        s_waitcnt vmcnt(0)

The fundamental issue is in all of these cases is that we still are relying on eliminateFrameIndex to correct the soffset of an MUBUF instruction, when it is not guaranteed that the frame index will survive to this point in it's original place (i.e. as an operand of the MUBUF we care about updating). From the comment, it seems like LocalStackSlotAllocation is one such place that can move the frame index. One mitigating factor is that we also try to fold the frame index back into the MUBUF when possible in another pass, but as we can't do this in all cases, we can't rely on it for generating correct code.

I think using a pseudo for these MUBUF cases early, and lowering them around the same time as eliminateFrameIndex (i.e. some time after we know how to populate the soffset field of the corresponding "real" MUBUF instruction) should work in all cases. IIRC this was the approach Matt liked back when the FIXME was added.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
495–502	Not exactly related to this patch, but I feel like I must be reading this wrong. Why is there a different assert and an early return for DEBUG builds? If isLegalFLATOffset implies isLegalMUBUFImmOffset then this is just an additional, stricter check to what is below, but it still seems odd to have a `return` here then. Why not just fall through in the DEBUG case instead of copy-pasting the code?
llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
19	I don't see the connection to the patch at hand, is this just unrelated cleanup or does this patch affect the dead code here?

In D95071#2511070, @scott.linder wrote:

Thank you for the patch, I definitely think this is a step in the right direction! However, I think we are still broken in common cases, the first of which can be seen by simply removing amdgpu_kernel from llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll and noticing that the soffset of the buffer_load_dwords in %bb.1 do not correctly refer to the frame pointer, they are constant 0 just like in the kernel case:

The frame index being separated from the memory operation is fine. 0 is always an acceptable soffset, kernel or not. The vaddr will be interpreted as the absolute address. If the frame index is materialized by a move, it will be expanded such that the address placed in the mubuf vaddr should work

In D95071#2511105, @arsenm wrote:

In D95071#2511070, @scott.linder wrote:

Thank you for the patch, I definitely think this is a step in the right direction! However, I think we are still broken in common cases, the first of which can be seen by simply removing amdgpu_kernel from llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll and noticing that the soffset of the buffer_load_dwords in %bb.1 do not correctly refer to the frame pointer, they are constant 0 just like in the kernel case:

The frame index being separated from the memory operation is fine. 0 is always an acceptable soffset, kernel or not. The vaddr will be interpreted as the absolute address. If the frame index is materialized by a move, it will be expanded such that the address placed in the mubuf vaddr should work

OK, that makes sense, sorry for the noise!

I'm not sure why I didn't consider the code right above the MUBUF handling in eliminateFrameIndex before assuming this wasn't correct. I'm also not sure why I didn't just read the ISA for the function case more carefully.

I'm satisfied that starting with 0 for the frame-index case and requiring that be the true in eliminateFrameIndex is a sound approach, then.

cdevadas added inline comments.Jan 21 2021, 3:02 AM

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
495–502	The flat scratch code was an add-on to the existing code that earlier handled MUBUF instructions alone. Looks like the early return was meant to skip the soffset validation.
llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll
19	This is completely an unrelated cleanup. This test was added as part of D87472. I noticed the dead argument and thought I can remove it. Let me know if this should go as a separate NFC commit.

I'd recommend committing the NFC changes before this, no need for a separate review.

Otherwise this LGTM, thank you!

This revision is now accepted and ready to land.Jan 21 2021, 9:01 AM

Closed by commit rGff8a1cae1814: [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses. (authored by cdevadas). · Explain WhyJan 22 2021, 12:54 AM

This revision was automatically updated to reflect the committed changes.

cdevadas added a commit: rGff8a1cae1814: [AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelDAGToDAG.cpp

4 lines

AMDGPUInstructionSelector.cpp

22 lines

SIFoldOperands.cpp

8 lines

SIRegisterInfo.cpp

21 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

inst-select-load-private.mir

8 lines

inst-select-store-private.mir

4 lines

amdpal-callable.ll

4 lines

fold-fi-mubuf.mir

32 lines

local-stack-alloc-block-sp-reference.ll

4 lines

stack-pointer-offset-relative-frameindex.ll

14 lines

Diff 317931

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

	Show First 20 Lines • Show All 1,518 Lines • ▼ Show 20 Lines
	std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {			std::pair<SDValue, SDValue> AMDGPUDAGToDAGISel::foldFrameIndex(SDValue N) const {
	SDLoc DL(N);			SDLoc DL(N);

	auto *FI = dyn_cast<FrameIndexSDNode>(N);			auto *FI = dyn_cast<FrameIndexSDNode>(N);
	SDValue TFI =			SDValue TFI =
	FI ? CurDAG->getTargetFrameIndex(FI->getIndex(), FI->getValueType(0)) : N;			FI ? CurDAG->getTargetFrameIndex(FI->getIndex(), FI->getValueType(0)) : N;

	// We rebase the base address into an absolute stack address and hence			// We rebase the base address into an absolute stack address and hence
	// use constant 0 for soffset.			// use constant 0 for soffset. This value must be retained until
				// frame elimination and eliminateFrameIndex will choose the appropriate
				// frame register if need be.
	return std::make_pair(TFI, CurDAG->getTargetConstant(0, DL, MVT::i32));			return std::make_pair(TFI, CurDAG->getTargetConstant(0, DL, MVT::i32));
	}			}

	bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,			bool AMDGPUDAGToDAGISel::SelectMUBUFScratchOffen(SDNode *Parent,
	SDValue Addr, SDValue &Rsrc,			SDValue Addr, SDValue &Rsrc,
	SDValue &VAddr, SDValue &SOffset,			SDValue &VAddr, SDValue &SOffset,
	SDValue &ImmOffset) const {			SDValue &ImmOffset) const {

	▲ Show 20 Lines • Show All 1,571 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

Show First 20 Lines • Show All 3,664 Lines • ▼ Show 20 Lines	if (mi_match(Root.getReg(), *MRI, m_ICst(Offset)) &&

return {{[=](MachineInstrBuilder &MIB) { // rsrc		return {{[=](MachineInstrBuilder &MIB) { // rsrc
MIB.addReg(Info->getScratchRSrcReg());		MIB.addReg(Info->getScratchRSrcReg());
},		},
[=](MachineInstrBuilder &MIB) { // vaddr		[=](MachineInstrBuilder &MIB) { // vaddr
MIB.addReg(HighBits);		MIB.addReg(HighBits);
},		},
[=](MachineInstrBuilder &MIB) { // soffset		[=](MachineInstrBuilder &MIB) { // soffset
const MachineMemOperand MMO = MI->memoperands_begin();		// Use constant zero for soffset and rely on eliminateFrameIndex
const MachinePointerInfo &PtrInfo = MMO->getPointerInfo();		// to choose the appropriate frame register if need be.

if (isStackPtrRelative(PtrInfo))
MIB.addReg(Info->getStackPtrOffsetReg());
else
MIB.addImm(0);		MIB.addImm(0);
},		},
[=](MachineInstrBuilder &MIB) { // offset		[=](MachineInstrBuilder &MIB) { // offset
MIB.addImm(Offset & 4095);		MIB.addImm(Offset & 4095);
}}};		}}};
}		}

assert(Offset == 0 \|\| Offset == -1);		assert(Offset == 0 \|\| Offset == -1);

Show All 30 Lines	return {{[=](MachineInstrBuilder &MIB) { // rsrc
},		},
[=](MachineInstrBuilder &MIB) { // vaddr		[=](MachineInstrBuilder &MIB) { // vaddr
if (FI.hasValue())		if (FI.hasValue())
MIB.addFrameIndex(FI.getValue());		MIB.addFrameIndex(FI.getValue());
else		else
MIB.addReg(VAddr);		MIB.addReg(VAddr);
},		},
[=](MachineInstrBuilder &MIB) { // soffset		[=](MachineInstrBuilder &MIB) { // soffset
// If we don't know this private access is a local stack object, it		// Use constant zero for soffset and rely on eliminateFrameIndex
// needs to be relative to the entry point's scratch wave offset.		// to choose the appropriate frame register if need be.
// TODO: Should split large offsets that don't fit like above.
// TODO: Don't use scratch wave offset just because the offset
// didn't fit.
if (!Info->isEntryFunction() && FI.hasValue())
MIB.addReg(Info->getStackPtrOffsetReg());
else
MIB.addImm(0);		MIB.addImm(0);
},		},
[=](MachineInstrBuilder &MIB) { // offset		[=](MachineInstrBuilder &MIB) { // offset
MIB.addImm(Offset);		MIB.addImm(Offset);
}}};		}}};
}		}

bool AMDGPUInstructionSelector::isDSOffsetLegal(Register Base,		bool AMDGPUInstructionSelector::isDSOffsetLegal(Register Base,
int64_t Offset) const {		int64_t Offset) const {
▲ Show 20 Lines • Show All 628 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	if (TII->isMUBUF(*UseMI)) {
if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=		if (TII->getNamedOperand(*UseMI, AMDGPU::OpName::srsrc)->getReg() !=
MFI->getScratchRSrcReg())		MFI->getScratchRSrcReg())
return;		return;

// Ensure this is either relative to the current frame or the current		// Ensure this is either relative to the current frame or the current
// wave.		// wave.
MachineOperand &SOff =		MachineOperand &SOff =
TII->getNamedOperand(UseMI, AMDGPU::OpName::soffset);		TII->getNamedOperand(UseMI, AMDGPU::OpName::soffset);
if ((!SOff.isReg() \|\| SOff.getReg() != MFI->getStackPtrOffsetReg()) &&		if (!SOff.isImm() \|\| SOff.getImm() != 0)
(!SOff.isImm() \|\| SOff.getImm() != 0))
return;		return;

// If this is relative to the current wave, update it to be relative to
// the current frame.
if (SOff.isImm())
SOff.ChangeToRegister(MFI->getStackPtrOffsetReg(), false);
}		}

// A frame index will resolve to a positive constant, so it should always be		// A frame index will resolve to a positive constant, so it should always be
// safe to fold the addressing mode, even pre-GFX9.		// safe to fold the addressing mode, even pre-GFX9.
UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());		UseMI->getOperand(UseOpIdx).ChangeToFrameIndex(OpToFold.getIndex());

if (TII->isFLATScratch(*UseMI) &&		if (TII->isFLATScratch(*UseMI) &&
AMDGPU::getNamedOperandIdx(UseMI->getOpcode(),		AMDGPU::getNamedOperandIdx(UseMI->getOpcode(),
▲ Show 20 Lines • Show All 931 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 484 Lines • ▼ Show 20 Lines	#endif
MachineOperand *FIOp =		MachineOperand *FIOp =
TII->getNamedOperand(MI, IsFlat ? AMDGPU::OpName::saddr		TII->getNamedOperand(MI, IsFlat ? AMDGPU::OpName::saddr
: AMDGPU::OpName::vaddr);		: AMDGPU::OpName::vaddr);

MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);		MachineOperand *OffsetOp = TII->getNamedOperand(MI, AMDGPU::OpName::offset);
int64_t NewOffset = OffsetOp->getImm() + Offset;		int64_t NewOffset = OffsetOp->getImm() + Offset;

#ifndef NDEBUG		#ifndef NDEBUG
MachineBasicBlock *MBB = MI.getParent();
MachineFunction *MF = MBB->getParent();
assert(FIOp && FIOp->isFI() && "frame index must be address operand");		assert(FIOp && FIOp->isFI() && "frame index must be address operand");
assert(TII->isMUBUF(MI) \|\| TII->isFLATScratch(MI));		assert(TII->isMUBUF(MI) \|\| TII->isFLATScratch(MI));

if (IsFlat) {		if (IsFlat) {
assert(TII->isLegalFLATOffset(NewOffset, AMDGPUAS::PRIVATE_ADDRESS, true) &&		assert(TII->isLegalFLATOffset(NewOffset, AMDGPUAS::PRIVATE_ADDRESS, true) &&
"offset should be legal");		"offset should be legal");
FIOp->ChangeToRegister(BaseReg, false);		FIOp->ChangeToRegister(BaseReg, false);
OffsetOp->setImm(NewOffset);		OffsetOp->setImm(NewOffset);
return;		return;
}		}
		scott.linderUnsubmitted Not Done Reply Inline Actions Not exactly related to this patch, but I feel like I must be reading this wrong. Why is there a different assert and an early return for DEBUG builds? If isLegalFLATOffset implies isLegalMUBUFImmOffset then this is just an additional, stricter check to what is below, but it still seems odd to have a `return` here then. Why not just fall through in the DEBUG case instead of copy-pasting the code? scott.linder: Not exactly related to this patch, but I feel like I must be reading this wrong. Why is there a…
		cdevadasAuthorUnsubmitted Done Reply Inline Actions The flat scratch code was an add-on to the existing code that earlier handled MUBUF instructions alone. Looks like the early return was meant to skip the soffset validation. cdevadas: The flat scratch code was an add-on to the existing code that earlier handled MUBUF…

MachineOperand *SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);		MachineOperand *SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
assert((SOffset->isReg() &&		assert(SOffset->isImm() && SOffset->getImm() == 0);
SOffset->getReg() ==
MF->getInfo<SIMachineFunctionInfo>()->getStackPtrOffsetReg()) \|\|
(SOffset->isImm() && SOffset->getImm() == 0));
#endif		#endif

assert(SIInstrInfo::isLegalMUBUFImmOffset(NewOffset) &&		assert(SIInstrInfo::isLegalMUBUFImmOffset(NewOffset) &&
"offset should be legal");		"offset should be legal");

FIOp->ChangeToRegister(BaseReg, false);		FIOp->ChangeToRegister(BaseReg, false);
OffsetOp->setImm(NewOffset);		OffsetOp->setImm(NewOffset);
}		}
▲ Show 20 Lines • Show All 1,174 Lines • ▼ Show 20 Lines	default: {

if (IsMUBUF) {		if (IsMUBUF) {
// Disable offen so we don't need a 0 vgpr base.		// Disable offen so we don't need a 0 vgpr base.
assert(static_cast<int>(FIOperandNum) ==		assert(static_cast<int>(FIOperandNum) ==
AMDGPU::getNamedOperandIdx(MI->getOpcode(),		AMDGPU::getNamedOperandIdx(MI->getOpcode(),
AMDGPU::OpName::vaddr));		AMDGPU::OpName::vaddr));

auto &SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);		auto &SOffset = TII->getNamedOperand(MI, AMDGPU::OpName::soffset);
assert((SOffset.isReg() &&		assert((SOffset.isImm() && SOffset.getImm() == 0));
SOffset.getReg() == MFI->getStackPtrOffsetReg()) \|\|
(SOffset.isImm() && SOffset.getImm() == 0));		if (FrameReg != AMDGPU::NoRegister)
if (SOffset.isReg()) {
if (FrameReg == AMDGPU::NoRegister) {
SOffset.ChangeToImmediate(0);
} else {
SOffset.setReg(FrameReg);
}
} else if (SOffset.isImm() && FrameReg != AMDGPU::NoRegister) {
SOffset.ChangeToRegister(FrameReg, false);		SOffset.ChangeToRegister(FrameReg, false);
}

int64_t Offset = FrameInfo.getObjectOffset(Index);		int64_t Offset = FrameInfo.getObjectOffset(Index);
int64_t OldImm		int64_t OldImm
= TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();		= TII->getNamedOperand(*MI, AMDGPU::OpName::offset)->getImm();
int64_t NewOffset = OldImm + Offset;		int64_t NewOffset = OldImm + Offset;

if (SIInstrInfo::isLegalMUBUFImmOffset(NewOffset) &&		if (SIInstrInfo::isLegalMUBUFImmOffset(NewOffset) &&
buildMUBUFOffsetLoadStore(ST, FrameInfo, MI, Index, NewOffset)) {		buildMUBUFOffsetLoadStore(ST, FrameInfo, MI, Index, NewOffset)) {
▲ Show 20 Lines • Show All 551 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

Show First 20 Lines • Show All 764 Lines • ▼ Show 20 Lines	machineFunctionInfo:
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32
stack:		stack:
- { id: 0, size: 4, alignment: 4 }		- { id: 0, size: 4, alignment: 4 }

body: \|		body: \|
bb.0:		bb.0:

; GFX6-LABEL: name: load_private_s32_from_fi		; GFX6-LABEL: name: load_private_s32_from_fi
; GFX6: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)		; GFX6: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]		; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
; GFX9-LABEL: name: load_private_s32_from_fi		; GFX9-LABEL: name: load_private_s32_from_fi
; GFX9: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)		; GFX9: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (load 4, addrspace 5)
; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]		; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
%0:vgpr(p5) = G_FRAME_INDEX %stack.0		%0:vgpr(p5) = G_FRAME_INDEX %stack.0
%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)		%1:vgpr(s32) = G_LOAD %0 :: (load 4, align 4, addrspace 5)
$vgpr0 = COPY %1		$vgpr0 = COPY %1

...		...

---		---

name: load_private_s32_from_1_fi_offset_4095		name: load_private_s32_from_1_fi_offset_4095
legalized: true		legalized: true
regBankSelected: true		regBankSelected: true
tracksRegLiveness: true		tracksRegLiveness: true
machineFunctionInfo:		machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3		scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
stackPtrOffsetReg: $sgpr32		stackPtrOffsetReg: $sgpr32
stack:		stack:
- { id: 0, size: 4096, alignment: 4 }		- { id: 0, size: 4096, alignment: 4 }

body: \|		body: \|
bb.0:		bb.0:

; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4095		; GFX6-LABEL: name: load_private_s32_from_1_fi_offset_4095
; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)		; GFX6: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]		; GFX6: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4095		; GFX9-LABEL: name: load_private_s32_from_1_fi_offset_4095
; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)		; GFX9: [[BUFFER_LOAD_UBYTE_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_UBYTE_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (load 1, addrspace 5)
; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]		; GFX9: $vgpr0 = COPY [[BUFFER_LOAD_UBYTE_OFFEN]]
%0:vgpr(p5) = G_FRAME_INDEX %stack.0		%0:vgpr(p5) = G_FRAME_INDEX %stack.0
%1:vgpr(s32) = G_CONSTANT i32 4095		%1:vgpr(s32) = G_CONSTANT i32 4095
%2:vgpr(p5) = G_PTR_ADD %0, %1		%2:vgpr(p5) = G_PTR_ADD %0, %1
%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)		%3:vgpr(s32) = G_LOAD %2 :: (load 1, align 1, addrspace 5)
$vgpr0 = COPY %3		$vgpr0 = COPY %3

...		...
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-private.mir

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	stack:			stack:
	- { id: 0, size: 4096, alignment: 4 }			- { id: 0, size: 4096, alignment: 4 }

	body: \|			body: \|
	bb.0:			bb.0:

	; GFX6-LABEL: name: function_store_private_s32_to_1_fi_offset_4095			; GFX6-LABEL: name: function_store_private_s32_to_1_fi_offset_4095
	; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX6: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX6: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	; GFX9-LABEL: name: function_store_private_s32_to_1_fi_offset_4095			; GFX9-LABEL: name: function_store_private_s32_to_1_fi_offset_4095
	; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX9: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)			; GFX9: BUFFER_STORE_BYTE_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 4095, 0, 0, 0, 0, 0, implicit $exec :: (store 1, addrspace 5)
	%0:vgpr(p5) = G_FRAME_INDEX %stack.0			%0:vgpr(p5) = G_FRAME_INDEX %stack.0
	%1:vgpr(s32) = G_CONSTANT i32 4095			%1:vgpr(s32) = G_CONSTANT i32 4095
	%2:vgpr(p5) = G_PTR_ADD %0, %1			%2:vgpr(p5) = G_PTR_ADD %0, %1
	%3:vgpr(s32) = G_CONSTANT i32 0			%3:vgpr(s32) = G_CONSTANT i32 0
	G_STORE %3, %2 :: (store 1, align 1, addrspace 5)			G_STORE %3, %2 :: (store 1, align 1, addrspace 5)

	...			...

	▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

	; GCN: amdpal.pipelines:			; GCN: amdpal.pipelines:
	; GCN-NEXT: - .registers:			; GCN-NEXT: - .registers:
	; SDAG-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf01ca{{$}}			; SDAG-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf01ca{{$}}
	; SDAG-NEXT: 0x2e13 (COMPUTE_PGM_RSRC2): 0x8001{{$}}			; SDAG-NEXT: 0x2e13 (COMPUTE_PGM_RSRC2): 0x8001{{$}}
	; GISEL-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf01cf{{$}}			; GISEL-NEXT: 0x2e12 (COMPUTE_PGM_RSRC1): 0xaf01ca{{$}}
	; GISEL-NEXT: 0x2e13 (COMPUTE_PGM_RSRC2): 0x8001{{$}}			; GISEL-NEXT: 0x2e13 (COMPUTE_PGM_RSRC2): 0x8001{{$}}
	; GCN-NEXT: .shader_functions:			; GCN-NEXT: .shader_functions:
	; GCN-NEXT: dynamic_stack:			; GCN-NEXT: dynamic_stack:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: dynamic_stack_loop:			; GCN-NEXT: dynamic_stack_loop:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: multiple_stack:			; GCN-NEXT: multiple_stack:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x24{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x24{{$}}
	; GCN-NEXT: no_stack:			; GCN-NEXT: no_stack:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: no_stack_call:			; GCN-NEXT: no_stack_call:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: no_stack_extern_call:			; GCN-NEXT: no_stack_extern_call:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: no_stack_extern_call_many_args:			; GCN-NEXT: no_stack_extern_call_many_args:
	; SDAG-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}			; SDAG-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}
	; GISEL-NEXT: .stack_frame_size_in_bytes: 0xd0{{$}}			; GISEL-NEXT: .stack_frame_size_in_bytes: 0x90{{$}}
	; GCN-NEXT: no_stack_indirect_call:			; GCN-NEXT: no_stack_indirect_call:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: simple_lds:			; GCN-NEXT: simple_lds:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0{{$}}
	; GCN-NEXT: simple_lds_recurse:			; GCN-NEXT: simple_lds_recurse:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x10{{$}}
	; GCN-NEXT: simple_stack:			; GCN-NEXT: simple_stack:
	; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}			; GCN-NEXT: .stack_frame_size_in_bytes: 0x14{{$}}
	Show All 9 Lines

llvm/test/CodeGen/AMDGPU/fold-fi-mubuf.mir

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr12_sgpr13_sgpr14_sgpr15		liveins: $sgpr12_sgpr13_sgpr14_sgpr15

; GCN-LABEL: name: kernel_no_fold_fi_non_stack_rsrc		; GCN-LABEL: name: kernel_no_fold_fi_non_stack_rsrc
; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15		; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15		; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]		; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
; GCN: SI_RETURN_TO_EPILOG $vgpr0		; GCN: SI_RETURN_TO_EPILOG $vgpr0
%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15		%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
$vgpr0 = COPY %3		$vgpr0 = COPY %3
SI_RETURN_TO_EPILOG $vgpr0		SI_RETURN_TO_EPILOG $vgpr0

...		...

---		---
name: kernel_no_fold_fi_non_stack_soffset		name: kernel_no_fold_fi_non_stack_soffset
tracksRegLiveness: true		tracksRegLiveness: true
Show All 39 Lines	machineFunctionInfo:
isEntryFunction: true		isEntryFunction: true
scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
body: \|		body: \|
bb.0:		bb.0:

; GCN-LABEL: name: kernel_fold_fi_mubuf		; GCN-LABEL: name: kernel_fold_fi_mubuf
; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]		; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
; GCN: S_ENDPGM 0, implicit $vgpr0		; GCN: S_ENDPGM 0, implicit $vgpr0
%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
$vgpr0 = COPY %2		$vgpr0 = COPY %2
S_ENDPGM 0, implicit $vgpr0		S_ENDPGM 0, implicit $vgpr0

...		...


# Functions have an unswizzled SP/FP relative to the wave offset		# Functions have an unswizzled SP/FP relative to the wave offset
---		---
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
body: \|		body: \|
bb.0:		bb.0:
liveins: $sgpr12_sgpr13_sgpr14_sgpr15		liveins: $sgpr12_sgpr13_sgpr14_sgpr15

; GCN-LABEL: name: function_no_fold_fi_non_stack_rsrc		; GCN-LABEL: name: function_no_fold_fi_non_stack_rsrc
; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15		; GCN: liveins: $sgpr12_sgpr13_sgpr14_sgpr15
; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15		; GCN: [[COPY:%[0-9]+]]:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: [[BUFFER_LOAD_DWORD_IDXEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN [[V_MOV_B32_e32_]], [[COPY]], 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]		; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_IDXEN]]
; GCN: SI_RETURN_TO_EPILOG $vgpr0		; GCN: SI_RETURN_TO_EPILOG $vgpr0
%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15		%0:sgpr_128 = COPY $sgpr12_sgpr13_sgpr14_sgpr15
%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		%2:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		%3:vgpr_32 = BUFFER_LOAD_DWORD_IDXEN %2, %0, 0, 0, 0, 0, 0, 0, 0, implicit $exec
$vgpr0 = COPY %3		$vgpr0 = COPY %3
SI_RETURN_TO_EPILOG $vgpr0		SI_RETURN_TO_EPILOG $vgpr0

...		...

---		---
name: function_no_fold_fi_non_stack_soffset		name: function_no_fold_fi_non_stack_soffset
tracksRegLiveness: true		tracksRegLiveness: true
frameInfo:		frameInfo:
maxAlignment: 4		maxAlignment: 4
localFrameSize: 4		localFrameSize: 4
stack:		stack:
- { id: 0, size: 4, alignment: 4, local-offset: 0 }		- { id: 0, size: 4, alignment: 4, local-offset: 0 }
machineFunctionInfo:		machineFunctionInfo:
isEntryFunction: false		isEntryFunction: false
scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
frameOffsetReg: '$sgpr32'		frameOffsetReg: '$sgpr32'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
body: \|		body: \|
bb.0:		bb.0:

; GCN-LABEL: name: function_no_fold_fi_non_stack_soffset		; GCN-LABEL: name: function_no_fold_fi_non_stack_soffset
; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]		; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
; GCN: S_ENDPGM 0, implicit $vgpr0		; GCN: S_ENDPGM 0, implicit $vgpr0
%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec		%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
$vgpr0 = COPY %2		$vgpr0 = COPY %2
Show All 14 Lines	machineFunctionInfo:
scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
frameOffsetReg: '$sgpr32'		frameOffsetReg: '$sgpr32'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
body: \|		body: \|
bb.0:		bb.0:

; GCN-LABEL: name: function_fold_fi_mubuf_wave_relative		; GCN-LABEL: name: function_fold_fi_mubuf_wave_relative
; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]		; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
; GCN: S_ENDPGM 0, implicit $vgpr0		; GCN: S_ENDPGM 0, implicit $vgpr0
%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec		%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
$vgpr0 = COPY %2		$vgpr0 = COPY %2
Show All 14 Lines	machineFunctionInfo:
scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'		scratchRSrcReg: '$sgpr0_sgpr1_sgpr2_sgpr3'
frameOffsetReg: '$sgpr32'		frameOffsetReg: '$sgpr32'
stackPtrOffsetReg: '$sgpr32'		stackPtrOffsetReg: '$sgpr32'
body: \|		body: \|
bb.0:		bb.0:

; GCN-LABEL: name: function_fold_fi_mubuf_stack_relative		; GCN-LABEL: name: function_fold_fi_mubuf_stack_relative
; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		; GCN: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 7, implicit $exec
; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: BUFFER_STORE_DWORD_OFFEN [[V_MOV_B32_e32_]], %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		; GCN: [[BUFFER_LOAD_DWORD_OFFEN:%[0-9]+]]:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %stack.0, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]		; GCN: $vgpr0 = COPY [[BUFFER_LOAD_DWORD_OFFEN]]
; GCN: S_ENDPGM 0, implicit $vgpr0		; GCN: S_ENDPGM 0, implicit $vgpr0
%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec		%0:vgpr_32 = V_MOV_B32_e32 %stack.0, implicit $exec
%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec		%1:vgpr_32 = V_MOV_B32_e32 7, implicit $exec

BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		BUFFER_STORE_DWORD_OFFEN %1:vgpr_32, %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, 0, 0, 0, implicit $exec		%2:vgpr_32 = BUFFER_LOAD_DWORD_OFFEN %0:vgpr_32, $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0, 0, 0, 0, 0, 0, implicit $exec
$vgpr0 = COPY %2		$vgpr0 = COPY %2
S_ENDPGM 0, implicit $vgpr0		S_ENDPGM 0, implicit $vgpr0

...		...

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

Show All 10 Lines
; LocalStackSlotAllocation pass at offset 4096.		; LocalStackSlotAllocation pass at offset 4096.
;		;
; The %load1 access to %gep.large.offset initially used the stack		; The %load1 access to %gep.large.offset initially used the stack
; pointer register and directly referenced the frame index. After		; pointer register and directly referenced the frame index. After
; LocalStackSlotAllocation, it would no longer refer to a frame index		; LocalStackSlotAllocation, it would no longer refer to a frame index
; so eliminateFrameIndex would not adjust the access to use the		; so eliminateFrameIndex would not adjust the access to use the
; correct FP offset.		; correct FP offset.

define amdgpu_kernel void @local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {		define amdgpu_kernel void @local_stack_offset_uses_sp(i64 addrspace(1)* %out) {
scott.linderUnsubmitted Not Done Reply Inline Actions I don't see the connection to the patch at hand, is this just unrelated cleanup or does this patch affect the dead code here? scott.linder: I don't see the connection to the patch at hand, is this just unrelated cleanup or does this…
cdevadasAuthorUnsubmitted Done Reply Inline Actions This is completely an unrelated cleanup. This test was added as part of D87472. I noticed the dead argument and thought I can remove it. Let me know if this should go as a separate NFC commit. cdevadas: This is completely an unrelated cleanup. This test was added as part of D87472. I noticed the…
; MUBUF-LABEL: local_stack_offset_uses_sp:		; MUBUF-LABEL: local_stack_offset_uses_sp:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0		; MUBUF-NEXT: s_load_dwordx2 s[4:5], s[4:5], 0x0
; MUBUF-NEXT: s_add_u32 flat_scratch_lo, s6, s9		; MUBUF-NEXT: s_add_u32 flat_scratch_lo, s6, s9
; MUBUF-NEXT: s_addc_u32 flat_scratch_hi, s7, 0		; MUBUF-NEXT: s_addc_u32 flat_scratch_hi, s7, 0
; MUBUF-NEXT: s_add_u32 s0, s0, s9		; MUBUF-NEXT: s_add_u32 s0, s0, s9
; MUBUF-NEXT: v_mov_b32_e32 v1, 0x3000		; MUBUF-NEXT: v_mov_b32_e32 v1, 0x3000
; MUBUF-NEXT: s_addc_u32 s1, s1, 0		; MUBUF-NEXT: s_addc_u32 s1, s1, 0
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	entry:
%gep.small.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 8		%gep.small.offset = getelementptr inbounds [1060 x i64], [1060 x i64] addrspace(5)* %local.area, i64 0, i64 8
%load0 = load volatile i64, i64 addrspace(5)* %gep.large.offset		%load0 = load volatile i64, i64 addrspace(5)* %gep.large.offset
%load1 = load volatile i64, i64 addrspace(5)* %gep.small.offset		%load1 = load volatile i64, i64 addrspace(5)* %gep.small.offset
%add0 = add i64 %load0, %load1		%add0 = add i64 %load0, %load1
store volatile i64 %add0, i64 addrspace(1)* %out		store volatile i64 %add0, i64 addrspace(1)* %out
ret void		ret void
}		}

define void @func_local_stack_offset_uses_sp(i64 addrspace(1)* %out, i8 addrspace(1)* %in) {		define void @func_local_stack_offset_uses_sp(i64 addrspace(1)* %out) {
; MUBUF-LABEL: func_local_stack_offset_uses_sp:		; MUBUF-LABEL: func_local_stack_offset_uses_sp:
; MUBUF: ; %bb.0: ; %entry		; MUBUF: ; %bb.0: ; %entry
; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)		; MUBUF-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; MUBUF-NEXT: s_add_u32 s4, s32, 0x7ffc0		; MUBUF-NEXT: s_add_u32 s4, s32, 0x7ffc0
; MUBUF-NEXT: s_mov_b32 s5, s33		; MUBUF-NEXT: s_mov_b32 s5, s33
; MUBUF-NEXT: s_and_b32 s33, s4, 0xfff80000		; MUBUF-NEXT: s_and_b32 s33, s4, 0xfff80000
; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33		; MUBUF-NEXT: v_lshrrev_b32_e64 v3, 6, s33
; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3		; MUBUF-NEXT: v_add_u32_e32 v3, 0x1000, v3
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=MUBUF %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs \| FileCheck -check-prefix=MUBUF %s
	; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -amdgpu-enable-flat-scratch -verify-machineinstrs \| FileCheck -check-prefix=FLATSCR %s			; RUN: llc < %s -march=amdgcn -mcpu=gfx1010 -amdgpu-enable-flat-scratch -verify-machineinstrs \| FileCheck -check-prefix=FLATSCR %s

	; FIXME: The MUBUF loads in this test output are incorrect, their SOffset			; During instruction selection, we use immediate const zero for soffset in
	; should use the frame offset register, not the ABI stack pointer register. We			; MUBUF stack accesses and let eliminateFrameIndex to fix up this field to use
	; rely on the frame index argument of MUBUF stack accesses to survive until PEI			; the correct frame register whenever required.
	; so we can fix up the SOffset to use the correct frame register in
	; eliminateFrameIndex. Some things like LocalStackSlotAllocation can lift the
	; frame index up into something (e.g. `v_add_nc_u32`) that we cannot fold back
	; into the MUBUF instruction, and so we end up emitting an incorrect offset.
	; Fixing this may involve adding stack access pseudos so that we don't have to
	; speculatively refer to the ABI stack pointer register at all.

	; An assert was hit when frame offset register was used to address FrameIndex.
	define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {			define amdgpu_kernel void @kernel_background_evaluate(float addrspace(5)* %kg, <4 x i32> addrspace(1)* %input, <4 x float> addrspace(1)* %output, i32 %i) {
	; MUBUF-LABEL: kernel_background_evaluate:			; MUBUF-LABEL: kernel_background_evaluate:
	; MUBUF: ; %bb.0: ; %entry			; MUBUF: ; %bb.0: ; %entry
	; MUBUF-NEXT: s_load_dword s0, s[0:1], 0x24			; MUBUF-NEXT: s_load_dword s0, s[0:1], 0x24
	; MUBUF-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0			; MUBUF-NEXT: s_mov_b32 s36, SCRATCH_RSRC_DWORD0
	; MUBUF-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1			; MUBUF-NEXT: s_mov_b32 s37, SCRATCH_RSRC_DWORD1
	; MUBUF-NEXT: s_mov_b32 s38, -1			; MUBUF-NEXT: s_mov_b32 s38, -1
	; MUBUF-NEXT: s_mov_b32 s39, 0x31c16000			; MUBUF-NEXT: s_mov_b32 s39, 0x31c16000
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 317931

llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-private.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-store-private.mir

llvm/test/CodeGen/AMDGPU/amdpal-callable.ll

llvm/test/CodeGen/AMDGPU/fold-fi-mubuf.mir

llvm/test/CodeGen/AMDGPU/local-stack-alloc-block-sp-reference.ll

llvm/test/CodeGen/AMDGPU/stack-pointer-offset-relative-frameindex.ll

[AMDGPU] Fix the inconsistency in soffset for MUBUF stack accesses.
ClosedPublic