Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rGf52c3cf27251: AMDGPU: fix local stack slot allocation bugs
rL275108: AMDGPU: fix local stack slot allocation bugs

Summary

The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602

Diff Detail

Repository: rL LLVM

Event Timeline

nhaehnle updated this revision to Diff 61349.Jun 21 2016, 3:40 AM

nhaehnle retitled this revision from to AMDGPU: fix local stack slot allocation bugs.

nhaehnle updated this object.

nhaehnle added reviewers: arsenm, • tstellarAMD.

nhaehnle added a subscriber: llvm-commits.

Herald added subscribers: kzhuravl, arsenm. · View Herald TranscriptJun 21 2016, 3:40 AM

arsenm added inline comments.Jun 22 2016, 11:00 AM

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
111–112 ↗	(On Diff #61349)	We shouldn't need to always enable this if a new vreg is created for the constant
lib/Target/AMDGPU/SIRegisterInfo.cpp
286–290 ↗	(On Diff #61349)	This is all pre-RA, so there's no issue creating new virtual registers. A new vreg can be created if the immediate isn't a valid inline immediate for the constant, and then it can be folded/shrunk later if needed.

nhaehnle added inline comments.Jun 22 2016, 12:21 PM

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
111–112 ↗	(On Diff #61349)	Right, but is there a way to tell at this stage? This is the MachineFunctionInfo constructor...
lib/Target/AMDGPU/SIRegisterInfo.cpp
286–290 ↗	(On Diff #61349)	Sure, the problem wasn't creating the virtual register, the problem was that the _e64 encoding only allows a very limited set of immediates. That is, the testcase in the diff fails with a "illegal immediate operand" error. If some other pass is supposed to pull the immediate out into a V_MOV before the instruction verifier runs, then that wasn't successful...

arsenm added inline comments.Jun 24 2016, 12:32 AM

lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
111–112 ↗	(On Diff #61349)	This logic is just if private access is going to be needed. LocalStackSlotAllocation is purely an optimization and doesn't change that. You can create new vregs at that point so you don't need to worry about the special reserved spill registers
lib/Target/AMDGPU/SIRegisterInfo.cpp
286–290 ↗	(On Diff #61349)	Yes, but the solution isn't to change the instruction encoding here. You don't need to pick the encoding because the instruction shrinking pass later will reduce it. Whenever possible we pick the e64 form and optimize it down later. You can either just always emit the mov of the constant to a new virtual register with the e64 form (which hopefully SIFoldOperands will take care of when legal), or directly check if it's a legal offset and change the emitted instruction

Change encoding back to ADD_I32_e64, use temporary register for immediates.

The other issue isn't that simple, I believe, and the problem actually doesn't
have anything to do with local stack slot allocation per se. It persists
even when I disable that pass.

The issue is that if
(1) VGPR spilling isn't enabled and
(2) there aren't any stack objects initially but
(3) during instruction selection, new stack objects are created (in order to
lower extractelement instructions)
this messes with some assumptions around the existence of stack objects.
Virtual registers or not, in this case we do need to reserve various SGPRs
after the fact because they're needed for frame finalization. But reserved
registers are frozen during instruction selection.

Basically, we would need a hook either when a new stack object is created
during instruction selection or somewhere just before the end of instruction
selection (but again not after, because we have to reserve SGPRs).

To be honest, this is becoming a bit of a waste of time. In Mesa, we're
enabling VGPR spilling all the time anyway. Why is the option to disable
that even there? I'm a bit tempted to just enable VGPR spilling in the test
case and commit only the SIRegisterInfo part.

Fix the bug affecting Mesa while keeping an XFAIL test for the case without
VGPR spilling enabled.

curan added a subscriber: curan.Jul 10 2016, 5:30 AM

@nhaehnle: you can have my Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> for Diff 62661 as well.

ping

We should probably just remove the VGPR spilling option

arsenm added inline comments.Jul 11 2016, 1:54 PM

test/CodeGen/AMDGPU/local-stack-slot-bug.ll
9 ↗	(On Diff #62661)	This should check a few of the stack instructions for the add to the pointer operand

Added some CHECK lines.

Removing the vgpr-spilling option is best left to a separate commit.

LGTM

This revision is now accepted and ready to land.Jul 11 2016, 2:42 PM

Closed by commit rL275108: AMDGPU: fix local stack slot allocation bugs (authored by nha). · Explain WhyJul 11 2016, 2:51 PM

This revision was automatically updated to reflect the committed changes.

chapuni added a subscriber: chapuni.Jul 11 2016, 7:39 PM

chapuni added inline comments.

llvm/trunk/test/CodeGen/AMDGPU/selected-stack-object.ll
1	You shouldn't XFAIL if it expected assertion failure. With -Asserts, the behavior is unknown. It might pass apparently, or it might execute infinite loop. Tweaked in r275144.

Diff 63584

llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	void SIRegisterInfo::materializeFrameBaseRegister(MachineBasicBlock *MBB,
if (Offset == 0) {		if (Offset == 0) {
BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::V_MOV_B32_e32), BaseReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::V_MOV_B32_e32), BaseReg)
.addFrameIndex(FrameIdx);		.addFrameIndex(FrameIdx);
return;		return;
}		}

MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
unsigned UnusedCarry = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);		unsigned UnusedCarry = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);
		unsigned OffsetReg = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);

		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
		.addImm(Offset);
BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::V_ADD_I32_e64), BaseReg)		BuildMI(*MBB, Ins, DL, TII->get(AMDGPU::V_ADD_I32_e64), BaseReg)
.addReg(UnusedCarry, RegState::Define \| RegState::Dead)		.addReg(UnusedCarry, RegState::Define \| RegState::Dead)
.addImm(Offset)		.addReg(OffsetReg, RegState::Kill)
.addFrameIndex(FrameIdx);		.addFrameIndex(FrameIdx);
}		}

void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, unsigned BaseReg,		void SIRegisterInfo::resolveFrameIndex(MachineInstr &MI, unsigned BaseReg,
int64_t Offset) const {		int64_t Offset) const {

MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineFunction *MF = MBB->getParent();		MachineFunction *MF = MBB->getParent();
Show All 30 Lines	#endif
// The offset is not legal, so we must insert an add of the offset.		// The offset is not legal, so we must insert an add of the offset.
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
unsigned NewReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);		unsigned NewReg = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
DebugLoc DL = MI.getDebugLoc();		DebugLoc DL = MI.getDebugLoc();

assert(Offset != 0 && "Non-zero offset expected");		assert(Offset != 0 && "Non-zero offset expected");

unsigned UnusedCarry = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);		unsigned UnusedCarry = MRI.createVirtualRegister(&AMDGPU::SReg_64RegClass);
		unsigned OffsetReg = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);

// In the case the instruction already had an immediate offset, here only		// In the case the instruction already had an immediate offset, here only
// the requested new offset is added because we are leaving the original		// the requested new offset is added because we are leaving the original
// immediate in place.		// immediate in place.
		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::S_MOV_B32), OffsetReg)
		.addImm(Offset);
BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ADD_I32_e64), NewReg)		BuildMI(*MBB, MI, DL, TII->get(AMDGPU::V_ADD_I32_e64), NewReg)
.addReg(UnusedCarry, RegState::Define \| RegState::Dead)		.addReg(UnusedCarry, RegState::Define \| RegState::Dead)
.addImm(Offset)		.addReg(OffsetReg, RegState::Kill)
.addReg(BaseReg);		.addReg(BaseReg);

FIOp->ChangeToRegister(NewReg, false);		FIOp->ChangeToRegister(NewReg, false);
}		}

bool SIRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,		bool SIRegisterInfo::isFrameOffsetLegal(const MachineInstr *MI,
unsigned BaseReg,		unsigned BaseReg,
int64_t Offset) const {		int64_t Offset) const {
▲ Show 20 Lines • Show All 649 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/local-stack-slot-bug.ll

				; RUN: llc -march=amdgcn -mcpu=verde -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+vgpr-spilling -verify-machineinstrs < %s \| FileCheck %s

				; This used to fail due to a v_add_i32 instruction with an illegal immediate
				; operand that was created during Local Stack Slot Allocation. Test case derived
				; from https://bugs.freedesktop.org/show_bug.cgi?id=96602
				;
				; CHECK-LABEL: {{^}}main:
				; CHECK: v_lshlrev_b32_e32 [[BYTES:v[0-9]+]], 2, v0
				; CHECK: v_mov_b32_e32 [[HI_CONST:v[0-9]+]], 0x200
				; CHECK: v_mov_b32_e32 [[LO_CONST:v[0-9]+]], 0
				; CHECK: v_add_i32_e32 [[HI_OFF:v[0-9]+]], vcc, [[BYTES]], [[HI_CONST]]
				; CHECK: v_add_i32_e32 [[LO_OFF:v[0-9]+]], vcc, [[BYTES]], [[LO_CONST]]
				; CHECK: buffer_load_dword {{v[0-9]+}}, [[LO_OFF]], {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen
				; CHECK: buffer_load_dword {{v[0-9]+}}, [[HI_OFF]], {{s\[[0-9]+:[0-9]+\]}}, {{s[0-9]+}} offen
				define amdgpu_ps float @main(i32 %idx) {
				main_body:
				%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
				%v2 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0x3FE7A693C0000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFEBE5DC40000000, float 0x3FEBE5DC40000000, float 0xBFEC71C720000000, float 0x3FEC71C6E0000000, float 0xBFEBE5DC60000000, float 0x3FEBE5DC40000000, float 0xBFEA477C20000000, float 0x3FEA477C20000000, float 0xBFE7A693C0000000, float 0x3FE7A69380000000, float 0xBFE41CFEA0000000, float 0xBFDF9B13E0000000, float 0xBFD5C53B80000000, float 0xBFC6326AC0000000, float 0x3EE0789320000000, float 0x3FC6326AC0000000, float 0x3FD5C53B80000000, float 0x3FDF9B13E0000000, float 0x3FE41CFE80000000>, i32 %idx
				%r = fadd float %v1, %v2
				ret float %r
				}

llvm/trunk/test/CodeGen/AMDGPU/selected-stack-object.ll

				; XFAIL: *
				chapuniUnsubmitted Not Done Reply Inline Actions You shouldn't XFAIL if it expected assertion failure. With -Asserts, the behavior is unknown. It might pass apparently, or it might execute infinite loop. Tweaked in r275144. chapuni: You shouldn't XFAIL if it expected assertion failure. With -Asserts, the behavior is unknown.
				; RUN: llc -march=amdgcn -mcpu=verde -verify-machineinstrs < %s \| FileCheck %s

				; See also local-stack-slot-bug.ll
				; This fails because a stack object is created during instruction selection.

				; CHECK-LABEL: {{^}}main:
				define amdgpu_ps float @main(i32 %idx) {
				main_body:
				%v1 = extractelement <81 x float> <float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float undef, float 0x3FE41CFEA0000000, float 0xBFE7A693C0000000, float 0xBFEA477C60000000, float 0xBFEBE5DC60000000, float 0xBFEC71C720000000, float 0xBFEBE5DC60000000, float 0xBFEA477C60000000, float 0xBFE7A693C0000000, float 0xBFE41CFEA0000000, float 0x3FDF9B13E0000000, float 0x3FDF9B1380000000, float 0x3FD5C53B80000000, float 0x3FD5C53B00000000, float 0x3FC6326AC0000000, float 0x3FC63269E0000000, float 0xBEE05CEB00000000, float 0xBEE086A320000000, float 0xBFC63269E0000000, float 0xBFC6326AC0000000, float 0xBFD5C53B80000000, float 0xBFD5C53B80000000, float 0xBFDF9B13E0000000, float 0xBFDF9B1460000000, float 0xBFE41CFE80000000, float 0x3FE7A693C0000000, float 0x3FEA477C20000000, float 0x3FEBE5DC40000000, float 0x3FEC71C6E0000000, float 0x3FEBE5DC40000000, float 0x3FEA477C20000000, float 0x3FE7A693C0000000, float 0xBFE41CFE80000000>, i32 %idx
				ret float %v1
				}

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: fix local stack slot allocation bugs
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63584

llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/trunk/test/CodeGen/AMDGPU/local-stack-slot-bug.ll

llvm/trunk/test/CodeGen/AMDGPU/selected-stack-object.ll

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: fix local stack slot allocation bugsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 63584

llvm/trunk/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/trunk/test/CodeGen/AMDGPU/local-stack-slot-bug.ll

llvm/trunk/test/CodeGen/AMDGPU/selected-stack-object.ll

AMDGPU: fix local stack slot allocation bugs
ClosedPublic