This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: fix local stack slot allocation bugs
ClosedPublic

Authored by nhaehnle on Jun 21 2016, 3:40 AM.

Details

Summary

The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 61349.Jun 21 2016, 3:40 AM
nhaehnle retitled this revision from to AMDGPU: fix local stack slot allocation bugs.
nhaehnle updated this object.
nhaehnle added reviewers: arsenm, tstellarAMD.
nhaehnle added a subscriber: llvm-commits.
arsenm added inline comments.Jun 22 2016, 11:00 AM
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
111–112

We shouldn't need to always enable this if a new vreg is created for the constant

lib/Target/AMDGPU/SIRegisterInfo.cpp
286–290

This is all pre-RA, so there's no issue creating new virtual registers. A new vreg can be created if the immediate isn't a valid inline immediate for the constant, and then it can be folded/shrunk later if needed.

nhaehnle added inline comments.Jun 22 2016, 12:21 PM
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
111–112

Right, but is there a way to tell at this stage? This is the MachineFunctionInfo constructor...

lib/Target/AMDGPU/SIRegisterInfo.cpp
286–290

Sure, the problem wasn't creating the virtual register, the problem was that the _e64 encoding only allows a very limited set of immediates. That is, the testcase in the diff fails with a "illegal immediate operand" error.

If some other pass is supposed to pull the immediate out into a V_MOV before the instruction verifier runs, then that wasn't successful...

arsenm added inline comments.Jun 24 2016, 12:32 AM
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
111–112

This logic is just if private access is going to be needed. LocalStackSlotAllocation is purely an optimization and doesn't change that. You can create new vregs at that point so you don't need to worry about the special reserved spill registers

lib/Target/AMDGPU/SIRegisterInfo.cpp
286–290

Yes, but the solution isn't to change the instruction encoding here. You don't need to pick the encoding because the instruction shrinking pass later will reduce it. Whenever possible we pick the e64 form and optimize it down later. You can either just always emit the mov of the constant to a new virtual register with the e64 form (which hopefully SIFoldOperands will take care of when legal), or directly check if it's a legal offset and change the emitted instruction

nhaehnle updated this revision to Diff 62660.Jul 4 2016, 3:19 AM

Change encoding back to ADD_I32_e64, use temporary register for immediates.

The other issue isn't that simple, I believe, and the problem actually doesn't
have anything to do with local stack slot allocation per se. It persists
even when I disable that pass.

The issue is that if
(1) VGPR spilling isn't enabled and
(2) there aren't any stack objects initially but
(3) during instruction selection, new stack objects are created (in order to
lower extractelement instructions)
this messes with some assumptions around the existence of stack objects.
Virtual registers or not, in this case we do need to reserve various SGPRs
after the fact because they're needed for frame finalization. But reserved
registers are frozen during instruction selection.

Basically, we would need a hook either when a new stack object is created
during instruction selection or somewhere just before the end of instruction
selection (but again not after, because we have to reserve SGPRs).

To be honest, this is becoming a bit of a waste of time. In Mesa, we're
enabling VGPR spilling all the time anyway. Why is the option to disable
that even there? I'm a bit tempted to just enable VGPR spilling in the test
case and commit only the SIRegisterInfo part.

nhaehnle updated this revision to Diff 62661.Jul 4 2016, 3:28 AM

Fix the bug affecting Mesa while keeping an XFAIL test for the case without
VGPR spilling enabled.

curan added a subscriber: curan.Jul 10 2016, 5:30 AM
curan added a comment.Jul 10 2016, 7:28 AM

@nhaehnle: you can have my Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org> for Diff 62661 as well.

arsenm edited edge metadata.Jul 11 2016, 12:37 PM

We should probably just remove the VGPR spilling option

arsenm added inline comments.Jul 11 2016, 1:54 PM
test/CodeGen/AMDGPU/local-stack-slot-bug.ll
10

This should check a few of the stack instructions for the add to the pointer operand

nhaehnle updated this revision to Diff 63577.Jul 11 2016, 2:40 PM
nhaehnle edited edge metadata.

Added some CHECK lines.

Removing the vgpr-spilling option is best left to a separate commit.

arsenm accepted this revision.Jul 11 2016, 2:42 PM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jul 11 2016, 2:42 PM
This revision was automatically updated to reflect the committed changes.
chapuni added inline comments.
llvm/trunk/test/CodeGen/AMDGPU/selected-stack-object.ll
1 ↗(On Diff #63584)

You shouldn't XFAIL if it expected assertion failure. With -Asserts, the behavior is unknown. It might pass apparently, or it might execute infinite loop.

Tweaked in r275144.