This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix-up cases where writelane has 2 SGPR operands
ClosedPublic

Authored by dstuttard on Sep 11 2018, 7:40 AM.

Details

Summary

Even though writelane doesn't have the same constraints as other valu
instructions it still can't violate the >1 SGPR operand constraint

Due to later register propagation (e.g. fixing up vgpr operands via
readfirstlane) changing writelane to only have a single SGPR is tricky.

This implementation puts a new check after SIFixSGPRCopies that prevents
multiple SGPRs being used in any writelane instructions.

The algorithm used is to check for trivial copy prop of suitable constants into
one of the SGPR operands and perform that if possible. If this isn't possible
put an explicit copy of Src1 SGPR into M0 and use that instead (this is
allowable for writelane as the constraint is for SGPR read-port and not
constant-bus access).

Diff Detail

Event Timeline

dstuttard created this revision.Sep 11 2018, 7:40 AM
rampitec added inline comments.Sep 11 2018, 12:39 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
456

Why cannot you use loop in the runOnMachineFunction()?

481

You do not need vector here.

for (auto MO : {&Src0, &Src1})
dstuttard added inline comments.Sep 11 2018, 2:01 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
456

I think we need all the sgpr to vgpr moves to have been completed before applying this fix since in some cases it might not be necessary.
I guess there's an argument for this to be done in a separate pass, or a later pass, then it could go into runOnMachineFunction - any suggestions?

481

Good point, I'll make this one.

rampitec added inline comments.Sep 11 2018, 2:15 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
456

Given the semantics of writelane it is hard to believe its sources will be moved to VALU. Also if that is going to happen in general, it should have already happened by the time iterator would reach the instruction.

dstuttard updated this revision to Diff 165104.Sep 12 2018, 9:37 AM

Made suggested changes

dstuttard marked 5 inline comments as done.Sep 12 2018, 9:38 AM
dstuttard added inline comments.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
456

Yes - I think you're right. I've changed it part of the main runOnMachineFunction.

rampitec added inline comments.Sep 12 2018, 12:01 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
750

missing break.

dstuttard updated this revision to Diff 165215.Sep 13 2018, 1:32 AM
dstuttard marked 2 inline comments as done.

Added missing break

dstuttard added inline comments.Sep 13 2018, 1:34 AM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
750

Doh

This revision is now accepted and ready to land.Sep 13 2018, 1:37 AM
arsenm requested changes to this revision.Sep 13 2018, 1:54 AM
arsenm added inline comments.
lib/Target/AMDGPU/Utils/AMDGPUMCUtils.cpp
23–25 ↗(On Diff #165215)

This seems more like something for TII

This revision now requires changes to proceed.Sep 13 2018, 1:54 AM

Should have a special check in the verifier

dstuttard updated this revision to Diff 165274.Sep 13 2018, 6:51 AM

Moved foldToImm into SIInstrInfo as suggested
Implemented check in verifyInstruction and checked that it worked when the fix was removed

dstuttard marked 3 inline comments as done.Sep 13 2018, 6:52 AM

I don't actually understand why this code is where it is? Why is SIFixSGPRCopies doing this? To clarify is this just an optimization? My initial reaction was that it was a fix, but looking at it again it seems like an optimization to me

lib/Target/AMDGPU/SIFixSGPRCopies.cpp
744–746

This should be a COPY?

dstuttard updated this revision to Diff 166681.Sep 24 2018, 7:26 AM

Updated a mov to a copy as per review comment

I don't actually understand why this code is where it is? Why is SIFixSGPRCopies doing this? To clarify is this just an optimization? My initial reaction was that it was a fix, but looking at it again it seems like an optimization to me

It isn't an optimization - it's a bug. We encountered this in graphics shaders - hence the requirement for the fix.

SIFixSGPRCopies does feel like a strange place to put this fix. It has to be somewhere late enough to catch the issue since it's a transformation that happens after isel that causes the problem in this case. Have you got any other suggestions as to where it could go instead? FixSGPR copies isn't perfect, but doesn't seem too unreasonable as it is an SGPR related fix-up.

nhaehnle accepted this revision.Oct 8 2018, 8:41 AM

LGTM, to be honest. Matt?

@arsenm Matt - good to go?

arsenm accepted this revision.Jun 14 2019, 7:20 AM

LGTM

This revision is now accepted and ready to land.Jun 14 2019, 7:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 7:20 AM
dstuttard updated this revision to Diff 220667.Sep 18 2019, 7:44 AM

GFX10 support has gone in since this change was approved - gfx10 allows 2 sgprs
on the constant bus. Implementation updated to allow for this.

Also updated a MIR test that had an incorrect WRITELANE instruction with 2 SGPR
accesses (one of which was VCC_LO).

arsenm added inline comments.Sep 18 2019, 8:12 AM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
715–716

I think there's a missing word here; " as the lane selector and doesn't"

717

Missing period

733–735

Is this really necessary? Will SIFoldOperands not get this for some reason?

lib/Target/AMDGPU/SIInstrInfo.cpp
2993

Extra empty line

2995

Should use Register

5353

I would expect this to handle only a single vreg def

Made some changes based on review

dstuttard marked 9 inline comments as done.Sep 19 2019, 10:17 AM
dstuttard added inline comments.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
715–716

I think this might be a difference in language - so I've changed it to make it less confusing

733–735

Yes, SIFoldOperands can replace the operand with the immediate, but the problem is that this code is forced to change one of the SGPR operands to M0 if it doesn't know if the other can be immediate.
If we just detect that an immediate will be folded, but don't do it, then the verification step will fail.

So, the simplest thing to do is detect trivial cases and change the operand to an immediate.
In some cases it is possible that it will not detect an immediate, replace the second SGPR with M0 and then the SIFoldOperands will replace the other SGPR operand with an immediate. This case is rare and functionally correct, just slightly less efficient.

I've reverted moving the foldToImm function from the SDWA peephole pass and implemented a simpler detection of immediates.

lib/Target/AMDGPU/SIInstrInfo.cpp
5353

I reverted moving the foldToImm function to SIInstrInfo - it was simpler to write a new one that attempted to do less.

dstuttard marked 3 inline comments as done.Sep 20 2019, 1:29 AM
arsenm added inline comments.Sep 20 2019, 9:49 AM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
737

No auto

738–739

The hasOneDef check is suspicious. You should be able to check getVRegDef and just a null check. This is missing a guard for virtual registers

dstuttard updated this revision to Diff 221267.Sep 23 2019, 1:58 AM

Updates in light of review comments

dstuttard marked 2 inline comments as done.Sep 23 2019, 1:59 AM

Made suggested changes - is this more in line with what you were thinking Matt?

Matt - are you now happy for me to submit this? (It is tagged as approved, but since you've made some extra comments I'm waiting for you to agree with the latest changes).

I think this should be good to go.

arsenm accepted this revision.Oct 8 2019, 10:48 AM

LGTM

arsenm added inline comments.Oct 8 2019, 12:41 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
732–733

I'm working on a patch to stop reserving m0; I suspect this will avoid the need for the special case propagation

This revision was automatically updated to reflect the committed changes.