This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix-up cases where writelane has 2 SGPR operands
ClosedPublic

Authored by dstuttard on Sep 11 2018, 7:40 AM.

Details

Summary

Even though writelane doesn't have the same constraints as other valu
instructions it still can't violate the >1 SGPR operand constraint

Due to later register propagation (e.g. fixing up vgpr operands via
readfirstlane) changing writelane to only have a single SGPR is tricky.

This implementation puts a new check after SIFixSGPRCopies that prevents
multiple SGPRs being used in any writelane instructions.

The algorithm used is to check for trivial copy prop of suitable constants into
one of the SGPR operands and perform that if possible. If this isn't possible
put an explicit copy of Src1 SGPR into M0 and use that instead (this is
allowable for writelane as the constraint is for SGPR read-port and not
constant-bus access).

Diff Detail

Event Timeline

dstuttard created this revision.Sep 11 2018, 7:40 AM
rampitec added inline comments.Sep 11 2018, 12:39 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
457 ↗(On Diff #164878)

Why cannot you use loop in the runOnMachineFunction()?

482 ↗(On Diff #164878)

You do not need vector here.

for (auto MO : {&Src0, &Src1})
dstuttard added inline comments.Sep 11 2018, 2:01 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
457 ↗(On Diff #164878)

I think we need all the sgpr to vgpr moves to have been completed before applying this fix since in some cases it might not be necessary.
I guess there's an argument for this to be done in a separate pass, or a later pass, then it could go into runOnMachineFunction - any suggestions?

482 ↗(On Diff #164878)

Good point, I'll make this one.

rampitec added inline comments.Sep 11 2018, 2:15 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
457 ↗(On Diff #164878)

Given the semantics of writelane it is hard to believe its sources will be moved to VALU. Also if that is going to happen in general, it should have already happened by the time iterator would reach the instruction.

dstuttard updated this revision to Diff 165104.Sep 12 2018, 9:37 AM

Made suggested changes

dstuttard marked 5 inline comments as done.Sep 12 2018, 9:38 AM
dstuttard added inline comments.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
457 ↗(On Diff #164878)

Yes - I think you're right. I've changed it part of the main runOnMachineFunction.

rampitec added inline comments.Sep 12 2018, 12:01 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
751 ↗(On Diff #165104)

missing break.

dstuttard updated this revision to Diff 165215.Sep 13 2018, 1:32 AM
dstuttard marked 2 inline comments as done.

Added missing break

dstuttard added inline comments.Sep 13 2018, 1:34 AM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
751 ↗(On Diff #165104)

Doh

This revision is now accepted and ready to land.Sep 13 2018, 1:37 AM
arsenm requested changes to this revision.Sep 13 2018, 1:54 AM
arsenm added inline comments.
lib/Target/AMDGPU/Utils/AMDGPUMCUtils.cpp
23–25 ↗(On Diff #165215)

This seems more like something for TII

This revision now requires changes to proceed.Sep 13 2018, 1:54 AM

Should have a special check in the verifier

dstuttard updated this revision to Diff 165274.Sep 13 2018, 6:51 AM

Moved foldToImm into SIInstrInfo as suggested
Implemented check in verifyInstruction and checked that it worked when the fix was removed

dstuttard marked 3 inline comments as done.Sep 13 2018, 6:52 AM

I don't actually understand why this code is where it is? Why is SIFixSGPRCopies doing this? To clarify is this just an optimization? My initial reaction was that it was a fix, but looking at it again it seems like an optimization to me

lib/Target/AMDGPU/SIFixSGPRCopies.cpp
744–746 ↗(On Diff #165274)

This should be a COPY?

dstuttard updated this revision to Diff 166681.Sep 24 2018, 7:26 AM

Updated a mov to a copy as per review comment

I don't actually understand why this code is where it is? Why is SIFixSGPRCopies doing this? To clarify is this just an optimization? My initial reaction was that it was a fix, but looking at it again it seems like an optimization to me

It isn't an optimization - it's a bug. We encountered this in graphics shaders - hence the requirement for the fix.

SIFixSGPRCopies does feel like a strange place to put this fix. It has to be somewhere late enough to catch the issue since it's a transformation that happens after isel that causes the problem in this case. Have you got any other suggestions as to where it could go instead? FixSGPR copies isn't perfect, but doesn't seem too unreasonable as it is an SGPR related fix-up.

nhaehnle accepted this revision.Oct 8 2018, 8:41 AM

LGTM, to be honest. Matt?

@arsenm Matt - good to go?

arsenm accepted this revision.Jun 14 2019, 7:20 AM

LGTM

This revision is now accepted and ready to land.Jun 14 2019, 7:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2019, 7:20 AM
dstuttard updated this revision to Diff 220667.Sep 18 2019, 7:44 AM

GFX10 support has gone in since this change was approved - gfx10 allows 2 sgprs
on the constant bus. Implementation updated to allow for this.

Also updated a MIR test that had an incorrect WRITELANE instruction with 2 SGPR
accesses (one of which was VCC_LO).

arsenm added inline comments.Sep 18 2019, 8:12 AM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
798–799 ↗(On Diff #220667)

I think there's a missing word here; " as the lane selector and doesn't"

800 ↗(On Diff #220667)

Missing period

816–818 ↗(On Diff #220667)

Is this really necessary? Will SIFoldOperands not get this for some reason?

lib/Target/AMDGPU/SIInstrInfo.cpp
3438 ↗(On Diff #220667)

Extra empty line

3440 ↗(On Diff #220667)

Should use Register

6494 ↗(On Diff #220667)

I would expect this to handle only a single vreg def

Made some changes based on review

dstuttard marked 9 inline comments as done.Sep 19 2019, 10:17 AM
dstuttard added inline comments.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
798–799 ↗(On Diff #220667)

I think this might be a difference in language - so I've changed it to make it less confusing

816–818 ↗(On Diff #220667)

Yes, SIFoldOperands can replace the operand with the immediate, but the problem is that this code is forced to change one of the SGPR operands to M0 if it doesn't know if the other can be immediate.
If we just detect that an immediate will be folded, but don't do it, then the verification step will fail.

So, the simplest thing to do is detect trivial cases and change the operand to an immediate.
In some cases it is possible that it will not detect an immediate, replace the second SGPR with M0 and then the SIFoldOperands will replace the other SGPR operand with an immediate. This case is rare and functionally correct, just slightly less efficient.

I've reverted moving the foldToImm function from the SDWA peephole pass and implemented a simpler detection of immediates.

lib/Target/AMDGPU/SIInstrInfo.cpp
6494 ↗(On Diff #220667)

I reverted moving the foldToImm function to SIInstrInfo - it was simpler to write a new one that attempted to do less.

dstuttard marked 3 inline comments as done.Sep 20 2019, 1:29 AM
arsenm added inline comments.Sep 20 2019, 9:49 AM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
820 ↗(On Diff #220882)

No auto

821–822 ↗(On Diff #220882)

The hasOneDef check is suspicious. You should be able to check getVRegDef and just a null check. This is missing a guard for virtual registers

dstuttard updated this revision to Diff 221267.Sep 23 2019, 1:58 AM

Updates in light of review comments

dstuttard marked 2 inline comments as done.Sep 23 2019, 1:59 AM

Made suggested changes - is this more in line with what you were thinking Matt?

Matt - are you now happy for me to submit this? (It is tagged as approved, but since you've made some extra comments I'm waiting for you to agree with the latest changes).

I think this should be good to go.

arsenm accepted this revision.Oct 8 2019, 10:48 AM

LGTM

arsenm added inline comments.Oct 8 2019, 12:41 PM
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
815–816 ↗(On Diff #221267)

I'm working on a patch to stop reserving m0; I suspect this will avoid the need for the special case propagation

This revision was automatically updated to reflect the committed changes.