This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Change FLAT SADDR to VADDR form in moveToVALU
ClosedPublic

Authored by rampitec on Apr 27 2021, 3:32 PM.

Details

Summary

Instead of legalizing saddr operand with a readfirstlane
when address is moved from SGPR to VGPR we can just
change the opcode.

Diff Detail

Event Timeline

rampitec created this revision.Apr 27 2021, 3:32 PM
rampitec requested review of this revision.Apr 27 2021, 3:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2021, 3:32 PM
Herald added a subscriber: wdng. · View Herald Transcript
Joe_Nash added inline comments.Apr 28 2021, 9:05 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5037

I don't understand why this assert is true. Does some previous check guarantee that? Other than that LGTM, but please wait for @arsenm

rampitec added inline comments.Apr 28 2021, 9:11 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5037

All flat global SADDR instructions have $vaddr component and other instructions cannot get to this point as getGlobalVaddrOp() will return -1. This changes in the followup D101408 which can also process flat scratch.

Joe_Nash added inline comments.Apr 28 2021, 9:17 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5037

I see it now, thanks.

rampitec marked 2 inline comments as done.
rampitec updated this revision to Diff 341623.Apr 29 2021, 1:21 PM

Renamed function, added comments to helpers.

rampitec updated this revision to Diff 341659.Apr 29 2021, 2:52 PM

Remove dead def because it is simple, do not leave it to DCE.

rampitec updated this revision to Diff 341715.Apr 29 2021, 5:10 PM

Fixed broken operand use list after moveOperands().

arsenm added inline comments.Apr 29 2021, 6:46 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5049–5063

At this point wouldn't it be simpler to just create a fresh new instruction and delete the old one?

rampitec added inline comments.Apr 29 2021, 6:47 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5049–5063

Unfortunately callers expect iterator to be intact. Otherwise probably yes, although it would not be less code.

arsenm added inline comments.Apr 29 2021, 7:00 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5049–5060

Probably should add a comment explaining why this nightmare is here

5049–5063

I think in some contexts in SIFixSGPRCopies the original instruction is erased, but it's a bit of a mess

llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir
36

Can you also add an IR test where this matters? I'm not understanding why the DAG divergence analysis would let this happen

rampitec added inline comments.Apr 29 2021, 7:14 PM
llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir
36

The test case shall be more than 100 instructions long, this is the threshold for memory dependency analysis to give up on the "noclobber" check. The divergence analysis tells it is ok, it is uniform. But noclobber is not set, so the address registers are not known not to be clobberd and we are here.

It really happens in large basic blocks. Probably it can also happen if we just had no SALU instructions to do some computations, so ended up with VALU which was pripagated, but that is not the case I have started with. I.e. it is really uniform, and readfirstlane is valid, but we don't want to issue it.

I am not really sure we want a test of that size. It will be more than obscure in the first place.

foad added inline comments.Apr 30 2021, 2:40 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5027

Use Inst.getMF() ?

5050

Redundant assert, you've just tested this condition.

5055–5056

Are these last two lines really necessary? Hasn't MRI.moveOperands already handled this?

5062

VAddrDef can't be 0 here, that was already checked earlier.

llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir
36

You could test with -memdep-block-scan-limit=1 ?

rampitec updated this revision to Diff 341988.Apr 30 2021, 11:20 AM
rampitec marked 7 inline comments as done.

Addressed review comments. One thing left is a reasonable IR test.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
5055–5056

It did not. I have added the comment.

5062

Right, this is a part of D101408. Removed from this patch.

llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir
36

I will check if I can produce a reasonable IR test.

rampitec updated this revision to Diff 342016.Apr 30 2021, 1:17 PM
rampitec marked 2 inline comments as done.

Added IR test.

arsenm accepted this revision.Apr 30 2021, 4:16 PM

LGTM with test comment

llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll
2 ↗(On Diff #342016)

Add a comment explaining what this is testing

21–23 ↗(On Diff #342016)

Oh, this is the case where the loop index/pointer is really uniform, but forced to be a vector branch. It's just better to do the VALU address computation than do it scalar and copy with readfirstlane

llvm/test/CodeGen/AMDGPU/move-load-addr-to-valu.mir
36

Whether or not this can be converted to SMRD is a different problem

This revision is now accepted and ready to land.Apr 30 2021, 4:16 PM
rampitec updated this revision to Diff 342080.Apr 30 2021, 4:25 PM
rampitec marked an inline comment as done.

Added comment to the test.

llvm/test/CodeGen/AMDGPU/global-load-saddr-to-vaddr.ll
21–23 ↗(On Diff #342016)

Yes, the memory dependency tracing was another case related to the opposite situation. This is just because we are producing VGPRs with a uniform value for the pointer.

This revision was automatically updated to reflect the committed changes.