This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Propagate defining src reg for AGPR to AGPR Copys
ClosedPublic

Authored by vangthao on Aug 27 2021, 9:56 AM.

Details

Summary

On targets that do not support AGPR to AGPR copying directly, try to find the
defining accvgpr_write and propagate its source vgpr register to the copies
before register allocation so the source vgpr register does not get clobbered.

The postrapseudos pass also attempt to propagate the defining accvgpr_write but
if the register to propagate is clobbered, it will give up and create new
temporary vgpr registers instead.

Diff Detail

Event Timeline

vangthao created this revision.Aug 27 2021, 9:56 AM
vangthao requested review of this revision.Aug 27 2021, 9:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2021, 9:56 AM

This attempts to address the issue seen here https://reviews.llvm.org/D105062

arsenm added inline comments.Aug 27 2021, 10:03 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
237

Can skip this whole pass if this isn't gfx908

238–239

Since you're visiting all the defs anyway, could you unify this with the previous loop when an agpr register is found?

278

I'm worried this might not work correctly with overlapping subregisters

288–291

Should not need to totally trash the liveness and start over

rampitec added inline comments.Aug 27 2021, 12:23 PM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
238–239

Second to that. We already scanning all registers in the processReg() and all their defs, so it shall not be hard to find a def of an AGPR which is a COPY.

263

It is better to use isAGPRClass() instead. AV classes may become allocatable one day [soon].

vangthao updated this revision to Diff 369480.Aug 30 2021, 10:08 AM

Merged loop. Changed uses of TRI->hasAGPRs() to TRI->isAGPRClass(). Added some subregs test.

vangthao marked 4 inline comments as done.Aug 30 2021, 10:09 AM
vangthao added inline comments.Aug 30 2021, 10:13 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
278

I'm worried this might not work correctly with overlapping subregisters

I added some new tests. Do they address this? If not can you give an example?

288–291

Should not need to totally trash the liveness and start over

Is there a way to update the liveness without trashing it?

vangthao updated this revision to Diff 372360.Sep 13 2021, 4:06 PM

Fixed clang-tidy variable case style warning and rebased.

rampitec added inline comments.Sep 20 2021, 3:59 PM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
98

I do not think you need this case. This is a single 32 bit register, it will have a single def.

98–99

... and here.

103

You know you are dealing with an AGPR or SGPR right at the beginning of the function. You can quickly bail from here based on that.

143

You can just return false here.

rampitec added inline comments.Sep 20 2021, 4:14 PM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
103

DstReg always the same as Reg or at least should be unless it is not an implicit-def. At maximum I would expect assert(Reg == I.getOperand(0).getReg()) here.

106

Reg is always virtual, it is a result of index2VirtReg() call.

rampitec added inline comments.Sep 20 2021, 5:26 PM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
121

I am no sure why do you need to match SRC and DST subregs. Why not just copy a source from Def here whatever it is?

vangthao added inline comments.Sep 21 2021, 7:50 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
98

This may not necessarily be a single def. This may contain multiple defs if subregs are used. For example in the case that I am looking at:

undef %1.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 %0.sub0:vreg_128, implicit $exec
%1.sub1:areg_128 = COPY %1.sub0:areg_128
%1.sub2:areg_128 = COPY %1.sub0:areg_128
%1.sub3:areg_128 = COPY %1.sub0:areg_128

These subregs are considered to be under %1 since def_instruction() does not consider different subregs as far as I can tell. I think index2VirtReg() does not consider subregs either.

121

Same as other comment. There may be subregs that def_instructions does not account for. We need to check if we are copying the correct subreg.

rampitec added inline comments.Sep 21 2021, 9:52 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
98–99

Same thing, it should continue outer loop.

121

I see.

137

Actually you can defer LIS update until after the loop. If you are dealing with many subregs of the same register you would only need to recreate interval once.

143

In fact not, you still want to continue outer loop.

vangthao updated this revision to Diff 374658.Sep 23 2021, 1:44 PM

Check if Reg is AGPR before doing anything in Copy switch case.
Remove isVirtual() check for Reg since index2VirtReg() guarantees virtual.
Defer LIS update until end.

vangthao marked 11 inline comments as done.Sep 23 2021, 1:46 PM
rampitec added inline comments.Sep 23 2021, 2:03 PM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
90

A largest tuple is 32, so maybe SmallSet<Register, 32>?

103

Can you hoist it out the loop? Reg is loop invariant.

175

Also replace it with hoisted IsAGPRDst.

vangthao updated this revision to Diff 374674.Sep 23 2021, 2:31 PM

Hoisted IsAGPRDst out of loop. Change SmallSet size to 32.

vangthao marked 3 inline comments as done.Sep 23 2021, 2:32 PM
This revision is now accepted and ready to land.Sep 23 2021, 2:32 PM
This revision was landed with ongoing or failed builds.Sep 23 2021, 3:20 PM
This revision was automatically updated to reflect the committed changes.
piotr added a subscriber: piotr.Sep 30 2021, 8:03 AM
piotr added inline comments.
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
104

I'm seeing issues with this commit on targets without agpr (e.g. gfx900).
I think there is a change in behaviour here - previously on COPY there was an early return. Now, there is only a switch break, so the loop continues. Is this intended?

foad added inline comments.Nov 3 2021, 1:19 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
104

@vangthao ping! This is causing codegen differences on gfx900 which has no AGPRs. Can you please comment on whether that is expected, and if so why? Thanks!

vangthao added inline comments.Nov 3 2021, 1:57 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
104

Sorry I missed this comment.

This behavior is not intended. The issue was found and should be fixed in https://reviews.llvm.org/D113005

foad added inline comments.Nov 3 2021, 2:07 AM
llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
104

Thanks. We are testing that fix now.

piotr added a comment.Nov 3 2021, 2:09 AM

For the record, D113005 indeed fixes the CTS issue I was seeing with D108830. Thanks.