This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add optional tied-op for wwm-register's epilog spill restore
ClosedPublic

Authored by cdevadas on May 11 2023, 10:06 AM.

Details

Summary

The COPY inserted in the epilog block before return instruction as part
of ABI lowering, can get optimized during machine copy propagation if
the same register is used earlier in a wwm operation that demands the
prolog/epilog wwm-spill store/restore to preserve its inactive lanes.
With the spill restore in the epilog, the preceding COPY appears to be
dead during machine-cp. To avoid it, mark the same register as a tied-op
in the spill restore instruction to ensure a usage for the COPY.

Diff Detail

Event Timeline

cdevadas created this revision.May 11 2023, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 10:06 AM
cdevadas requested review of this revision.May 11 2023, 10:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2023, 10:06 AM

I noticed in the context we still have some forward scavenging. Those should have been replaced when we switched to using PEI in reverse.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1298

Exact register equality is almost always wrong

1669

Should replace this with a liveness query instead

cdevadas added inline comments.May 14 2023, 4:10 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1669

Using LiveRegs here won't help. The CSRs (including wwm-regs) are marked LiveOut by initLiveRegs() for an epilog block. In that sense, the liveness query will always hold true and we add the tied-op unconditionally for all spill reloads.
These registers are, in fact, liveout only after the insertion of the spill-restore. But we are in the process of inserting one, and we need to look one level backward to get their actual liveness. LiveRegs in its present form won't be helpful for that.

The intention of this patch is to tie the return value register to its own spill restore (if have one) to avoid an incorrect optimization currently performed on it due to the presence of the wwm-restore inserted afterward.
Instead of walking through the operands of RETURN instruction, I would prefer if we have a utility function that returns a regmask or the list of return value registers allocated for this function. But not sure such a thing exists.

arsenm added inline comments.May 15 2023, 2:13 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1669

I'm having trouble following this. The live-outness is already represented by the $vgpr0 use on the return instruction. You're special casing looking for the return instruction in the low level return utility. Is this just to filter out spills in other contexts, and that's why the liveness query doesn't work? If you performed the liveness query in buildEpilogRestore and passed in that you needed the tied operand, would that let you use the liveness query?

llvm/test/CodeGen/AMDGPU/tied-op-for-wwm-scratch-reg-spill-restore.mir
37

Can you add some additional tests with returned register tuples, and one where the def'd register is in the CSR range?

Also, test where the spill isn't used as the return value

cdevadas updated this revision to Diff 522256.May 15 2023, 10:30 AM

Used MachineInstr::readsRegister to check if RETURN instruction reads the spill-reg.
Also, added more tests.

arsenm accepted this revision.May 15 2023, 11:49 AM

LGTM with nit

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
1669

MBB.isReturnBlock is redundant with MI->isReturn. isReturnBlock just checks if the last instruction is a return

This revision is now accepted and ready to land.May 15 2023, 11:49 AM
cdevadas updated this revision to Diff 522431.May 15 2023, 10:10 PM

Removed the isReturn() blk check. Also, added a validity check on the iterator.

This revision was landed with ongoing or failed builds.May 15 2023, 10:37 PM
This revision was automatically updated to reflect the committed changes.