This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Split waterfall loop exec manipulation
ClosedPublic

Authored by critson on Mar 21 2022, 6:50 PM.

Details

Summary

Split waterfall loops into multiple blocks so that exec mask
manipulation (s_and_saveexec) does not occur in the middle of
a block.

VGPR live range optimizer is updated to handle waterfall loops
spanning multiple blocks.

Diff Detail

Event Timeline

critson created this revision.Mar 21 2022, 6:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 6:50 PM
critson requested review of this revision.Mar 21 2022, 6:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 6:50 PM
arsenm added inline comments.Mar 21 2022, 6:54 PM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
337–339

I've thought about adding some kind of iterator which would automatically cross fallthrough blocks

critson added inline comments.Mar 21 2022, 7:00 PM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
337–339

I can certainly see it being useful in a few places.

I agree this is a good idea and the change looks good, just only a few minor comments. btw, Is this the only case that we modify EXEC in middle of a block?

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
337–338

I would suggest you simply push LoopHeader and LoopEnd into the Blocks here, otherwise we have to go through the blocks by checking against predecessors/successors. The block appears between two blocks in the block-list may not appear between them in the CFG.
You can also place an assert the LoopHeader has unique successor, which should be LoopEnd.

571–572

Please help update the comment a little bit to reflect the new code. It is better to place an assert that Blocks.size() <=2 here. The reason is if we have more blocks, we can not clear the AliveBlocks bit in this simple way.

I agree this is a good idea and the change looks good, just only a few minor comments. btw, Is this the only case that we modify EXEC in middle of a block?

I don't think so. WWM/WQM still ends up in the middle of a block I believe. Kills also start in the middle of blocks, but are moved to terminators

critson updated this revision to Diff 417800.Mar 23 2022, 6:14 PM
critson marked 2 inline comments as done.
  • Address reviewer comments

I agree this is a good idea and the change looks good, just only a few minor comments. btw, Is this the only case that we modify EXEC in middle of a block?

I don't think so. WWM/WQM still ends up in the middle of a block I believe. Kills also start in the middle of blocks, but are moved to terminators

WWM/WQM is probably the only remaining major example -- which I can work on fixing.
Kills and demotes always get lowered such that manipulation occurs at end of a block.

At the moment, I feel that pseudos that introduce exec manipulation do not themselves need to be treated as exec manipulation, in terms of not being in the middle of a block.
The burden is on the code lowering them.
Of course we have to take care about reordering around these instructions, but generally convergent attributes take care of this.

llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
337–338

I have rewritten the loop to follow successors and validate there is a linear flow.
I also lifted the Block collection from the inner loop.
I don't want to limit this to just LoopHeader and LoopEnd, in case we extend the waterfall loop in future.

571–572

I have updated the comment.
Sorry, I still don't understand the Block.size() limit -- surely this works for any length waterfall loop if all the blocks are chained together linearly?

ruiling added inline comments.Mar 23 2022, 7:20 PM
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
571–572

I find that I was thinking about a situation not applicable here. The way that OldVarInfo was updated should still work. The problem here is we need to update NewVarInfo such that it may be alive in some block if there are more than two blocks. For example, if we have three blocks, the value defined in the first block and used in the third block should be alive in the second block. Seem handleVirtRegUse can help this. Please double check.

critson updated this revision to Diff 418098.Mar 24 2022, 5:56 PM
  • Update liveness for new registers
critson marked an inline comment as done.Mar 24 2022, 5:56 PM
critson added inline comments.
llvm/lib/Target/AMDGPU/SIOptimizeVGPRLiveRange.cpp
571–572

I have updated the logic with a new loop, can you take a look?

This revision is now accepted and ready to land.Mar 27 2022, 10:45 PM
This revision was landed with ongoing or failed builds.Mar 28 2022, 1:45 AM
This revision was automatically updated to reflect the committed changes.