This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
ClosedPublic

Authored by bsaleil on Jul 21 2022, 4:11 PM.

Details

Summary

One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop.
The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixed that case.

Diff Detail

Event Timeline

bsaleil created this revision.Jul 21 2022, 4:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 4:11 PM
bsaleil requested review of this revision.Jul 21 2022, 4:11 PM
foad added inline comments.Jul 22 2022, 5:34 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1731

Maybe use .readsReg() here? See comments about undef in the test case. (Note that readsReg is not mutually exclusive with isDef, since a subreg def operand both reads and writes the undeerlying super reg. So you might also need to remove the "else" on line 1752.)

1740

Maybe you just need: if (Brackets.getRegScore(RegNo, VM_CNT) > Brackets.getScoreLB(VM_CNT))?

llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
570

It is weird to have undef on all the use operands in this test cause that means the instruction doesn't care about the value it reads - so really no waitcnt is required before any of those reads.

bsaleil updated this revision to Diff 450885.Aug 8 2022, 11:12 AM

Addressed review comments, added test cases with and without undef operands.

bsaleil marked 3 inline comments as done.Aug 8 2022, 11:12 AM
foad added inline comments.Aug 22 2022, 9:26 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1731

Sorry but I think I was wrong to suggest readsReg. A subreg write operand is "readsReg" because it "reads" the other subregs of the superreg, but for the purposes of waitcnt insertion we should not consider it as a read operand. So I think you don't need any change on line 1729 after all.

bsaleil updated this revision to Diff 462185.Sep 22 2022, 8:44 AM

@foad sorry for the delay. I removed the .readsReg() code and cleaned a bit the test case.

bsaleil marked an inline comment as done.Sep 22 2022, 8:44 AM
arsenm accepted this revision.Sep 22 2022, 10:18 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir
547

Should drop the operand check from the -NOTs

This revision is now accepted and ready to land.Sep 22 2022, 10:18 AM
foad accepted this revision.Sep 27 2022, 3:53 AM

LGTM, thanks!

bsaleil marked an inline comment as done.Sep 28 2022, 10:07 AM