One of the conditions to flush the vmcnt counter in loop preheaders is: The loop contains a use of a vgpr that is defined out of the loop.
The code currently checks if a waitcnt is needed by looking at the score of that vgpr in the score brackets. This is not enough and may cause the generation of an unnecessary vmcnt flush. This patch fixed that case.
Details
- Reviewers
foad nhaehnle arsenm - Group Reviewers
Restricted Project - Commits
- rGb556726ccc56: [AMDGPU] Avoid flushing the vmcnt counter in loop preheaders if not necessary
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1733 | Maybe use .readsReg() here? See comments about undef in the test case. (Note that readsReg is not mutually exclusive with isDef, since a subreg def operand both reads and writes the undeerlying super reg. So you might also need to remove the "else" on line 1752.) | |
1742 | Maybe you just need: if (Brackets.getRegScore(RegNo, VM_CNT) > Brackets.getScoreLB(VM_CNT))? | |
llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir | ||
570 | It is weird to have undef on all the use operands in this test cause that means the instruction doesn't care about the value it reads - so really no waitcnt is required before any of those reads. |
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp | ||
---|---|---|
1733 | Sorry but I think I was wrong to suggest readsReg. A subreg write operand is "readsReg" because it "reads" the other subregs of the superreg, but for the purposes of waitcnt insertion we should not consider it as a read operand. So I think you don't need any change on line 1729 after all. |
@foad sorry for the delay. I removed the .readsReg() code and cleaned a bit the test case.
llvm/test/CodeGen/AMDGPU/waitcnt-vmcnt-loop.mir | ||
---|---|---|
547 | Should drop the operand check from the -NOTs |
Maybe use .readsReg() here? See comments about undef in the test case. (Note that readsReg is not mutually exclusive with isDef, since a subreg def operand both reads and writes the undeerlying super reg. So you might also need to remove the "else" on line 1752.)