Clear all kill flags on source register when folding a COPY.
This is necessary because the kills may now be out of order with the uses.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
So you start with this:
%2:sreg_64_xexec = COPY %1.sub0_sub1:sgpr_128 %3:sreg_64_xexec = COPY killed %1.sub2_sub3:sgpr_128 %4:sreg_64 = COPY %2:sreg_64_xexec
and folding %2 into %4 gives this:
%3:sreg_64_xexec = COPY killed %1.sub2_sub3:sgpr_128 %4:sreg_64 = COPY %1.sub0_sub1:sgpr_128
but now the killed flag on the RHS of %3 is wrong. How is this handled in the normal non-subreg case?
Here's a test case that shows the same bug with no subregs involved:
--- name: fold_reg_kill tracksRegLiveness: true body: | bb.0: liveins: $sgpr0 %0:sreg_32 = COPY $sgpr0 %1:sreg_32 = COPY %0 %2:sreg_32 = S_ADD_U32 killed %0, killed %0, implicit-def $scc %3:sreg_32 = COPY %1 S_ENDPGM 0, implicit %2, implicit %3 ...
I wonder why we never normally see this in practice?
I wonder why we never normally see this in practice?
I think we only go into this code if there is a COPY of the result of another COPY. Perhaps these are normally copy-propagated by some other pass before we get here?
I wonder why we never normally see this in practice?
I think we only go into this code if there is a COPY of the result of another COPY. Perhaps these are normally copy-propagated by some other pass before we get here?
See: https://reviews.llvm.org/D52577#1248697 which introduced this code, and the comment: "Why is this necessary? PeepholeOpt runs first and should have eliminated redundant copies"
Anyway the patch looks OK to me, given that this code does not fire very often. One day we should take a closer look at whether it is still needed, or whether PeepholeOptimizer could be improved instead. (Or perhaps it already has been? I see D87939 landed since this code was written.)
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
741 | Don't need the braces. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
741–743 | You've reinvented MRI.clearKillFlags |
In my specific case these extra COPYs come from SILoadStoreOptimizer.
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
741–743 | Thanks, I had missed there was a helper for this. |
See: https://reviews.llvm.org/D52577#1248697 which introduced this code, and the comment: "Why is this necessary? PeepholeOpt runs first and should have eliminated redundant copies"
In my specific case these extra COPYs come from SILoadStoreOptimizer.
Maybe SILoadStoreOptimizer should not insert copies at all. If it replaces:
%old0:vgpr_32 = BUFFER_LOAD_DWORD ... %old1:vgpr_32 = BUFFER_LOAD_DWORD ...
with:
%new:vgpr_64 = BUFFER_LOAD_DWORDX2 ...
then perhaps it could just replace all uses of %old0 with %new.sub0?
Don't need the braces.