This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][SIFoldOperands] Clear kills when folding COPY
ClosedPublic

Authored by critson on Jul 27 2022, 4:14 AM.

Details

Summary

Clear all kill flags on source register when folding a COPY.
This is necessary because the kills may now be out of order with the uses.

Diff Detail

Event Timeline

critson created this revision.Jul 27 2022, 4:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 4:14 AM
critson requested review of this revision.Jul 27 2022, 4:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 4:14 AM
foad added a comment.Jul 27 2022, 5:30 AM

So you start with this:

%2:sreg_64_xexec = COPY %1.sub0_sub1:sgpr_128
%3:sreg_64_xexec = COPY killed %1.sub2_sub3:sgpr_128
%4:sreg_64 = COPY %2:sreg_64_xexec

and folding %2 into %4 gives this:

%3:sreg_64_xexec = COPY killed %1.sub2_sub3:sgpr_128
%4:sreg_64 = COPY %1.sub0_sub1:sgpr_128

but now the killed flag on the RHS of %3 is wrong. How is this handled in the normal non-subreg case?

foad added a comment.EditedJul 27 2022, 5:41 AM

Here's a test case that shows the same bug with no subregs involved:

---
name: fold_reg_kill
tracksRegLiveness: true
body: |
  bb.0:
    liveins: $sgpr0

    %0:sreg_32 = COPY $sgpr0
    %1:sreg_32 = COPY %0
    %2:sreg_32 = S_ADD_U32 killed %0, killed %0, implicit-def $scc
    %3:sreg_32 = COPY %1
    S_ENDPGM 0, implicit %2, implicit %3
...

I wonder why we never normally see this in practice?

I wonder why we never normally see this in practice?

Perhaps it is happening, but without the verifier enabled?

critson updated this revision to Diff 448023.Jul 27 2022, 6:03 AM
  • Generalise beyond subregister case
critson retitled this revision from [AMDGPU][SIFoldOperands] Clear kills when folding subreg COPY to [AMDGPU][SIFoldOperands] Clear kills when folding COPY.Jul 27 2022, 6:04 AM
critson edited the summary of this revision. (Show Details)
foad added a comment.Jul 27 2022, 8:26 AM

I wonder why we never normally see this in practice?

I think we only go into this code if there is a COPY of the result of another COPY. Perhaps these are normally copy-propagated by some other pass before we get here?

foad added a comment.EditedJul 27 2022, 8:33 AM

I wonder why we never normally see this in practice?

I think we only go into this code if there is a COPY of the result of another COPY. Perhaps these are normally copy-propagated by some other pass before we get here?

See: https://reviews.llvm.org/D52577#1248697 which introduced this code, and the comment: "Why is this necessary? PeepholeOpt runs first and should have eliminated redundant copies"

foad accepted this revision.Jul 27 2022, 9:46 AM
foad added a reviewer: hliao.

Anyway the patch looks OK to me, given that this code does not fire very often. One day we should take a closer look at whether it is still needed, or whether PeepholeOptimizer could be improved instead. (Or perhaps it already has been? I see D87939 landed since this code was written.)

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
741

Don't need the braces.

This revision is now accepted and ready to land.Jul 27 2022, 9:46 AM
arsenm added inline comments.Jul 27 2022, 4:35 PM
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
741–743

You've reinvented MRI.clearKillFlags

critson marked 2 inline comments as done.Jul 27 2022, 8:01 PM

I wonder why we never normally see this in practice?

I think we only go into this code if there is a COPY of the result of another COPY. Perhaps these are normally copy-propagated by some other pass before we get here?

See: https://reviews.llvm.org/D52577#1248697 which introduced this code, and the comment: "Why is this necessary? PeepholeOpt runs first and should have eliminated redundant copies"

In my specific case these extra COPYs come from SILoadStoreOptimizer.

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
741–743

Thanks, I had missed there was a helper for this.

This revision was landed with ongoing or failed builds.Jul 27 2022, 8:01 PM
This revision was automatically updated to reflect the committed changes.
critson marked an inline comment as done.
foad added a comment.Jul 29 2022, 2:27 AM

See: https://reviews.llvm.org/D52577#1248697 which introduced this code, and the comment: "Why is this necessary? PeepholeOpt runs first and should have eliminated redundant copies"

In my specific case these extra COPYs come from SILoadStoreOptimizer.

Maybe SILoadStoreOptimizer should not insert copies at all. If it replaces:

%old0:vgpr_32 = BUFFER_LOAD_DWORD ...
%old1:vgpr_32 = BUFFER_LOAD_DWORD ...

with:

%new:vgpr_64 = BUFFER_LOAD_DWORDX2 ...

then perhaps it could just replace all uses of %old0 with %new.sub0?