This patch merges a consecutive sequence of
s_or_saveexec s_o, s_i
s_xor exec, exec, s_o
into a single
s_andn2_saveexec s_o, s_i instruction.
This patch also cleans up the SIOptimizeExecMasking pass a bit.
Paths
| Differential D129073
[AMDGPU] Combine s_or_saveexec, s_xor instructions. ClosedPublic Authored by tsymalla on Jul 4 2022, 4:16 AM.
Details Summary This patch merges a consecutive sequence of s_or_saveexec s_o, s_i into a single s_andn2_saveexec s_o, s_i instruction.
Diff Detail
Event TimelineComment Actions
Please put that in a separate NFC patch. It really does make life easier for reviewers. Comment Actions
Will do that. Comment Actions Cleanup submitted as separate NFC change in https://reviews.llvm.org/D129086, will change this patch after the NFC has landed Comment Actions This is a good start. However, I have some high-level questions:
Comment Actions
The primary reason is we can't produce the terminators with output registers before register allocation in case we need to insert spills live out of the block for the save exec. RA needs to insert spills before terminators
Comment Actions
Good idea. I will try to reuse the existing scan to inject the optimizations, if possible.
I don't know what exactly the reasons are to have two passes. However, as in SIOptimizeExecMasking S_*BINOP*_{B32, B64} instructions are swapped with their SAVEEXEC counterpart, it made sense to me to introduce the change here. This is the last time such instructions are inserted by a pass, so it's likely the pattern I am trying to match will only appear after SIOptimizeExecMasking has run. But please correct me if I'm wrong. Comment Actions
We only had post-RA pass initially. AFAIR I have added pre-RA to actually save registers (D35967). Comment Actions
It can probably start from the first terminator, scan backwards, and bail with the first instruction not modifying the exec. Comment Actions
That makes sense, thanks. tsymalla added inline comments.
Comment Actions Merge two scans into one to reduce compile time. Comment Actions Seems fine to me if you update the commit message and the comment.
tsymalla marked an inline comment as done.
Comment Actions This looks pretty good already, but I do have a bunch of code quality comments inline.
tsymalla added inline comments.
This revision is now accepted and ready to land.Jul 21 2022, 3:32 AM This revision was landed with ongoing or failed builds.Jul 21 2022, 5:16 AM Closed by commit rGfd64a857ee7b: [AMDGPU] Combine s_or_saveexec, s_xor instructions. (authored by tsymalla). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 446444 llvm/lib/Target/AMDGPU/SIOptimizeExecMasking.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.memmove.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/localizer.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
llvm/test/CodeGen/AMDGPU/bypass-div.ll
llvm/test/CodeGen/AMDGPU/collapse-endcf.ll
llvm/test/CodeGen/AMDGPU/else.ll
llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.softwqm.ll
llvm/test/CodeGen/AMDGPU/multi-divergent-exit-region.ll
llvm/test/CodeGen/AMDGPU/s_or_saveexec_xor_combine.mir
llvm/test/CodeGen/AMDGPU/set-inactive-wwm-overwrite.ll
llvm/test/CodeGen/AMDGPU/sgpr-control-flow.ll
llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll
llvm/test/CodeGen/AMDGPU/valu-i1.ll
llvm/test/CodeGen/AMDGPU/vcmp-saveexec-to-vcmpx.ll
llvm/test/CodeGen/AMDGPU/vgpr-liverange.ll
llvm/test/CodeGen/AMDGPU/wqm.ll
|
Why the empty line?