This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.
And we can also get the mask from and i1, or i1, xor i1.
Paths
| Differential D52060
AMDGPU: Add a fast path for icmp.i1(src, false, NE) ClosedPublic Authored by mareko on Sep 13 2018, 2:30 PM.
Details Summary This allows moving the condition from the intrinsic to the standard ICmp And we can also get the mask from and i1, or i1, xor i1.
Diff Detail
Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 4 others. · View Herald TranscriptSep 13 2018, 2:30 PM Comment Actions
What do you mean by that? I'm not sure what you mean. Comment Actions
In InstCombineCalls we whitelist bitwidth sizes that are legal, so if the input compare is an i1 compare, it will fold into the intrinsic Comment Actions AMDGPU: Add a fast path for icmp.i1(src, false, NE) Summary: And we can also get the mask from and i1, or i1, xor i1. Don't fold icmp in InstCombineCalls. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060
Closed by commit rL351150: AMDGPU: Add a fast path for icmp.i1(src, false, NE) (authored by mareko). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 175387 lib/Target/AMDGPU/SIISelLowering.cpp
lib/Target/AMDGPU/SIInstructions.td
lib/Transforms/InstCombine/InstCombineCalls.cpp
test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll
|
Needs test in InstCombine