This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.
And we can also get the mask from and i1, or i1, xor i1.
Differential D52060
AMDGPU: Add a fast path for icmp.i1(src, false, NE) mareko on Sep 13 2018, 2:30 PM. Authored by
Details This allows moving the condition from the intrinsic to the standard ICmp And we can also get the mask from and i1, or i1, xor i1.
Diff Detail
Event TimelineComment Actions In InstCombineCalls we whitelist bitwidth sizes that are legal, so if the input compare is an i1 compare, it will fold into the intrinsic Comment Actions AMDGPU: Add a fast path for icmp.i1(src, false, NE) Summary: And we can also get the mask from and i1, or i1, xor i1. Don't fold icmp in InstCombineCalls. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52060
|
Needs test in InstCombine