This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))
ClosedPublic

Authored by mreisinger on Jul 19 2016, 6:49 AM.

Details

Summary

Currently, InstCombine is already able to fold expressions of the form logic(cast(A), cast(B)) to the simpler form cast(logic(A, B)), where logic designates one of and/or/xor. This transformation is implemented in foldCastedBitwiseLogic() in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both A and B are icmp instructions. The decision to preclude casts of icmp instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:

Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.

Apparently, it introduced a transformation that is a reverse of the transformation that is done in foldCastedBitwiseLogic(). Its purpose is to expose pairs of zext icmp that would subsequently be optimized by transformZExtICmp() in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in foldCastedBitwiseLogic() has been restricted to exclude icmp instructions which is mirrored in the responsible check:

if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...

This check seems to sort out more cases than necessary because:

  • the reverse transformation is obviously done for or instructions only
  • and also not every zext icmp pair is necessarily the result of this reverse transformation

Therefore we now remove this check and replace it by a more finegrained one in shouldOptimizeCast() that now rejects only those logic(zext(icmp), zext(icmp)) that would be able to be optimized by transformZExtICmp(), which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form logic(cast(icmp), cast(icmp)) to cast(logic(icmp, icmp)) (cast being an arbitrary CastInst).

As an example, consider the following IR snippet

%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4

which would now be transformed to

%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8

This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its Bool datatype to LLVM's i8 (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet (a > b) & (a < c). Like shown above, this may introduce zext operations, casting between i1 and i8, which could for example hinder ScalarEvolution and Polly on certain code.

Diff Detail

Event Timeline

mreisinger updated this revision to Diff 64482.Jul 19 2016, 6:49 AM
mreisinger retitled this revision from to [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp)).
mreisinger updated this object.
mreisinger added reviewers: grosser, vtjnash.
mreisinger added a subscriber: llvm-commits.
mreisinger updated this object.Jul 19 2016, 7:12 AM
majnemer accepted this revision.Jul 19 2016, 8:37 AM
majnemer added a reviewer: majnemer.
majnemer added a subscriber: majnemer.

LGTM with nits addressed.

test/Transforms/InstCombine/zext.ll
78–82

I'd recommend pattern matching these similar to the test above.

94–101

Ditto.

This revision is now accepted and ready to land.Jul 19 2016, 8:37 AM
mreisinger updated this revision to Diff 64510.Jul 19 2016, 9:14 AM
mreisinger edited edge metadata.

Thank you, I updated the test cases accordingly.

grosser closed this revision.Jul 19 2016, 1:33 PM
grosser edited edge metadata.
test/Transforms/InstCombine/zext.ll
90

Could you add a corresponding version for the other logical operators?

111

Nice.

Thank you for committing. See https://reviews.llvm.org/D22561 for additional test cases.