Page MenuHomePhabricator

[InstCombine] Fold (xor (min/max X, Y), -1) -> (max/min ~X, ~Y) when X and Y are freely invertible.
ClosedPublic

Authored by craig.topper on Sep 11 2018, 11:30 PM.

Details

Summary

This allows the xor to be removed completely.

This might help with recomitting r341674, but seems good regardless.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Sep 11 2018, 11:30 PM
spatel added inline comments.Sep 12 2018, 7:11 AM
lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2903 ↗(On Diff #165011)

typo

2908–2910 ↗(On Diff #165011)

Move createMinMax() from InstCombineSelect.cpp, so we can use it here and above?

test/Transforms/InstCombine/xor.ll
813–816 ↗(On Diff #165011)

The test should start in canonical min/max form, so we're not depending on other folds to exercise the new one:

%a = sub i32 -2, %x
%b = icmp sgt i32 %a, 0
%c = select i1 %1, i32 %a, i32 0
%d = xor i32 %c, -1

...and of course, I'd prefer to have the test in place ahead of the patch.

827–830 ↗(On Diff #165011)

This doesn't provide much extra coverage - just changing the constant? Better to have a test with a vector type and/or change the sub to an add?

More test cases and fix an infinite loop.

spatel added a reviewer: anna.Sep 12 2018, 1:48 PM

Can you split off the inf-loop fix? That solves PR38915, right?
https://bugs.llvm.org/show_bug.cgi?id=38915

I don’t know if it fixes that PR. This transform causes an infinite loop on the tests where both max/min operands are invertible but not constants. Test50 and test51

This whole patch does fix the infinite loop from PR38915, but it requires the whole patch and not just the change in InstCombineSelect.cpp

Fix the typo in the comment. I missed it earlier in my haste to board a plane.

anna added a comment.Sep 13 2018, 8:33 AM

This whole patch does fix the infinite loop from PR38915, but it requires the whole patch and not just the change in InstCombineSelect.cpp

The whole patch fixes our internal regression as well (PR38915 was the bug point reduced version of the regression). Thanks for the fix!

I looked a bit closer at what's going on in PR38915. This patch fixes that particular case, but I'm worried that we still have the same bug lurking if we commit this without solving the underlying problem.

I think these combines are fighting because we don't have a way to accurately assess 'one-use' in terms of another min/max within IsFreeToInvert(). This could be the best argument yet for integer min/max intrinsics because we're going to have to keep special-casing / adding those !X->hasNUsesOrMore(3) hacks to account for min/max transforms...or maybe this + whatever else is needed to recommit rL341674 is enough?

The problem might also be solved by adjusting the min/max bailout that we have within visitICmp. We don't optimize cmps that are part of min/max because we're scared to break a min/max.

There's another underlying problem that I'd like to look that I've been putting off: we have a bunch of min/max of min/max transforms in instcombine, but they really belong in instsimplify because we're just deleting ops. It's not a complete solution, but if we fix that, we're less likely to have this problem in instcombine because some patterns would be killed before we could get into trouble.

spatel accepted this revision.Sep 13 2018, 10:40 AM

I don't know how to solve the infinite loop bug completely yet (if anyone else does, please suggest), and given that this solves the existing bug while adding an optimization, I'll say LGTM.

But please do:

  1. Commit the tests with baseline checks.
  2. Add a TODO comment about using createMinMax() (we're dropping metadata in all of the related transforms, so it would be good to handle that in 1 place if possible)
  3. Add this test for the existing bug:
define i32 @PR38915(i32 %x, i32 %y, i32 %z) {
  %xn = sub i32 0, %x
  %yn = sub i32 0, %y
  %c1 = icmp sgt i32 %xn, %yn
  %m1 = select i1 %c1, i32 %xn, i32 %yn
  %m1n = xor i32 %m1, -1
  %c2 = icmp sgt i32 %m1n, %z
  %m2 = select i1 %c2, i32 %m1n, i32 %z
  %m2n = xor i32 %m2, -1
  ret i32 %m2n
}
This revision is now accepted and ready to land.Sep 13 2018, 10:40 AM
This revision was automatically updated to reflect the committed changes.