This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] remove one-use restriction for not (cmp P, A, B) --> cmp P', A, B
Needs ReviewPublic

Authored by spatel on Jul 9 2017, 9:11 AM.

Details

Summary

We can always move the 'not' op of a compare into the predicate of the compare itself to eliminate the not (xor).

The test case (poo) that looks worse was discussed in D33172 / PR32791 ( https://bugs.llvm.org/show_bug.cgi?id=32791 ). The common value is easy for CSE to detect, so there should be no actual regressions in that case.

D35044 reminded me that I had a draft patch for this ready to go once PR32791 was resolved. That change will make this cleaner, but it doesn't affect the functionality.

This patch will help clear the way for the remaining fixes needed to resolve:
https://bugs.llvm.org/show_bug.cgi?id=32706 (making 'not' ops rather than random 'xor' canonical in IR).

Diff Detail

Event Timeline

spatel created this revision.Jul 9 2017, 9:11 AM
spatel edited the summary of this revision. (Show Details)Jul 9 2017, 9:12 AM
craig.topper edited edge metadata.Jul 10 2017, 8:56 AM

Are any of the improved tests, cases that wouldn't be improved by my earlier proposal to demorgan logical ops on compare when one side is free to invert and the other side isn't. Which I think avoids the CSE issues.

test/Transforms/SimplifyCFG/merge-cond-stores.ll
45

This test also got worse, but I guess it will CSE too?

Are any of the improved tests, cases that wouldn't be improved by my earlier proposal to demorgan logical ops on compare when one side is free to invert and the other side isn't. Which I think avoids the CSE issues.

I was wondering about that too. You're referring to D32725? We're getting the same outcome on the SimplifyCFG tests - 'test_simple' and 'test_recursive', but yes, we have the CSE-related diff on 'test_simple_commuted'.

test/Transforms/SimplifyCFG/merge-cond-stores.ll
45

Right - we'd expect X3 to be matched with X2 and deleted by -early-cse.

spatel added a comment.Aug 1 2017, 6:15 AM

Ping.

For reference, I mentioned this patch in a related discussion about instcombine and folding of cmps:
http://lists.llvm.org/pipermail/llvm-dev/2017-July/115452.html

Based on that thread, I think we do want to proceed with this canonicalization, rather than the inverse transform suggested in:
https://bugs.llvm.org/show_bug.cgi?id=27431
...but let me know if there's a better way.

We definitely should do something to fix an invert of a compare feeding an 'and' or 'or' op. I've seen it prevent folding 'and of icmps' or 'or of icmps' because we won't look through the not in those folds. The ones I've seen would have been handled by something like D32725.

This doesn't do anything to fix PR27431 right?

Do we have any real motivating examples that don't involve the compares being used by bitwise operations that D32725 would take care of?

spatel added a comment.Aug 1 2017, 2:44 PM

We definitely should do something to fix an invert of a compare feeding an 'and' or 'or' op. I've seen it prevent folding 'and of icmps' or 'or of icmps' because we won't look through the not in those folds. The ones I've seen would have been handled by something like D32725.

This doesn't do anything to fix PR27431 right?

int a;
char b;
int fn1() { return a && b || a && b; }

This eliminates a 'not' op, but doesn't actually solve that example. Currently, we have this at -O2:

define i32 @fn1(i32 %a, i8 signext %b) {
entry:
  %tobool = icmp ne i32 %a, 0
  %tobool.not = xor i1 %tobool, true
  %tobool1 = icmp eq i8 %b, 0
  %or.cond = or i1 %tobool1, %tobool.not
  br i1 %or.cond, label %lor.rhs, label %lor.end

lor.rhs:                                          ; preds = %entry
  %tobool4 = icmp ne i8 %b, 0
  %0 = and i1 %tobool, %tobool4
  %phitmp = zext i1 %0 to i32
  br label %lor.end

lor.end:                                          ; preds = %entry, %lor.rhs
  %1 = phi i32 [ %phitmp, %lor.rhs ], [ 1, %entry ]
  ret i32 %1
}

With this patch applied, we get:

define i32 @fn1(i32 %a, i8 signext %b)  {
entry:
  %tobool.not = icmp eq i32 %a, 0
  %tobool1 = icmp eq i8 %b, 0
  %or.cond = or i1 %tobool.not, %tobool1
  br i1 %or.cond, label %lor.rhs, label %lor.end

lor.rhs:                                          ; preds = %entry
  %tobool = icmp ne i32 %a, 0
  %tobool4 = icmp ne i8 %b, 0
  %0 = and i1 %tobool, %tobool4
  %phitmp = zext i1 %0 to i32
  br label %lor.end

lor.end:                                          ; preds = %entry, %lor.rhs
  %1 = phi i32 [ %phitmp, %lor.rhs ], [ 1, %entry ]
  ret i32 %1
}

This form (more clearly DeMorganized ops) seems easier to have GVN match?

Do we have any real motivating examples that don't involve the compares being used by bitwise operations that D32725 would take care of?

My motivation is actually PR32706...although I've lost track of how these things actually interact to conflict with each other and infinite loop.

That's not really the IR for that C code is it. The C code has two global variables and the IR has two arguments.

spatel added a comment.Aug 1 2017, 4:15 PM

That's not really the IR for that C code is it. The C code has two global variables and the IR has two arguments.

Oops - that's correct. I simplified the example from PR27431 to:

int fn1(int a, char b) { return (a && b) || (a && b); }

It shows the same problem though AFAICT.

davide edited edge metadata.Aug 1 2017, 4:19 PM

That's not really the IR for that C code is it. The C code has two global variables and the IR has two arguments.

Oops - that's correct. I simplified the example from PR27431 to:

int fn1(int a, char b) { return (a && b) || (a && b); }

It shows the same problem though AFAICT.

Unless I'm reading this example incorrectly, which may be possible, this is clearly an example of redundant expression.
FWIW, you should be able to prove that:
a' OR a can be simplified to a
if a is structurally equivalent to a'

In this case the two things are trivially structurally equivalent.
Same opcode, same operands.

This, FWIW, it's the whole point of doing value numbering.

The C code with the global variables never creates a 'not' operation and never combines the 4 compares with 0. For your modified version, it looks like something combined the two compares with A into a single compare and the xor and with this patch instcombine undoes that. Isn't that moving in the opposite direction of PR27431?

spatel added a comment.Aug 2 2017, 4:39 AM

The C code with the global variables never creates a 'not' operation and never combines the 4 compares with 0. For your modified version, it looks like something combined the two compares with A into a single compare and the xor and with this patch instcombine undoes that. Isn't that moving in the opposite direction of PR27431?

Yes, you're correct - the 'not' isn't in the version with the globals for some reason. But the end result is similar - we fail to recognize the DeMorgan equivalents.

And yes, this is going in the other direction from the suggestion in PR27431. There are 3 options as I see it:

  1. We standardize instcombine to always transform not(cmp) to the inverted predicate (this patch); we enhance GVN or other passes to recognize DeMorgan patterns (which we may want to do first and anyway).
  2. We standardize instcombine to not transform not(cmp) and enhance its pattern matching to account for the extra 'not' ops.
  3. We live with the current instcombine behavior (not(cmp) is transformed with only 1-use), and add folds like D32725. That patch alone isn't enough to catch the PR27431 example with globals, is it?

I don't believe D32725 won't do anything to the patch with globals in it. I don't believe an xor ever exists there.

I think the globals test is somehow different do to the presence of the loads of the globals that don't exist in the argument test.

I've been wondering if there's some way to canonicalize (br (and cmp ne, cmp ne) and (br (or cmp eq, cmp eq)) to the same thing by inverting the branch targets in one which would allow CSE to catch it. But I think it gets difficult when there are more than 2 compares involved that need to be canonicalized since you need to walk a tree to make sure you can flip all the compares.