This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Apply deMorgan to (and/or (not cmp1), cmp2) when cmp1 has multiple uses, but cmp2 has a single use
AbandonedPublic

Authored by craig.topper on May 1 2017, 5:19 PM.

Details

Reviewers
spatel
efriedma
Summary

This patch applies demorgan when we detect an icmp of 2 cmps where one cmp is inverted, but the not can't be folded into the predicate because the cmp has another usage. If we are able to fold a not into the other cmp, we can do the transform and hope the remaining not can be folded away further up the chain. And even if its not folded away, we haven't increased the number of operations.

I plan to add more test cases in the InstCombine directory before committing this, but wanted to get an opinion on whether it makes sense.

Interestingly, the one test that changed here is for the transformation that caused the extra xors in my test case too.

Diff Detail

Event Timeline

craig.topper created this revision.May 1 2017, 5:19 PM
spatel edited edge metadata.

The transform seems reasonable, but I wonder if we can pull more of the code for DeMorgan folds together. Maybe this can be added to matchDeMorgansLaws? Also, I'm trying to clean up the block in D32665. Below that, we use "IsFreeToInvert" and that does have a param to account for multi-use cases. Could/should we use IsFreeToInvert instead of writing a special match for this case?

As you said, it would be great to have the tests for various cases under test/Transforms/InstCombine, so we'll know what happens in those cases. I suspect we have a lot of transforms where we're either stopping short because of !hasOneUse() or overstepping because that condition is missing.

efriedma edited edge metadata.May 2 2017, 11:46 AM

I'm not quite sure about this the way it's written. This doesn't reduce the total number of instructions on its own, and it seems like an overly complicated pattern. You could approach this from a different angle: fold xor(icmp(pred, x, y)) to (icmp (!pred, x, y)) even if the inner icmp has multiple uses. Or maybe this is okay as-is; my intuition about what's right here isn't strong.

I guess my thought was that even if we were unable to get rid of the not later, an extra not was slightly preferable to creating a new compare instruction.

spatel added a comment.May 3 2017, 8:56 AM

I'm not quite sure about this the way it's written. This doesn't reduce the total number of instructions on its own, and it seems like an overly complicated pattern. You could approach this from a different angle: fold xor(icmp(pred, x, y)) to (icmp (!pred, x, y)) even if the inner icmp has multiple uses. Or maybe this is okay as-is; my intuition about what's right here isn't strong.

I think that folding to icmp with inverse predicate (regardless of number of uses) is the right thing to do because it eliminates a dependency between icmp and xor (not). Plus, we have tons of folds/analysis for icmp.

In other words, we should get rid of the m_OneUse here:

// xor (cmp A, B), true = not (cmp A, B) = !cmp A, B
ICmpInst::Predicate Pred;
if (match(Op0, m_OneUse(m_Cmp(Pred, m_Value(), m_Value()))) &&

There was at least one missing fold exposed by this change in a regression test when I checked it previously.

should we also lift the one use check for compares in IsFreeToInvert?

What about floating point compares? We probably don't have as many folds for them.

spatel added a comment.May 3 2017, 3:57 PM

should we also lift the one use check for compares in IsFreeToInvert?

Yes, I think that makes sense. @efriedma - do you foresee problems?
I forgot that I already filed this bug that will result if we lift the one-use restriction for // xor (cmp A, B), true = not (cmp A, B) = !cmp A, B :
https://bugs.llvm.org/show_bug.cgi?id=32791

What about floating point compares? We probably don't have as many folds for them.

IMO, it's just a bonus that we have lots of icmp folds. The main reason we should not care about uses for these folds is that we always reduce the dependency chain by inverting the predicate. If we're missing folds based on inverted predicates, we can add those (and might need to do that first to avoid regressions). If either icmp or fcmp is less efficient than 'not' for some target, then it can invert this in the backend.

Examples:

declare void @bar(i1, i1)

define void @icmp_neg(i8 %a) {
  %cmp = icmp eq i8 %a, 1
  %neg = xor i1 %cmp, -1  <--- must wait for cmp
  call void @bar(i1 %cmp, i1 %neg)
  ret void
}

define void @icmp_inv(i8 %a) {
  %cmp = icmp eq i8 %a, 1
  %neg = icmp ne i8 %a, 1    <--- can execute immediately
  call void @bar(i1 %cmp, i1 %neg)
  ret void
}

define void @fcmp_neg(float %a) {
  %cmp = fcmp oeq float %a, 1.0
  %neg = xor i1 %cmp, -1
  call void @bar(i1 %cmp, i1 %neg)
  ret void
}

define void @fcmp_inv(float %a) {
  %cmp = fcmp oeq float %a, 1.0
  %neg = fcmp une float %a, 1.0
  call void @bar(i1 %cmp, i1 %neg)
  ret void
}
craig.topper abandoned this revision.May 4 2017, 3:00 PM

I forgot that I already filed this bug that will result if we lift the one-use restriction for // xor (cmp A, B), true = not (cmp A, B) = !cmp A, B :
https://bugs.llvm.org/show_bug.cgi?id=32791

The opposite transform was suggested in:
https://bugs.llvm.org/show_bug.cgi?id=27431
Ie, turning an inverted predicate into a 'not' op might already be a recognized pattern by other passes.