This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add optimizations for icmp eq/ne (mul(X, Y), 0)
ClosedPublic

Authored by goldstein.w.n on Jan 2 2023, 11:21 AM.

Details

Summary
  1. Add checks if X and/or Y are odd. The Odd values are unnecessary to the icmp: isZero(Odd * N) == isZero(N)
  1. If neither X nor Y is known odd, then if X * Y cannot overflow AND if X and/or Y is non-zero, the non-zero values are unnecessary to the icmp.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jan 2 2023, 11:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 11:21 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
goldstein.w.n requested review of this revision.Jan 2 2023, 11:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 11:21 AM

Note:

I have not verified the transformations in this patch individually, I verified the net changes from D140850, D140851, and D140852 in bulk.

nikic added a reviewer: spatel.Jan 2 2023, 1:15 PM
nikic added a comment.Jan 4 2023, 3:59 AM

Proofs:

Both odd: https://alive2.llvm.org/ce/z/9qgwMo
One odd: https://alive2.llvm.org/ce/z/vRqUQO
Non-zero nuw: https://alive2.llvm.org/ce/z/3Bqx2-
Non-zero nsw: https://alive2.llvm.org/ce/z/AybG6r

From the test diffs, I don't see any cases where we actually hit the true/false case. Presumably, this get's reliably handled by the "icmp eq isKnownNonZero, 0" fold in InstSimplify. If that's the case, we can omit handling for that.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1331

YExtra -> YOdd if I understand the intention correctly.

Edit: Hm, or is the naming like this because it's odd in one case and non-zero in the other?

1336

cast<OverflowingBinaryOperator> would always work (including for mul constant expressions).

1340

llvm:: shouldn't be needed.

1349

Can use getBool(Cmp.Type(), Pred == ICmpInst::ICMP_NE) here.

goldstein.w.n added a comment.EditedJan 4 2023, 9:09 AM

Thanks for writing those.

For future reference, I ran alive2 on the entire test file locally (basically opt -passes=instcombine tests-before.ll -o tests-after.ll; ./alive2 tests-before.ll tests-after.ll Is there a way to get that into godbolt or is the only way to do 1 function at a time with src and tgt?

From the test diffs, I don't see any cases where we actually hit the true/false case. Presumably, this get's reliably handled by the "icmp eq isKnownNonZero, 0" fold in InstSimplify. If that's the case, we can omit handling for that.

We start hitting it in: https://reviews.llvm.org/D140851
See: define i64 @mul_assume_V_oddV_s64_setz(i64 %v, i64 %other)

Before then the missing cases in analysis was getting in the way. Could probably create a case where
it works at this patch. Would you like to add one?

goldstein.w.n added inline comments.Jan 4 2023, 9:18 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1331

YExtra -> YOdd if I understand the intention correctly.

Edit: Hm, or is the naming like this because it's odd in one case and non-zero in the other?

Yeah it was originally YOdd but to avoid having the rewrite the cases for YNonZero I renamed to extra. I can add a comment in V2 or do something like:

bool YOdd, YNonZero;
YNonZero = false;
YOdd = XKnown.countMaxTrailingZeros() == 0;
...
if(!YOdd && !XOdd) {
YNonZero = sKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT);
...
}
XExtra = YOdd | YNonZero;
...

Let me know what you prefer.

1336

cast<OverflowingBinaryOperator> would always work (including for mul constant expressions).

nikic added a comment.Jan 4 2023, 9:43 AM

Thanks for writing those.

For future reference, I ran alive2 on the entire test file locally (basically opt -passes=instcombine tests-before.ll -o tests-after.ll; ./alive2 tests-before.ll tests-after.ll Is there a way to get that into godbolt or is the only way to do 1 function at a time with src and tgt?

I think it only works with src/tgt. It is also possible to let alive-tv run passes over the input, but of course that requires the changes to already be implemented.

From the test diffs, I don't see any cases where we actually hit the true/false case. Presumably, this get's reliably handled by the "icmp eq isKnownNonZero, 0" fold in InstSimplify. If that's the case, we can omit handling for that.

We start hitting it in: https://reviews.llvm.org/D140851
See: define i64 @mul_assume_V_oddV_s64_setz(i64 %v, i64 %other)

Before then the missing cases in analysis was getting in the way. Could probably create a case where
it works at this patch. Would you like to add one?

It's not really clear to me that that folds only in conjunction with this patch. I suspect that one also folds via InstSimplify, just with strengthened assume handling.

Make {X|Y}Extra more clear. Remove so extra verbose code

goldstein.w.n marked 3 inline comments as done.Jan 4 2023, 4:11 PM

Propegating changes from rebase

Rebase + update tests

goldstein.w.n retitled this revision from [Patch 2/4]: Add optimizations for icmp eq/ne (mul(X, Y), 0) to [InstCombine] Add optimizations for icmp eq/ne (mul(X, Y), 0).Jan 17 2023, 11:35 AM
goldstein.w.n marked an inline comment as done.Jan 19 2023, 11:00 AM
nikic added inline comments.Jan 26 2023, 6:23 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1311

Something of a style nit, but I think it would be more elegant to write the contents of this if as follows:

KnownBits XKnown = computeKnownBits(X, 0, &Cmp);
if (XKnown.countMaxTrailingZeros() != 0)
  return new ICmpInst(Pred, Y, Cmp.getOperand(1));

KnownBits YKnown = computeKnownBits(Y, 0, &Cmp);
if (YKnown.countMaxTrailingZeros() != 0)
  return new ICmpInst(Pred, X, Cmp.getOperand(1));

if (BO0->hasNoUnsignedWrap() || BO0->hasNoSignedWrap()) {
  const SimplifyQuery Q = SQ.getWithInstruction(&Cmp);
  if (isKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT))
    return new ICmpInst(Pred, Y, Cmp.getOperand(1));
  if (isKnownNonZero(Y, DL, 0, Q.AC, Q.CxtI, Q.DT))
    return new ICmpInst(Pred, X, Cmp.getOperand(1));
}

We don't really need to handle the case where both are unneeded explicitly (it will be picked up when simplifying the icmp, though in practice it will even be simplified before it reaches here), and then I think the code is simpler if you just forgo the XUnneeeded etc variables entirely.

Could also move the comments above this whole if to the respective clauses.

goldstein.w.n marked an inline comment as done.Jan 26 2023, 10:58 AM

Rebase + don't both with constant fold + restructure/cleanup

nikic accepted this revision.Jan 26 2023, 11:51 AM

Implementation LGTM

This revision is now accepted and ready to land.Jan 26 2023, 11:51 AM

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Implementation LGTM

Are the tests okay? Or do they need work?

nikic added a comment.Jan 26 2023, 3:08 PM

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

  • X odd
  • Y odd
  • X non-zero nuw
  • Y non-zero nsw
  • (Possibly also Y non-zero nuw, X non-zero nsw for completeness)
  • (Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)
  • X non-zero, no nowrap flags (negative test)
  • icmp constant not zero (negative test)
  • icmp predicate not equality (negative test)
  • eq / ne predicates [maybe combined with other tests]
  • multi-use test (pass mul to call @use()) [maybe combined with other tests]
  • vector test [maybe combined with other tests]
  • assume to establish odd / non-zero [maybe combined with other tests]

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

  • X odd
  • Y odd
  • X non-zero nuw
  • Y non-zero nsw
  • (Possibly also Y non-zero nuw, X non-zero nsw for completeness)
  • (Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)
  • X non-zero, no nowrap flags (negative test)
  • icmp constant not zero (negative test)
  • icmp predicate not equality (negative test)
  • eq / ne predicates [maybe combined with other tests]
  • multi-use test (pass mul to call @use()) [maybe combined with other tests]
  • vector test [maybe combined with other tests]
  • assume to establish odd / non-zero [maybe combined with other tests]

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

This was, thank you!

I will update both.

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

  • X odd
  • Y odd
  • X non-zero nuw
  • Y non-zero nsw
  • (Possibly also Y non-zero nuw, X non-zero nsw for completeness)
  • (Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)
  • X non-zero, no nowrap flags (negative test)
  • icmp constant not zero (negative test)
  • icmp predicate not equality (negative test)
  • eq / ne predicates [maybe combined with other tests]
  • multi-use test (pass mul to call @use()) [maybe combined with other tests]
  • vector test [maybe combined with other tests]
  • assume to establish odd / non-zero [maybe combined with other tests]

The one addition is IMO select/br for establishing zero/nonzero.

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

This was, thank you!

I will update both.

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

  • X odd
  • Y odd
  • X non-zero nuw
  • Y non-zero nsw
  • (Possibly also Y non-zero nuw, X non-zero nsw for completeness)
  • (Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)
  • X non-zero, no nowrap flags (negative test)
  • icmp constant not zero (negative test)
  • icmp predicate not equality (negative test)
  • eq / ne predicates [maybe combined with other tests]
  • multi-use test (pass mul to call @use()) [maybe combined with other tests]
  • vector test [maybe combined with other tests]
  • assume to establish odd / non-zero [maybe combined with other tests]

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

Okay, removed 1/3 from each test from in the D142270 patches and ~4k lines from this one.

I'll be more reserved in the future. My general default is if in doubt test it, but guess thats not
fair to the reviewer.

Thank you for the help :)

This revision was automatically updated to reflect the committed changes.

Either this or D141875 cause the failure: https://lab.llvm.org/buildbot/#/builders/183/builds/10447

Going to revert while I look into it.

Bit confused by the failure, https://lab.llvm.org/buildbot/#/builders/183/builds/10448 which included this commit (and D141875) succeeded.