This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
7/7
InstCombineCompares.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
icmp-binop.ll
-
pr38677.ll

Differential D140850

[InstCombine] Add optimizations for icmp eq/ne (mul(X, Y), 0)
ClosedPublic

Authored by goldstein.w.n on Jan 2 2023, 11:21 AM.

Download Raw Diff

Details

Reviewers

nikic
majnemer
spatel

Commits

rGbaf7f7e5752c: Recommit "Add optimizations for icmp eq/ne (mul(X, Y), 0)" 2nd Try
rGaa250ceb3ff1: Add optimizations for icmp eq/ne (mul(X, Y), 0)

Summary

Add checks if X and/or Y are odd. The Odd values are unnecessary to the icmp: isZero(Odd * N) == isZero(N)

If neither X nor Y is known odd, then if X * Y cannot overflow AND if X and/or Y is non-zero, the non-zero values are unnecessary to the icmp.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

goldstein.w.n created this revision.Jan 2 2023, 11:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 11:21 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

goldstein.w.n requested review of this revision.Jan 2 2023, 11:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 11:21 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B205367: Diff 485875.Jan 2 2023, 11:22 AM

goldstein.w.n added a parent revision: D140849: [InstCombine] Add tests for binops with conditions/assume constraints.Jan 2 2023, 11:24 AM

goldstein.w.n added a child revision: D140851: [InstCombine]: Add cases for assume (X & Y != {0|Y}).

goldstein.w.n added reviewers: nikic, majnemer.Jan 2 2023, 11:28 AM

goldstein.w.n mentioned this in D140852: [Patch 4/4]: Use cannoical patterns `(A > C1 && A < C2)` and `(A & B != C)` in `isKnownNonZero`.

Note:

I have not verified the transformations in this patch individually, I verified the net changes from D140850, D140851, and D140852 in bulk.

goldstein.w.n mentioned this in D140851: [InstCombine]: Add cases for assume (X & Y != {0|Y}).Jan 2 2023, 11:31 AM

goldstein.w.n mentioned this in D140840: Tests + Improve cases for optimizing out some icmp(binop) patterns (mostly mul).Jan 2 2023, 11:33 AM

nikic added a reviewer: spatel.Jan 2 2023, 1:15 PM

Proofs:

Both odd: https://alive2.llvm.org/ce/z/9qgwMo
One odd: https://alive2.llvm.org/ce/z/vRqUQO
Non-zero nuw: https://alive2.llvm.org/ce/z/3Bqx2-
Non-zero nsw: https://alive2.llvm.org/ce/z/AybG6r

From the test diffs, I don't see any cases where we actually hit the true/false case. Presumably, this get's reliably handled by the "icmp eq isKnownNonZero, 0" fold in InstSimplify. If that's the case, we can omit handling for that.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1314	YExtra -> YOdd if I understand the intention correctly. Edit: Hm, or is the naming like this because it's odd in one case and non-zero in the other?
1319	`cast<OverflowingBinaryOperator>` would always work (including for mul constant expressions).
1323	`llvm::` shouldn't be needed.
1332	Can use `getBool(Cmp.Type(), Pred == ICmpInst::ICMP_NE)` here.

In D140850#4025580, @nikic wrote:

Proofs:

Both odd: https://alive2.llvm.org/ce/z/9qgwMo
One odd: https://alive2.llvm.org/ce/z/vRqUQO
Non-zero nuw: https://alive2.llvm.org/ce/z/3Bqx2-
Non-zero nsw: https://alive2.llvm.org/ce/z/AybG6r

Thanks for writing those.

For future reference, I ran alive2 on the entire test file locally (basically opt -passes=instcombine tests-before.ll -o tests-after.ll; ./alive2 tests-before.ll tests-after.ll Is there a way to get that into godbolt or is the only way to do 1 function at a time with src and tgt?

From the test diffs, I don't see any cases where we actually hit the true/false case. Presumably, this get's reliably handled by the "icmp eq isKnownNonZero, 0" fold in InstSimplify. If that's the case, we can omit handling for that.

We start hitting it in: https://reviews.llvm.org/D140851
See: define i64 @mul_assume_V_oddV_s64_setz(i64 %v, i64 %other)

Before then the missing cases in analysis was getting in the way. Could probably create a case where
it works at this patch. Would you like to add one?

goldstein.w.n added inline comments.Jan 4 2023, 9:18 AM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
1314	YExtra -> YOdd if I understand the intention correctly. Edit: Hm, or is the naming like this because it's odd in one case and non-zero in the other? Yeah it was originally `YOdd` but to avoid having the rewrite the cases for `YNonZero` I renamed to extra. I can add a comment in V2 or do something like: bool YOdd, YNonZero; YNonZero = false; YOdd = XKnown.countMaxTrailingZeros() == 0; ... if(!YOdd && !XOdd) { YNonZero = sKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT); ... } XExtra = YOdd \| YNonZero; ... Let me know what you prefer.
1319	`cast<OverflowingBinaryOperator>` would always work (including for mul constant expressions).

In D140850#4026460, @goldstein.w.n wrote:

In D140850#4025580, @nikic wrote:

Proofs:

Both odd: https://alive2.llvm.org/ce/z/9qgwMo
One odd: https://alive2.llvm.org/ce/z/vRqUQO
Non-zero nuw: https://alive2.llvm.org/ce/z/3Bqx2-
Non-zero nsw: https://alive2.llvm.org/ce/z/AybG6r

Thanks for writing those.

For future reference, I ran alive2 on the entire test file locally (basically opt -passes=instcombine tests-before.ll -o tests-after.ll; ./alive2 tests-before.ll tests-after.ll Is there a way to get that into godbolt or is the only way to do 1 function at a time with src and tgt?

I think it only works with src/tgt. It is also possible to let alive-tv run passes over the input, but of course that requires the changes to already be implemented.

From the test diffs, I don't see any cases where we actually hit the true/false case. Presumably, this get's reliably handled by the "icmp eq isKnownNonZero, 0" fold in InstSimplify. If that's the case, we can omit handling for that.

We start hitting it in: https://reviews.llvm.org/D140851
See: define i64 @mul_assume_V_oddV_s64_setz(i64 %v, i64 %other)

Before then the missing cases in analysis was getting in the way. Could probably create a case where
it works at this patch. Would you like to add one?

It's not really clear to me that that folds only in conjunction with this patch. I suspect that one also folds via InstSimplify, just with strengthened assume handling.

Make {X|Y}Extra more clear. Remove so extra verbose code

goldstein.w.n marked 3 inline comments as done.Jan 4 2023, 4:11 PM

Harbormaster completed remote builds in B205795: Diff 486422.Jan 4 2023, 5:06 PM

spatel mentioned this in D141086: [SDAG] try to avoid multiply for X*Y==0.Jan 5 2023, 1:22 PM

Propegating changes from rebase

Harbormaster completed remote builds in B205990: Diff 486679.Jan 5 2023, 3:48 PM

spatel mentioned this in rGbf82070ea465: [SDAG] try to avoid multiply for X*Y==0.Jan 6 2023, 6:07 AM

goldstein.w.n removed a child revision: D140851: [InstCombine]: Add cases for assume (X & Y != {0|Y}).Jan 10 2023, 10:59 AM

Rebase + update tests

Herald added a subscriber: StephenFan. · View Herald TranscriptJan 11 2023, 12:23 PM

Harbormaster completed remote builds in B207181: Diff 488336.Jan 11 2023, 3:15 PM

Rebase

goldstein.w.n retitled this revision from [Patch 2/4]: Add optimizations for icmp eq/ne (mul(X, Y), 0) to [InstCombine] Add optimizations for icmp eq/ne (mul(X, Y), 0).Jan 17 2023, 11:35 AM

Harbormaster completed remote builds in B208301: Diff 489898.Jan 17 2023, 1:04 PM

goldstein.w.n marked an inline comment as done.Jan 19 2023, 11:00 AM

@nikic ping?

nikic added inline comments.Jan 26 2023, 6:23 AM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

1311

Something of a style nit, but I think it would be more elegant to write the contents of this if as follows:

KnownBits XKnown = computeKnownBits(X, 0, &Cmp);
if (XKnown.countMaxTrailingZeros() != 0)
  return new ICmpInst(Pred, Y, Cmp.getOperand(1));

KnownBits YKnown = computeKnownBits(Y, 0, &Cmp);
if (YKnown.countMaxTrailingZeros() != 0)
  return new ICmpInst(Pred, X, Cmp.getOperand(1));

if (BO0->hasNoUnsignedWrap() || BO0->hasNoSignedWrap()) {
  const SimplifyQuery Q = SQ.getWithInstruction(&Cmp);
  if (isKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT))
    return new ICmpInst(Pred, Y, Cmp.getOperand(1));
  if (isKnownNonZero(Y, DL, 0, Q.AC, Q.CxtI, Q.DT))
    return new ICmpInst(Pred, X, Cmp.getOperand(1));
}

We don't really need to handle the case where both are unneeded explicitly (it will be picked up when simplifying the icmp, though in practice it will even be simplified before it reaches here), and then I think the code is simpler if you just forgo the XUnneeeded etc variables entirely.

Could also move the comments above this whole if to the respective clauses.

goldstein.w.n marked an inline comment as done.Jan 26 2023, 10:58 AM

Rebase + don't both with constant fold + restructure/cleanup

Implementation LGTM

This revision is now accepted and ready to land.Jan 26 2023, 11:51 AM

In D140850#4083702, @nikic wrote:

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

In D140850#4083702, @nikic wrote:

Implementation LGTM

Are the tests okay? Or do they need work?

Harbormaster completed remote builds in B210190: Diff 492523.Jan 26 2023, 1:04 PM

In D140850#4083749, @goldstein.w.n wrote:

In D140850#4083702, @nikic wrote:

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

X odd
Y odd
X non-zero nuw
Y non-zero nsw
(Possibly also Y non-zero nuw, X non-zero nsw for completeness)
(Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)
X non-zero, no nowrap flags (negative test)
icmp constant not zero (negative test)
icmp predicate not equality (negative test)
eq / ne predicates [maybe combined with other tests]
multi-use test (pass mul to call @use()) [maybe combined with other tests]
vector test [maybe combined with other tests]
assume to establish odd / non-zero [maybe combined with other tests]

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

In D140850#4084188, @nikic wrote:

In D140850#4083749, @goldstein.w.n wrote:

In D140850#4083702, @nikic wrote:

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

X odd

Y odd

X non-zero nuw

Y non-zero nsw

(Possibly also Y non-zero nuw, X non-zero nsw for completeness)

(Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)

X non-zero, no nowrap flags (negative test)

icmp constant not zero (negative test)

icmp predicate not equality (negative test)

eq / ne predicates [maybe combined with other tests]

multi-use test (pass mul to call @use()) [maybe combined with other tests]

vector test [maybe combined with other tests]

assume to establish odd / non-zero [maybe combined with other tests]

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

This was, thank you!

I will update both.

In D140850#4084217, @goldstein.w.n wrote:

In D140850#4084188, @nikic wrote:

In D140850#4083749, @goldstein.w.n wrote:

In D140850#4083702, @nikic wrote:

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

X odd

Y odd

X non-zero nuw

Y non-zero nsw

(Possibly also Y non-zero nuw, X non-zero nsw for completeness)

(Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)

X non-zero, no nowrap flags (negative test)

icmp constant not zero (negative test)

icmp predicate not equality (negative test)

eq / ne predicates [maybe combined with other tests]

multi-use test (pass mul to call @use()) [maybe combined with other tests]

vector test [maybe combined with other tests]

assume to establish odd / non-zero [maybe combined with other tests]

The one addition is IMO select/br for establishing zero/nonzero.

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

This was, thank you!

I will update both.

Rebase

In D140850#4084188, @nikic wrote:

In D140850#4083749, @goldstein.w.n wrote:

In D140850#4083702, @nikic wrote:

Implementation LGTM

Thanks, is D140849 also good to push (as a general question, if there is a tests patch + a real patch, does real patch approval imply test patch approval).

Generally speaking, tests can be committed without review. That said...

Are the tests okay? Or do they need work?

I'm still rather unhappy about the tests for this patch. There are both way too many tests, and at the same time not enough of the tests we care about (e.g. there doesn't seem to be a single test using ne, and negative test coverage is pretty much impossible to make out in a 5000 line test file).

Here is what I would expect to be tested in some form:

X odd

Y odd

X non-zero nuw

Y non-zero nsw

(Possibly also Y non-zero nuw, X non-zero nsw for completeness)

(Possibly also X and Y odd / X and Y non-zero, which will already fold before the patch)

X non-zero, no nowrap flags (negative test)

icmp constant not zero (negative test)

icmp predicate not equality (negative test)

eq / ne predicates [maybe combined with other tests]

multi-use test (pass mul to call @use()) [maybe combined with other tests]

vector test [maybe combined with other tests]

assume to establish odd / non-zero [maybe combined with other tests]

Having all of these as orthogonal tests is fine, but some can also be combined. For example, you can have the X odd test use a scalar, and the Y odd test use a vector. You can have one establish oddness via or 1, and the other via an assume. Have some tests use an eq predicate, and some use an ne predicate. What there definitely shouldn't be is a combinatorial explosion of all combinations. We don't need a vector variant of each test, and we don't need to test many different ways of specifying known bits / non-zero for each pattern.

So there should be a total of 8 - 16 (approximately) tests here (depending on how orthogonal the tests are, and how thorough you want to be), each covering some important variation of the pattern. This makes it easy to both see that all relevant tests are present, including negative tests.

It may be worthwhile to take a look at some commits by @spatel / rotateright in the git log of llvm/test/Transforms/InstCombine to get an idea of how InstCombine test coverage usually looks like.

I hope this is helpful. I've tried to be overly detailed here, because the same basic problem also exists in other patches, e.g. D142270. I kind of regret bringing up vector test coverage, because having vector variants of every single test wasn't my intention (let alone for vector assumes, which aren't supported at all).

Okay, removed 1/3 from each test from in the D142270 patches and ~4k lines from this one.

I'll be more reserved in the future. My general default is if in doubt test it, but guess thats not
fair to the reviewer.

Thank you for the help :)

Harbormaster completed remote builds in B210243: Diff 492595.Jan 26 2023, 6:08 PM

Rebase

Harbormaster completed remote builds in B210441: Diff 492856.Jan 27 2023, 1:59 PM

Closed by commit rGaa250ceb3ff1: Add optimizations for icmp eq/ne (mul(X, Y), 0) (authored by goldstein.w.n). · Explain WhyJan 27 2023, 3:50 PM

This revision was automatically updated to reflect the committed changes.

goldstein.w.n added a commit: rGaa250ceb3ff1: Add optimizations for icmp eq/ne (mul(X, Y), 0).

Either this or D141875 cause the failure: https://lab.llvm.org/buildbot/#/builders/183/builds/10447

Going to revert while I look into it.

goldstein.w.n added a reverting change: rGe7e3d723a592: Revert "Add optimizations for icmp eq/ne (mul(X, Y), 0)".Jan 27 2023, 4:45 PM

Bit confused by the failure, https://lab.llvm.org/buildbot/#/builders/183/builds/10448 which included this commit (and D141875) succeeded.

goldstein.w.n added a commit: rGbaf7f7e5752c: Recommit "Add optimizations for icmp eq/ne (mul(X, Y), 0)" 2nd Try.Jan 27 2023, 6:39 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

42 lines

test/

Transforms/

InstCombine/

icmp-binop.ll

27 lines

pr38677.ll

4 lines

Diff 492933

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,289 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldICmpWithZero(ICmpInst &Cmp) {
if (match(Cmp.getOperand(0), m_URem(m_Value(X), m_Value(Y))) &&		if (match(Cmp.getOperand(0), m_URem(m_Value(X), m_Value(Y))) &&
ICmpInst::isEquality(Pred)) {		ICmpInst::isEquality(Pred)) {
KnownBits XKnown = computeKnownBits(X, 0, &Cmp);		KnownBits XKnown = computeKnownBits(X, 0, &Cmp);
KnownBits YKnown = computeKnownBits(Y, 0, &Cmp);		KnownBits YKnown = computeKnownBits(Y, 0, &Cmp);
if (XKnown.countMaxPopulation() == 1 && YKnown.countMinPopulation() >= 2)		if (XKnown.countMaxPopulation() == 1 && YKnown.countMinPopulation() >= 2)
return new ICmpInst(Pred, X, Cmp.getOperand(1));		return new ICmpInst(Pred, X, Cmp.getOperand(1));
}		}

		// (icmp eq/ne (mul X Y)) -> (icmp eq/ne X/Y) if we know about whether X/Y are
		// odd/non-zero/there is no overflow.
		if (match(Cmp.getOperand(0), m_Mul(m_Value(X), m_Value(Y))) &&
		ICmpInst::isEquality(Pred)) {

		KnownBits XKnown = computeKnownBits(X, 0, &Cmp);
		// if X % 2 != 0
		// (icmp eq/ne Y)
		if (XKnown.countMaxTrailingZeros() == 0)
		return new ICmpInst(Pred, Y, Cmp.getOperand(1));

		KnownBits YKnown = computeKnownBits(Y, 0, &Cmp);
		// if Y % 2 != 0
		// (icmp eq/ne X)
		nikicUnsubmitted Done Reply Inline Actions Something of a style nit, but I think it would be more elegant to write the contents of this if as follows: KnownBits XKnown = computeKnownBits(X, 0, &Cmp); if (XKnown.countMaxTrailingZeros() != 0) return new ICmpInst(Pred, Y, Cmp.getOperand(1)); KnownBits YKnown = computeKnownBits(Y, 0, &Cmp); if (YKnown.countMaxTrailingZeros() != 0) return new ICmpInst(Pred, X, Cmp.getOperand(1)); if (BO0->hasNoUnsignedWrap() \|\| BO0->hasNoSignedWrap()) { const SimplifyQuery Q = SQ.getWithInstruction(&Cmp); if (isKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT)) return new ICmpInst(Pred, Y, Cmp.getOperand(1)); if (isKnownNonZero(Y, DL, 0, Q.AC, Q.CxtI, Q.DT)) return new ICmpInst(Pred, X, Cmp.getOperand(1)); } We don't really need to handle the case where both are unneeded explicitly (it will be picked up when simplifying the icmp, though in practice it will even be simplified before it reaches here), and then I think the code is simpler if you just forgo the XUnneeeded etc variables entirely. Could also move the comments above this whole if to the respective clauses. nikic: Something of a style nit, but I think it would be more elegant to write the contents of this if…
		if (YKnown.countMaxTrailingZeros() == 0)
		return new ICmpInst(Pred, X, Cmp.getOperand(1));

		nikicUnsubmitted Done Reply Inline Actions YExtra -> YOdd if I understand the intention correctly. Edit: Hm, or is the naming like this because it's odd in one case and non-zero in the other? nikic: YExtra -> YOdd if I understand the intention correctly. Edit: Hm, or is the naming like this…
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions YExtra -> YOdd if I understand the intention correctly. Edit: Hm, or is the naming like this because it's odd in one case and non-zero in the other? Yeah it was originally `YOdd` but to avoid having the rewrite the cases for `YNonZero` I renamed to extra. I can add a comment in V2 or do something like: bool YOdd, YNonZero; YNonZero = false; YOdd = XKnown.countMaxTrailingZeros() == 0; ... if(!YOdd && !XOdd) { YNonZero = sKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT); ... } XExtra = YOdd \| YNonZero; ... Let me know what you prefer. goldstein.w.n: > YExtra -> YOdd if I understand the intention correctly. > > Edit: Hm, or is the naming like…
		auto *BO0 = cast<OverflowingBinaryOperator>(Cmp.getOperand(0));
		if (BO0->hasNoUnsignedWrap() \|\| BO0->hasNoSignedWrap()) {
		const SimplifyQuery Q = SQ.getWithInstruction(&Cmp);
		// `isKnownNonZero` does more analysis than just `!KnownBits.One.isZero()`
		// but to avoid unnecessary work, first just if this is an obvious case.
		nikicUnsubmitted Done Reply Inline Actions `cast<OverflowingBinaryOperator>` would always work (including for mul constant expressions). nikic: `cast<OverflowingBinaryOperator>` would always work (including for mul constant expressions).
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions `cast<OverflowingBinaryOperator>` would always work (including for mul constant expressions). goldstein.w.n: > `cast<OverflowingBinaryOperator>` would always work (including for mul constant expressions).

		// if X non-zero and NoOverflow(X * Y)
		// (icmp eq/ne Y)
		if (!XKnown.One.isZero() \|\| isKnownNonZero(X, DL, 0, Q.AC, Q.CxtI, Q.DT))
		nikicUnsubmitted Done Reply Inline Actions `llvm::` shouldn't be needed. nikic: `llvm::` shouldn't be needed.
		return new ICmpInst(Pred, Y, Cmp.getOperand(1));

		// if Y non-zero and NoOverflow(X * Y)
		// (icmp eq/ne X)
		if (!YKnown.One.isZero() \|\| isKnownNonZero(Y, DL, 0, Q.AC, Q.CxtI, Q.DT))
		return new ICmpInst(Pred, X, Cmp.getOperand(1));
		}
		// Note, we are skipping cases:
		// if Y % 2 != 0 AND X % 2 != 0
		nikicUnsubmitted Done Reply Inline Actions Can use `getBool(Cmp.Type(), Pred == ICmpInst::ICMP_NE)` here. nikic: Can use `getBool(Cmp.Type(), Pred == ICmpInst::ICMP_NE)` here.
		// (false/true)
		// if X non-zero and Y non-zero and NoOverflow(X * Y)
		// (false/true)
		// Those can be simplified later as we would have already replaced the (icmp
		// eq/ne (mul X, Y)) with (icmp eq/ne X/Y) and if X/Y is known non-zero that
		// will fold to a constant elsewhere.
		}
return nullptr;		return nullptr;
}		}

/// Fold icmp Pred X, C.		/// Fold icmp Pred X, C.
/// TODO: This code structure does not make sense. The saturating add fold		/// TODO: This code structure does not make sense. The saturating add fold
/// should be moved to some other helper and extended as noted below (it is also		/// should be moved to some other helper and extended as noted below (it is also
/// possible that code has been made unnecessary - do we canonicalize IR to		/// possible that code has been made unnecessary - do we canonicalize IR to
/// overflow/saturating intrinsics or not?).		/// overflow/saturating intrinsics or not?).
▲ Show 20 Lines • Show All 5,766 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/icmp-binop.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -passes=instcombine -S \| FileCheck %s		; RUN: opt < %s -passes=instcombine -S \| FileCheck %s

declare void @use64(i64)		declare void @use64(i64)
declare void @llvm.assume(i1)		declare void @llvm.assume(i1)

define i1 @mul_unkV_oddC_eq(i32 %v) {		define i1 @mul_unkV_oddC_eq(i32 %v) {
; CHECK-LABEL: @mul_unkV_oddC_eq(		; CHECK-LABEL: @mul_unkV_oddC_eq(
; CHECK-NEXT: [[MUL:%.]] = mul i32 [[V:%.]], 3		; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[V:%.]], 0
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[MUL]], 0
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%mul = mul i32 %v, 3		%mul = mul i32 %v, 3
%cmp = icmp eq i32 %mul, 0		%cmp = icmp eq i32 %mul, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @mul_unkV_oddC_eq_nonzero(i32 %v) {		define i1 @mul_unkV_oddC_eq_nonzero(i32 %v) {
; CHECK-LABEL: @mul_unkV_oddC_eq_nonzero(		; CHECK-LABEL: @mul_unkV_oddC_eq_nonzero(
; CHECK-NEXT: [[MUL:%.]] = mul i32 [[V:%.]], 3		; CHECK-NEXT: [[MUL:%.]] = mul i32 [[V:%.]], 3
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[MUL]], 4		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i32 [[MUL]], 4
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%mul = mul i32 %v, 3		%mul = mul i32 %v, 3
%cmp = icmp eq i32 %mul, 4		%cmp = icmp eq i32 %mul, 4
ret i1 %cmp		ret i1 %cmp
}		}

define <2 x i1> @mul_unkV_oddC_ne_vec(<2 x i64> %v) {		define <2 x i1> @mul_unkV_oddC_ne_vec(<2 x i64> %v) {
; CHECK-LABEL: @mul_unkV_oddC_ne_vec(		; CHECK-LABEL: @mul_unkV_oddC_ne_vec(
; CHECK-NEXT: [[MUL:%.]] = mul <2 x i64> [[V:%.]], <i64 3, i64 3>		; CHECK-NEXT: [[CMP:%.]] = icmp ne <2 x i64> [[V:%.]], zeroinitializer
; CHECK-NEXT: [[CMP:%.*]] = icmp ne <2 x i64> [[MUL]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[CMP]]		; CHECK-NEXT: ret <2 x i1> [[CMP]]
;		;
%mul = mul <2 x i64> %v, <i64 3, i64 3>		%mul = mul <2 x i64> %v, <i64 3, i64 3>
%cmp = icmp ne <2 x i64> %mul, <i64 0, i64 0>		%cmp = icmp ne <2 x i64> %mul, <i64 0, i64 0>
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}

define i1 @mul_assumeoddV_asumeoddV_eq(i16 %v, i16 %v2) {		define i1 @mul_assumeoddV_asumeoddV_eq(i16 %v, i16 %v2) {
Show All 26 Lines	;
%mul = mul i8 %v, 3		%mul = mul i8 %v, 3
%cmp = icmp sge i8 %mul, 0		%cmp = icmp sge i8 %mul, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @mul_reused_unkV_oddC_ne(i64 %v) {		define i1 @mul_reused_unkV_oddC_ne(i64 %v) {
; CHECK-LABEL: @mul_reused_unkV_oddC_ne(		; CHECK-LABEL: @mul_reused_unkV_oddC_ne(
; CHECK-NEXT: [[MUL:%.]] = mul i64 [[V:%.]], 3		; CHECK-NEXT: [[MUL:%.]] = mul i64 [[V:%.]], 3
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i64 [[MUL]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp ne i64 [[V]], 0
; CHECK-NEXT: call void @use64(i64 [[MUL]])		; CHECK-NEXT: call void @use64(i64 [[MUL]])
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%mul = mul i64 %v, 3		%mul = mul i64 %v, 3
%cmp = icmp ne i64 %mul, 0		%cmp = icmp ne i64 %mul, 0
call void @use64(i64 %mul)		call void @use64(i64 %mul)
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @mul_assumeoddV_unkV_eq(i16 %v, i16 %v2) {		define i1 @mul_assumeoddV_unkV_eq(i16 %v, i16 %v2) {
; CHECK-LABEL: @mul_assumeoddV_unkV_eq(		; CHECK-LABEL: @mul_assumeoddV_unkV_eq(
; CHECK-NEXT: [[LB:%.]] = and i16 [[V2:%.]], 1		; CHECK-NEXT: [[LB:%.]] = and i16 [[V2:%.]], 1
; CHECK-NEXT: [[ODD:%.*]] = icmp ne i16 [[LB]], 0		; CHECK-NEXT: [[ODD:%.*]] = icmp ne i16 [[LB]], 0
; CHECK-NEXT: call void @llvm.assume(i1 [[ODD]])		; CHECK-NEXT: call void @llvm.assume(i1 [[ODD]])
; CHECK-NEXT: [[MUL:%.]] = mul i16 [[V:%.]], [[V2]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i16 [[V:%.]], 0
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i16 [[MUL]], 0
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%lb = and i16 %v2, 1		%lb = and i16 %v2, 1
%odd = icmp eq i16 %lb, 1		%odd = icmp eq i16 %lb, 1
call void @llvm.assume(i1 %odd)		call void @llvm.assume(i1 %odd)
%mul = mul i16 %v, %v2		%mul = mul i16 %v, %v2
%cmp = icmp eq i16 %mul, 0		%cmp = icmp eq i16 %mul, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @mul_reusedassumeoddV_unkV_ne(i64 %v, i64 %v2) {		define i1 @mul_reusedassumeoddV_unkV_ne(i64 %v, i64 %v2) {
; CHECK-LABEL: @mul_reusedassumeoddV_unkV_ne(		; CHECK-LABEL: @mul_reusedassumeoddV_unkV_ne(
; CHECK-NEXT: [[LB:%.]] = and i64 [[V:%.]], 1		; CHECK-NEXT: [[LB:%.]] = and i64 [[V:%.]], 1
; CHECK-NEXT: [[ODD:%.*]] = icmp ne i64 [[LB]], 0		; CHECK-NEXT: [[ODD:%.*]] = icmp ne i64 [[LB]], 0
; CHECK-NEXT: call void @llvm.assume(i1 [[ODD]])		; CHECK-NEXT: call void @llvm.assume(i1 [[ODD]])
; CHECK-NEXT: [[MUL:%.]] = mul i64 [[V]], [[V2:%.]]		; CHECK-NEXT: [[MUL:%.]] = mul i64 [[V]], [[V2:%.]]
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i64 [[MUL]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp ne i64 [[V2]], 0
; CHECK-NEXT: call void @use64(i64 [[MUL]])		; CHECK-NEXT: call void @use64(i64 [[MUL]])
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%lb = and i64 %v, 1		%lb = and i64 %v, 1
%odd = icmp ne i64 %lb, 0		%odd = icmp ne i64 %lb, 0
call void @llvm.assume(i1 %odd)		call void @llvm.assume(i1 %odd)
%mul = mul i64 %v, %v2		%mul = mul i64 %v, %v2
%cmp = icmp ne i64 %mul, 0		%cmp = icmp ne i64 %mul, 0
call void @use64(i64 %mul)		call void @use64(i64 %mul)
ret i1 %cmp		ret i1 %cmp
}		}

define <2 x i1> @mul_setoddV_unkV_ne(<2 x i32> %v1, <2 x i32> %v2) {		define <2 x i1> @mul_setoddV_unkV_ne(<2 x i32> %v1, <2 x i32> %v2) {
; CHECK-LABEL: @mul_setoddV_unkV_ne(		; CHECK-LABEL: @mul_setoddV_unkV_ne(
; CHECK-NEXT: [[V:%.]] = or <2 x i32> [[V1:%.]], <i32 1, i32 1>		; CHECK-NEXT: [[CMP:%.]] = icmp ne <2 x i32> [[V2:%.]], zeroinitializer
; CHECK-NEXT: [[MUL:%.]] = mul <2 x i32> [[V]], [[V2:%.]]
; CHECK-NEXT: [[CMP:%.*]] = icmp ne <2 x i32> [[MUL]], zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[CMP]]		; CHECK-NEXT: ret <2 x i1> [[CMP]]
;		;
%v = or <2 x i32> %v1, <i32 1, i32 1>		%v = or <2 x i32> %v1, <i32 1, i32 1>
%mul = mul <2 x i32> %v, %v2		%mul = mul <2 x i32> %v, %v2
%cmp = icmp ne <2 x i32> %mul, <i32 0, i32 0>		%cmp = icmp ne <2 x i32> %mul, <i32 0, i32 0>
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}

▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	;
%cmp = icmp eq i64 %mul, 0		%cmp = icmp eq i64 %mul, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @mul_assumenzV_unkV_nsw_ne(i32 %v, i32 %v2) {		define i1 @mul_assumenzV_unkV_nsw_ne(i32 %v, i32 %v2) {
; CHECK-LABEL: @mul_assumenzV_unkV_nsw_ne(		; CHECK-LABEL: @mul_assumenzV_unkV_nsw_ne(
; CHECK-NEXT: [[NZ:%.]] = icmp ne i32 [[V:%.]], 0		; CHECK-NEXT: [[NZ:%.]] = icmp ne i32 [[V:%.]], 0
; CHECK-NEXT: call void @llvm.assume(i1 [[NZ]])		; CHECK-NEXT: call void @llvm.assume(i1 [[NZ]])
; CHECK-NEXT: [[MUL:%.]] = mul nsw i32 [[V]], [[V2:%.]]		; CHECK-NEXT: [[CMP:%.]] = icmp ne i32 [[V2:%.]], 0
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[MUL]], 0
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%nz = icmp ne i32 %v, 0		%nz = icmp ne i32 %v, 0
call void @llvm.assume(i1 %nz)		call void @llvm.assume(i1 %nz)
%mul = mul nsw i32 %v, %v2		%mul = mul nsw i32 %v, %v2
%cmp = icmp ne i32 %mul, 0		%cmp = icmp ne i32 %mul, 0
ret i1 %cmp		ret i1 %cmp
}		}
Show All 21 Lines
;		;
%mul = mul nuw nsw <2 x i16> %v, %v2		%mul = mul nuw nsw <2 x i16> %v, %v2
%cmp = icmp ne <2 x i16> %mul, <i16 0, i16 0>		%cmp = icmp ne <2 x i16> %mul, <i16 0, i16 0>
ret <2 x i1> %cmp		ret <2 x i1> %cmp
}		}

define i1 @mul_setnzV_unkV_nuw_eq(i8 %v1, i8 %v2) {		define i1 @mul_setnzV_unkV_nuw_eq(i8 %v1, i8 %v2) {
; CHECK-LABEL: @mul_setnzV_unkV_nuw_eq(		; CHECK-LABEL: @mul_setnzV_unkV_nuw_eq(
; CHECK-NEXT: [[V:%.]] = or i8 [[V1:%.]], 2		; CHECK-NEXT: [[CMP:%.]] = icmp eq i8 [[V2:%.]], 0
; CHECK-NEXT: [[MUL:%.]] = mul nuw i8 [[V]], [[V2:%.]]
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[MUL]], 0
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%v = or i8 %v1, 2		%v = or i8 %v1, 2
%mul = mul nuw i8 %v, %v2		%mul = mul nuw i8 %v, %v2
%cmp = icmp eq i8 %mul, 0		%cmp = icmp eq i8 %mul, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @mul_brnzV_unkV_nuw_eq(i64 %v, i64 %v2) {		define i1 @mul_brnzV_unkV_nuw_eq(i64 %v, i64 %v2) {
; CHECK-LABEL: @mul_brnzV_unkV_nuw_eq(		; CHECK-LABEL: @mul_brnzV_unkV_nuw_eq(
; CHECK-NEXT: [[NZ_NOT:%.]] = icmp eq i64 [[V2:%.]], 0		; CHECK-NEXT: [[NZ_NOT:%.]] = icmp eq i64 [[V2:%.]], 0
; CHECK-NEXT: br i1 [[NZ_NOT]], label [[FALSE:%.]], label [[TRUE:%.]]		; CHECK-NEXT: br i1 [[NZ_NOT]], label [[FALSE:%.]], label [[TRUE:%.]]
; CHECK: true:		; CHECK: true:
; CHECK-NEXT: [[MUL:%.]] = mul nuw i64 [[V:%.]], [[V2]]		; CHECK-NEXT: [[CMP:%.]] = icmp eq i64 [[V:%.]], 0
; CHECK-NEXT: [[CMP:%.*]] = icmp eq i64 [[MUL]], 0
; CHECK-NEXT: ret i1 [[CMP]]		; CHECK-NEXT: ret i1 [[CMP]]
; CHECK: false:		; CHECK: false:
; CHECK-NEXT: call void @use64(i64 [[V]])		; CHECK-NEXT: call void @use64(i64 [[V]])
; CHECK-NEXT: ret i1 false		; CHECK-NEXT: ret i1 false
;		;
%nz = icmp ne i64 %v2, 0		%nz = icmp ne i64 %v2, 0
br i1 %nz, label %true, label %false		br i1 %nz, label %true, label %false
true:		true:
%mul = mul nuw i64 %v, %v2		%mul = mul nuw i64 %v, %v2
%cmp = icmp eq i64 %mul, 0		%cmp = icmp eq i64 %mul, 0
ret i1 %cmp		ret i1 %cmp
false:		false:
call void @use64(i64 %v)		call void @use64(i64 %v)
ret i1 false		ret i1 false
}		}

llvm/test/Transforms/InstCombine/pr38677.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	;RUN: opt -passes=instcombine -S %s \| FileCheck %s			;RUN: opt -passes=instcombine -S %s \| FileCheck %s

	@A = extern_weak global i32, align 4			@A = extern_weak global i32, align 4
	@B = extern_weak global i32, align 4			@B = extern_weak global i32, align 4

	define i32 @foo(i1 %which, ptr %dst) {			define i32 @foo(i1 %which, ptr %dst) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 true, label [[FINAL:%.]], label [[DELAY:%.]]			; CHECK-NEXT: br i1 true, label [[FINAL:%.]], label [[DELAY:%.]]
	; CHECK: delay:			; CHECK: delay:
	; CHECK-NEXT: br label [[FINAL]]			; CHECK-NEXT: br label [[FINAL]]
	; CHECK: final:			; CHECK: final:
	; CHECK-NEXT: [[USE2:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ select (i1 icmp eq (ptr @A, ptr @B), i32 2, i32 1), [[DELAY]] ]			; CHECK-NEXT: [[USE2:%.]] = phi i32 [ 1, [[ENTRY:%.]] ], [ select (i1 icmp eq (ptr @A, ptr @B), i32 2, i32 1), [[DELAY]] ]
	; CHECK-NEXT: [[B7:%.*]] = mul i32 [[USE2]], 2147483647			; CHECK-NEXT: store i1 false, ptr [[DST:%.*]], align 1
	; CHECK-NEXT: [[C3:%.*]] = icmp eq i32 [[B7]], 0
	; CHECK-NEXT: store i1 [[C3]], ptr [[DST:%.*]], align 1
	; CHECK-NEXT: ret i32 [[USE2]]			; CHECK-NEXT: ret i32 [[USE2]]
	;			;
	entry:			entry:
	br i1 true, label %final, label %delay			br i1 true, label %final, label %delay

	delay: ; preds = %entry			delay: ; preds = %entry
	br label %final			br label %final

	final: ; preds = %delay, %entry			final: ; preds = %delay, %entry
	%use2 = phi i1 [ false, %entry ], [ icmp eq (ptr @A, ptr @B), %delay ]			%use2 = phi i1 [ false, %entry ], [ icmp eq (ptr @A, ptr @B), %delay ]
	%value = select i1 %use2, i32 2, i32 1			%value = select i1 %use2, i32 2, i32 1
	%B7 = mul i32 %value, 2147483647			%B7 = mul i32 %value, 2147483647
	%C3 = icmp ule i32 %B7, 0			%C3 = icmp ule i32 %B7, 0
	store i1 %C3, ptr %dst			store i1 %C3, ptr %dst
	ret i32 %value			ret i32 %value
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add optimizations for icmp eq/ne (mul(X, Y), 0)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 492933

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/test/Transforms/InstCombine/icmp-binop.ll

llvm/test/Transforms/InstCombine/pr38677.ll

[InstCombine] Add optimizations for icmp eq/ne (mul(X, Y), 0)
ClosedPublic