This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
24/24
InstCombineAndOrXor.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
8/8
and-or-icmps.ll

Differential D154791

[InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold.
ClosedPublic

Authored by XChy on Jul 9 2023, 7:32 AM.

Download Raw Diff

Details

Reviewers

nikic
goldstein.w.n
k-arrows
spatel

Commits

rG8a0b2ca8217f: [InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0…

Summary

This extends foldCastedBitwiseLogic to handle the similar cases.
I have recently submitted a patch to implement a single fold like:

(A > 0) | (A < 0) -> zext (A != 0)

But it is not general enough, and some problems like a < b & a >= b - 1 happen again.

So I generalize this fold by matching the pattern bitwise(A >> C - 1, zext(icmp)), and replace A >> C - 1 with zext(A < 0) here.
(C is the scalar size bits of the type of A)
Then we get bitwise(zext(A < 0), zext(icmp)), this will be folded by original code in foldCastedBitwiseLogic, into zext(bitwise(A < 0, icmp)).
And finally, any related icmp fold will be automatically implemented because bitwise(icmp,icmp) had been implemented.

The proof of the correctness is obvious, because the folds below were previously proved and implemented.
A >> C - 1 -> zext(A < 0)
bitwise(zext(A), zext(B)) -> zext(bitwise(A, B))
And the fold of this patch is the combination of folds above.

Related issue:
a < b | a >b
a < b & a >= b - 1
Related patch:
D154126

Diff Detail

Event Timeline

XChy created this revision.Jul 9 2023, 7:32 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald TranscriptJul 9 2023, 7:32 AM

XChy requested review of this revision.Jul 9 2023, 7:32 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 9 2023, 7:32 AM

Harbormaster completed remote builds in B243984: Diff 538433.Jul 9 2023, 7:33 AM

The basic idea here is reasonable, but you need to be very careful about infinite loops: If you replace the shift with zext+icmp and it does *not* get folded afterwards, it will be converted back to the shift, and so on. I don't think the fold is guaranteed to happen, e.g. due to some unlucky interaction with shouldOptimizeCast().

I would recommend to instead directly produce the zext(binop(icmp, icmp)) sequence, rather than letting the following fold handle it.

Please add:

Multi-use test.
Test where we do not get any beneficial fold out of converting the lshr back into an icmp.

Depending on how the latter case looks like, we might want to further limit this -- e.g. does it make sense to do this if the lshr and icmp work on different variables or not?

Throughout the comments/summary/title can you replace X with C to indicate its a constant. Also in a few places you forget the -1 in its description as sizeof_bits(A) - 1.

goldstein.w.n added inline comments.Jul 9 2023, 11:58 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1723	`lshr`? The comments are all `shl`. Can you clarify one of them (looks like comments/summary is wrong).

goldstein.w.n added inline comments.Jul 9 2023, 12:00 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1718	'and/or' -> 'bitwise' no?
1726	Is there a reason you create `IsMatched` as opposed to just embedding the `match(...)` logic in the if statement?

XChy added a parent revision: D154789: [InstCombine] Add tests for bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A<0, icmp)) fold (NFC).Jul 9 2023, 5:05 PM

XChy retitled this revision from [InstCombine] Transform bitwise (A << X, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold. to [InstCombine] Transform bitwise (A << C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold..

XChy edited the summary of this revision. (Show Details)

XChy retitled this revision from [InstCombine] Transform bitwise (A << C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold. to [InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold..Jul 9 2023, 5:26 PM

XChy edited the summary of this revision. (Show Details)

In D154791#4483474, @nikic wrote:

The basic idea here is reasonable, but you need to be very careful about infinite loops: If you replace the shift with zext+icmp and it does *not* get folded afterwards, it will be converted back to the shift, and so on. I don't think the fold is guaranteed to happen, e.g. due to some unlucky interaction with shouldOptimizeCast().

I would recommend to instead directly produce the zext(binop(icmp, icmp)) sequence, rather than letting the following fold handle it.

Please add:

Multi-use test.

Test where we do not get any beneficial fold out of converting the lshr back into an icmp.

Depending on how the latter case looks like, we might want to further limit this -- e.g. does it make sense to do this if the lshr and icmp work on different variables or not?

Thanks for your review! I agree with you. Actually, I came across infinite loops during developing and seem to solve it by letting the following fold handle IR. But it's just solved with few tests.

My original purpose is to use foldAndOrOfICmps to fold icmps as fold and/or( A < 0, icmp). From my perspective, if foldAndOrOfICmps doesn't fold the transformed IR, this fold should not happend.

I will add related tests here and try to find out where we do not get any beneficial fold later.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1718	Maybe `and/or` is right, since foldAndOrOfICmps do not fold `xor`. I'm not sure what `xor(icmp,icmp)` is folded into.
1723	You're right. I mess it up in last patch too. I'll fix it.
1726	No, it's my personal habit to set a boolean variable when expression is too long. If needed, I can remove it.

XChy updated this revision to Diff 538616.Jul 10 2023, 6:44 AM

Harbormaster completed remote builds in B244124: Diff 538616.Jul 10 2023, 7:28 AM

goldstein.w.n added inline comments.Jul 10 2023, 9:09 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1718	In that case, you need to only match `and/or` and probably should mention that in the summary.
1726	Its fine.

This is still missing multi-use tests. We'll need some m_OneUse guards to prevent unprofitable transforms.

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1715–1716	X -> BW
1716–1731
1734	Needs clang-format.
llvm/test/Transforms/InstCombine/and-or-icmps.ll
2794	The main multi-use test I'm looking for is one where the resulting binop does not get folded. That's the case where your current code will increase instructions, I believe.

XChy updated this revision to Diff 539026.Jul 11 2023, 4:48 AM

XChy marked 9 inline comments as done.

XChy edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B244422: Diff 539026.Jul 11 2023, 6:16 AM

goldstein.w.n added inline comments.Jul 11 2023, 9:47 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1715–1716	comment incorrect shift.
1725	Does `Op1` need to be an `icmp`, or can it just be any `i1`?
1726	nit:: LogicOp != Instruction::Xor should go before the `match(...)` imo.

XChy updated this revision to Diff 539399.Jul 11 2023, 11:27 PM

XChy marked 2 inline comments as done.

XChy edited the summary of this revision. (Show Details)

XChy set the repository for this revision to rG LLVM Github Monorepo.

Harbormaster completed remote builds in B244679: Diff 539399.Jul 12 2023, 1:46 AM

[clang-format]

Harbormaster completed remote builds in B244744: Diff 539492.Jul 12 2023, 6:56 AM

XChy added inline comments.Jul 12 2023, 8:45 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1718	I noticed foldXorOfICmps fold just now. Maybe I can add some related tests here.
1725	When folding `A >> BW - 1 -> A < 0`, there are many possible folds for `(A < 0) bitwise (icmp)`. However, if replacing `icmp` with an arbitrary i1, it seldom folds and just produces this single fold `A >> BW - 1 -> A < 0`, which is inefficient.

goldstein.w.n added inline comments.Jul 12 2023, 3:11 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1725	I see.
1726	It seems in addressing this you just remove the LogicOp != Xor ingeneral...

I update the diff to ensure that we fold exactly what can be folded by foldXorOfICmps and foldAndOrOfICmps.

However, I'm not sure what unexpected result the code below may bring.

// remove the deferred 2 instructions : 
// icmp slt A, 0
// bitwise (A < 0, icmp) 
// otherwise there will be infinite loops of combining
Worklist.popDeferred()->eraseFromParent();
Worklist.popDeferred()->eraseFromParent();

In D154791#4497148, @XChy wrote:
I update the diff to ensure that we fold exactly what can be folded by foldXorOfICmps and foldAndOrOfICmps.

However, I'm not sure what unexpected result the code below may bring.
// remove the deferred 2 instructions : 
// icmp slt A, 0
// bitwise (A < 0, icmp) 
// otherwise there will be infinite loops of combining
Worklist.popDeferred()->eraseFromParent();
Worklist.popDeferred()->eraseFromParent();

Actually, I copy and edit the code to avoid infinite loops from:

Instruction *eraseInstFromFunction(Instruction &I) override {
  LLVM_DEBUG(dbgs() << "IC: ERASE " << I << '\n');
  assert(I.use_empty() && "Cannot erase instruction that is used!");
  salvageDebugInfo(I);

  // Make sure that we reprocess all operands now that we reduced their
  // use counts.
  SmallVector<Value *> Ops(I.operands());
  Worklist.remove(&I);
  I.eraseFromParent();
  for (Value *Op : Ops)
    Worklist.handleUseCountDecrement(Op);
  MadeIRChange = true;
  return nullptr; // Don't do anything with FI
}

I omit MadeIRChange = true; to avoid the infinite loops, which are caused by MadeIRChange with the same instructions deferred and erased(Actually, IR do not change).

Harbormaster completed remote builds in B245077: Diff 539971.Jul 13 2023, 7:13 AM

goldstein.w.n added inline comments.Jul 13 2023, 3:53 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1738	@nikic, is this right?
llvm/test/Transforms/InstCombine/and-or-icmps.ll
2794	Since you're now supporting xor, you need to add some tests for it.

Add xor tests

Harbormaster completed remote builds in B245386: Diff 540411.Jul 14 2023, 8:01 AM

XChy marked an inline comment as done.Jul 15 2023, 6:32 AM

Limit the fold for multiuse cases

Harbormaster completed remote builds in B245679: Diff 540804.Jul 16 2023, 7:50 AM

ping.

goldstein.w.n added inline comments.Jul 16 2023, 11:06 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736	I'm mostly okay with the change, but a little unhappy about this. It seems like a worrisome practice. I guess it works, but it would be nice if there where a better way to accomplish it. Generally I'd argue the best solution would be to refactor `fold<BinOp>OfICmp` to take the components rather than the final instructions, but those are both fairly complicated and propagate the instructions to other functions that would then need to be refactored. All in all more work than its worth. I'm going to defer my opinion to @nikic about whether this is okay. Other than my concern here, I'm basically ready to approve.

XChy added inline comments.Jul 16 2023, 8:26 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736	Thanks for your comment! I agree with you. It's not a good way to erase some deferred instructions in a fold function. It's better to let `InstCombinerImpl::run()` control the logic of instruction erasion. But there aren't other ways to determine whether some instructions can be folded by `fold<BinOp>OfICmp`, unless we just call it with the instructions built and deferred or we can extract a canFold<BinOp>OfICmp function to take the components. The latter costs too much, since it may require refactoring all sub folds. For that reason, I applied the former but I didn't find any similar situations in InstCombine. I just try to copy some of the logic of instructions erasion to solve the problem. I'll search for more similar cases to get a better solution if possible. I'll highly appreciate it if you could give me some other advice.

goldstein.w.n added inline comments.Jul 17 2023, 9:14 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736	Unfortunately I don't really have a better idea than what you have here, but want to here nikic's opinion

nikic added inline comments.Jul 18 2023, 8:17 AM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736	We shouldn't do this kind of speculative transform. Either you can always do the transform based on a reasonably close heuristic (like zext and icmp on the same value) or change APIs in a way that allows doing the transform without materializing the icmp (I think this is not worth the trouble). My 2c here is that it would be okay to convert to the zext(bitwise(icmp, icmp)) form even if it doesn't always fold, as this seems like the better representation at the IR level to me. Even if it doesn't fold in InstCombine, this is the form that is more likely to be usefully optimized by other passes. If we really care, we can undo this in the backend.

Produce zext(bitwise(icmp.icmp))

Harbormaster completed remote builds in B246410: Diff 541812.Jul 19 2023, 12:22 AM

XChy added inline comments.Jul 20 2023, 9:15 PM

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1736	I see. I have reverted to the version producing `zext(bitwise(icmp.icmp))`

ping.

Looks mostly fine, but I think the one-use check isn't quite correct.

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2669	Not _fail?
2796	Creating more instructions here. You should probably always require one-use on the lshr rather than one-use on one of the operands.

Apply m_OneUse guard to lshr.
Fix nits

XChy marked an inline comment as done.Jul 23 2023, 8:35 PM

XChy added inline comments.Jul 23 2023, 8:45 PM

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2796	If replacing `icmp eq 100` with other icmp that can be optimized along with `icmp slt 0`, there will be a better IR, except just one extra instruction. Is there some principles that determine whether a one-use guard is necessary or whether a fold is too agressive/conservative? Maybe I can apply them to similar situation in future contributions.

Harbormaster completed remote builds in B247549: Diff 543357.Jul 23 2023, 11:09 PM

nikic added inline comments.Jul 24 2023, 3:16 AM

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2796	Based on this we should have m_OneUse on the zext(icmp) as well.
2796	For multi-use we want to avoid instruction increase in the worst case.

Add m_OneUse guard to zext(icmp).

llvm/test/Transforms/InstCombine/and-or-icmps.ll
2796	OK. Thanks for suggestion.

LGTM

This revision is now accepted and ready to land.Jul 24 2023, 3:28 AM

In D154791#4527350, @nikic wrote:

LGTM

I don't have commit access, can you please land this for me? Please use 'XChy xxs_chy@outlook.com' for the commit.

This revision was landed with ongoing or failed builds.Jul 24 2023, 4:07 AM

Closed by commit rG8a0b2ca8217f: [InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0… (authored by XChy, committed by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG8a0b2ca8217f: [InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0….

Harbormaster completed remote builds in B247605: Diff 543448.Jul 24 2023, 4:51 AM

goldstein.w.n mentioned this in D159327: [InstCombine] Modify all folds of `(and/or (cmp0), (cmp1))` to not quite a completed instruction; NFC.Aug 31 2023, 6:29 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

33 lines

test/

Transforms/

InstCombine/

and-or-icmps.ll

66 lines

Diff 538433

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 1,706 Lines • ▼ Show 20 Lines

} }

/// Fold {and,or,xor} (cast X), Y. /// Fold {and,or,xor} (cast X), Y.

Instruction *InstCombinerImpl::foldCastedBitwiseLogic(BinaryOperator &I) { Instruction *InstCombinerImpl::foldCastedBitwiseLogic(BinaryOperator &I) {

auto LogicOpc = I.getOpcode(); auto LogicOpc = I.getOpcode();

assert(I.isBitwiseLogicOp() && "Unexpected opcode for bitwise logic folding"); assert(I.isBitwiseLogicOp() && "Unexpected opcode for bitwise logic folding");

Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1); Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1);

// ( A << (X - 1) ) | ((A > 0) zext to iX) // fold and/or(A << X - 1, zext(icmp)) (X is the scalar bits of the type of A)

nikicUnsubmitted

Done

X -> BW

nikic: X -> BW

goldstein.w.nUnsubmitted

Done

comment incorrect shift.

goldstein.w.n: comment incorrect shift.

// <=> A < 0 | A > 0 // -> and/or(zext(A < 0), zext(icmp))

// <=> (A != 0) zext to iX // -> zext(and/or(A < 0, icmp))

goldstein.w.nUnsubmitted

Done

'and/or' -> 'bitwise' no?

goldstein.w.n: 'and/or' -> 'bitwise' no?

XChyAuthorUnsubmitted

Done

Maybe and/or is right, since foldAndOrOfICmps do not fold xor.
I'm not sure what xor(icmp,icmp) is folded into.

XChy: Maybe `and/or` is right, since **foldAndOrOfICmps** do not fold `xor`. I'm not sure what `xor…

goldstein.w.nUnsubmitted

Done

In that case, you need to only match and/or and probably should mention that in the summary.

goldstein.w.n: In that case, you need to only match `and/or` and probably should mention that in the summary.

XChyAuthorUnsubmitted

Done

I noticed foldXorOfICmps fold just now. Maybe I can add some related tests here.

XChy: I noticed **foldXorOfICmps** fold just now. Maybe I can add some related tests here.

Value *A; auto MatchBitwiseICmpZeroWithICmp = [&](Value *&Op0, Value *Op1) {

ICmpInst::Predicate Pred; ICmpInst::Predicate Pred;

Value *A;

auto MatchOrZExtICmp = [&](Value *Op0, Value *Op1) -> bool { bool IsMatched =

return match(Op0, m_LShr(m_Value(A), m_SpecificInt(Op0->getType()->getScalarSizeInBits() - 1))) && match(Op0, m_LShr(m_Value(A),

goldstein.w.nUnsubmitted

Done

lshr? The comments are all shl. Can you clarify one of them (looks like comments/summary is wrong).

goldstein.w.n: `lshr`? The comments are all `shl`. Can you clarify one of them (looks like comments/summary is…

XChyAuthorUnsubmitted

Done

You're right. I mess it up in last patch too. I'll fix it.

XChy: You're right. I mess it up in last patch too. I'll fix it.

match(Op1, m_ZExt(m_ICmp(Pred, m_Specific(A), m_Zero()))); m_SpecificInt(Op0->getType()->getScalarSizeInBits() - 1))) &&

match(Op1, m_ZExt(m_ICmp(Pred, m_Value(), m_Value())));

goldstein.w.nUnsubmitted

Done

Does Op1 need to be an icmp, or can it just be any i1?

goldstein.w.n: Does `Op1` need to be an `icmp`, or can it just be any `i1`?

XChyAuthorUnsubmitted

Done

When folding A >> BW - 1 -> A < 0, there are many possible folds for (A < 0) bitwise (icmp).
However, if replacing icmp with an arbitrary i1, it seldom folds and just produces this single fold A >> BW - 1 -> A < 0, which is inefficient.

XChy: When folding `A >> BW - 1 -> A < 0`, there are many possible folds for `(A < 0) bitwise…

goldstein.w.nUnsubmitted

Done

I see.

goldstein.w.n: I see.

if (IsMatched) {

goldstein.w.nUnsubmitted

Done

Is there a reason you create IsMatched as opposed to just embedding the match(...) logic in the if statement?

goldstein.w.n: Is there a reason you create `IsMatched` as opposed to just embedding the `match(...)` logic in…

XChyAuthorUnsubmitted

Done

No, it's my personal habit to set a boolean variable when expression is too long. If needed, I can remove it.

XChy: No, it's my personal habit to set a boolean variable when expression is too long. If needed, I…

goldstein.w.nUnsubmitted

Done

Its fine.

goldstein.w.n: Its fine.

goldstein.w.nUnsubmitted

Done

nit:: LogicOp != Instruction::Xor should go before the match(...) imo.

goldstein.w.n: nit:: LogicOp != Instruction::Xor should go before the `match(...)` imo.

goldstein.w.nUnsubmitted

Done

It seems in addressing this you just remove the LogicOp != Xor ingeneral...

goldstein.w.n: It seems in addressing this you just remove the LogicOp != Xor ingeneral...

Op0 = Builder.CreateZExt(

Builder.CreateICmpSLT(A, Constant::getNullValue(A->getType())),

A->getType());

}

}; };

nikicUnsubmitted

Done

return nullptr;

- return ZExtInst::Create(

- ZExtInst::ZExt,

+ return new ZExtInst(

Builder.CreateBinOp(

nikic:

if (LogicOpc == Instruction::Or && MatchBitwiseICmpZeroWithICmp(Op0, Op1);

(MatchOrZExtICmp(Op0, Op1) || MatchOrZExtICmp(Op1, Op0)) && MatchBitwiseICmpZeroWithICmp(Op1, Op0);

nikicUnsubmitted

Done

Needs clang-format.

nikic: Needs clang-format.

Pred == ICmpInst::ICMP_SGT) {

Value *Cmp =

Builder.CreateICmpNE(A, Constant::getNullValue(A->getType()));

return new ZExtInst(Cmp, A->getType());

}

CastInst *Cast0 = dyn_cast<CastInst>(Op0); CastInst *Cast0 = dyn_cast<CastInst>(Op0);

goldstein.w.nUnsubmitted

Done

I'm mostly okay with the change, but a little unhappy about this. It seems like a worrisome practice.
I guess it works, but it would be nice if there where a better way to accomplish it.

Generally I'd argue the best solution would be to refactor fold<BinOp>OfICmp to take the components rather than the final instructions, but those are both fairly complicated and propagate the instructions to other functions that would then need to be refactored. All in all more work than its worth.

I'm going to defer my opinion to @nikic about whether this is okay.

Other than my concern here, I'm basically ready to approve.

goldstein.w.n: I'm mostly okay with the change, but a little unhappy about this. It seems like a worrisome…

XChyAuthorUnsubmitted

Done

Thanks for your comment! I agree with you. It's not a good way to erase some deferred instructions in a fold function. It's better to let InstCombinerImpl::run() control the logic of instruction erasion.

But there aren't other ways to determine whether some instructions can be folded by fold<BinOp>OfICmp, unless we just call it with the instructions built and deferred or we can extract a canFold<BinOp>OfICmp function to take the components.

The latter costs too much, since it may require refactoring all sub folds.

For that reason, I applied the former but I didn't find any similar situations in InstCombine. I just try to copy some of the logic of instructions erasion to solve the problem.

I'll search for more similar cases to get a better solution if possible. I'll highly appreciate it if you could give me some other advice.

XChy: Thanks for your comment! I agree with you. It's not a good way to erase some deferred…

goldstein.w.nUnsubmitted

Done

Unfortunately I don't really have a better idea than what you have here, but want to here nikic's opinion

goldstein.w.n: Unfortunately I don't really have a better idea than what you have here, but want to here…

nikicUnsubmitted

Done

We shouldn't do this kind of speculative transform. Either you can always do the transform based on a reasonably close heuristic (like zext and icmp on the same value) or change APIs in a way that allows doing the transform without materializing the icmp (I think this is not worth the trouble).

My 2c here is that it would be okay to convert to the zext(bitwise(icmp, icmp)) form even if it doesn't always fold, as this seems like the better representation at the IR level to me. Even if it doesn't fold in InstCombine, this is the form that is more likely to be usefully optimized by other passes. If we really care, we can undo this in the backend.

nikic: We shouldn't do this kind of speculative transform. Either you can always do the transform…

XChyAuthorUnsubmitted

Done

I see. I have reverted to the version producing zext(bitwise(icmp.icmp))

XChy: I see. I have reverted to the version producing `zext(bitwise(icmp.icmp))`

if (!Cast0) if (!Cast0)

return nullptr; return nullptr;

goldstein.w.nUnsubmitted

Done

@nikic, is this right?

goldstein.w.n: @nikic, is this right?

// This must be a cast from an integer or integer vector source type to allow // This must be a cast from an integer or integer vector source type to allow

// transformation of the logic operation to the source type. // transformation of the logic operation to the source type.

Type *DestTy = I.getType(); Type *DestTy = I.getType();

Type *SrcTy = Cast0->getSrcTy(); Type *SrcTy = Cast0->getSrcTy();

if (!SrcTy->isIntOrIntVectorTy()) if (!SrcTy->isIntOrIntVectorTy())

return nullptr; return nullptr;

▲ Show 20 Lines • Show All 2,874 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/and-or-icmps.ll

Show First 20 Lines • Show All 2,565 Lines • ▼ Show 20 Lines	;
%rx = icmp ne <2 x i8> %x, <i8 -1, i8 -1>		%rx = icmp ne <2 x i8> %x, <i8 -1, i8 -1>
%ry = icmp ne <2 x i8> %y, <i8 -1, i8 undef>		%ry = icmp ne <2 x i8> %y, <i8 -1, i8 undef>
%r = or <2 x i1> %rx, %ry		%r = or <2 x i1> %rx, %ry
ret <2 x i1> %r		ret <2 x i1> %r
}		}

define i32 @icmp_slt_0_or_icmp_sgt_0_i32(i32 %x) {		define i32 @icmp_slt_0_or_icmp_sgt_0_i32(i32 %x) {
; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i32(		; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i32(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne i32 [[X:%.]], 0		; CHECK-NEXT: [[E1:%.]] = icmp ne i32 [[X:%.]], 0
; CHECK-NEXT: [[E:%.*]] = zext i1 [[TMP1]] to i32		; CHECK-NEXT: [[E:%.*]] = zext i1 [[E1]] to i32
; CHECK-NEXT: ret i32 [[E]]		; CHECK-NEXT: ret i32 [[E]]
;		;
%A = icmp slt i32 %x, 0		%A = icmp slt i32 %x, 0
%B = icmp sgt i32 %x, 0		%B = icmp sgt i32 %x, 0
%C = zext i1 %A to i32		%C = zext i1 %A to i32
%D = zext i1 %B to i32		%D = zext i1 %B to i32
%E = or i32 %C, %D		%E = or i32 %C, %D
ret i32 %E		ret i32 %E
}		}

define i64 @icmp_slt_0_or_icmp_sgt_0_i64(i64 %x) {		define i64 @icmp_slt_0_or_icmp_sgt_0_i64(i64 %x) {
; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64(		; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne i64 [[X:%.]], 0		; CHECK-NEXT: [[E1:%.]] = icmp ne i64 [[X:%.]], 0
; CHECK-NEXT: [[E:%.*]] = zext i1 [[TMP1]] to i64		; CHECK-NEXT: [[E:%.*]] = zext i1 [[E1]] to i64
; CHECK-NEXT: ret i64 [[E]]		; CHECK-NEXT: ret i64 [[E]]
;		;
%A = icmp slt i64 %x, 0		%A = icmp slt i64 %x, 0
%B = icmp sgt i64 %x, 0		%B = icmp sgt i64 %x, 0
%C = zext i1 %A to i64		%C = zext i1 %A to i64
%D = zext i1 %B to i64		%D = zext i1 %B to i64
%E = or i64 %C, %D		%E = or i64 %C, %D
ret i64 %E		ret i64 %E
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	;
%C = ashr i64 %x, 62		%C = ashr i64 %x, 62
%D = zext i1 %B to i64		%D = zext i1 %B to i64
%E = or i64 %C, %D		%E = or i64 %C, %D
ret i64 %E		ret i64 %E
}		}

define <2 x i64> @icmp_slt_0_or_icmp_sgt_0_i64x2(<2 x i64> %x) {		define <2 x i64> @icmp_slt_0_or_icmp_sgt_0_i64x2(<2 x i64> %x) {
; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64x2(		; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64x2(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <2 x i64> [[X:%.]], zeroinitializer		; CHECK-NEXT: [[E1:%.]] = icmp ne <2 x i64> [[X:%.]], zeroinitializer
; CHECK-NEXT: [[E:%.*]] = zext <2 x i1> [[TMP1]] to <2 x i64>		; CHECK-NEXT: [[E:%.*]] = zext <2 x i1> [[E1]] to <2 x i64>
; CHECK-NEXT: ret <2 x i64> [[E]]		; CHECK-NEXT: ret <2 x i64> [[E]]
;		;
%A = icmp slt <2 x i64> %x, <i64 0,i64 0>		%A = icmp slt <2 x i64> %x, <i64 0,i64 0>
%B = icmp sgt <2 x i64> %x, <i64 0,i64 0>		%B = icmp sgt <2 x i64> %x, <i64 0,i64 0>
%C = zext <2 x i1> %A to <2 x i64>		%C = zext <2 x i1> %A to <2 x i64>
%D = zext <2 x i1> %B to <2 x i64>		%D = zext <2 x i1> %B to <2 x i64>
%E = or <2 x i64> %C, %D		%E = or <2 x i64> %C, %D
ret <2 x i64> %E		ret <2 x i64> %E
}		}
define <2 x i64> @icmp_slt_0_or_icmp_sgt_0_i64x2_fail(<2 x i64> %x) {		define <2 x i64> @icmp_slt_0_or_icmp_sgt_0_i64x2_fail(<2 x i64> %x) {
		nikicUnsubmitted Done Reply Inline Actions Not _fail? nikic: Not _fail?
; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64x2_fail(		; CHECK-LABEL: @icmp_slt_0_or_icmp_sgt_0_i64x2_fail(
; CHECK-NEXT: [[B:%.]] = icmp sgt <2 x i64> [[X:%.]], <i64 1, i64 1>		; CHECK-NEXT: [[E1:%.]] = icmp ugt <2 x i64> [[X:%.]], <i64 1, i64 1>
; CHECK-NEXT: [[C:%.*]] = lshr <2 x i64> [[X]], <i64 63, i64 63>		; CHECK-NEXT: [[E:%.*]] = zext <2 x i1> [[E1]] to <2 x i64>
; CHECK-NEXT: [[D:%.*]] = zext <2 x i1> [[B]] to <2 x i64>
; CHECK-NEXT: [[E:%.*]] = or <2 x i64> [[C]], [[D]]
; CHECK-NEXT: ret <2 x i64> [[E]]		; CHECK-NEXT: ret <2 x i64> [[E]]
;		;
%B = icmp sgt <2 x i64> %x, <i64 1, i64 1>		%B = icmp sgt <2 x i64> %x, <i64 1, i64 1>
%C = lshr <2 x i64> %x, <i64 63, i64 63>		%C = lshr <2 x i64> %x, <i64 63, i64 63>
%D = zext <2 x i1> %B to <2 x i64>		%D = zext <2 x i1> %B to <2 x i64>
%E = or <2 x i64> %C, %D		%E = or <2 x i64> %C, %D
ret <2 x i64> %E		ret <2 x i64> %E

}		}

define i32 @icmps_slt_0_and_icmp_sge_neg1_i32(i32 %x) {		define i32 @icmps_slt_0_and_icmp_sge_neg1_i32(i32 %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i32(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i32(
; CHECK-NEXT: [[A:%.]] = icmp sgt i32 [[X:%.]], -1		; CHECK-NEXT: ret i32 0
; CHECK-NEXT: [[B:%.*]] = zext i1 [[A]] to i32
; CHECK-NEXT: [[C:%.*]] = lshr i32 [[X]], 31
; CHECK-NEXT: [[D:%.*]] = and i32 [[C]], [[B]]
; CHECK-NEXT: ret i32 [[D]]
;		;
%A = icmp sgt i32 %x, -1		%A = icmp sgt i32 %x, -1
%B = zext i1 %A to i32		%B = zext i1 %A to i32
%C = lshr i32 %x, 31		%C = lshr i32 %x, 31
%D = and i32 %C, %B		%D = and i32 %C, %B
ret i32 %D		ret i32 %D
}		}

define i32 @icmps_slt_0_or_icmp_sge_neg1_i32(i32 %x) {		define i32 @icmps_slt_0_or_icmp_sge_neg1_i32(i32 %x) {
; CHECK-LABEL: @icmps_slt_0_or_icmp_sge_neg1_i32(		; CHECK-LABEL: @icmps_slt_0_or_icmp_sge_neg1_i32(
; CHECK-NEXT: [[A:%.]] = icmp sgt i32 [[X:%.]], -2		; CHECK-NEXT: ret i32 1
; CHECK-NEXT: [[B:%.*]] = zext i1 [[A]] to i32
; CHECK-NEXT: [[C:%.*]] = lshr i32 [[X]], 31
; CHECK-NEXT: [[D:%.*]] = or i32 [[C]], [[B]]
; CHECK-NEXT: ret i32 [[D]]
;		;
%A = icmp sge i32 %x, -1		%A = icmp sge i32 %x, -1
%B = zext i1 %A to i32		%B = zext i1 %A to i32
%C = lshr i32 %x, 31		%C = lshr i32 %x, 31
%D = or i32 %C, %B		%D = or i32 %C, %B
ret i32 %D		ret i32 %D
}		}

define i64 @icmps_slt_0_and_icmp_sge_neg1_i64(i64 %x) {		define i64 @icmps_slt_0_and_icmp_sge_neg1_i64(i64 %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64(
; CHECK-NEXT: [[A:%.]] = icmp sgt i64 [[X:%.]], -2		; CHECK-NEXT: [[D1:%.]] = icmp eq i64 [[X:%.]], -1
; CHECK-NEXT: [[B:%.*]] = zext i1 [[A]] to i64		; CHECK-NEXT: [[D:%.*]] = zext i1 [[D1]] to i64
; CHECK-NEXT: [[C:%.*]] = lshr i64 [[X]], 63
; CHECK-NEXT: [[D:%.*]] = and i64 [[C]], [[B]]
; CHECK-NEXT: ret i64 [[D]]		; CHECK-NEXT: ret i64 [[D]]
;		;
%A = icmp sge i64 %x, -1		%A = icmp sge i64 %x, -1
%B = zext i1 %A to i64		%B = zext i1 %A to i64
%C = lshr i64 %x, 63		%C = lshr i64 %x, 63
%D = and i64 %C, %B		%D = and i64 %C, %B
ret i64 %D		ret i64 %D
}		}

define i64 @icmps_slt_0_and_icmp_sge_neg1_i64_fail0(i64 %x) {		define i64 @icmps_slt_0_and_icmp_sge_neg1_i64_fail0(i64 %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64_fail0(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64_fail0(
; CHECK-NEXT: [[A:%.]] = icmp sgt i64 [[X:%.]], -2		; CHECK-NEXT: [[D1:%.]] = icmp eq i64 [[X:%.]], -1
; CHECK-NEXT: [[B:%.*]] = zext i1 [[A]] to i64		; CHECK-NEXT: [[D:%.*]] = zext i1 [[D1]] to i64
; CHECK-NEXT: [[C:%.*]] = lshr i64 [[X]], 63
; CHECK-NEXT: [[D:%.*]] = and i64 [[C]], [[B]]
; CHECK-NEXT: ret i64 [[D]]		; CHECK-NEXT: ret i64 [[D]]
;		;
%A = icmp sge i64 %x, -1		%A = icmp sge i64 %x, -1
%B = zext i1 %A to i64		%B = zext i1 %A to i64
%C = lshr i64 %x, 63		%C = lshr i64 %x, 63
%D = and i64 %C, %B		%D = and i64 %C, %B
ret i64 %D		ret i64 %D
}		}

define i64 @icmps_slt_0_and_icmp_sge_neg1_i64_fail1(i64 %x) {		define i64 @icmps_slt_0_and_icmp_sge_neg1_i64_fail1(i64 %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64_fail1(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64_fail1(
; CHECK-NEXT: [[A:%.]] = icmp sgt i64 [[X:%.]], -2		; CHECK-NEXT: [[D2:%.]] = icmp eq i64 [[X:%.]], -1
; CHECK-NEXT: [[B:%.*]] = zext i1 [[A]] to i64		; CHECK-NEXT: [[D:%.*]] = zext i1 [[D2]] to i64
; CHECK-NEXT: [[C1:%.*]] = lshr i64 [[X]], 63
; CHECK-NEXT: [[D:%.*]] = and i64 [[C1]], [[B]]
; CHECK-NEXT: ret i64 [[D]]		; CHECK-NEXT: ret i64 [[D]]
;		;
%A = icmp sge i64 %x, -1		%A = icmp sge i64 %x, -1
%B = zext i1 %A to i64		%B = zext i1 %A to i64
%C = ashr i64 %x, 63		%C = ashr i64 %x, 63
%D = and i64 %C, %B		%D = and i64 %C, %B
ret i64 %D		ret i64 %D
}		}
Show All 10 Lines	;
%B = zext i1 %A to i64		%B = zext i1 %A to i64
%C = lshr i64 %x, 62		%C = lshr i64 %x, 62
%D = and i64 %C, %B		%D = and i64 %C, %B
ret i64 %D		ret i64 %D
}		}

define i64 @icmps_slt_0_and_icmp_sge_neg1_i64_fail3(i64 %x) {		define i64 @icmps_slt_0_and_icmp_sge_neg1_i64_fail3(i64 %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64_fail3(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i64_fail3(
; CHECK-NEXT: [[A:%.]] = icmp sgt i64 [[X:%.]], -1		; CHECK-NEXT: ret i64 0
; CHECK-NEXT: [[B:%.*]] = zext i1 [[A]] to i64
; CHECK-NEXT: [[C:%.*]] = lshr i64 [[X]], 63
; CHECK-NEXT: [[D:%.*]] = and i64 [[C]], [[B]]
; CHECK-NEXT: ret i64 [[D]]
;		;
%A = icmp sgt i64 %x, -1		%A = icmp sgt i64 %x, -1
%B = zext i1 %A to i64		%B = zext i1 %A to i64
%C = lshr i64 %x, 63		%C = lshr i64 %x, 63
%D = and i64 %C, %B		%D = and i64 %C, %B
ret i64 %D		ret i64 %D
}		}

define <2 x i32> @icmps_slt_0_and_icmp_sge_neg1_i32x2(<2 x i32> %x) {		define <2 x i32> @icmps_slt_0_and_icmp_sge_neg1_i32x2(<2 x i32> %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i32x2(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg1_i32x2(
; CHECK-NEXT: [[A:%.]] = icmp sgt <2 x i32> [[X:%.]], <i32 -2, i32 -2>		; CHECK-NEXT: [[D1:%.]] = icmp eq <2 x i32> [[X:%.]], <i32 -1, i32 -1>
; CHECK-NEXT: [[B:%.*]] = zext <2 x i1> [[A]] to <2 x i32>		; CHECK-NEXT: [[D:%.*]] = zext <2 x i1> [[D1]] to <2 x i32>
; CHECK-NEXT: [[C:%.*]] = lshr <2 x i32> [[X]], <i32 31, i32 31>
; CHECK-NEXT: [[D:%.*]] = and <2 x i32> [[C]], [[B]]
; CHECK-NEXT: ret <2 x i32> [[D]]		; CHECK-NEXT: ret <2 x i32> [[D]]
;		;
%A = icmp sge <2 x i32> %x, <i32 -1, i32 -1>		%A = icmp sge <2 x i32> %x, <i32 -1, i32 -1>
%B = zext <2 x i1> %A to <2 x i32>		%B = zext <2 x i1> %A to <2 x i32>
%C = lshr <2 x i32> %x, <i32 31, i32 31>		%C = lshr <2 x i32> %x, <i32 31, i32 31>
%D = and <2 x i32> %C, %B		%D = and <2 x i32> %C, %B
ret <2 x i32> %D		ret <2 x i32> %D
}		}

define <2 x i32> @icmps_slt_0_and_icmp_sge_neg2_i32x2(<2 x i32> %x) {		define <2 x i32> @icmps_slt_0_and_icmp_sge_neg2_i32x2(<2 x i32> %x) {
; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg2_i32x2(		; CHECK-LABEL: @icmps_slt_0_and_icmp_sge_neg2_i32x2(
; CHECK-NEXT: [[A:%.]] = icmp sgt <2 x i32> [[X:%.]], <i32 -3, i32 -3>		; CHECK-NEXT: [[D1:%.]] = icmp ugt <2 x i32> [[X:%.]], <i32 -3, i32 -3>
; CHECK-NEXT: [[B:%.*]] = zext <2 x i1> [[A]] to <2 x i32>		; CHECK-NEXT: [[D:%.*]] = zext <2 x i1> [[D1]] to <2 x i32>
; CHECK-NEXT: [[C:%.*]] = lshr <2 x i32> [[X]], <i32 31, i32 31>
; CHECK-NEXT: [[D:%.*]] = and <2 x i32> [[C]], [[B]]
; CHECK-NEXT: ret <2 x i32> [[D]]		; CHECK-NEXT: ret <2 x i32> [[D]]
;		;
%A = icmp sge <2 x i32> %x, <i32 -2, i32 -2>		%A = icmp sge <2 x i32> %x, <i32 -2, i32 -2>
%B = zext <2 x i1> %A to <2 x i32>		%B = zext <2 x i1> %A to <2 x i32>
%C = lshr <2 x i32> %x, <i32 31, i32 31>		%C = lshr <2 x i32> %x, <i32 31, i32 31>
%D = and <2 x i32> %C, %B		%D = and <2 x i32> %C, %B
ret <2 x i32> %D		ret <2 x i32> %D
}		}
		nikicUnsubmitted Done Reply Inline Actions The main multi-use test I'm looking for is one where the resulting binop does not get folded. That's the case where your current code will increase instructions, I believe. nikic: The main multi-use test I'm looking for is one where the resulting binop does not get folded.
		goldstein.w.nUnsubmitted Done Reply Inline Actions Since you're now supporting xor, you need to add some tests for it. goldstein.w.n: Since you're now supporting xor, you need to add some tests for it.
		nikicUnsubmitted Done Reply Inline Actions Creating more instructions here. You should probably always require one-use on the lshr rather than one-use on one of the operands. nikic: Creating more instructions here. You should probably always require one-use on the lshr rather…
		XChyAuthorUnsubmitted Done Reply Inline Actions If replacing `icmp eq 100` with other icmp that can be optimized along with `icmp slt 0`, there will be a better IR, except just one extra instruction. Is there some principles that determine whether a one-use guard is necessary or whether a fold is too agressive/conservative? Maybe I can apply them to similar situation in future contributions. XChy: If replacing `icmp eq 100` with other icmp that can be optimized along with `icmp slt 0`, there…
		nikicUnsubmitted Done Reply Inline Actions For multi-use we want to avoid instruction increase in the worst case. nikic: For multi-use we want to avoid instruction increase in the worst case.
		XChyAuthorUnsubmitted Done Reply Inline Actions OK. Thanks for suggestion. XChy: OK. Thanks for suggestion.
		nikicUnsubmitted Done Reply Inline Actions Based on this we should have m_OneUse on the zext(icmp) as well. nikic: Based on this we should have m_OneUse on the zext(icmp) as well.

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 538433

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

llvm/test/Transforms/InstCombine/and-or-icmps.ll

[InstCombine] Transform bitwise (A >> C - 1, zext(icmp)) -> zext (bitwise(A < 0, icmp)) fold.
ClosedPublic