This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Extracting common and-mask for shift operands of Or instruction
Needs ReviewPublic

Authored by opaparo on Oct 30 2017, 8:24 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
zvi
m_zuckerman
lsaba
AndreiGrischenko

Summary

Adding an InstCombine transformation:
((V<<C3)&C1) | ((V<<C4)&C2) --> ((V&C5)<<C3) | ((V&C5)<<C4), if C5 = C1>>C3 == C2>>C4, for both logical shifts.
When executed, this transforms five instructions into four, saving one instruction.

These patterns will also be transformed:
((V&C5)<<C3) | ((V<<C4)&C2) --> ((V&C5)<<C3) | ((V&C5)<<C4)
((V<<C3)&C1) | ((V&C5)<<C4) --> ((V&C5)<<C3) | ((V&C5)<<C4)

Diff Detail

Repository: rL LLVM

Event Timeline

opaparo created this revision.Oct 30 2017, 8:24 AM

Adding a pattern match for a shift of and ((V&C1)<<C2).
Although and of shift is the canonical form, this new form is also required in some cases. The new test multiuse3 demonstrate such a case.

Why do we canonicalize shift-left before 'and'?

Ie, shouldn't we prefer this:

define i8 @andshl(i8 %x) {
  %and = and i8 %x, 1
  %shl = shl i8 %and, 3
  ret i8 %shl
}

instead of this:

define i8 @andshl(i8 %x) {
  %and = shl i8 %x, 3
  %shl = and i8 %and, 8
  ret i8 %shl
}

...because doing the 'and' before the shift always uses a smaller constant?

opaparo added a reviewer: lsaba.Nov 20 2017, 6:38 AM

zvi added inline comments.Nov 20 2017, 11:00 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
95	May be a nitpick of mine, but i would find it easier to follow these helpers if the argument and variable names would match the comments. E.g. Source -> V, ShifyBy -> C2, PreShiftMask ->C1 ...

lsaba added inline comments.Nov 21 2017, 3:26 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
88	both logical shifts (shl / lshr)
158	Check that it's the same logical shift

LGTM after fixing the minor comments

This revision is now accepted and ready to land.Nov 21 2017, 3:26 AM

Please answer my question. It seems like that canonicalization might simplify or obviate the need for this patch.

This revision now requires changes to proceed.Nov 24 2017, 7:12 AM

davide removed a subscriber: davide.Nov 25 2017, 12:32 AM

In D39421#923308, @spatel wrote:
Why do we canonicalize shift-left before 'and'?

Ie, shouldn't we prefer this:
define i8 @andshl(i8 %x) {
  %and = and i8 %x, 1
  %shl = shl i8 %and, 3
  ret i8 %shl
}
instead of this:
define i8 @andshl(i8 %x) {
  %and = shl i8 %x, 3
  %shl = and i8 %and, 8
  ret i8 %shl
}
...because doing the 'and' before the shift always uses a smaller constant?

Hi,
Sorry for the delayed response.

I'm not sure I understand why the suggested canonization might simplify or obviate the need for this patch.
Consider my use case "multiuse3". Although InstCombine normally recognizes and canonize something of the form

%1 = and i32 %x, 96
%2 = shl nuw nsw i32 %1, 6

It will not in this case as '%1' has more than one use, and having one use is a condition for this transformation. If my transformation wouldn't consider the non-canonical in addition to the canonical form it could not handle this case.
One may argue that changing the canonical form will yield this code:

%1 = and i32 %x, 96
%2 = shl nuw nsw i32 %1, 6
%3 = lshr exact i32 %1, 1
%4 = and i32 %x, 30
%5 = shl nuw nsw i32 %4, 6
%6 = or i32 %2, %5
%7 = lshr exact i32 %4, 1
%8 = or i32 %3, %7
%9 = or i32 %8, %6
ret i32 %9

Which will then simplify the patch as I will not be required to consider the non-canonical form. However:

This will be true only if the two shifts are shl. In this example one of them is a lshr, so the transformation will not actually happen and the non-canonical form still needs to be considered.
This canonization will only occur if the intermediate results, i.e. "%4 = shl i32 %x, 6" and "%7 = lshr i32 %x, 1" have only one use. Suppose the scenario was a bit different and those values were used somewhere along the road. In this case the canonization would not happen and again I'll have to consider the non-canonical form.

As far as I understand this is the way that the suggested canonization might simplify or obviate the need for this patch. If you meant something else, could you please elaborate?

In D39421#935341, @opaparo wrote:

I'm not sure I understand why the suggested canonization might simplify or obviate the need for this patch.
Consider my use case "multiuse3". Although InstCombine normally recognizes and canonize something of the form

I think inverting the canonicalization of shl+and would make your first test case optimize without this patch, so that's actually where I paused in reviewing the patch. Have you investigated that possibility? Currently, we end up inverting the canonicalization in the x86 backend (because a smaller constant mask can be created in less instruction bytes), so it would be better to "get it right" here in IR in the first place.

I understand the multi-use case better now with your explanation, so I agree that we want this patch to handle those cases too. But I don't think we should ignore the underlying canonicalization choices just because we know we want to catch the larger patterns.

spatel mentioned this in rL319182: [InstCombine] add tests from D39421 to show current transforms; NFC.Nov 28 2017, 8:41 AM

In D39421#937705, @spatel wrote:

I think inverting the canonicalization of shl+and would make your first test case optimize without this patch

Could you please explain why? I'm not sure I'm seeing it.

Currently, we end up inverting the canonicalization in the x86 backend (because a smaller constant mask can be created in less instruction bytes), so it would be better to "get it right" here in IR in the first place.
I understand the multi-use case better now with your explanation, so I agree that we want this patch to handle those cases too. But I don't think we should ignore the underlying canonicalization choices just because we know we want to catch the larger patterns.

I agree that this alternative canonization could prove to be beneficial and more correct. However, I feel that this discussion is orthogonal to this patch, and if it would indeed be decided to switch to the new form then some of the code of this patch, along with several other pieces of code, may need to change accordingly.

In D39421#937773, @opaparo wrote:

In D39421#937705, @spatel wrote:

I think inverting the canonicalization of shl+and would make your first test case optimize without this patch

Could you please explain why? I'm not sure I'm seeing it.

If we invert the shl+and transform, we don't need this patch to reach optimal code for 3 out of the 6 tests:

define i32 @or_and_shifts1(i32 %x) {
  %1 = and i32 %x, 1
  %2 = shl nuw nsw i32 %1, 3
  %3 = and i32 %x, 1   <-- CSE will eliminate this
  %4 = shl nuw nsw i32 %3, 5
  %5 = or i32 %2, %4
  ret i32 %5
}

Similarly (what does this test check that is different from the above?):

define i32 @or_and_shift_shift_and(i32 %x) {
  %1 = and i32 %x, 7
  %2 = shl nuw nsw i32 %1, 3
  %3 = and i32 %x, 7  <-- CSE will eliminate this
  %4 = shl nuw nsw i32 %3, 2
  %5 = or i32 %2, %4
  ret i32 %5
}

And again:

define i32 @multiuse2(i32 %x) {
  %1 = and i32 %x, 126
  %2 = shl nuw nsw i32 %1, 8
  %3 = and i32 %x, 126 <-- CSE will eliminate this
  %4 = shl nuw nsw i32 %3, 1
  %5 = or i32 %2, %4
  ret i32 %5
}

Currently, we end up inverting the canonicalization in the x86 backend (because a smaller constant mask can be created in less instruction bytes), so it would be better to "get it right" here in IR in the first place.
I understand the multi-use case better now with your explanation, so I agree that we want this patch to handle those cases too. But I don't think we should ignore the underlying canonicalization choices just because we know we want to catch the larger patterns.

I agree that this alternative canonization could prove to be beneficial and more correct. However, I feel that this discussion is orthogonal to this patch, and if it would indeed be decided to switch to the new form then some of the code of this patch, along with several other pieces of code, may need to change accordingly.

Since we can eliminate the need for this patch in half of the tests (note: I checked in the tests at rL319182 , so we can see what they look like currently), I don't think the underlying transform is orthogonal. If this patch would change with the inverted canonicalization, then that's more reason to view the inversion as a preliminary step for this patch. Otherwise, we're adding code unnecessarily. It's possible that inverting shl+and inhibits other folds, and if that's the case, then why not fix that too?

Here's the draft patch I used to check the tests above:

Index: lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
===================================================================
--- lib/Transforms/InstCombine/InstCombineAndOrXor.cpp	(revision 319170)
+++ lib/Transforms/InstCombine/InstCombineAndOrXor.cpp	(working copy)
@@ -1212,6 +1212,13 @@
       return BinaryOperator::CreateOr(And, ConstantInt::get(I.getType(),
                                                             Together));
     }
+    const APInt *ShlC;
+    if (match(Op0, m_OneUse(m_Shl(m_Value(X), m_APInt(ShlC))))) {
+      Constant *NewMask = ConstantInt::get(I.getType(), C->lshr(*ShlC));
+      Value *NewAnd = Builder.CreateAnd(X, NewMask);
+      return BinaryOperator::CreateShl(NewAnd, ConstantInt::get(I.getType(),
+                                                                *ShlC));
+    }
 
     // If the mask is only needed on one incoming arm, push the 'and' op up.
     if (match(Op0, m_OneUse(m_Xor(m_Value(X), m_Value(Y)))) ||
Index: lib/Transforms/InstCombine/InstCombineShifts.cpp
===================================================================
--- lib/Transforms/InstCombine/InstCombineShifts.cpp	(revision 319170)
+++ lib/Transforms/InstCombine/InstCombineShifts.cpp	(working copy)
@@ -505,7 +505,7 @@
       // If the operand is a bitwise operator with a constant RHS, and the
       // shift is the only use, we can pull it out of the shift.
       const APInt *Op0C;
-      if (match(Op0BO->getOperand(1), m_APInt(Op0C))) {
+      if (match(Op0BO->getOperand(1), m_APInt(Op0C)) && !isLeftShift) {
         if (canShiftBinOpWithConstantRHS(I, Op0BO, *Op0C)) {
           Constant *NewRHS = ConstantExpr::get(I.getOpcode(),
                                      cast<Constant>(Op0BO->getOperand(1)), Op1);

spatel added inline comments.Nov 29 2017, 6:08 AM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
146	Why are we looking for a pattern that early-cse can simplify? I think this is beyond the scope of instcombine.

opaparo added a reviewer: AndreiGrischenko.Dec 5 2017, 3:46 AM

Both this and D38037 are trying to start a pattern match with an 'or', but I'm curious if there's a 'trunc' in the larger source that creates these patterns? Either way, we're missing something bigger than patterns that start with 'or'.

For example, I was looking at PR31667:
https://bugs.llvm.org/show_bug.cgi?id=31667

Name: sub_mask_shift

%and1 = lshr i32 %x, 3
%shr1 = and i32 %and1, 8191
%and2 = lshr i32 %x, 1
%shr2 = and i32 %and2, 32767
%r = sub i32 %shr1, %shr2

...which was filed as a backend bug, but we wouldn't handle that in IR either:
https://rise4fun.com/Alive/id4

So I think there's some more general sequence that we want to capture and optimize, but it may be difficult to justify as part of instcombine?

Note that there is a proposal for a new pass where all of these might find a home:
D38313

opaparo set the repository for this revision to rL LLVM.Dec 14 2017, 5:49 AM

opaparo added a parent revision: D41233: [InstCombine] Canonizing 'and' before 'shl'.

Rebasing on parent revision and adding more test cases.

opaparo mentioned this in D38037: [InstCombine] Compacting or instructions whose operands are shift instructions.Dec 14 2017, 6:06 AM

In D39421#945064, @spatel wrote:
Both this and D38037 are trying to start a pattern match with an 'or', but I'm curious if there's a 'trunc' in the larger source that creates these patterns? Either way, we're missing something bigger than patterns that start with 'or'.

For example, I was looking at PR31667:
https://bugs.llvm.org/show_bug.cgi?id=31667

Name: sub_mask_shift
%and1 = lshr i32 %x, 3
%shr1 = and i32 %and1, 8191
%and2 = lshr i32 %x, 1
%shr2 = and i32 %and2, 32767
%r = sub i32 %shr1, %shr2
...which was filed as a backend bug, but we wouldn't handle that in IR either:
https://rise4fun.com/Alive/id4

So I think there's some more general sequence that we want to capture and optimize, but it may be difficult to justify as part of instcombine?

Note that there is a proposal for a new pass where all of these might find a home:
D38313

(Now abandoned) D38037 was a the first draft of this review.
There is no 'trunc' in the larger source that creates these patterns. My patch may address these issues you mentioned, but it is not directly related to them.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
146	Please check out the newly added tests multiuse4 and multiuse5. I believe they illustrate that this is indeed an instcombine transformation.

Rebasing on top of the new parent review D41354

opaparo edited parent revisions, added: D41354: [InstCombine] Extending InstructionSimplify; removed: D41233: [InstCombine] Canonizing 'and' before 'shl'.Dec 18 2017, 8:01 AM

opaparo added a child revision: D41574: [Transforms] Adding a WeakReassociate pass.Dec 25 2017, 5:27 AM

opaparo mentioned this in D41574: [Transforms] Adding a WeakReassociate pass.Dec 27 2017, 7:23 AM

Ping

opaparo removed a child revision: D41574: [Transforms] Adding a WeakReassociate pass.May 22 2018, 9:43 AM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

86 lines

test/

Transforms/

InstCombine/

or-shifted-masks.ll

94 lines

Diff 126938

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
/// \param I Binary operator to transform.		/// \param I Binary operator to transform.
/// \return Pointer to node that must replace the original binary operator, or		/// \return Pointer to node that must replace the original binary operator, or
/// null pointer if no transformation was made.		/// null pointer if no transformation was made.
static Value *SimplifyBSwap(BinaryOperator &I,		static Value *SimplifyBSwap(BinaryOperator &I,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
assert(I.isBitwiseLogicOp() && "Unexpected opcode for bswap simplifying");		assert(I.isBitwiseLogicOp() && "Unexpected opcode for bswap simplifying");

Value *OldLHS = I.getOperand(0);		Value *OldLHS = I.getOperand(0);
Value *OldRHS = I.getOperand(1);		Value *OldRHS = I.getOperand(1);
		lsabaUnsubmitted Not Done Reply Inline Actions both logical shifts (shl / lshr) lsaba: both logical shifts (shl / lshr)

Value *NewLHS;		Value *NewLHS;
if (!match(OldLHS, m_BSwap(m_Value(NewLHS))))		if (!match(OldLHS, m_BSwap(m_Value(NewLHS))))
return nullptr;		return nullptr;

Value *NewRHS;		Value *NewRHS;
const APInt *C;		const APInt *C;
		zviUnsubmitted Not Done Reply Inline Actions May be a nitpick of mine, but i would find it easier to follow these helpers if the argument and variable names would match the comments. E.g. Source -> V, ShifyBy -> C2, PreShiftMask ->C1 ... zvi: May be a nitpick of mine, but i would find it easier to follow these helpers if the argument…

if (match(OldRHS, m_BSwap(m_Value(NewRHS)))) {		if (match(OldRHS, m_BSwap(m_Value(NewRHS)))) {
// OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )		// OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
if (!OldLHS->hasOneUse() && !OldRHS->hasOneUse())		if (!OldLHS->hasOneUse() && !OldRHS->hasOneUse())
return nullptr;		return nullptr;
// NewRHS initialized by the matcher.		// NewRHS initialized by the matcher.
} else if (match(OldRHS, m_APInt(C))) {		} else if (match(OldRHS, m_APInt(C))) {
// OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )		// OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )
Show All 34 Lines	if (Op->hasOneUse()) {
const APInt& AddRHS = OpRHS->getValue();		const APInt& AddRHS = OpRHS->getValue();

// Check to see if any bits below the one bit set in AndRHSV are set.		// Check to see if any bits below the one bit set in AndRHSV are set.
if ((AddRHS & (AndRHSV - 1)).isNullValue()) {		if ((AddRHS & (AndRHSV - 1)).isNullValue()) {
// If not, the only thing that can effect the output of the AND is		// If not, the only thing that can effect the output of the AND is
// the bit specified by AndRHSV. If that bit is set, the effect of		// the bit specified by AndRHSV. If that bit is set, the effect of
// the XOR is to toggle the bit. If it is clear, then the ADD has		// the XOR is to toggle the bit. If it is clear, then the ADD has
// no effect.		// no effect.
if ((AddRHS & AndRHSV).isNullValue()) { // Bit is not set, noop		if ((AddRHS & AndRHSV).isNullValue()) { // Bit is not set, noop
		spatelUnsubmitted Not Done Reply Inline Actions Why are we looking for a pattern that early-cse can simplify? I think this is beyond the scope of instcombine. spatel: Why are we looking for a pattern that early-cse can simplify? I think this is beyond the scope…
		opaparoAuthorUnsubmitted Not Done Reply Inline Actions Please check out the newly added tests multiuse4 and multiuse5. I believe they illustrate that this is indeed an instcombine transformation. opaparo: Please check out the newly added tests multiuse4 and multiuse5. I believe they illustrate that…
TheAnd.setOperand(0, X);		TheAnd.setOperand(0, X);
return &TheAnd;		return &TheAnd;
} else {		} else {
// Pull the XOR out of the AND.		// Pull the XOR out of the AND.
Value *NewAnd = Builder.CreateAnd(X, AndRHS);		Value *NewAnd = Builder.CreateAnd(X, AndRHS);
NewAnd->takeName(Op);		NewAnd->takeName(Op);
return BinaryOperator::CreateXor(NewAnd, AndRHS);		return BinaryOperator::CreateXor(NewAnd, AndRHS);
}		}
}		}
}		}
}		}
break;		break;
		lsabaUnsubmitted Not Done Reply Inline Actions Check that it's the same logical shift lsaba: Check that it's the same logical shift
}		}
return nullptr;		return nullptr;
}		}

/// Emit a computation of: (V >= Lo && V < Hi) if Inside is true, otherwise		/// Emit a computation of: (V >= Lo && V < Hi) if Inside is true, otherwise
/// (V < Lo \|\| V >= Hi). This method expects that Lo <= Hi. IsSigned indicates		/// (V < Lo \|\| V >= Hi). This method expects that Lo <= Hi. IsSigned indicates
/// whether to treat V, Lo, and Hi as signed or not.		/// whether to treat V, Lo, and Hi as signed or not.
Value InstCombiner::insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,		Value InstCombiner::insertRangeTest(Value V, const APInt &Lo, const APInt &Hi,
▲ Show 20 Lines • Show All 1,863 Lines • ▼ Show 20 Lines	// that this transformation will allow the new ORs to be optimized.
if (Op0->hasOneUse() && Op1->hasOneUse() &&		if (Op0->hasOneUse() && Op1->hasOneUse() &&
match(Op0, m_Select(m_Value(X), m_Value(A), m_Value(B))) &&		match(Op0, m_Select(m_Value(X), m_Value(A), m_Value(B))) &&
match(Op1, m_Select(m_Value(Y), m_Value(C), m_Value(D))) && X == Y) {		match(Op1, m_Select(m_Value(Y), m_Value(C), m_Value(D))) && X == Y) {
Value *orTrue = Builder.CreateOr(A, C);		Value *orTrue = Builder.CreateOr(A, C);
Value *orFalse = Builder.CreateOr(B, D);		Value *orFalse = Builder.CreateOr(B, D);
return SelectInst::Create(X, orTrue, orFalse);		return SelectInst::Create(X, orTrue, orFalse);
}		}
}		}

		// ((V<<C3)&C1) \| ((V<<C4)&C2) --> ((V&C5)<<C3) \| ((V&C5)<<C4)
		// if C5 = C1>>C3 == C2>>C4, for both logical shifts
		//
		// These patterns will also be transformed:
		// ((V&C5)<<C3) \| ((V<<C4)&C2) --> ((V&C5)<<C3) \| ((V&C5)<<C4)
		// ((V<<C3)&C1) \| ((V&C5)<<C4) --> ((V&C5)<<C3) \| ((V&C5)<<C4)
		{
		// match (V&C1)<<C2
		auto MatchShiftOfAnd = [](Value V, Value &Source,
		Instruction::BinaryOps &ShiftOpcode,
		ConstantInt &ShiftBy, ConstantInt &PreShiftMask,
		ConstantInt *&PostShiftMask,
		BinaryOperator *&IntermediateInstr) -> bool {
		BinaryOperator *Sh;
		if (!match(V, m_BinOp(Sh)) \|\|
		!match(Sh, m_LogicalShift(m_BinOp(IntermediateInstr),
		m_ConstantInt(ShiftBy))) \|\|
		!match(IntermediateInstr,
		m_And(m_Value(Source), m_ConstantInt(PreShiftMask))))
		return false;
		ShiftOpcode = Sh->getOpcode();
		PostShiftMask = cast<ConstantInt>(
		ConstantExpr::get(Sh->getOpcode(), PreShiftMask, ShiftBy));
		return true;
		};
		// match (V<<C1)&C2
		auto MatchAndOfShift = [](Value V, Value &Source,
		Instruction::BinaryOps &ShiftOpcode,
		ConstantInt &ShiftBy, ConstantInt &PreShiftMask,
		ConstantInt *&PostShiftMask,
		BinaryOperator *&IntermediateInstr) -> bool {
		if (!match(V, m_And(m_BinOp(IntermediateInstr),
		m_ConstantInt(PostShiftMask))) \|\|
		!match(IntermediateInstr,
		m_LogicalShift(m_Value(Source), m_ConstantInt(ShiftBy))))
		return false;
		ShiftOpcode = IntermediateInstr->getOpcode();
		Instruction::BinaryOps InverseOpcode =
		IntermediateInstr->getOpcode() == Instruction::Shl ? Instruction::LShr
		: Instruction::Shl;
		PreShiftMask = cast<ConstantInt>(
		ConstantExpr::get(InverseOpcode, PostShiftMask, ShiftBy));
		return true;
		};
		Value Source0, Source1;
		Instruction::BinaryOps ShiftOpcode0, ShiftOpcode1;
		ConstantInt ShiftBy0, ShiftBy1, PreShiftMask0, PreShiftMask1,
		PostShiftMask0, PostShiftMask1;
		BinaryOperator IntermediateInstr0, IntermediateInstr1;
		if ((MatchShiftOfAnd(Op0, Source0, ShiftOpcode0, ShiftBy0, PreShiftMask0,
		PostShiftMask0, IntermediateInstr0) \|\|
		MatchAndOfShift(Op0, Source0, ShiftOpcode0, ShiftBy0, PreShiftMask0,
		PostShiftMask0, IntermediateInstr0)) &&
		(MatchShiftOfAnd(Op1, Source1, ShiftOpcode1, ShiftBy1, PreShiftMask1,
		PostShiftMask1, IntermediateInstr1) \|\|
		MatchAndOfShift(Op1, Source1, ShiftOpcode1, ShiftBy1, PreShiftMask1,
		PostShiftMask1, IntermediateInstr1)) &&
		Source0 == Source1) {
		if (ShiftBy0 == ShiftBy1 && ShiftOpcode0 == ShiftOpcode1) {
		// ((V<<C3)&C1) \| ((V<<C3)&C2) --> (V<<C3)&(C1\|C2)
		unsigned SavedInstructions = Op0->hasOneUse() + Op1->hasOneUse() +
		IntermediateInstr0->hasOneUse() +
		IntermediateInstr1->hasOneUse();
		if (SavedInstructions >= 2) {
		Value *Sh = Builder.CreateBinOp(ShiftOpcode0, Source0, ShiftBy0);
		Constant *Mask = ConstantExpr::get(Instruction::Or, PostShiftMask0,
		PostShiftMask1);
		return BinaryOperator::CreateAnd(Sh, Mask);
		}
		}
		Value *CommonOperand;
		if (PreShiftMask0 == PreShiftMask1 &&
		match(Op0, m_BinOp(m_Value(CommonOperand), m_Value())) &&
		!match(Op1, m_BinOp(m_Specific(CommonOperand), m_Value())) &&
		(IntermediateInstr0->hasOneUse() \|\|
		IntermediateInstr1->hasOneUse())) {
		Value *MaskedSource = Builder.CreateAnd(Source0, PreShiftMask0);
		Value *NewOp0 =
		Builder.CreateBinOp(ShiftOpcode0, MaskedSource, ShiftBy0);
		Value *NewOp1 =
		Builder.CreateBinOp(ShiftOpcode1, MaskedSource, ShiftBy1);
		return BinaryOperator::CreateOr(NewOp0, NewOp1);
		}
		}
		}

return Changed ? &I : nullptr;		return Changed ? &I : nullptr;
}		}

/// A ^ B can be specified using other logic ops in a variety of patterns. We		/// A ^ B can be specified using other logic ops in a variety of patterns. We
/// can fold these early and efficiently by morphing an existing instruction.		/// can fold these early and efficiently by morphing an existing instruction.
static Instruction *foldXorToXor(BinaryOperator &I,		static Instruction *foldXorToXor(BinaryOperator &I,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
assert(I.getOpcode() == Instruction::Xor);		assert(I.getOpcode() == Instruction::Xor);
▲ Show 20 Lines • Show All 364 Lines • Show Last 20 Lines

test/Transforms/InstCombine/or-shifted-masks.ll

	; RUN: opt -S -instcombine < %s \| FileCheck %s			; RUN: opt -S -instcombine < %s \| FileCheck %s

	define i32 @or_and_shifts1(i32 %x) {			define i32 @or_and_shifts1(i32 %x) {
	; CHECK-LABEL: @or_and_shifts1(			; CHECK-LABEL: @or_and_shifts1(
	; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 1			; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 1
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 3			; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 3
	; CHECK-NEXT: [[TMP3:%.*]] = and i32 %x, 1			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i32 [[TMP1]], 5
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i32 [[TMP3]], 5			; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = or i32 [[TMP2]], [[TMP4]]			; CHECK-NEXT: ret i32 [[TMP4]]
	; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	%1 = shl i32 %x, 3			%1 = shl i32 %x, 3
	%2 = and i32 %1, 15			%2 = and i32 %1, 15
	%3 = shl i32 %x, 5			%3 = shl i32 %x, 5
	%4 = and i32 %3, 60			%4 = and i32 %3, 60
	%5 = or i32 %2, %4			%5 = or i32 %2, %4
	ret i32 %5			ret i32 %5
	}			}

	define i32 @or_and_shifts2(i32 %x) {			define i32 @or_and_shifts2(i32 %x) {
	; CHECK-LABEL: @or_and_shifts2(			; CHECK-LABEL: @or_and_shifts2(
	; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 112			; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 112
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 3			; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 3
	; CHECK-NEXT: [[TMP3:%.*]] = lshr i32 %x, 4			; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i32 [[TMP1]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = and i32 [[TMP3]], 7			; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = or i32 [[TMP2]], [[TMP4]]			; CHECK-NEXT: ret i32 [[TMP4]]
	; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	%1 = shl i32 %x, 3			%1 = shl i32 %x, 3
	%2 = and i32 %1, 896			%2 = and i32 %1, 896
	%3 = lshr i32 %x, 4			%3 = lshr i32 %x, 4
	%4 = and i32 %3, 7			%4 = and i32 %3, 7
	%5 = or i32 %2, %4			%5 = or i32 %2, %4
	ret i32 %5			ret i32 %5
	}			}

	define i32 @or_and_shift_shift_and(i32 %x) {			define i32 @or_and_shift_shift_and(i32 %x) {
	; CHECK-LABEL: @or_and_shift_shift_and(			; CHECK-LABEL: @or_and_shift_shift_and(
	; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 7			; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 7
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 3			; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 3
	; CHECK-NEXT: [[TMP3:%.*]] = and i32 %x, 7			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i32 [[TMP1]], 2
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i32 [[TMP3]], 2			; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = or i32 [[TMP2]], [[TMP4]]			; CHECK-NEXT: ret i32 [[TMP4]]
	; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	%1 = and i32 %x, 7			%1 = and i32 %x, 7
	%2 = shl i32 %1, 3			%2 = shl i32 %1, 3
	%3 = shl i32 %x, 2			%3 = shl i32 %x, 2
	%4 = and i32 %3, 28			%4 = and i32 %3, 28
	%5 = or i32 %2, %4			%5 = or i32 %2, %4
	ret i32 %5			ret i32 %5
	}			}

	define i32 @multiuse1(i32 %x) {			define i32 @multiuse1(i32 %x) {
	; CHECK-LABEL: @multiuse1(			; CHECK-LABEL: @multiuse1(
	; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 6			; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 6
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 6			; CHECK-NEXT: [[TMP2:%.*]] = lshr exact i32 [[TMP1]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = lshr i32 %x, 1			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i32 [[TMP1]], 6
	; CHECK-NEXT: [[TMP4:%.*]] = and i32 [[TMP3]], 3			; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = or i32 [[TMP4]], [[TMP2]]			; CHECK-NEXT: ret i32 [[TMP4]]
	; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	%1 = and i32 %x, 2			%1 = and i32 %x, 2
	%2 = and i32 %x, 4			%2 = and i32 %x, 4
	%3 = shl nuw nsw i32 %1, 6			%3 = shl nuw nsw i32 %1, 6
	%4 = lshr exact i32 %1, 1			%4 = lshr exact i32 %1, 1
	%5 = shl nuw nsw i32 %2, 6			%5 = shl nuw nsw i32 %2, 6
	%6 = lshr exact i32 %2, 1			%6 = lshr exact i32 %2, 1
	%7 = or i32 %3, %5			%7 = or i32 %3, %5
	%8 = or i32 %4, %6			%8 = or i32 %4, %6
	%9 = or i32 %8, %7			%9 = or i32 %8, %7
	ret i32 %9			ret i32 %9
	}			}

	define i32 @multiuse2(i32 %x) {			define i32 @multiuse2(i32 %x) {
	; CHECK-LABEL: @multiuse2(			; CHECK-LABEL: @multiuse2(
	; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 126			; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 126
	; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 8			; CHECK-NEXT: [[TMP2:%.*]] = shl nuw nsw i32 [[TMP1]], 8
	; CHECK-NEXT: [[TMP3:%.*]] = and i32 %x, 126			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i32 [[TMP1]], 1
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i32 [[TMP3]], 1			; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = or i32 [[TMP2]], [[TMP4]]			; CHECK-NEXT: ret i32 [[TMP4]]
	; CHECK-NEXT: ret i32 [[TMP5]]
	;			;
	%1 = and i32 %x, 6			%1 = and i32 %x, 6
	%2 = shl nuw nsw i32 %1, 8			%2 = shl nuw nsw i32 %1, 8
	%3 = shl nuw nsw i32 %1, 1			%3 = shl nuw nsw i32 %1, 1
	%4 = and i32 %x, 24			%4 = and i32 %x, 24
	%5 = shl nuw nsw i32 %4, 8			%5 = shl nuw nsw i32 %4, 8
	%6 = shl nuw nsw i32 %4, 1			%6 = shl nuw nsw i32 %4, 1
	%7 = and i32 %x, 96			%7 = and i32 %x, 96
	%8 = shl nuw nsw i32 %7, 8			%8 = shl nuw nsw i32 %7, 8
	%9 = shl nuw nsw i32 %7, 1			%9 = shl nuw nsw i32 %7, 1
	%10 = or i32 %2, %5			%10 = or i32 %2, %5
	%11 = or i32 %8, %10			%11 = or i32 %8, %10
	%12 = or i32 %9, %6			%12 = or i32 %9, %6
	%13 = or i32 %3, %12			%13 = or i32 %3, %12
	%14 = or i32 %11, %13			%14 = or i32 %11, %13
	ret i32 %14			ret i32 %14
	}			}

	define i32 @multiuse3(i32 %x) {			define i32 @multiuse3(i32 %x) {
	; CHECK-LABEL: @multiuse3(			; CHECK-LABEL: @multiuse3(
	; CHECK-NEXT: [[TMP1:%.*]] = lshr i32 %x, 1			; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 126
	; CHECK-NEXT: [[TMP2:%.*]] = and i32 [[TMP1]], 48			; CHECK-NEXT: [[TMP2:%.*]] = lshr exact i32 [[TMP1]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = and i32 %x, 126			; CHECK-NEXT: [[TMP3:%.*]] = shl nuw nsw i32 [[TMP1]], 6
	; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i32 [[TMP3]], 6			; CHECK-NEXT: [[TMP4:%.*]] = or i32 [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = lshr i32 %x, 1			; CHECK-NEXT: ret i32 [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = and i32 [[TMP5]], 15
	; CHECK-NEXT: [[TMP7:%.*]] = or i32 [[TMP2]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = or i32 [[TMP7]], [[TMP4]]
	; CHECK-NEXT: ret i32 [[TMP8]]
	;			;
	%1 = and i32 %x, 96			%1 = and i32 %x, 96
	%2 = shl nuw nsw i32 %1, 6			%2 = shl nuw nsw i32 %1, 6
	%3 = lshr exact i32 %1, 1			%3 = lshr exact i32 %1, 1
	%4 = shl i32 %x, 6			%4 = shl i32 %x, 6
	%5 = and i32 %4, 1920			%5 = and i32 %4, 1920
	%6 = or i32 %2, %5			%6 = or i32 %2, %5
	%7 = lshr i32 %x, 1			%7 = lshr i32 %x, 1
	%8 = and i32 %7, 15			%8 = and i32 %7, 15
	%9 = or i32 %3, %8			%9 = or i32 %3, %8
	%10 = or i32 %9, %6			%10 = or i32 %9, %6
	ret i32 %10			ret i32 %10
	}			}

	define i32 @multiuse4(i32 %x) local_unnamed_addr #0 {			define i32 @multiuse4(i32 %x) local_unnamed_addr #0 {
	; CHECK-LABEL: @multiuse4(			; CHECK-LABEL: @multiuse4(
	; CHECK-NEXT: [[TMP1:%.*]] = and i32 %x, 100663296			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 %x, -1
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 %x, -1			; CHECK-NEXT: br i1 [[TMP1]], label %if, label %else
	; CHECK-NEXT: br i1 [[TMP2]], label %if, label %else
	; CHECK: {{.}}if:{{.}}			; CHECK: {{.}}if:{{.}}
	; CHECK-NEXT: [[TMP3:%.*]] = lshr exact i32 [[TMP1]], 22			; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 %x, 22
	; CHECK-NEXT: [[TMP4:%.*]] = lshr i32 %x, 22			; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP2]], 504
	; CHECK-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], 480
	; CHECK-NEXT: [[TMP6:%.*]] = or i32 [[TMP5]], [[TMP3]]
	; CHECK-NEXT: br label %end			; CHECK-NEXT: br label %end
	; CHECK: {{.}}else:{{.}}			; CHECK: {{.}}else:{{.}}
	; CHECK-NEXT: [[TMP7:%.*]] = lshr exact i32 [[TMP1]], 17			; CHECK-NEXT: [[TMP4:%.*]] = lshr i32 %x, 17
	; CHECK-NEXT: [[TMP8:%.*]] = lshr i32 %x, 17			; CHECK-NEXT: [[TMP5:%.*]] = and i32 [[TMP4]], 16128
	; CHECK-NEXT: [[TMP9:%.*]] = and i32 [[TMP8]], 15360
	; CHECK-NEXT: [[TMP10:%.*]] = or i32 [[TMP9]], [[TMP7]]
	; CHECK-NEXT: br label %end			; CHECK-NEXT: br label %end
	; CHECK: {{.}}end{{.}}			; CHECK: {{.}}end{{.}}
	; CHECK-NEXT: [[TMP11:%.*]] = phi i32 [ [[TMP6]], %if ], [ [[TMP10]], %else ]			; CHECK-NEXT: [[TMP6:%.*]] = phi i32 [ [[TMP3]], %if ], [ [[TMP5]], %else ]
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	%1 = and i32 %x, 100663296			%1 = and i32 %x, 100663296
	%2 = icmp sgt i32 %x, -1			%2 = icmp sgt i32 %x, -1
	br i1 %2, label %if, label %else			br i1 %2, label %if, label %else

	if:			if:
	%3 = lshr exact i32 %1, 22			%3 = lshr exact i32 %1, 22
	%4 = lshr i32 %x, 22			%4 = lshr i32 %x, 22
	Show All 10 Lines

	end:			end:
	%11 = phi i32 [ %6, %if ], [ %10, %else ]			%11 = phi i32 [ %6, %if ], [ %10, %else ]
	ret i32 %11			ret i32 %11
	}			}

	define i32 @multiuse5(i32 %x) local_unnamed_addr #0 {			define i32 @multiuse5(i32 %x) local_unnamed_addr #0 {
	; CHECK-LABEL: @multiuse5(			; CHECK-LABEL: @multiuse5(
	; CHECK-NEXT: [[TMP1:%.*]] = shl i32 %x, 5			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt i32 %x, -1
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt i32 %x, -1			; CHECK-NEXT: br i1 [[TMP1]], label %if, label %else
	; CHECK-NEXT: br i1 [[TMP2]], label %if, label %else
	; CHECK: {{.}}if:{{.}}			; CHECK: {{.}}if:{{.}}
	; CHECK-NEXT: [[TMP3:%.*]] = and i32 [[TMP1]], 21760			; CHECK-NEXT: [[TMP2:%.*]] = and i32 %x, 2040
	; CHECK-NEXT: [[TMP4:%.*]] = and i32 %x, 1360
	; CHECK-NEXT: [[TMP5:%.*]] = shl nuw nsw i32 [[TMP4]], 5
	; CHECK-NEXT: [[TMP6:%.*]] = or i32 [[TMP5]], [[TMP3]]
	; CHECK-NEXT: br label %end			; CHECK-NEXT: br label %end
	; CHECK: {{.}}else:{{.}}			; CHECK: {{.}}else:{{.}}
	; CHECK-NEXT: [[TMP7:%.*]] = and i32 [[TMP1]], 5570560			; CHECK-NEXT: [[TMP3:%.*]] = and i32 %x, 522240
	; CHECK-NEXT: [[TMP8:%.*]] = and i32 %x, 348160
	; CHECK-NEXT: [[TMP9:%.*]] = shl nuw nsw i32 [[TMP8]], 5
	; CHECK-NEXT: [[TMP10:%.*]] = or i32 [[TMP9]], [[TMP7]]
	; CHECK-NEXT: br label %end			; CHECK-NEXT: br label %end
	; CHECK: {{.}}end{{.}}			; CHECK: {{.}}end:{{.}}
	; CHECK-NEXT: [[TMP11:%.*]] = phi i32 [ [[TMP6]], %if ], [ [[TMP10]], %else ]			; CHECK-NEXT: [[IN:%.*]] = phi i32 [ [[TMP2]], %if ], [ [[TMP3]], %else ]
	; CHECK-NEXT: ret i32 [[TMP11]]			; CHECK-NEXT: [[TMP4:%.*]] = shl nuw nsw i32 [[IN]], 5
				; CHECK-NEXT: ret i32 [[TMP4]]
	;			;
	%1 = shl i32 %x, 5			%1 = shl i32 %x, 5
	%2 = icmp sgt i32 %x, -1			%2 = icmp sgt i32 %x, -1
	br i1 %2, label %if, label %else			br i1 %2, label %if, label %else

	if:			if:
	%3 = and i32 %1, 21760			%3 = and i32 %1, 21760
	%4 = and i32 %x, 1360			%4 = and i32 %x, 1360
	Show All 16 Lines