This is an archive of the discontinued LLVM Phabricator instance.

Differential D20774

[InstCombine] look through bitcasts to find selects
ClosedPublic

Authored by spatel on May 28 2016, 12:24 PM.

Download Raw Diff

Details

Reviewers

RKSimon
chandlerc
majnemer

Commits

rG6cf18af1c504: [InstCombine] look through bitcasts to find selects
rL271676: [InstCombine] look through bitcasts to find selects

Summary

The motivating example for this patch is this IR produced via SSE intrinsics in C:

define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x i32>
  %t1 = bitcast <2 x i64> %b to <4 x i32>
  %cmp = icmp sgt <4 x i32> %t0, %t1
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %t2 = bitcast <4 x i32> %sext to <2 x i64>
  %and = and <2 x i64> %t2, %a
  %neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
  %neg2 = bitcast <4 x i32> %neg to <2 x i64>
  %and2 = and <2 x i64> %neg2, %b
  %or = or <2 x i64> %and, %and2
  ret <2 x i64> %or
}

For an AVX target, this is currently:

vpcmpgtd	%xmm1, %xmm0, %xmm2
vpand	%xmm0, %xmm2, %xmm0
vpandn	%xmm1, %xmm2, %xmm1
vpor	%xmm1, %xmm0, %xmm0
retq

With this patch, it becomes:

vpmaxsd	%xmm1, %xmm0, %xmm0

Diff Detail

Repository: rL LLVM

Event Timeline

spatel updated this revision to Diff 58899.May 28 2016, 12:24 PM

spatel retitled this revision from to [InstCombine] look through bitcasts to find selects.

spatel updated this object.

spatel added reviewers: majnemer, RKSimon, chandlerc.

spatel added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptMay 28 2016, 12:24 PM

eli.friedman added a subscriber: eli.friedman.May 28 2016, 1:09 PM

eli.friedman added inline comments.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1335 ↗	(On Diff #58899)	This canonicalization seems incomplete... for example, if both operands are sext instructions, the one which belongs on the LHS is the one where the source is a bool, but this just picks randomly.
1359 ↗	(On Diff #58899)	Assuming bitcasts are free is kind of a big assumption... you could easily end up in situations where the bitcasts are not free. If X is scalar, the extra bitcasts should be folded away, but you could end up with bad effects on code like "((uint64_t)(vec1 == vec2)) & x".

spatel added inline comments.May 28 2016, 1:35 PM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1335 ↗	(On Diff #58899)	Yes, you're right. I was going to say it is just an existing bug for the sext/not cases, but it seems we don't optimize the bitcast case either: define <2 x i64> @vecBitcastOp0(<4 x i32> %a, <8 x i16> %b) { %bc1 = bitcast <4 x i32> %a to <2 x i64> %bc2 = bitcast <8 x i16> %b to <2 x i64> %and = and <2 x i64> %bc1, %bc2 ret <2 x i64> %and } I thought we'd eliminate one bitcast (randomly?) for this case and do the logic op in the format of one of the inputs. Is that a valid IR optimization?
1359 ↗	(On Diff #58899)	Would it be fair to assume that vector bitcasts to another vector type are free? If not, then I suppose I need to abandon this patch and try again in the DAG?

eli.friedman added inline comments.May 28 2016, 2:27 PM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1335 ↗	(On Diff #58899)	Well, you can't transform the return type of a function unconditionally; it could change the calling convention. But to actually answer your question, you can perform an "and" in any type of the right size as long as there aren't any padding bits involved; "and" is a bit-wise operation. Whether that's a good idea probably depends on the target; performing an "and" in an illegal type is likely to be expensive.
1359 ↗	(On Diff #58899)	Vector to vector bitcasts are free on most platforms, but not all. As a practical example, big-endian AArch64 has non-trivial vector bitcasts: a bitcast translates to a vector shuffle to put the elements in the right places. In theory, there's infrastructure for querying costs like this from transform passes; see TargetTransformInfo::getCastInstrCost. Not sure how that would work out in practice.

spatel added inline comments.May 28 2016, 3:05 PM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1335 ↗	(On Diff #58899)	Sorry - I didn't finish that example properly here. We'd of course need a bitcast after the "promoted" logic op to get the types to match. Please let me know what you think of this example: https://llvm.org/bugs/show_bug.cgi?id=27925
1359 ↗	(On Diff #58899)	Based on my experience so far, I assume we'd just delay this kind of combining to the DAG rather than attempt it in IR in a target-dependent way. It sounds like that's what needs to happen in this case: the problem can't be generalized to all targets, so I'll make an x86 solution to start and other targets can pick it up if they like. I'll let this sit a bit in case anyone else wants to comment, but for now I'm assuming I will eventually abandon this patch. Thank you for the feedback, Eli!

It's going to take multiple patches/steps to get this to match in the DAG. Is there any concern about making the following transform in IR:

Before:

define <2 x i64> @gibson(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x i32>
  %t1 = bitcast <2 x i64> %b to <4 x i32>
  %cmp = icmp sgt <4 x i32> %t0, %t1
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %t2 = bitcast <4 x i32> %sext to <2 x i64>
  %and = and <2 x i64> %t2, %a
  %neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
  %neg2 = bitcast <4 x i32> %neg to <2 x i64>
  %and2 = and <2 x i64> %neg2, %b
  %or = or <2 x i64> %and, %and2
  ret <2 x i64> %or
}

After:

define <2 x i64> @max(<2 x i64> %a, <2 x i64> %b) {
  %t0 = bitcast <2 x i64> %a to <4 x i32>
  %t1 = bitcast <2 x i64> %b to <4 x i32>
  %cmp = icmp sgt <4 x i32> %t0, %t1
  %or = select <4 x i1> %cmp, <4 x i32> %t0, <4 x i32> %t1
  %r1 = bitcast <4 x i32> %or to <2 x i64>
  ret <2 x i64> %r1
}

We're trading 4 logic ops, a bitcast, and a sext for a select. Is that better / more canonical IR for all targets?

In D20774#446452, @spatel wrote:

We're trading 4 logic ops, a bitcast, and a sext for a select. Is that better / more canonical IR for all targets?

I can answer my own question with a 'yes'. We already have this transform in matchSelectFromAndOr(). It looks like it just needs to be enhanced to see through the bitcasts.

spatel mentioned this in rL271603: [InstCombine] change tests to show a more obvious transform possibility.Jun 2 2016, 3:52 PM

Patch updated:
Use the existing matchSelectFromAndOr() infrastructure to transform the larger select pattern that is present in the motivating example.
Hopefully, this alleviates any concern about the cost of bitcasts because we're eliminating all of the logic/sext instructions.

LGTM with nits

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1647–1689 ↗	(On Diff #59470)	It would be nice if we could find a way to fold the code which handles the bitcast case with the code which doesn't.

This revision is now accepted and ready to land.Jun 2 2016, 4:15 PM

spatel added inline comments.Jun 2 2016, 4:25 PM

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1647–1689 ↗	(On Diff #59470)	Agreed - at the least, we can use m_CombineOr in the same way to make the code look similar. However, after my bot-killing / bug-raising misadventure in rL269728, I'm determined to make any refactoring a follow-up step. :) I'll add a FIXME comment in this patch. Thank you for the prompt review!

Closed by commit rL271676: [InstCombine] look through bitcasts to find selects (authored by spatel). · Explain WhyJun 3 2016, 7:48 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D21190: [InstCombine] allow more than one use for vector cast folding with selects.Jun 9 2016, 10:50 AM

spatel mentioned this in rL273011: [InstCombine] allow more than one use for vector bitcast folding with selects.Jun 17 2016, 9:53 AM

spatel mentioned this in D21661: [InstCombine] refactor optional bitcasting in matchSelectFromAndOr() into one code path (NFCI).Jun 23 2016, 2:59 PM

spatel mentioned this in D26641: [InstCombine] change bitwise logic type to eliminate bitcasts.Nov 14 2016, 3:27 PM

spatel mentioned this in rL287707: [InstCombine] change bitwise logic type to eliminate bitcasts.Nov 22 2016, 2:15 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

69 lines

test/

Transforms/

InstCombine/

logical-select.ll

46 lines

Diff 59557

llvm/trunk/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 1,635 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::MatchBSwap(BinaryOperator &I) {
for (auto *Inst : Insts)		for (auto *Inst : Insts)
Worklist.Add(Inst);		Worklist.Add(Inst);
return LastInst;		return LastInst;
}		}

/// We have an expression of the form (A&C)\|(B&D). Check if A is (cond?-1:0)		/// We have an expression of the form (A&C)\|(B&D). Check if A is (cond?-1:0)
/// and either B or D is ~(cond?-1,0) or (cond?0,-1), then we can simplify this		/// and either B or D is ~(cond?-1,0) or (cond?0,-1), then we can simplify this
/// expression to "cond ? C : D or B".		/// expression to "cond ? C : D or B".
static Instruction matchSelectFromAndOr(Value A, Value *B,		static Instruction matchSelectFromAndOr(Value A, Value B, Value C, Value *D,
Value C, Value D) {		InstCombiner::BuilderTy &Builder) {
// If A is not a select of -1/0, this cannot match.		// If A is not a select of -1/0, this cannot match.
Value *Cond = nullptr;		Value *Cond = nullptr;
if (!match(A, m_SExt(m_Value(Cond))) \|\| !Cond->getType()->isIntegerTy(1))		if (match(A, m_SExt(m_Value(Cond))) &&
return nullptr;		Cond->getType()->getScalarType()->isIntegerTy(1)) {

// ((cond?-1:0)&C) \| (B&(cond?0:-1)) -> cond ? C : B.		// ((cond ? -1:0) & C) \| (B & (cond ? 0:-1)) -> cond ? C : B.
if (match(D, m_Not(m_SExt(m_Specific(Cond)))))		if (match(D, m_Not(m_SExt(m_Specific(Cond)))))
return SelectInst::Create(Cond, C, B);		return SelectInst::Create(Cond, C, B);
if (match(D, m_SExt(m_Not(m_Specific(Cond)))))		if (match(D, m_SExt(m_Not(m_Specific(Cond)))))
return SelectInst::Create(Cond, C, B);		return SelectInst::Create(Cond, C, B);

// ((cond?-1:0)&C) \| ((cond?0:-1)&D) -> cond ? C : D.		// ((cond ? -1:0) & C) \| ((cond ? 0:-1) & D) -> cond ? C : D.
if (match(B, m_Not(m_SExt(m_Specific(Cond)))))		if (match(B, m_Not(m_SExt(m_Specific(Cond)))))
return SelectInst::Create(Cond, C, D);		return SelectInst::Create(Cond, C, D);
if (match(B, m_SExt(m_Not(m_Specific(Cond)))))		if (match(B, m_SExt(m_Not(m_Specific(Cond)))))
return SelectInst::Create(Cond, C, D);		return SelectInst::Create(Cond, C, D);
		}

		// TODO: Refactor the pattern matching above and below so there's less code.

		// The sign-extended boolean condition may be hiding behind a bitcast. In that
		// case, look for the same patterns as above. However, we need to bitcast the
		// input operands to the select and bitcast the output of the select to match
		// the expected types.
		if (match(A, m_BitCast(m_SExt(m_Value(Cond)))) &&
		Cond->getType()->getScalarType()->isIntegerTy(1)) {

		Type *SrcType = cast<BitCastInst>(A)->getSrcTy();

		// ((bc Cond) & C) \| (B & (bc ~Cond)) --> bc (select Cond, (bc C), (bc B))
		if (match(D, m_CombineOr(m_BitCast(m_Not(m_SExt(m_Specific(Cond)))),
		m_BitCast(m_SExt(m_Not(m_Specific(Cond))))))) {
		Value *BitcastC = Builder.CreateBitCast(C, SrcType);
		Value *BitcastB = Builder.CreateBitCast(B, SrcType);
		Value *Select = Builder.CreateSelect(Cond, BitcastC, BitcastB);
		return CastInst::Create(Instruction::BitCast, Select, A->getType());
		}

		// ((bc Cond) & C) \| ((bc ~Cond) & D) --> bc (select Cond, (bc C), (bc D))
		if (match(B, m_CombineOr(m_BitCast(m_Not(m_SExt(m_Specific(Cond)))),
		m_BitCast(m_SExt(m_Not(m_Specific(Cond))))))) {
		Value *BitcastC = Builder.CreateBitCast(C, SrcType);
		Value *BitcastD = Builder.CreateBitCast(D, SrcType);
		Value *Select = Builder.CreateSelect(Cond, BitcastC, BitcastD);
		return CastInst::Create(Instruction::BitCast, Select, A->getType());
		}
		}

return nullptr;		return nullptr;
}		}

/// Fold (icmp)\|(icmp) if possible.		/// Fold (icmp)\|(icmp) if possible.
Value InstCombiner::FoldOrOfICmps(ICmpInst LHS, ICmpInst *RHS,		Value InstCombiner::FoldOrOfICmps(ICmpInst LHS, ICmpInst *RHS,
Instruction *CxtI) {		Instruction *CxtI) {
ICmpInst::Predicate LHSCC = LHS->getPredicate(), RHSCC = RHS->getPredicate();		ICmpInst::Predicate LHSCC = LHS->getPredicate(), RHSCC = RHS->getPredicate();
▲ Show 20 Lines • Show All 581 Lines • ▼ Show 20 Lines	if (C1 && C2) { // (A & C1)\|(B & C2)
V2 = Builder->CreateOr(V1, ConstantExpr::getOr(C3, C4), "bitfield");		V2 = Builder->CreateOr(V1, ConstantExpr::getOr(C3, C4), "bitfield");
return BinaryOperator::CreateAnd(V2,		return BinaryOperator::CreateAnd(V2,
Builder->getInt(C1->getValue()\|C2->getValue()));		Builder->getInt(C1->getValue()\|C2->getValue()));
}		}
}		}
}		}

// (A & (C0?-1:0)) \| (B & ~(C0?-1:0)) -> C0 ? A : B, and commuted variants.		// (A & (C0?-1:0)) \| (B & ~(C0?-1:0)) -> C0 ? A : B, and commuted variants.
if (Instruction *Match = matchSelectFromAndOr(A, B, C, D))		if (Instruction Match = matchSelectFromAndOr(A, B, C, D, Builder))
return Match;		return Match;
if (Instruction *Match = matchSelectFromAndOr(B, A, D, C))		if (Instruction Match = matchSelectFromAndOr(B, A, D, C, Builder))
return Match;		return Match;
if (Instruction *Match = matchSelectFromAndOr(C, B, A, D))		if (Instruction Match = matchSelectFromAndOr(C, B, A, D, Builder))
return Match;		return Match;
if (Instruction *Match = matchSelectFromAndOr(D, A, B, C))		if (Instruction Match = matchSelectFromAndOr(D, A, B, C, Builder))
return Match;		return Match;

// ((A&~B)\|(~A&B)) -> A^B		// ((A&~B)\|(~A&B)) -> A^B
if ((match(C, m_Not(m_Specific(D))) &&		if ((match(C, m_Not(m_Specific(D))) &&
match(B, m_Not(m_Specific(A)))))		match(B, m_Not(m_Specific(A)))))
return BinaryOperator::CreateXor(A, D);		return BinaryOperator::CreateXor(A, D);
// ((~B&A)\|(~A&B)) -> A^B		// ((~B&A)\|(~A&B)) -> A^B
if ((match(A, m_Not(m_Specific(D))) &&		if ((match(A, m_Not(m_Specific(D))) &&
▲ Show 20 Lines • Show All 483 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/logical-select.ll

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	;
%iftmp.1.0 = select i1 %t0, i32 -1, i32 0		%iftmp.1.0 = select i1 %t0, i32 -1, i32 0
%t1 = and i32 %iftmp.1.0, %c		%t1 = and i32 %iftmp.1.0, %c
%not = xor i32 %iftmp.1.0, -1		%not = xor i32 %iftmp.1.0, -1
%t2 = and i32 %not, %d		%t2 = and i32 %not, %d
%t3 = or i32 %t1, %t2		%t3 = or i32 %t1, %t2
ret i32 %t3		ret i32 %t3
}		}

; FIXME: In the following tests, verify that a bitcast doesn't get in the way		; In the following tests, verify that a bitcast doesn't get in the way
; of a select transform. These bitcasts are common in SSE/AVX and possibly		; of a select transform. These bitcasts are common in SSE/AVX and possibly
; other vector code because of canonicalization to i64 elements for vectors.		; other vector code because of canonicalization to i64 elements for vectors.

define <2 x i64> @bitcast_select(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @bitcast_select(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {
; CHECK-LABEL: @bitcast_select(		; CHECK-LABEL: @bitcast_select(
; CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> %cmp to <4 x i32>		; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %a to <4 x i32>
; CHECK-NEXT: [[T2:%.*]] = bitcast <4 x i32> [[SEXT]] to <2 x i64>		; CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> %b to <4 x i32>
; CHECK-NEXT: [[AND:%.*]] = and <2 x i64> [[T2]], %a		; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> %cmp, <4 x i32> [[TMP1]], <4 x i32> [[TMP2]]
; CHECK-NEXT: [[NEG:%.*]] = xor <4 x i32> [[SEXT]], <i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[OR:%.*]] = bitcast <4 x i32> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[NEG2:%.*]] = bitcast <4 x i32> [[NEG]] to <2 x i64>
; CHECK-NEXT: [[AND2:%.*]] = and <2 x i64> [[NEG2]], %b
; CHECK-NEXT: [[OR:%.*]] = or <2 x i64> [[AND]], [[AND2]]
; CHECK-NEXT: ret <2 x i64> [[OR]]		; CHECK-NEXT: ret <2 x i64> [[OR]]
;		;
%sext = sext <4 x i1> %cmp to <4 x i32>		%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>		%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %t2, %a		%and = and <2 x i64> %t2, %a
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>		%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>		%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %neg2, %b		%and2 = and <2 x i64> %neg2, %b
%or = or <2 x i64> %and, %and2		%or = or <2 x i64> %and, %and2
ret <2 x i64> %or		ret <2 x i64> %or
}		}

define <2 x i64> @bitcast_select_swap_or_ops(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @bitcast_select_swap_or_ops(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {
; CHECK-LABEL: @bitcast_select_swap_or_ops(		; CHECK-LABEL: @bitcast_select_swap_or_ops(
; CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> %cmp to <4 x i32>		; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %a to <4 x i32>
; CHECK-NEXT: [[T2:%.*]] = bitcast <4 x i32> [[SEXT]] to <2 x i64>		; CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> %b to <4 x i32>
; CHECK-NEXT: [[AND:%.*]] = and <2 x i64> [[T2]], %a		; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> %cmp, <4 x i32> [[TMP1]], <4 x i32> [[TMP2]]
; CHECK-NEXT: [[NEG:%.*]] = xor <4 x i32> [[SEXT]], <i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[OR:%.*]] = bitcast <4 x i32> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[NEG2:%.*]] = bitcast <4 x i32> [[NEG]] to <2 x i64>
; CHECK-NEXT: [[AND2:%.*]] = and <2 x i64> [[NEG2]], %b
; CHECK-NEXT: [[OR:%.*]] = or <2 x i64> [[AND2]], [[AND]]
; CHECK-NEXT: ret <2 x i64> [[OR]]		; CHECK-NEXT: ret <2 x i64> [[OR]]
;		;
%sext = sext <4 x i1> %cmp to <4 x i32>		%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>		%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %t2, %a		%and = and <2 x i64> %t2, %a
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>		%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>		%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %neg2, %b		%and2 = and <2 x i64> %neg2, %b
%or = or <2 x i64> %and2, %and		%or = or <2 x i64> %and2, %and
ret <2 x i64> %or		ret <2 x i64> %or
}		}

define <2 x i64> @bitcast_select_swap_and_ops(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @bitcast_select_swap_and_ops(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {
; CHECK-LABEL: @bitcast_select_swap_and_ops(		; CHECK-LABEL: @bitcast_select_swap_and_ops(
; CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> %cmp to <4 x i32>		; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %a to <4 x i32>
; CHECK-NEXT: [[T2:%.*]] = bitcast <4 x i32> [[SEXT]] to <2 x i64>		; CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> %b to <4 x i32>
; CHECK-NEXT: [[AND:%.*]] = and <2 x i64> [[T2]], %a		; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> %cmp, <4 x i32> [[TMP1]], <4 x i32> [[TMP2]]
; CHECK-NEXT: [[NEG:%.*]] = xor <4 x i32> [[SEXT]], <i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[OR:%.*]] = bitcast <4 x i32> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[NEG2:%.*]] = bitcast <4 x i32> [[NEG]] to <2 x i64>
; CHECK-NEXT: [[AND2:%.*]] = and <2 x i64> [[NEG2]], %b
; CHECK-NEXT: [[OR:%.*]] = or <2 x i64> [[AND]], [[AND2]]
; CHECK-NEXT: ret <2 x i64> [[OR]]		; CHECK-NEXT: ret <2 x i64> [[OR]]
;		;
%sext = sext <4 x i1> %cmp to <4 x i32>		%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>		%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %t2, %a		%and = and <2 x i64> %t2, %a
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>		%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>		%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %b, %neg2		%and2 = and <2 x i64> %b, %neg2
%or = or <2 x i64> %and, %and2		%or = or <2 x i64> %and, %and2
ret <2 x i64> %or		ret <2 x i64> %or
}		}

define <2 x i64> @bitcast_select_swap_and_ops2(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {		define <2 x i64> @bitcast_select_swap_and_ops2(<4 x i1> %cmp, <2 x i64> %a, <2 x i64> %b) {
; CHECK-LABEL: @bitcast_select_swap_and_ops2(		; CHECK-LABEL: @bitcast_select_swap_and_ops2(
; CHECK-NEXT: [[SEXT:%.*]] = sext <4 x i1> %cmp to <4 x i32>		; CHECK-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> %a to <4 x i32>
; CHECK-NEXT: [[T2:%.*]] = bitcast <4 x i32> [[SEXT]] to <2 x i64>		; CHECK-NEXT: [[TMP2:%.*]] = bitcast <2 x i64> %b to <4 x i32>
; CHECK-NEXT: [[AND:%.*]] = and <2 x i64> [[T2]], %a		; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> %cmp, <4 x i32> [[TMP1]], <4 x i32> [[TMP2]]
; CHECK-NEXT: [[NEG:%.*]] = xor <4 x i32> [[SEXT]], <i32 -1, i32 -1, i32 -1, i32 -1>		; CHECK-NEXT: [[OR:%.*]] = bitcast <4 x i32> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[NEG2:%.*]] = bitcast <4 x i32> [[NEG]] to <2 x i64>
; CHECK-NEXT: [[AND2:%.*]] = and <2 x i64> [[NEG2]], %b
; CHECK-NEXT: [[OR:%.*]] = or <2 x i64> [[AND]], [[AND2]]
; CHECK-NEXT: ret <2 x i64> [[OR]]		; CHECK-NEXT: ret <2 x i64> [[OR]]
;		;
%sext = sext <4 x i1> %cmp to <4 x i32>		%sext = sext <4 x i1> %cmp to <4 x i32>
%t2 = bitcast <4 x i32> %sext to <2 x i64>		%t2 = bitcast <4 x i32> %sext to <2 x i64>
%and = and <2 x i64> %a, %t2		%and = and <2 x i64> %a, %t2
%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>		%neg = xor <4 x i32> %sext, <i32 -1, i32 -1, i32 -1, i32 -1>
%neg2 = bitcast <4 x i32> %neg to <2 x i64>		%neg2 = bitcast <4 x i32> %neg to <2 x i64>
%and2 = and <2 x i64> %neg2, %b		%and2 = and <2 x i64> %neg2, %b
%or = or <2 x i64> %and, %and2		%or = or <2 x i64> %and, %and2
ret <2 x i64> %or		ret <2 x i64> %or
}		}