This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine][SSE] Demanded vector elements for scalar intrinsics
ClosedPublic

Authored by RKSimon on Feb 21 2016, 8:56 AM.

Download Raw Diff

Details

Reviewers

spatel
majnemer
craig.topper

Commits

rG424da1637a8a: [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2)
rL267356: [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2)

Summary

This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:

1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)

2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon updated this revision to Diff 48623.Feb 21 2016, 8:56 AM

RKSimon retitled this revision from to [InstCombine][SSE] Demanded vector elements for scalar intrinsics.

RKSimon updated this object.

RKSimon added reviewers: majnemer, craig.topper, spatel.

RKSimon set the repository for this revision to rL LLVM.

RKSimon added a subscriber: llvm-commits.

ping?

4 - addss gets simplified to a fadd call if we aren't interested in the pass through elements (can we do this for div/sqrt?)

I think so - please add a TODO comment. But this change really should be a separate patch review because it has FP exception behavior implications, doesn't it?

I'm happy to split this into multiple commits if anyone thinks its necessary.

Yes, this should definitely be split up for committing for easier tracking and reversion if needed. I'd argue that it should be done for review purposes too as there are so many test cases to consider.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1243–1258 ↗	(On Diff #48623)	Don't these also demand the lowest element of the first input vector?
1275–1278 ↗	(On Diff #48623)	Instead of returning immediately, isn't it more efficient to fall-through to the check of the 2nd arg? Ie, replace both args in one shot if possible. That's what the code in SimplifyDemandedVectorElts() is doing if I'm reading it correctly.

Cheers Sanjay, I'll split off the InstCombineSimplifyDemanded.cpp diffs into a separate patch

RKSimon mentioned this in rL266731: [InstCombine][X86] Regenerate SSE combine tests as part of setup for D17490.Apr 19 2016, 6:02 AM

RKSimon mentioned this in rL266732: [InstCombine][X86] Added extra tests introduced for D17490.Apr 19 2016, 6:05 AM

Split based on Sanjay's comments

RKSimon added inline comments.Apr 19 2016, 6:37 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
1374–1389 ↗	(On Diff #54186)	For the binary scalar intrinsics we need all the elements of the first input: the lowest is used in the operation and the remaining are all 'passed through' to the result.
1406–1409 ↗	(On Diff #54186)	I've included the 'MadeChange' bool so we can combine both in one pass. I've also done this in the existing similar COMI/UCOMI combine above.

LGTM - see inline for 1 minor change to the patch itself.

The rest of the comments are just 'thinking out loud' sort of questions so we have a paper trail for how we got here.
On that note, if you know of bug reports that track any of this, please do include them in the commit message.
https://llvm.org/bugs/show_bug.cgi?id=22206 is one for the sqrt intrinsic for whenever that part of the patch set lands.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1374–1389 ↗	(On Diff #54186)	Got it - it was the code comment that threw me off. Please change to something like: "These intrinsics demand all elements of the first input because the high elements of that input are passed through. The low elements of both inputs are used for the actual binary op."
test/Transforms/InstCombine/x86-sse.ll
134–136 ↗	(On Diff #54186)	The next patch will delete these extra inserts?
test/Transforms/InstCombine/x86-sse2.ll
72–78 ↗	(On Diff #54186)	This case is, "If a tree falls in the IEEE754 woods..." right? Ie, with fast-math, we zap the whole thing, but I don't think we can under default settings.
105–118 ↗	(On Diff #54186)	This isn't part of this patch of course, but this isn't always right, is it? See previous comment about default behavior under IEEE754. To maintain correct exception behavior, we should have subtracted %b from %a. But I suppose this is ok because Clang/LLVM don't support changing the FP env? The patch that converts the intrinsic to regular IR will allow this to become "ret double 1.0"?

This revision is now accepted and ready to land.Apr 19 2016, 9:09 AM

RKSimon mentioned this in D19318: [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 2).Apr 20 2016, 5:10 AM

RKSimon added inline comments.Apr 20 2016, 5:21 AM

test/Transforms/InstCombine/x86-sse.ll
134–136 ↗	(On Diff #54186)	Yes D19318 handles those.
test/Transforms/InstCombine/x86-sse2.ll
105–118 ↗	(On Diff #54186)	Yes, this is what D19318 is handling - we shouldn't call the scalar intrinsics if we don't actually use the result and the demanded-elts logic has nixed the lowest element inputs.

RKSimon mentioned this in rL267355: [InstCombine] Avoid updating argument demanded elements in separate passes..Apr 24 2016, 11:03 AM

Closed by commit rL267356: [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2) (authored by RKSimon). · Explain WhyApr 24 2016, 11:18 AM

This revision was automatically updated to reflect the committed changes.

RKSimon mentioned this in rL267357: [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 2 of 2).Apr 24 2016, 11:29 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

52 lines

test/

Transforms/

InstCombine/

x86-sse.ll

70 lines

x86-sse2.ll

85 lines

x86-sse41.ll

64 lines

Diff 54805

llvm/trunk/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 1,071 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitCallInst(CallInst &CI) {
}		}

auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,		auto SimplifyDemandedVectorEltsLow = [this](Value *Op, unsigned Width,
unsigned DemandedWidth) {		unsigned DemandedWidth) {
APInt UndefElts(Width, 0);		APInt UndefElts(Width, 0);
APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);		APInt DemandedElts = APInt::getLowBitsSet(Width, DemandedWidth);
return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);		return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);
};		};
		auto SimplifyDemandedVectorEltsHigh = [this](Value *Op, unsigned Width,
		unsigned DemandedWidth) {
		APInt UndefElts(Width, 0);
		APInt DemandedElts = APInt::getHighBitsSet(Width, DemandedWidth);
		return SimplifyDemandedVectorElts(Op, DemandedElts, UndefElts);
		};

switch (II->getIntrinsicID()) {		switch (II->getIntrinsicID()) {
default: break;		default: break;
case Intrinsic::objectsize: {		case Intrinsic::objectsize: {
uint64_t Size;		uint64_t Size;
if (getObjectSize(II->getArgOperand(0), Size, DL, TLI)) {		if (getObjectSize(II->getArgOperand(0), Size, DL, TLI)) {
APInt APSize(II->getType()->getIntegerBitWidth(), Size);		APInt APSize(II->getType()->getIntegerBitWidth(), Size);
// Equality check to be sure that `Size` can fit in a value of type		// Equality check to be sure that `Size` can fit in a value of type
▲ Show 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
II->setArgOperand(1, V);		II->setArgOperand(1, V);
MadeChange = true;		MadeChange = true;
}		}
if (MadeChange)		if (MadeChange)
return II;		return II;
break;		break;
}		}

		case Intrinsic::x86_sse_add_ss:
		case Intrinsic::x86_sse_sub_ss:
		case Intrinsic::x86_sse_mul_ss:
		case Intrinsic::x86_sse_div_ss:
		case Intrinsic::x86_sse_min_ss:
		case Intrinsic::x86_sse_max_ss:
		case Intrinsic::x86_sse_cmp_ss:
		case Intrinsic::x86_sse2_add_sd:
		case Intrinsic::x86_sse2_sub_sd:
		case Intrinsic::x86_sse2_mul_sd:
		case Intrinsic::x86_sse2_div_sd:
		case Intrinsic::x86_sse2_min_sd:
		case Intrinsic::x86_sse2_max_sd:
		case Intrinsic::x86_sse2_cmp_sd: {
		// These intrinsics only demand the lowest element of the second input
		// vector.
		Value *Arg1 = II->getArgOperand(1);
		unsigned VWidth = Arg1->getType()->getVectorNumElements();
		if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
		II->setArgOperand(1, V);
		return II;
		}
		break;
		}

		case Intrinsic::x86_sse41_round_ss:
		case Intrinsic::x86_sse41_round_sd: {
		// These intrinsics demand the upper elements of the first input vector and
		// the lowest element of the second input vector.
		bool MadeChange = false;
		Value *Arg0 = II->getArgOperand(0);
		Value *Arg1 = II->getArgOperand(1);
		unsigned VWidth = Arg0->getType()->getVectorNumElements();
		if (Value *V = SimplifyDemandedVectorEltsHigh(Arg0, VWidth, VWidth - 1)) {
		II->setArgOperand(0, V);
		MadeChange = true;
		}
		if (Value *V = SimplifyDemandedVectorEltsLow(Arg1, VWidth, 1)) {
		II->setArgOperand(1, V);
		MadeChange = true;
		}
		if (MadeChange)
		return II;
		break;
		}

// Constant fold ashr( <A x Bi>, Ci ).		// Constant fold ashr( <A x Bi>, Ci ).
// Constant fold lshr( <A x Bi>, Ci ).		// Constant fold lshr( <A x Bi>, Ci ).
// Constant fold shl( <A x Bi>, Ci ).		// Constant fold shl( <A x Bi>, Ci ).
case Intrinsic::x86_sse2_psrai_d:		case Intrinsic::x86_sse2_psrai_d:
case Intrinsic::x86_sse2_psrai_w:		case Intrinsic::x86_sse2_psrai_w:
case Intrinsic::x86_avx2_psrai_d:		case Intrinsic::x86_avx2_psrai_d:
case Intrinsic::x86_avx2_psrai_w:		case Intrinsic::x86_avx2_psrai_w:
case Intrinsic::x86_sse2_psrli_d:		case Intrinsic::x86_sse2_psrli_d:
▲ Show 20 Lines • Show All 1,339 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/x86-sse.ll

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	;
%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3		%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
%5 = tail call <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float> %4)		%5 = tail call <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float> %4)
%6 = extractelement <4 x float> %5, i32 3		%6 = extractelement <4 x float> %5, i32 3
ret float %6		ret float %6
}		}

define <4 x float> @test_add_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_add_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_add_ss(		; CHECK-LABEL: @test_add_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> %a, <4 x float> %b)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> %a, <4 x float> [[TMP3]])
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> %a, <4 x float> %3)		%4 = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> %a, <4 x float> %3)
ret <4 x float> %4		ret <4 x float> %4
}		}

define float @test_add_ss_0(float %a, float %b) {		define float @test_add_ss_0(float %a, float %b) {
; CHECK-LABEL: @test_add_ss_0(		; CHECK-LABEL: @test_add_ss_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float 4.000000e+00, i32 1		; CHECK-NEXT: [[TMP6:%.*]] = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[TMP6]], float 5.000000e+00, i32 2		; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP6]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float 6.000000e+00, i32 3
; CHECK-NEXT: [[TMP9:%.*]] = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> [[TMP4]], <4 x float> [[TMP8]])
; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
; CHECK-NEXT: ret float [[R]]		; CHECK-NEXT: ret float [[R]]
;		;
%1 = insertelement <4 x float> undef, float %a, i32 0		%1 = insertelement <4 x float> undef, float %a, i32 0
%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1		%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1
%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2		%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2
%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3		%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1		%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1
Show All 22 Lines	;
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> %4, <4 x float> %5)		%6 = tail call <4 x float> @llvm.x86.sse.add.ss(<4 x float> %4, <4 x float> %5)
%7 = extractelement <4 x float> %6, i32 1		%7 = extractelement <4 x float> %6, i32 1
ret float %7		ret float %7
}		}

define <4 x float> @test_sub_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_sub_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_sub_ss(		; CHECK-LABEL: @test_sub_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.sub.ss(<4 x float> %a, <4 x float> %b)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.sub.ss(<4 x float> %a, <4 x float> [[TMP3]])
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.sub.ss(<4 x float> %a, <4 x float> %3)		%4 = tail call <4 x float> @llvm.x86.sse.sub.ss(<4 x float> %a, <4 x float> %3)
ret <4 x float> %4		ret <4 x float> %4
}		}

Show All 28 Lines	;
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = tail call <4 x float> @llvm.x86.sse.sub.ss(<4 x float> %4, <4 x float> %5)		%6 = tail call <4 x float> @llvm.x86.sse.sub.ss(<4 x float> %4, <4 x float> %5)
%7 = extractelement <4 x float> %6, i32 2		%7 = extractelement <4 x float> %6, i32 2
ret float %7		ret float %7
}		}

define <4 x float> @test_mul_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_mul_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_mul_ss(		; CHECK-LABEL: @test_mul_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.mul.ss(<4 x float> %a, <4 x float> %b)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.mul.ss(<4 x float> %a, <4 x float> [[TMP3]])
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.mul.ss(<4 x float> %a, <4 x float> %3)		%4 = tail call <4 x float> @llvm.x86.sse.mul.ss(<4 x float> %a, <4 x float> %3)
ret <4 x float> %4		ret <4 x float> %4
}		}

Show All 28 Lines	;
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = tail call <4 x float> @llvm.x86.sse.mul.ss(<4 x float> %4, <4 x float> %5)		%6 = tail call <4 x float> @llvm.x86.sse.mul.ss(<4 x float> %4, <4 x float> %5)
%7 = extractelement <4 x float> %6, i32 3		%7 = extractelement <4 x float> %6, i32 3
ret float %7		ret float %7
}		}

define <4 x float> @test_div_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_div_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_div_ss(		; CHECK-LABEL: @test_div_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> %a, <4 x float> %b)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> %a, <4 x float> [[TMP3]])
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> %a, <4 x float> %3)		%4 = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> %a, <4 x float> %3)
ret <4 x float> %4		ret <4 x float> %4
}		}

define float @test_div_ss_0(float %a, float %b) {		define float @test_div_ss_0(float %a, float %b) {
; CHECK-LABEL: @test_div_ss_0(		; CHECK-LABEL: @test_div_ss_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float 4.000000e+00, i32 1		; CHECK-NEXT: [[TMP6:%.*]] = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> [[TMP4]], <4 x float> [[TMP5]])
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[TMP6]], float 5.000000e+00, i32 2		; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP6]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float 6.000000e+00, i32 3
; CHECK-NEXT: [[TMP9:%.*]] = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> [[TMP4]], <4 x float> [[TMP8]])
; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
; CHECK-NEXT: ret float [[R]]		; CHECK-NEXT: ret float [[R]]
;		;
%1 = insertelement <4 x float> undef, float %a, i32 0		%1 = insertelement <4 x float> undef, float %a, i32 0
%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1		%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1
%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2		%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2
%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3		%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1		%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1
Show All 22 Lines	;
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> %4, <4 x float> %5)		%6 = tail call <4 x float> @llvm.x86.sse.div.ss(<4 x float> %4, <4 x float> %5)
%7 = extractelement <4 x float> %6, i32 1		%7 = extractelement <4 x float> %6, i32 1
ret float %7		ret float %7
}		}

define <4 x float> @test_min_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_min_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_min_ss(		; CHECK-LABEL: @test_min_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> %a, <4 x float> %b)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> %a, <4 x float> [[TMP3]])
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> %a, <4 x float> %3)		%4 = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> %a, <4 x float> %3)
ret <4 x float> %4		ret <4 x float> %4
}		}

Show All 31 Lines	;
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> %4, <4 x float> %5)		%6 = tail call <4 x float> @llvm.x86.sse.min.ss(<4 x float> %4, <4 x float> %5)
%7 = extractelement <4 x float> %6, i32 2		%7 = extractelement <4 x float> %6, i32 2
ret float %7		ret float %7
}		}

define <4 x float> @test_max_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_max_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_max_ss(		; CHECK-LABEL: @test_max_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.max.ss(<4 x float> %a, <4 x float> %b)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.max.ss(<4 x float> %a, <4 x float> [[TMP3]])
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.max.ss(<4 x float> %a, <4 x float> %3)		%4 = tail call <4 x float> @llvm.x86.sse.max.ss(<4 x float> %a, <4 x float> %3)
ret <4 x float> %4		ret <4 x float> %4
}		}

Show All 31 Lines	;
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = tail call <4 x float> @llvm.x86.sse.max.ss(<4 x float> %4, <4 x float> %5)		%6 = tail call <4 x float> @llvm.x86.sse.max.ss(<4 x float> %4, <4 x float> %5)
%7 = extractelement <4 x float> %6, i32 3		%7 = extractelement <4 x float> %6, i32 3
ret float %7		ret float %7
}		}

define <4 x float> @test_cmp_ss(<4 x float> %a, <4 x float> %b) {		define <4 x float> @test_cmp_ss(<4 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @test_cmp_ss(		; CHECK-LABEL: @test_cmp_ss(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %a, <4 x float> %b, i8 0)
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2		; CHECK-NEXT: ret <4 x float> [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP4:%.*]] = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %a, <4 x float> [[TMP3]], i8 0)
; CHECK-NEXT: ret <4 x float> [[TMP4]]
;		;
%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1		%1 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2		%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3		%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
%4 = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %a, <4 x float> %3, i8 0)		%4 = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %a, <4 x float> %3, i8 0)
ret <4 x float> %4		ret <4 x float> %4
}		}

define float @test_cmp_ss_0(float %a, float %b) {		define float @test_cmp_ss_0(float %a, float %b) {
; CHECK-LABEL: @test_cmp_ss_0(		; CHECK-LABEL: @test_cmp_ss_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3
; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0
; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float 4.000000e+00, i32 1		; CHECK-NEXT: [[TMP6:%.*]] = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> [[TMP4]], <4 x float> [[TMP5]], i8 0)
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[TMP6]], float 5.000000e+00, i32 2		; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP6]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float 6.000000e+00, i32 3
; CHECK-NEXT: [[TMP9:%.*]] = tail call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> [[TMP4]], <4 x float> [[TMP8]], i8 0)
; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
; CHECK-NEXT: ret float [[R]]		; CHECK-NEXT: ret float [[R]]
;		;
%1 = insertelement <4 x float> undef, float %a, i32 0		%1 = insertelement <4 x float> undef, float %a, i32 0
%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1		%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1
%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2		%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2
%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3		%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
%5 = insertelement <4 x float> undef, float %b, i32 0		%5 = insertelement <4 x float> undef, float %b, i32 0
%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1		%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1
▲ Show 20 Lines • Show All 281 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/x86-sse2.ll

Show All 28 Lines	;
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %2)		%3 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %2)
%4 = extractelement <2 x double> %3, i32 1		%4 = extractelement <2 x double> %3, i32 1
ret double %4		ret double %4
}		}

define <2 x double> @test_add_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_add_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_add_sd(		; CHECK-LABEL: @test_add_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %a, <2 x double> %b)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %a, <2 x double> [[TMP1]])		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %a, <2 x double> %1)		%2 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %a, <2 x double> %1)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_add_sd_0(double %a, double %b) {		define double @test_add_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_add_sd_0(		; CHECK-LABEL: @test_add_sd_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP4:%.*]] = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> [[TMP2]], <2 x double> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0		; CHECK-NEXT: ret double [[TMP5]]
; CHECK-NEXT: ret double [[TMP6]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_add_sd_1(double %a, double %b) {		define double @test_add_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_add_sd_1(		; CHECK-LABEL: @test_add_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP4:%.*]] = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> [[TMP2]], <2 x double> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 1		; CHECK-NEXT: ret double [[TMP5]]
; CHECK-NEXT: ret double [[TMP6]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.add.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
}		}

define <2 x double> @test_sub_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_sub_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_sub_sd(		; CHECK-LABEL: @test_sub_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %a, <2 x double> %b)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %a, <2 x double> [[TMP1]])		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %a, <2 x double> %1)		%2 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %a, <2 x double> %1)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_sub_sd_0(double %a, double %b) {		define double @test_sub_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_sub_sd_0(		; CHECK-LABEL: @test_sub_sd_0(
; CHECK-NEXT: [[TMP1:%.*]] = fsub double %a, %b		; CHECK-NEXT: [[TMP1:%.*]] = fsub double %a, %b
; CHECK-NEXT: ret double [[TMP1]]		; CHECK-NEXT: ret double [[TMP1]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_sub_sd_1(double %a, double %b) {		define double @test_sub_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_sub_sd_1(		; CHECK-LABEL: @test_sub_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> <double undef, double 2.000000e+00>)		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> undef)
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: ret double [[TMP2]]		; CHECK-NEXT: ret double [[TMP2]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.sub.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
}		}

define <2 x double> @test_mul_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_mul_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_mul_sd(		; CHECK-LABEL: @test_mul_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %a, <2 x double> %b)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %a, <2 x double> [[TMP1]])		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %a, <2 x double> %1)		%2 = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %a, <2 x double> %1)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_mul_sd_0(double %a, double %b) {		define double @test_mul_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_mul_sd_0(		; CHECK-LABEL: @test_mul_sd_0(
; CHECK-NEXT: [[TMP1:%.*]] = fmul double %a, %b		; CHECK-NEXT: [[TMP1:%.*]] = fmul double %a, %b
; CHECK-NEXT: ret double [[TMP1]]		; CHECK-NEXT: ret double [[TMP1]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_mul_sd_1(double %a, double %b) {		define double @test_mul_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_mul_sd_1(		; CHECK-LABEL: @test_mul_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> <double undef, double 2.000000e+00>)		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> undef)
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: ret double [[TMP2]]		; CHECK-NEXT: ret double [[TMP2]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.mul.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
}		}

define <2 x double> @test_div_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_div_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_div_sd(		; CHECK-LABEL: @test_div_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %a, <2 x double> %b)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %a, <2 x double> [[TMP1]])		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %a, <2 x double> %1)		%2 = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %a, <2 x double> %1)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_div_sd_0(double %a, double %b) {		define double @test_div_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_div_sd_0(		; CHECK-LABEL: @test_div_sd_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP4:%.*]] = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> [[TMP2]], <2 x double> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0		; CHECK-NEXT: ret double [[TMP5]]
; CHECK-NEXT: ret double [[TMP6]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_div_sd_1(double %a, double %b) {		define double @test_div_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_div_sd_1(		; CHECK-LABEL: @test_div_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP4:%.*]] = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> [[TMP2]], <2 x double> [[TMP3]])
; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]])		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 1		; CHECK-NEXT: ret double [[TMP5]]
; CHECK-NEXT: ret double [[TMP6]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.div.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
}		}

define <2 x double> @test_min_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_min_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_min_sd(		; CHECK-LABEL: @test_min_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %a, <2 x double> %b)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %a, <2 x double> [[TMP1]])		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %a, <2 x double> %1)		%2 = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %a, <2 x double> %1)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_min_sd_0(double %a, double %b) {		define double @test_min_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_min_sd_0(		; CHECK-LABEL: @test_min_sd_0(
Show All 9 Lines	;
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_min_sd_1(double %a, double %b) {		define double @test_min_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_min_sd_1(		; CHECK-LABEL: @test_min_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> <double undef, double 2.000000e+00>)		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> undef)
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: ret double [[TMP2]]		; CHECK-NEXT: ret double [[TMP2]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.min.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
}		}

define <2 x double> @test_max_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_max_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_max_sd(		; CHECK-LABEL: @test_max_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %a, <2 x double> %b)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %a, <2 x double> [[TMP1]])		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %a, <2 x double> %1)		%2 = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %a, <2 x double> %1)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_max_sd_0(double %a, double %b) {		define double @test_max_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_max_sd_0(		; CHECK-LABEL: @test_max_sd_0(
Show All 9 Lines	;
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_max_sd_1(double %a, double %b) {		define double @test_max_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_max_sd_1(		; CHECK-LABEL: @test_max_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> <double undef, double 2.000000e+00>)		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> undef)
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
; CHECK-NEXT: ret double [[TMP2]]		; CHECK-NEXT: ret double [[TMP2]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %2, <2 x double> %4)		%5 = tail call <2 x double> @llvm.x86.sse2.max.sd(<2 x double> %2, <2 x double> %4)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
}		}

define <2 x double> @test_cmp_sd(<2 x double> %a, <2 x double> %b) {		define <2 x double> @test_cmp_sd(<2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @test_cmp_sd(		; CHECK-LABEL: @test_cmp_sd(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %a, <2 x double> %b, i8 0)
; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %a, <2 x double> [[TMP1]], i8 0)		; CHECK-NEXT: ret <2 x double> [[TMP1]]
; CHECK-NEXT: ret <2 x double> [[TMP2]]
;		;
%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1		%1 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
%2 = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %a, <2 x double> %1, i8 0)		%2 = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %a, <2 x double> %1, i8 0)
ret <2 x double> %2		ret <2 x double> %2
}		}

define double @test_cmp_sd_0(double %a, double %b) {		define double @test_cmp_sd_0(double %a, double %b) {
; CHECK-LABEL: @test_cmp_sd_0(		; CHECK-LABEL: @test_cmp_sd_0(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP4:%.*]] = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> [[TMP2]], <2 x double> [[TMP3]], i8 0)
; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]], i8 0)		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0		; CHECK-NEXT: ret double [[TMP5]]
; CHECK-NEXT: ret double [[TMP6]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %2, <2 x double> %4, i8 0)		%5 = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %2, <2 x double> %4, i8 0)
%6 = extractelement <2 x double> %5, i32 0		%6 = extractelement <2 x double> %5, i32 0
ret double %6		ret double %6
}		}

define double @test_cmp_sd_1(double %a, double %b) {		define double @test_cmp_sd_1(double %a, double %b) {
; CHECK-LABEL: @test_cmp_sd_1(		; CHECK-LABEL: @test_cmp_sd_1(
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0
; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1		; CHECK-NEXT: [[TMP4:%.*]] = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> [[TMP2]], <2 x double> [[TMP3]], i8 0)
; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]], i8 0)		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 1		; CHECK-NEXT: ret double [[TMP5]]
; CHECK-NEXT: ret double [[TMP6]]
;		;
%1 = insertelement <2 x double> undef, double %a, i32 0		%1 = insertelement <2 x double> undef, double %a, i32 0
%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1		%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
%3 = insertelement <2 x double> undef, double %b, i32 0		%3 = insertelement <2 x double> undef, double %b, i32 0
%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1		%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
%5 = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %2, <2 x double> %4, i8 0)		%5 = tail call <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double> %2, <2 x double> %4, i8 0)
%6 = extractelement <2 x double> %5, i32 1		%6 = extractelement <2 x double> %5, i32 1
ret double %6		ret double %6
▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/x86-sse41.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

	define <2 x double> @test_round_sd(<2 x double> %a, <2 x double> %b) {			define <2 x double> @test_round_sd(<2 x double> %a, <2 x double> %b) {
	; CHECK-LABEL: @test_round_sd(			; CHECK-LABEL: @test_round_sd(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> %a, double 1.000000e+00, i32 0			; CHECK-NEXT: [[TMP1:%.*]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %a, <2 x double> %b, i32 10)
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> %b, double 2.000000e+00, i32 1			; CHECK-NEXT: ret <2 x double> [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> [[TMP1]], <2 x double> [[TMP2]], i32 10)
	; CHECK-NEXT: ret <2 x double> [[TMP3]]
	;			;
	%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 0			%1 = insertelement <2 x double> %a, double 1.000000e+00, i32 0
	%2 = insertelement <2 x double> %b, double 2.000000e+00, i32 1			%2 = insertelement <2 x double> %b, double 2.000000e+00, i32 1
	%3 = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %1, <2 x double> %2, i32 10)			%3 = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %1, <2 x double> %2, i32 10)
	ret <2 x double> %3			ret <2 x double> %3
	}			}

	define double @test_round_sd_0(double %a, double %b) {			define double @test_round_sd_0(double %a, double %b) {
	; CHECK-LABEL: @test_round_sd_0(			; CHECK-LABEL: @test_round_sd_0(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %b, i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1			; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> [[TMP1]], i32 10)
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1			; CHECK-NEXT: ret double [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]], i32 10)
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
	; CHECK-NEXT: ret double [[TMP6]]
	;			;
	%1 = insertelement <2 x double> undef, double %a, i32 0			%1 = insertelement <2 x double> undef, double %a, i32 0
	%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1			%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
	%3 = insertelement <2 x double> undef, double %b, i32 0			%3 = insertelement <2 x double> undef, double %b, i32 0
	%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1			%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
	%5 = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %2, <2 x double> %4, i32 10)			%5 = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %2, <2 x double> %4, i32 10)
	%6 = extractelement <2 x double> %5, i32 0			%6 = extractelement <2 x double> %5, i32 0
	ret double %6			ret double %6
	}			}

	define double @test_round_sd_1(double %a, double %b) {			define double @test_round_sd_1(double %a, double %b) {
	; CHECK-LABEL: @test_round_sd_1(			; CHECK-LABEL: @test_round_sd_1(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %a, i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double %b, i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double 1.000000e+00, i32 1			; CHECK-NEXT: [[TMP2:%.*]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> <double undef, double 1.000000e+00>, <2 x double> [[TMP1]], i32 10)
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double %b, i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double 2.000000e+00, i32 1			; CHECK-NEXT: ret double [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> [[TMP2]], <2 x double> [[TMP4]], i32 10)
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 1
	; CHECK-NEXT: ret double [[TMP6]]
	;			;
	%1 = insertelement <2 x double> undef, double %a, i32 0			%1 = insertelement <2 x double> undef, double %a, i32 0
	%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1			%2 = insertelement <2 x double> %1, double 1.000000e+00, i32 1
	%3 = insertelement <2 x double> undef, double %b, i32 0			%3 = insertelement <2 x double> undef, double %b, i32 0
	%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1			%4 = insertelement <2 x double> %3, double 2.000000e+00, i32 1
	%5 = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %2, <2 x double> %4, i32 10)			%5 = tail call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %2, <2 x double> %4, i32 10)
	%6 = extractelement <2 x double> %5, i32 1			%6 = extractelement <2 x double> %5, i32 1
	ret double %6			ret double %6
	}			}

	define <4 x float> @test_round_ss(<4 x float> %a, <4 x float> %b) {			define <4 x float> @test_round_ss(<4 x float> %a, <4 x float> %b) {
	; CHECK-LABEL: @test_round_ss(			; CHECK-LABEL: @test_round_ss(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> %a, float 1.000000e+00, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, <4 x float> %b, i32 10)
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 2.000000e+00, i32 2			; CHECK-NEXT: ret <4 x float> [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 3.000000e+00, i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> %b, float 1.000000e+00, i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float 2.000000e+00, i32 2
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float 3.000000e+00, i32 3
	; CHECK-NEXT: [[TMP7:%.*]] = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> [[TMP3]], <4 x float> [[TMP6]], i32 10)
	; CHECK-NEXT: ret <4 x float> [[TMP7]]
	;			;
	%1 = insertelement <4 x float> %a, float 1.000000e+00, i32 1			%1 = insertelement <4 x float> %a, float 1.000000e+00, i32 1
	%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2			%2 = insertelement <4 x float> %1, float 2.000000e+00, i32 2
	%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3			%3 = insertelement <4 x float> %2, float 3.000000e+00, i32 3
	%4 = insertelement <4 x float> %b, float 1.000000e+00, i32 1			%4 = insertelement <4 x float> %b, float 1.000000e+00, i32 1
	%5 = insertelement <4 x float> %4, float 2.000000e+00, i32 2			%5 = insertelement <4 x float> %4, float 2.000000e+00, i32 2
	%6 = insertelement <4 x float> %5, float 3.000000e+00, i32 3			%6 = insertelement <4 x float> %5, float 3.000000e+00, i32 3
	%7 = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %3, <4 x float> %6, i32 10)			%7 = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %3, <4 x float> %6, i32 10)
	ret <4 x float> %7			ret <4 x float> %7
	}			}

	define float @test_round_ss_0(float %a, float %b) {			define float @test_round_ss_0(float %a, float %b) {
	; CHECK-LABEL: @test_round_ss_0(			; CHECK-LABEL: @test_round_ss_0(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %b, i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1			; CHECK-NEXT: [[TMP2:%.*]] = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, <4 x float> [[TMP1]], i32 10)
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2			; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float 4.000000e+00, i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[TMP6]], float 5.000000e+00, i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float 6.000000e+00, i32 3
	; CHECK-NEXT: [[TMP9:%.*]] = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> [[TMP4]], <4 x float> [[TMP8]], i32 10)
	; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
	; CHECK-NEXT: ret float [[R]]			; CHECK-NEXT: ret float [[R]]
	;			;
	%1 = insertelement <4 x float> undef, float %a, i32 0			%1 = insertelement <4 x float> undef, float %a, i32 0
	%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1			%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1
	%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2			%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2
	%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3			%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
	%5 = insertelement <4 x float> undef, float %b, i32 0			%5 = insertelement <4 x float> undef, float %b, i32 0
	%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1			%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1
	%7 = insertelement <4 x float> %6, float 5.000000e+00, i32 2			%7 = insertelement <4 x float> %6, float 5.000000e+00, i32 2
	%8 = insertelement <4 x float> %7, float 6.000000e+00, i32 3			%8 = insertelement <4 x float> %7, float 6.000000e+00, i32 3
	%9 = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %4, <4 x float> %8, i32 10)			%9 = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %4, <4 x float> %8, i32 10)
	%r = extractelement <4 x float> %9, i32 0			%r = extractelement <4 x float> %9, i32 0
	ret float %r			ret float %r
	}			}

	define float @test_round_ss_2(float %a, float %b) {			define float @test_round_ss_2(float %a, float %b) {
	; CHECK-LABEL: @test_round_ss_2(			; CHECK-LABEL: @test_round_ss_2(
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %a, i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> undef, float %b, i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> [[TMP1]], float 1.000000e+00, i32 1			; CHECK-NEXT: [[TMP2:%.*]] = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, <4 x float> [[TMP1]], i32 10)
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> [[TMP2]], float 2.000000e+00, i32 2			; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP2]], i32 2
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP3]], float 3.000000e+00, i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> undef, float %b, i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float 4.000000e+00, i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> [[TMP6]], float 5.000000e+00, i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float 6.000000e+00, i32 3
	; CHECK-NEXT: [[TMP9:%.*]] = tail call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> [[TMP4]], <4 x float> [[TMP8]], i32 10)
	; CHECK-NEXT: [[R:%.*]] = extractelement <4 x float> [[TMP9]], i32 2
	; CHECK-NEXT: ret float [[R]]			; CHECK-NEXT: ret float [[R]]
	;			;
	%1 = insertelement <4 x float> undef, float %a, i32 0			%1 = insertelement <4 x float> undef, float %a, i32 0
	%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1			%2 = insertelement <4 x float> %1, float 1.000000e+00, i32 1
	%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2			%3 = insertelement <4 x float> %2, float 2.000000e+00, i32 2
	%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3			%4 = insertelement <4 x float> %3, float 3.000000e+00, i32 3
	%5 = insertelement <4 x float> undef, float %b, i32 0			%5 = insertelement <4 x float> undef, float %b, i32 0
	%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1			%6 = insertelement <4 x float> %5, float 4.000000e+00, i32 1
	Show All 9 Lines