This is an archive of the discontinued LLVM Phabricator instance.

Fix for bootstrap bug introduced in r244921
ClosedPublic

Authored by nemanjai on Sep 3 2015, 4:34 AM.

Download Raw Diff

Details

Reviewers

wschmidt
kbarton
seurer
hfinkel

Summary

Since I have enabled building vectors using vector shuffles for v2i64 when direct moves are available, there was something I missed in PPCDAGToDAGISel::Select.
Namely, when a ISD::VECTOR_SHUFFLE is fed by an ISD::SCALAR_TO_VECTOR which is fed by an unindexed load, we were transforming the node to a VSX load and splat. However, since we produce an MTVSRD and a swap for ISD::SCALAR_TO_VECTOR, this is no longer the right thing to do.
So I have added logic to not change this scalar load to a VSX load-and-splat when we are building a v2i64 and have direct moves.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 33929.Sep 3 2015, 4:34 AM

nemanjai retitled this revision from to Fix for bootstrap bug introduced in r244921.

nemanjai updated this object.

nemanjai added reviewers: wschmidt, seurer, kbarton, hfinkel.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

First, congrats on sorting this out; bootstrap issues are always a pain. I have one minor inline comment. Also, can you add a test case for this situation?

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2798	Minor nit: Positive logic is easier to read than negative logic. My preference would be to compute DirectMovable and use !DirectMovable here instead.

DeMorgan-ized the negative logic as per Bill's comment.
Added a test case for which a similar shuffle produces a VSX load and splat with v2f64.

LGTM! Thanks.

This revision is now accepted and ready to land.Sep 4 2015, 10:53 AM

Since I have enabled building vectors using vector shuffles for v2i64 when direct moves are available, there was something I missed in PPCDAGToDAGISel::Select. Namely, when a ISD::VECTOR_SHUFFLE is fed by an ISD::SCALAR_TO_VECTOR which is fed by an unindexed load, we were transforming the node to a VSX load and splat. However, since we produce an MTVSRD and a swap for ISD::SCALAR_TO_VECTOR, this is no longer the right thing to do.

So we have shuffle(scalar_to_vector(load), <0, 0>), and we had been producing a vsx load+splat. Now, instead, we have a load + direct move + swap? Is that better?

So I have added logic to not change this scalar load to a VSX load-and-splat when we are building a v2i64 and have direct moves.

Hi Hal, sorry I was on vacation so I'm only getting to this now.
I agree that this code sequence in isolation is not better than what we had in the particular case. However, the direct move sequence is applicable in every situation whereas the old simpler sequence only occurred in the specific sequence you mentioned. Furthermore, if the loaded value is used for non-vector computations as well, it is still in the register we moved it from - although I have no idea if this actually happens or if LLVM infrastructure knows this.
Please advise whether I should:

Check this in as-is and move on
Add some info in the README about improving this sequence
Re-implement SCALAR_TO_VECTOR of v2i64 as custom lowering, detect this sequence and do not emit the direct move
Do something different - i.e. a peephole optimization or something along those lines

I am of the opinion that going with options 1 or 2 is fine as the new sequence is only marginally worse than what we had and it is a lot of work to get this small win.

In D12596#248753, @nemanjai wrote:

Hi Hal, sorry I was on vacation so I'm only getting to this now.
I agree that this code sequence in isolation is not better than what we had in the particular case. However, the direct move sequence is applicable in every situation whereas the old simpler sequence only occurred in the specific sequence you mentioned. Furthermore, if the loaded value is used for non-vector computations as well, it is still in the register we moved it from - although I have no idea if this actually happens or if LLVM infrastructure knows this.
Please advise whether I should:

Check this in as-is and move on

Add some info in the README about improving this sequence

Re-implement SCALAR_TO_VECTOR of v2i64 as custom lowering, detect this sequence and do not emit the direct move

Do something different - i.e. a peephole optimization or something along those lines

I am of the opinion that going with options 1 or 2 is fine as the new sequence is only marginally worse than what we had and it is a lot of work to get this small win.

Something more like (3), although you might be able to do that in TableGen too (we have patterns that use callbacks to match shuffle patterns already).

Before that, however, I still don't understand the problem. Why does it matter how SCALAR_TO_VECTOR is lowered? We're selecting the shuffle node to be the LXVDSX, so it becomes the result of that shuffle (which does represent a splat). The SCALAR_TO_VECTOR is then dead regardless of how it is lowered, if it is lowered at all.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2797	You'd need to explain why in the comment.

This revision now requires changes to proceed.Sep 25 2015, 1:31 PM

hfinkel added inline comments.Sep 25 2015, 1:35 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2796	Also, is this optimization missing some hasOneUse() checks? It seems like we might need to make sure that the load and the scalar_to_vector have no other uses.

Hal, thank you for your comments and suggestions regarding this fix.
It is quite obvious that I did not adequately understand the problem. When you suggested the hasOneUse() calls, I tried that but clearly I did it incorrectly when I didn't understand the problem.

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

The reason we never hit this before the legalization of SCALAR_TO_VECTOR for v2i64 is that we never got in this code because the BUILD_VECTOR for v2i64 was never lowered to a vector_shuffle. I mistakenly assumed that my lowering code for SCALAR_TO_VECTOR was somehow broken and was causing this issue so my investigation took me down the wrong path.

In D12596#268715, @nemanjai wrote:

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:

def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;

So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.

I should clarify that this is a bug that's independent of your work, so I think it should be treated separately.

In D12596#268816, @wschmidt wrote:
In D12596#268715, @nemanjai wrote:

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:
def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;
So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.

You don't need to add special ISD nodes to do instruction selection in DAGToDAG (only in ISelLowering). LXVDSX is already tagged as mayLoad, and so is already assumed to carry a chain operand. In DAGToDAG we directly generate machine-instruction SDAG nodes. You just need to make sure that chain users are appropriately updated before returning.

In D12596#268887, @hfinkel wrote:
In D12596#268816, @wschmidt wrote:
In D12596#268715, @nemanjai wrote:

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:
def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;
So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.
You don't need to add special ISD nodes to do instruction selection in DAGToDAG (only in ISelLowering). LXVDSX is already tagged as mayLoad, and so is already assumed to carry a chain operand. In DAGToDAG we directly generate machine-instruction SDAG nodes. You just need to make sure that chain users are appropriately updated before returning.

OK, I am confused by the need for SNDPHasChain, then. Is that redundant when SNDPMayLoad is specified? Just a curiosity question.

In D12596#268905, @wschmidt wrote:
In D12596#268887, @hfinkel wrote:
In D12596#268816, @wschmidt wrote:
In D12596#268715, @nemanjai wrote:

So the crux of the problem is that the load being replaced has a user of its chain (in this case, a store that nullifies the unique_ptr in the source). Because we introduced the LXVDSX, the use of the chain in the store was not updated and it still used the chain from the load (which goes away). The store was then free to move up (before the LXVDSX) and we end up with a load and splat of a null. I do not see a way to chain the store to this target specific node.

So, you're going to have to ensure that the DAG nodes for LXVDSX have chains on them. This is an oversight in the way things are implemented today. See what's done for LXVD2X:
def PPClxvd2x  : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
                        [SDNPHasChain, SDNPMayLoad]>;
So you'll need to add LXVDSX to the PPCISD enumeration in PPCISelLowering.h, add an entry like the above in PPCInstrVSX.td, and make sure we expand to that node type in the DAGtoDAG code. Then you'll have a chain that you can manipulate.
You don't need to add special ISD nodes to do instruction selection in DAGToDAG (only in ISelLowering). LXVDSX is already tagged as mayLoad, and so is already assumed to carry a chain operand. In DAGToDAG we directly generate machine-instruction SDAG nodes. You just need to make sure that chain users are appropriately updated before returning.
OK, I am confused by the need for SNDPHasChain, then. Is that redundant when SNDPMayLoad is specified? Just a curiosity question.

No, it is not. But we only need these when defining matching patterns for custom nodes. For instructions themselves, there's nothing being matched (since we're doing isel manually).

In any case, this change now LGTM.

This revision is now accepted and ready to land.Oct 27 2015, 12:34 PM

Committed revision 251798.

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelDAGToDAG.cpp

2 lines

PPCISelLowering.cpp

3 lines

test/

CodeGen/

PowerPC/

p8-scalar_vector_conversions.ll

23 lines

vsx.ll

16 lines

Diff 37563

lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,787 Lines • ▼ Show 20 Lines	if (PPCSubTarget->hasVSX() && (N->getValueType(0) == MVT::v2f64 \|\|

for (int i = 0; i < 2; ++i)		for (int i = 0; i < 2; ++i)
if (SVN->getMaskElt(i) <= 0 \|\| SVN->getMaskElt(i) == 2)		if (SVN->getMaskElt(i) <= 0 \|\| SVN->getMaskElt(i) == 2)
DM[i] = 0;		DM[i] = 0;
else		else
DM[i] = 1;		DM[i] = 1;

if (Op1 == Op2 && DM[0] == 0 && DM[1] == 0 &&		if (Op1 == Op2 && DM[0] == 0 && DM[1] == 0 &&
Op1.getOpcode() == ISD::SCALAR_TO_VECTOR &&		Op1.getOpcode() == ISD::SCALAR_TO_VECTOR &&
		hfinkelUnsubmitted Not Done Reply Inline Actions Also, is this optimization missing some hasOneUse() checks? It seems like we might need to make sure that the load and the scalar_to_vector have no other uses. hfinkel: Also, is this optimization missing some hasOneUse() checks? It seems like we might need to make…
isa<LoadSDNode>(Op1.getOperand(0))) {		isa<LoadSDNode>(Op1.getOperand(0))) {
		hfinkelUnsubmitted Not Done Reply Inline Actions You'd need to explain why in the comment. hfinkel: You'd need to explain why in the comment.
LoadSDNode *LD = cast<LoadSDNode>(Op1.getOperand(0));		LoadSDNode *LD = cast<LoadSDNode>(Op1.getOperand(0));
		wschmidtUnsubmitted Not Done Reply Inline Actions Minor nit: Positive logic is easier to read than negative logic. My preference would be to compute DirectMovable and use !DirectMovable here instead. wschmidt: Minor nit: Positive logic is easier to read than negative logic. My preference would be to…
SDValue Base, Offset;		SDValue Base, Offset;

if (LD->isUnindexed() &&		if (LD->isUnindexed() && LD->hasOneUse() && Op1.hasOneUse() &&
(LD->getMemoryVT() == MVT::f64 \|\|		(LD->getMemoryVT() == MVT::f64 \|\|
LD->getMemoryVT() == MVT::i64) &&		LD->getMemoryVT() == MVT::i64) &&
SelectAddrIdxOnly(LD->getBasePtr(), Base, Offset)) {		SelectAddrIdxOnly(LD->getBasePtr(), Base, Offset)) {
SDValue Chain = LD->getChain();		SDValue Chain = LD->getChain();
SDValue Ops[] = { Base, Offset, Chain };		SDValue Ops[] = { Base, Offset, Chain };
return CurDAG->SelectNodeTo(N, PPC::LXVDSX,		return CurDAG->SelectNodeTo(N, PPC::LXVDSX,
N->getValueType(0), Ops);		N->getValueType(0), Ops);
}		}
▲ Show 20 Lines • Show All 1,498 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 546 Lines • ▼ Show 20 Lines	if (Subtarget.hasVSX()) {
if (Subtarget.hasP8Vector()) {		if (Subtarget.hasP8Vector()) {
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4f32, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4f32, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4f32, Legal);
}		}
if (Subtarget.hasDirectMove()) {		if (Subtarget.hasDirectMove()) {
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v16i8, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v16i8, Legal);
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v8i16, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v8i16, Legal);
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4i32, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4i32, Legal);
// FIXME: this is causing bootstrap failures, disable temporarily		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2i64, Legal);
//setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2i64, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v16i8, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v16i8, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v8i16, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v8i16, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4i32, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v4i32, Legal);
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2i64, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2i64, Legal);
}		}
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);

setOperationAction(ISD::FFLOOR, MVT::v2f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::v2f64, Legal);
▲ Show 20 Lines • Show All 11,000 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll

	; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s			; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s
	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s -check-prefix=CHECK-LE			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s -check-prefix=CHECK-LE

	; The build[csilf] functions simply test the scalar_to_vector handling with			; The build[csilf] functions simply test the scalar_to_vector handling with
	; direct moves. This corresponds to the "insertelement" instruction. Subsequent			; direct moves. This corresponds to the "insertelement" instruction. Subsequent
	; to this, there will be a splat corresponding to the shufflevector.			; to this, there will be a splat corresponding to the shufflevector.

				@d = common global double 0.000000e+00, align 8

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define <16 x i8> @buildc(i8 zeroext %a) {			define <16 x i8> @buildc(i8 zeroext %a) {
	entry:			entry:
	%a.addr = alloca i8, align 1			%a.addr = alloca i8, align 1
	store i8 %a, i8* %a.addr, align 1			store i8 %a, i8* %a.addr, align 1
	%0 = load i8, i8* %a.addr, align 1			%0 = load i8, i8* %a.addr, align 1
	%splat.splatinsert = insertelement <16 x i8> undef, i8 %0, i32 0			%splat.splatinsert = insertelement <16 x i8> undef, i8 %0, i32 0
	%splat.splat = shufflevector <16 x i8> %splat.splatinsert, <16 x i8> undef, <16 x i32> zeroinitializer			%splat.splat = shufflevector <16 x i8> %splat.splatinsert, <16 x i8> undef, <16 x i32> zeroinitializer
	Show All 38 Lines
	define <2 x i64> @buildl(i64 %a) {			define <2 x i64> @buildl(i64 %a) {
	entry:			entry:
	%a.addr = alloca i64, align 8			%a.addr = alloca i64, align 8
	store i64 %a, i64* %a.addr, align 8			store i64 %a, i64* %a.addr, align 8
	%0 = load i64, i64* %a.addr, align 8			%0 = load i64, i64* %a.addr, align 8
	%splat.splatinsert = insertelement <2 x i64> undef, i64 %0, i32 0			%splat.splatinsert = insertelement <2 x i64> undef, i64 %0, i32 0
	%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer			%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
	ret <2 x i64> %splat.splat			ret <2 x i64> %splat.splat
	; FIXME-CHECK: mtvsrd {{[0-9]+}}, 3			; CHECK: mtvsrd {{[0-9]+}}, 3
	; FIXME-CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3			; CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3
	; FIXME-CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]			; CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define <4 x float> @buildf(float %a) {			define <4 x float> @buildf(float %a) {
	entry:			entry:
	%a.addr = alloca float, align 4			%a.addr = alloca float, align 4
	store float %a, float* %a.addr, align 4			store float %a, float* %a.addr, align 4
	%0 = load float, float* %a.addr, align 4			%0 = load float, float* %a.addr, align 4
	%splat.splatinsert = insertelement <4 x float> undef, float %0, i32 0			%splat.splatinsert = insertelement <4 x float> undef, float %0, i32 0
	%splat.splat = shufflevector <4 x float> %splat.splatinsert, <4 x float> undef, <4 x i32> zeroinitializer			%splat.splat = shufflevector <4 x float> %splat.splatinsert, <4 x float> undef, <4 x i32> zeroinitializer
	ret <4 x float> %splat.splat			ret <4 x float> %splat.splat
	; CHECK: xscvdpspn {{[0-9]+}}, 1			; CHECK: xscvdpspn {{[0-9]+}}, 1
	; CHECK-LE: xscvdpspn [[REG1:[0-9]+]], 1			; CHECK-LE: xscvdpspn [[REG1:[0-9]+]], 1
	; CHECK-LE: xxsldwi {{[0-9]+}}, [[REG1]], [[REG1]], 1			; CHECK-LE: xxsldwi {{[0-9]+}}, [[REG1]], [[REG1]], 1
	}			}

				; The optimization to remove stack operations from PPCDAGToDAGISel::Select
				; should still trigger for v2f64, producing an lxvdsx.
				; Function Attrs: nounwind
				define <2 x double> @buildd() #0 {
				entry:
				%0 = load double, double* @d, align 8
				%splat.splatinsert = insertelement <2 x double> undef, double %0, i32 0
				%splat.splat = shufflevector <2 x double> %splat.splatinsert, <2 x double> undef, <2 x i32> zeroinitializer
				ret <2 x double> %splat.splat
				; CHECK: ld [[REG1:[0-9]+]], .LC0@toc@l
				; CHECK: lxvdsx 34, 0, [[REG1]]
				; CHECK-LE: ld [[REG1:[0-9]+]], .LC0@toc@l
				; CHECK-LE: lxvdsx 34, 0, [[REG1]]
				}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define signext i8 @getsc0(<16 x i8> %vsc) {			define signext i8 @getsc0(<16 x i8> %vsc) {
	entry:			entry:
	%vsc.addr = alloca <16 x i8>, align 16			%vsc.addr = alloca <16 x i8>, align 16
	store <16 x i8> %vsc, <16 x i8>* %vsc.addr, align 16			store <16 x i8> %vsc, <16 x i8>* %vsc.addr, align 16
	%0 = load <16 x i8>, <16 x i8>* %vsc.addr, align 16			%0 = load <16 x i8>, <16 x i8>* %vsc.addr, align 16
	%vecext = extractelement <16 x i8> %0, i32 0			%vecext = extractelement <16 x i8> %0, i32 0
	ret i8 %vecext			ret i8 %vecext
	▲ Show 20 Lines • Show All 1,371 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vsx.ll

	Show First 20 Lines • Show All 1,220 Lines • ▼ Show 20 Lines
	; CHECK-FISL-DAG: addi [[R2:[0-9]+]], 1, -16			; CHECK-FISL-DAG: addi [[R2:[0-9]+]], 1, -16
	; CHECK-FISL-DAG: addi [[R3:[0-9]+]], 3, 2			; CHECK-FISL-DAG: addi [[R3:[0-9]+]], 3, 2
	; CHECK-FISL-DAG: std [[R1]], -8(1)			; CHECK-FISL-DAG: std [[R1]], -8(1)
	; CHECK-FISL-DAG: std [[R3]], -16(1)			; CHECK-FISL-DAG: std [[R3]], -16(1)
	; CHECK-FISL-DAG: lxvd2x 0, 0, [[R2]]			; CHECK-FISL-DAG: lxvd2x 0, 0, [[R2]]
	; CHECK-FISL: blr			; CHECK-FISL: blr

	; CHECK-LE-LABEL: @test80			; CHECK-LE-LABEL: @test80
	; FIXME-CHECK-LE-DAG: mtvsrd [[R1:[0-9]+]], 3			; CHECK-LE-DAG: mtvsrd [[R1:[0-9]+]], 3
	; FIXME-CHECK-LE-DAG: addi [[R2:[0-9]+]], {{[0-9]+}}, .LCPI			; CHECK-LE-DAG: addi [[R2:[0-9]+]], {{[0-9]+}}, .LCPI
	; FIXME-CHECK-LE-DAG: xxswapd [[V1:[0-9]+]], [[R1]]			; CHECK-LE-DAG: xxswapd [[V1:[0-9]+]], [[R1]]
	; FIXME-CHECK-LE-DAG: lxvd2x [[V2:[0-9]+]], 0, [[R2]]			; CHECK-LE-DAG: lxvd2x [[V2:[0-9]+]], 0, [[R2]]
	; FIXME-CHECK-LE-DAG: xxspltd 34, [[V1]]			; CHECK-LE-DAG: xxspltd 34, [[V1]]
	; FIXME-CHECK-LE-DAG: xxswapd 35, [[V2]]			; CHECK-LE-DAG: xxswapd 35, [[V2]]
	; FIXME-CHECK-LE: vaddudm 2, 2, 3			; CHECK-LE: vaddudm 2, 2, 3
	; FIXME-CHECK-LE: blr			; CHECK-LE: blr
	}			}

	define <2 x double> @test81(<4 x float> %b) {			define <2 x double> @test81(<4 x float> %b) {
	%w = bitcast <4 x float> %b to <2 x double>			%w = bitcast <4 x float> %b to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w

	; CHECK-LABEL: @test81			; CHECK-LABEL: @test81
	; CHECK: blr			; CHECK: blr
	Show All 23 Lines