This is an archive of the discontinued LLVM Phabricator instance.

Scalar to vector conversions using direct moves
ClosedPublic

Authored by nemanjai on Jul 23 2015, 12:53 PM.

Download Raw Diff

Details

Reviewers

wschmidt
kbarton
seurer
hfinkel

Summary

We currently always build vectors from scalar non-constant values using stack operations which involves the load-hit-store hazard. This patch is the first in a series that will allow operations such as scalar to vector, extract vector element, etc. to be done using direct move instructions rather than memory operations.
Since this is just the first patch, the code in some cases is not yet optimal, but all scalar to vector operations involve fewer memory operations.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 30516.Jul 23 2015, 12:53 PM

nemanjai retitled this revision from to Scalar to vector conversions using direct moves.

nemanjai updated this object.

nemanjai added reviewers: wschmidt, hfinkel, kbarton, seurer.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

hfinkel added inline comments.Jul 25 2015, 10:38 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7501	Just return the result of the DAG.getNode(...) call. Then you don't even need the { } around the body of the if. That having been said, it does not seem like you have to do this at all. Just declare the relevant SCALAR_TO_VECTOR as Legal, and pattern match it in the usual way in the .td file.
lib/Target/PowerPC/PPCISelLowering.h
135 ↗	(On Diff #30516)	Why are you adding these? What generates them?
lib/Target/PowerPC/PPCInstrVSX.td
1195	Given that I don't see anything that actually generates this ISD nodes, I'm suspicions that you don't actually have tests for these.
1231	Maybe we should name this BE_LOW_BYTE, or something like that?

nemanjai added inline comments.Jul 27 2015, 9:50 AM

lib/Target/PowerPC/PPCISelLowering.cpp
7501	I agree with the first point and can do as you suggested. However, I felt that this way is a little cleaner than doing it all in the .td file. If I try to use PPCmtvsrdVec in an output pattern, tblgen gives me the message: Cannot use 'PPCmtvsrdVec' in an output pattern! Of course, I could do away with that SDAG node altogether and provide the same patterns for scalar_to_vector as I did for the new nodes. I just felt that the node that explicitly names the move to VSR to be more descriptive and allow for a different implementation of scalar_to_vector should a need for that arise in the future. And again, I will implement it however you would prefer.
lib/Target/PowerPC/PPCISelLowering.h
135 ↗	(On Diff #30516)	Thank you for catching this. It turns out that this is one of those events where I have tried a number of approaches to get the results I want until I settled on the approach that I plan to stick with. However, these SDAG nodes were missed in the re-re-re-factoring :). The idea was to emit these SDAG nodes when we need scalar to vector conversions so that they can be matched, but since they will only match one instruction, they're kind of pointless. I will remove them in the next review.
lib/Target/PowerPC/PPCInstrVSX.td
1195	Yes, sorry about this. See above. I will remove the SD nodes and the patterns from these instructions.
1231	The BE version of this move will put the value into the most significant byte in the VSR. Similarly with the halfword, word and doubleword versions. The LE versions will place it into the corresponding least significant element. Maybe BE_MSB? BE_HI_BYTE? And correspondingly for the half, word and double?
1242	The naming is different with LE since what is happening is fundamentally different. The move is the same for the byte, halfword and word. The doubleword version is the same move as for BE. That's why the LE versions have the "MoveWord" portion, then the "CopyWord" portion (for the regclass copy) and finaly the LE_BHW to signify that this is a byte/half/word correctly aligned for the shift to element 0 on LE. I can put a comment to this end in the code.

Removed the custom code that isn't necessary.
Added a test case specifically for scalar_to_vector nodes.
Renamed the DAG's in PPCInstrVSX.td.
Reconciled the predicate for the v4f32 version (it was predicated on HasDirectMove in the .td file and on hasP8Vector() in the c++ code).

A few issues and questions...

lib/Target/PowerPC/PPCInstrVSX.td
1195	The comment at line 1195 is wrong. These are scalar conversions between single and double precision, right?
1240	BE_DWORD_0? This isn't an LE pattern, so it needs a BE designator, correct? Even though it's used in an LE pattern, it is LE doubleword 1 there, so naming it BE_DWORD_0 is more expressive.
1247	I find LE_BHW and LE_DBL to be pretty incomprehensible names. For consistency and understanding, I would name them LE_WORD_0 and LE_DWORD_0. Also, LE_CPW is arguably LE_WORD_1 and LE_CPD is LE_DWORD_1. This gives the more understandable: dag LE_DWORD_1 = (v2i64 (COPY_TO_REGCLASS BE_DWORD_0, VSRC)); reinforcing that LE doubleword 1 is the same as BE doubleword 0. I don't much care what you call LE_MVW so long as it is readable and implies moving an integer to a vector register. Longer meaningful names beat brief incomprehensible ones. Of course that's still better than long incomprehensible ones. ;)
test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
74	I was confused by this at first, but now I get it. In all of these tests, the code generation that you are checking for is just for the %splat.splatinsert calculation, right? This is the part that translates into a scalar_to_vector node. I assume that the generated code follows this up with code to splat element 0 to the entire result register, correct?

wschmidt added inline comments.Jul 29 2015, 12:53 PM

lib/Target/PowerPC/PPCInstrVSX.td
1195	Withdrawn per online discussion.

nemanjai added inline comments.Jul 29 2015, 12:54 PM

lib/Target/PowerPC/PPCInstrVSX.td
1240	Fair enough, the value needs to be shifted/swapped for LE, so I'll rename it to BE_DWORD_0. And the subsequent swap will put it in LE doubleword 0.
1247	OK, I will rename them as you suggest: LE_BHW -> LE_WORD_0 LE_CPW -> LE_WORD_1 LE_DBL -> LE_DWORD_0 LE_CPD -> LE_DWORD_1
test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
74	Yes, the subsequent instruction will be a splat (VSX if we have one, VMX otherwise). I can add those to the check patterns but, I can't verify the correct register (because of the "off-by-32 relationship" between the VSX and VMX registers). I was actually hoping to use the FileCheck capability of checking line numbers (ranges), but that actually looks for the word line I think. So I can add the CHECK: vsplat[bhw] and just confirm the right element in the splat instruction if you'd like.

wschmidt added inline comments.Jul 29 2015, 1:10 PM

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
74	No, that's ok. Just add a comment at the beginning of the file indicating that you are just checking the code generated for the insertelement. Alternatively, you can make this more obvious by putting the CHECK comment lines right after the insertelement in each test variant.

@hfinkel@anl.gov Hi Hal, if you get the chance, can you please have a look and let me know if you're ok with the updates? Thank you.

LGTM.

This revision is now accepted and ready to land.Aug 9 2015, 8:02 PM

Committed revision 244921.

nemanjai mentioned this in D12032: Vector element extraction without stack operations on Power 8.Aug 14 2015, 6:03 AM

nemanjai mentioned this in rL249822: Vector element extraction without stack operations on Power 8.Oct 9 2015, 4:14 AM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.cpp

10 lines

PPCInstrVSX.td

74 lines

PPCVSXCopy.cpp

7 lines

test/

CodeGen/

PowerPC/

fp-int-conversions-direct-moves.ll

24 lines

p8-scalar_vector_conversions.ll

75 lines

vsx.ll

6 lines

vsx_scalar_ld_st.ll

6 lines

MC/

Disassembler/

PowerPC/

vsx.txt

12 lines

PowerPC/

vsx.s

12 lines

Diff 30861

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	if (Subtarget.hasAltivec()) {
// Altivec does not contain unordered floating-point compare instructions		// Altivec does not contain unordered floating-point compare instructions
setCondCodeAction(ISD::SETUO, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETUO, MVT::v4f32, Expand);
setCondCodeAction(ISD::SETUEQ, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETUEQ, MVT::v4f32, Expand);
setCondCodeAction(ISD::SETO, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETO, MVT::v4f32, Expand);
setCondCodeAction(ISD::SETONE, MVT::v4f32, Expand);		setCondCodeAction(ISD::SETONE, MVT::v4f32, Expand);

if (Subtarget.hasVSX()) {		if (Subtarget.hasVSX()) {
setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2f64, Legal);		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2f64, Legal);
		if (Subtarget.hasP8Vector())
		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4f32, Legal);
		if (Subtarget.hasDirectMove()) {
		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v16i8, Legal);
		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v8i16, Legal);
		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v4i32, Legal);
		setOperationAction(ISD::SCALAR_TO_VECTOR, MVT::v2i64, Legal);
		}
setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);		setOperationAction(ISD::EXTRACT_VECTOR_ELT, MVT::v2f64, Legal);

setOperationAction(ISD::FFLOOR, MVT::v2f64, Legal);		setOperationAction(ISD::FFLOOR, MVT::v2f64, Legal);
setOperationAction(ISD::FCEIL, MVT::v2f64, Legal);		setOperationAction(ISD::FCEIL, MVT::v2f64, Legal);
setOperationAction(ISD::FTRUNC, MVT::v2f64, Legal);		setOperationAction(ISD::FTRUNC, MVT::v2f64, Legal);
setOperationAction(ISD::FNEARBYINT, MVT::v2f64, Legal);		setOperationAction(ISD::FNEARBYINT, MVT::v2f64, Legal);
setOperationAction(ISD::FROUND, MVT::v2f64, Legal);		setOperationAction(ISD::FROUND, MVT::v2f64, Legal);

▲ Show 20 Lines • Show All 6,932 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,
return SDValue();		return SDValue();
}		}

SDValue PPCTargetLowering::LowerSCALAR_TO_VECTOR(SDValue Op,		SDValue PPCTargetLowering::LowerSCALAR_TO_VECTOR(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(Op);		SDLoc dl(Op);
// Create a stack slot that is 16-byte aligned.		// Create a stack slot that is 16-byte aligned.
MachineFrameInfo *FrameInfo = DAG.getMachineFunction().getFrameInfo();		MachineFrameInfo *FrameInfo = DAG.getMachineFunction().getFrameInfo();
int FrameIdx = FrameInfo->CreateStackObject(16, 16, false);		int FrameIdx = FrameInfo->CreateStackObject(16, 16, false);
		hfinkelUnsubmitted Not Done Reply Inline Actions Just return the result of the DAG.getNode(...) call. Then you don't even need the { } around the body of the if. That having been said, it does not seem like you have to do this at all. Just declare the relevant SCALAR_TO_VECTOR as Legal, and pattern match it in the usual way in the .td file. hfinkel: Just return the result of the DAG.getNode(...) call. Then you don't even need the { } around…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions I agree with the first point and can do as you suggested. However, I felt that this way is a little cleaner than doing it all in the .td file. If I try to use PPCmtvsrdVec in an output pattern, tblgen gives me the message: Cannot use 'PPCmtvsrdVec' in an output pattern! Of course, I could do away with that SDAG node altogether and provide the same patterns for scalar_to_vector as I did for the new nodes. I just felt that the node that explicitly names the move to VSR to be more descriptive and allow for a different implementation of scalar_to_vector should a need for that arise in the future. And again, I will implement it however you would prefer. nemanjai: I agree with the first point and can do as you suggested. However, I felt that this way is a…
EVT PtrVT = getPointerTy(DAG.getDataLayout());		EVT PtrVT = getPointerTy(DAG.getDataLayout());
SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);		SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);

// Store the input value into Value#0 of the stack slot.		// Store the input value into Value#0 of the stack slot.
SDValue Store = DAG.getStore(DAG.getEntryNode(), dl,		SDValue Store = DAG.getStore(DAG.getEntryNode(), dl,
Op.getOperand(0), FIdx, MachinePointerInfo(),		Op.getOperand(0), FIdx, MachinePointerInfo(),
false, false, 0);		false, false, 0);
// Load it out.		// Load it out.
▲ Show 20 Lines • Show All 3,962 Lines • ▼ Show 20 Lines	PPCTargetLowering::getScratchRegisters(CallingConv::ID) const {

return ScratchRegs;		return ScratchRegs;
}		}

bool		bool
PPCTargetLowering::shouldExpandBuildVectorWithShuffles(		PPCTargetLowering::shouldExpandBuildVectorWithShuffles(
EVT VT , unsigned DefinedValues) const {		EVT VT , unsigned DefinedValues) const {
if (VT == MVT::v2i64)		if (VT == MVT::v2i64)
return false;		return Subtarget.hasDirectMove(); // Don't need stack ops with direct moves

if (Subtarget.hasQPX()) {		if (Subtarget.hasQPX()) {
if (VT == MVT::v4f32 \|\| VT == MVT::v4f64 \|\| VT == MVT::v4i1)		if (VT == MVT::v4f32 \|\| VT == MVT::v4f64 \|\| VT == MVT::v4i1)
return true;		return true;
}		}

return TargetLowering::shouldExpandBuildVectorWithShuffles(VT, DefinedValues);		return TargetLowering::shouldExpandBuildVectorWithShuffles(VT, DefinedValues);
}		}
Show All 14 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 1,175 Lines • ▼ Show 20 Lines	let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
let IsVSXFMAAlt = 1 in		let IsVSXFMAAlt = 1 in
def XSNMSUBMSP : XX3Form<60, 153,		def XSNMSUBMSP : XX3Form<60, 153,
(outs vssrc:$XT),		(outs vssrc:$XT),
(ins vssrc:$XTi, vssrc:$XA, vssrc:$XB),		(ins vssrc:$XTi, vssrc:$XA, vssrc:$XB),
"xsnmsubmsp $XT, $XA, $XB", IIC_VecFP, []>,		"xsnmsubmsp $XT, $XA, $XB", IIC_VecFP, []>,
RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">,		RegConstraint<"$XTi = $XT">, NoEncode<"$XTi">,
AltVSXFMARel;		AltVSXFMARel;
}		}

		// Single Precision Conversions (FP <-> INT)
		def XSCVSXDSP : XX2Form<60, 312,
		(outs vssrc:$XT), (ins vsfrc:$XB),
		"xscvsxdsp $XT, $XB", IIC_VecFP,
		[(set f32:$XT, (PPCfcfids f64:$XB))]>;
		def XSCVUXDSP : XX2Form<60, 296,
		(outs vssrc:$XT), (ins vsfrc:$XB),
		"xscvuxdsp $XT, $XB", IIC_VecFP,
		[(set f32:$XT, (PPCfcfidus f64:$XB))]>;

		// Conversions between vector and scalar single precision
		hfinkelUnsubmitted Not Done Reply Inline Actions Given that I don't see anything that actually generates this ISD nodes, I'm suspicions that you don't actually have tests for these. hfinkel: Given that I don't see anything that actually generates this ISD nodes, I'm suspicions that you…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Yes, sorry about this. See above. I will remove the SD nodes and the patterns from these instructions. nemanjai: Yes, sorry about this. See above. I will remove the SD nodes and the patterns from these…
		wschmidtUnsubmitted Not Done Reply Inline Actions The comment at line 1195 is wrong. These are scalar conversions between single and double precision, right? wschmidt: The comment at line 1195 is wrong. These are scalar conversions between single and double…
		wschmidtUnsubmitted Not Done Reply Inline Actions Withdrawn per online discussion. wschmidt: Withdrawn per online discussion.
		def XSCVDPSPN : XX2Form<60, 267, (outs vsrc:$XT), (ins vssrc:$XB),
		"xscvdpspn $XT, $XB", IIC_VecFP, []>;
		def XSCVSPDPN : XX2Form<60, 331, (outs vssrc:$XT), (ins vsrc:$XB),
		"xscvspdpn $XT, $XB", IIC_VecFP, []>;

} // AddedComplexity = 400		} // AddedComplexity = 400
} // HasP8Vector		} // HasP8Vector

let Predicates = [HasDirectMove, HasVSX] in {		let Predicates = [HasDirectMove, HasVSX] in {
// VSX direct move instructions		// VSX direct move instructions
def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),		def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),
"mfvsrd $rA, $XT", IIC_VecGeneral,		"mfvsrd $rA, $XT", IIC_VecGeneral,
[(set i64:$rA, (PPCmfvsr f64:$XT))]>,		[(set i64:$rA, (PPCmfvsr f64:$XT))]>,
Requires<[In64BitMode]>;		Requires<[In64BitMode]>;
def MFVSRWZ : XX1_RS6_RD5_XO<31, 115, (outs gprc:$rA), (ins vsfrc:$XT),		def MFVSRWZ : XX1_RS6_RD5_XO<31, 115, (outs gprc:$rA), (ins vsfrc:$XT),
"mfvsrwz $rA, $XT", IIC_VecGeneral,		"mfvsrwz $rA, $XT", IIC_VecGeneral,
[(set i32:$rA, (PPCmfvsr f64:$XT))]>;		[(set i32:$rA, (PPCmfvsr f64:$XT))]>;
def MTVSRD : XX1_RS6_RD5_XO<31, 179, (outs vsfrc:$XT), (ins g8rc:$rA),		def MTVSRD : XX1_RS6_RD5_XO<31, 179, (outs vsfrc:$XT), (ins g8rc:$rA),
"mtvsrd $XT, $rA", IIC_VecGeneral,		"mtvsrd $XT, $rA", IIC_VecGeneral,
[(set f64:$XT, (PPCmtvsra i64:$rA))]>,		[(set f64:$XT, (PPCmtvsra i64:$rA))]>,
Requires<[In64BitMode]>;		Requires<[In64BitMode]>;
def MTVSRWA : XX1_RS6_RD5_XO<31, 211, (outs vsfrc:$XT), (ins gprc:$rA),		def MTVSRWA : XX1_RS6_RD5_XO<31, 211, (outs vsfrc:$XT), (ins gprc:$rA),
"mtvsrwa $XT, $rA", IIC_VecGeneral,		"mtvsrwa $XT, $rA", IIC_VecGeneral,
[(set f64:$XT, (PPCmtvsra i32:$rA))]>;		[(set f64:$XT, (PPCmtvsra i32:$rA))]>;
def MTVSRWZ : XX1_RS6_RD5_XO<31, 243, (outs vsfrc:$XT), (ins gprc:$rA),		def MTVSRWZ : XX1_RS6_RD5_XO<31, 243, (outs vsfrc:$XT), (ins gprc:$rA),
"mtvsrwz $XT, $rA", IIC_VecGeneral,		"mtvsrwz $XT, $rA", IIC_VecGeneral,
[(set f64:$XT, (PPCmtvsrz i32:$rA))]>;		[(set f64:$XT, (PPCmtvsrz i32:$rA))]>;
} // HasDirectMove, HasVSX		} // HasDirectMove, HasVSX

		/* Direct moves of various size entities from GPR's into VSR's. Each lines
		the value up into element 0 (both BE and LE). Namely, entities smaller than
		a doubleword are shifted left and moved for BE. For LE, they're moved, then
		swapped to go into the least significant element of the VSR.
		*/
		def Moves {
		dag BE_BYTE_0 = (MTVSRD
		hfinkelUnsubmitted Not Done Reply Inline Actions Maybe we should name this BE_LOW_BYTE, or something like that? hfinkel: Maybe we should name this BE_LOW_BYTE, or something like that?
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions The BE version of this move will put the value into the most significant byte in the VSR. Similarly with the halfword, word and doubleword versions. The LE versions will place it into the corresponding least significant element. Maybe BE_MSB? BE_HI_BYTE? And correspondingly for the half, word and double? nemanjai: The BE version of this move will put the value into the most significant byte in the VSR.
		(RLDICR
		(INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32), 56, 7));
		dag BE_HALF_0 = (MTVSRD
		(RLDICR
		(INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32), 48, 15));
		dag BE_WORD_0 = (MTVSRD
		(RLDICR
		(INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32), 32, 31));
		dag DBLWORD_0 = (MTVSRD $A);
		wschmidtUnsubmitted Not Done Reply Inline Actions BE_DWORD_0? This isn't an LE pattern, so it needs a BE designator, correct? Even though it's used in an LE pattern, it is LE doubleword 1 there, so naming it BE_DWORD_0 is more expressive. wschmidt: BE_DWORD_0? This isn't an LE pattern, so it needs a BE designator, correct? Even though it's…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Fair enough, the value needs to be shifted/swapped for LE, so I'll rename it to BE_DWORD_0. And the subsequent swap will put it in LE doubleword 0. nemanjai: Fair enough, the value needs to be shifted/swapped for LE, so I'll rename it to BE_DWORD_0. And…

		dag LE_MVW = (MTVSRD (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32));
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions The naming is different with LE since what is happening is fundamentally different. The move is the same for the byte, halfword and word. The doubleword version is the same move as for BE. That's why the LE versions have the "MoveWord" portion, then the "CopyWord" portion (for the regclass copy) and finaly the LE_BHW to signify that this is a byte/half/word correctly aligned for the shift to element 0 on LE. I can put a comment to this end in the code. nemanjai: The naming is different with LE since what is happening is fundamentally different. The move is…
		dag LE_CPW = (v2i64 (COPY_TO_REGCLASS LE_MVW, VSRC));
		dag LE_BHW = (XXPERMDI LE_CPW, LE_CPW, 2);
		dag LE_CPD = (v2i64 (COPY_TO_REGCLASS DBLWORD_0, VSRC));
		dag LE_DBL = (XXPERMDI LE_CPD, LE_CPD, 2);
		}
		wschmidtUnsubmitted Not Done Reply Inline Actions I find LE_BHW and LE_DBL to be pretty incomprehensible names. For consistency and understanding, I would name them LE_WORD_0 and LE_DWORD_0. Also, LE_CPW is arguably LE_WORD_1 and LE_CPD is LE_DWORD_1. This gives the more understandable: dag LE_DWORD_1 = (v2i64 (COPY_TO_REGCLASS BE_DWORD_0, VSRC)); reinforcing that LE doubleword 1 is the same as BE doubleword 0. I don't much care what you call LE_MVW so long as it is readable and implies moving an integer to a vector register. Longer meaningful names beat brief incomprehensible ones. Of course that's still better than long incomprehensible ones. ;) wschmidt: I find LE_BHW and LE_DBL to be pretty incomprehensible names. For consistency and…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions OK, I will rename them as you suggest: LE_BHW -> LE_WORD_0 LE_CPW -> LE_WORD_1 LE_DBL -> LE_DWORD_0 LE_CPD -> LE_DWORD_1 nemanjai: OK, I will rename them as you suggest: LE_BHW -> LE_WORD_0 LE_CPW -> LE_WORD_1 LE_DBL ->…

		let Predicates = [IsBigEndian, HasP8Vector] in {
		def : Pat<(v4f32 (scalar_to_vector f32:$A)),
		(v4f32 (XSCVDPSPN $A))>;
		} // IsBigEndian, HasP8Vector

		let Predicates = [IsBigEndian, HasDirectMove] in {
		def : Pat<(v16i8 (scalar_to_vector i32:$A)),
		(v16i8 (COPY_TO_REGCLASS Moves.BE_BYTE_0, VSRC))>;
		def : Pat<(v8i16 (scalar_to_vector i32:$A)),
		(v8i16 (COPY_TO_REGCLASS Moves.BE_HALF_0, VSRC))>;
		def : Pat<(v4i32 (scalar_to_vector i32:$A)),
		(v4i32 (COPY_TO_REGCLASS Moves.BE_WORD_0, VSRC))>;
		def : Pat<(v2i64 (scalar_to_vector i64:$A)),
		(v2i64 (COPY_TO_REGCLASS Moves.DBLWORD_0, VSRC))>;
		} // IsBigEndian, HasDirectMove

		let Predicates = [IsLittleEndian, HasP8Vector] in {
		def : Pat<(v4f32 (scalar_to_vector f32:$A)),
		(v4f32 (XXSLDWI (XSCVDPSPN $A), (XSCVDPSPN $A), 1))>;
		} // IsLittleEndian, HasP8Vector

		let Predicates = [IsLittleEndian, HasDirectMove] in {
		def : Pat<(v16i8 (scalar_to_vector i32:$A)),
		(v16i8 (COPY_TO_REGCLASS Moves.LE_BHW, VSRC))>;
		def : Pat<(v8i16 (scalar_to_vector i32:$A)),
		(v8i16 (COPY_TO_REGCLASS Moves.LE_BHW, VSRC))>;
		def : Pat<(v4i32 (scalar_to_vector i32:$A)),
		(v4i32 (COPY_TO_REGCLASS Moves.LE_BHW, VSRC))>;
		def : Pat<(v2i64 (scalar_to_vector i64:$A)),
		(v2i64 Moves.LE_DBL)>;
		} // IsLittleEndian, HasDirectMove

lib/Target/PowerPC/PPCVSXCopy.cpp

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	struct PPCVSXCopy : public MachineFunctionPass {
bool IsVRReg(unsigned Reg, MachineRegisterInfo &MRI) {		bool IsVRReg(unsigned Reg, MachineRegisterInfo &MRI) {
return IsRegInClass(Reg, &PPC::VRRCRegClass, MRI);		return IsRegInClass(Reg, &PPC::VRRCRegClass, MRI);
}		}

bool IsF8Reg(unsigned Reg, MachineRegisterInfo &MRI) {		bool IsF8Reg(unsigned Reg, MachineRegisterInfo &MRI) {
return IsRegInClass(Reg, &PPC::F8RCRegClass, MRI);		return IsRegInClass(Reg, &PPC::F8RCRegClass, MRI);
}		}

		bool IsVSFReg(unsigned Reg, MachineRegisterInfo &MRI) {
		return IsRegInClass(Reg, &PPC::VSFRCRegClass, MRI);
		}

protected:		protected:
bool processBlock(MachineBasicBlock &MBB) {		bool processBlock(MachineBasicBlock &MBB) {
bool Changed = false;		bool Changed = false;

MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();		MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
for (MachineBasicBlock::iterator I = MBB.begin(), IE = MBB.end();		for (MachineBasicBlock::iterator I = MBB.begin(), IE = MBB.end();
I != IE; ++I) {		I != IE; ++I) {
MachineInstr *MI = I;		MachineInstr *MI = I;
if (!MI->isFullCopy())		if (!MI->isFullCopy())
continue;		continue;

MachineOperand &DstMO = MI->getOperand(0);		MachineOperand &DstMO = MI->getOperand(0);
MachineOperand &SrcMO = MI->getOperand(1);		MachineOperand &SrcMO = MI->getOperand(1);

if ( IsVSReg(DstMO.getReg(), MRI) &&		if ( IsVSReg(DstMO.getReg(), MRI) &&
!IsVSReg(SrcMO.getReg(), MRI)) {		!IsVSReg(SrcMO.getReg(), MRI)) {
// This is a copy to a VSX register from a non-VSX register.		// This is a copy to a VSX register from a non-VSX register.
Changed = true;		Changed = true;

const TargetRegisterClass *SrcRC =		const TargetRegisterClass *SrcRC =
IsVRReg(SrcMO.getReg(), MRI) ? &PPC::VSHRCRegClass :		IsVRReg(SrcMO.getReg(), MRI) ? &PPC::VSHRCRegClass :
&PPC::VSLRCRegClass;		&PPC::VSLRCRegClass;
assert((IsF8Reg(SrcMO.getReg(), MRI) \|\|		assert((IsF8Reg(SrcMO.getReg(), MRI) \|\|
IsVRReg(SrcMO.getReg(), MRI)) &&		IsVRReg(SrcMO.getReg(), MRI) \|\|
		IsVSFReg(SrcMO.getReg(), MRI)) &&
"Unknown source for a VSX copy");		"Unknown source for a VSX copy");

unsigned NewVReg = MRI.createVirtualRegister(SrcRC);		unsigned NewVReg = MRI.createVirtualRegister(SrcRC);
BuildMI(MBB, MI, MI->getDebugLoc(),		BuildMI(MBB, MI, MI->getDebugLoc(),
TII->get(TargetOpcode::SUBREG_TO_REG), NewVReg)		TII->get(TargetOpcode::SUBREG_TO_REG), NewVReg)
.addImm(1) // add 1, not 0, because there is no implicit clearing		.addImm(1) // add 1, not 0, because there is no implicit clearing
// of the high bits.		// of the high bits.
.addOperand(SrcMO)		.addOperand(SrcMO)
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll

	Show All 18 Lines
	entry:			entry:
	%arg.addr = alloca i8, align 1			%arg.addr = alloca i8, align 1
	store i8 %arg, i8* %arg.addr, align 1			store i8 %arg, i8* %arg.addr, align 1
	%0 = load i8, i8* %arg.addr, align 1			%0 = load i8, i8* %arg.addr, align 1
	%conv = uitofp i8 %0 to float			%conv = uitofp i8 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z6testfcc			; CHECK-LABEL: @_Z6testfcc
	; CHECK: mtvsrwz [[MOVEREG01:[0-9]+]], 3			; CHECK: mtvsrwz [[MOVEREG01:[0-9]+]], 3
	; FIXME: Once we have XSCVUXDSP implemented, this will change			; CHECK: xscvuxdsp 1, [[MOVEREG01]]
	; CHECK: fcfidus 1, [[MOVEREG01]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define zeroext i8 @_Z6testcdd(double %arg) {			define zeroext i8 @_Z6testcdd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i8, align 1			%arg.addr = alloca i8, align 1
	store i8 %arg, i8* %arg.addr, align 1			store i8 %arg, i8* %arg.addr, align 1
	%0 = load i8, i8* %arg.addr, align 1			%0 = load i8, i8* %arg.addr, align 1
	%conv = uitofp i8 %0 to float			%conv = uitofp i8 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z7testfuch			; CHECK-LABEL: @_Z7testfuch
	; CHECK: mtvsrwz [[MOVEREG03:[0-9]+]], 3			; CHECK: mtvsrwz [[MOVEREG03:[0-9]+]], 3
	; FIXME: Once we have XSCVUXDSP implemented, this will change			; CHECK: xscvuxdsp 1, [[MOVEREG03]]
	; CHECK: fcfidus 1, [[MOVEREG03]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define zeroext i8 @_Z7testucdd(double %arg) {			define zeroext i8 @_Z7testucdd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i16, align 2			%arg.addr = alloca i16, align 2
	store i16 %arg, i16* %arg.addr, align 2			store i16 %arg, i16* %arg.addr, align 2
	%0 = load i16, i16* %arg.addr, align 2			%0 = load i16, i16* %arg.addr, align 2
	%conv = sitofp i16 %0 to float			%conv = sitofp i16 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z6testfss			; CHECK-LABEL: @_Z6testfss
	; CHECK: mtvsrwa [[MOVEREG05:[0-9]+]], 3			; CHECK: mtvsrwa [[MOVEREG05:[0-9]+]], 3
	; FIXME: Once we have XSCVSXDSP implemented, this will change			; CHECK: xscvsxdsp 1, [[MOVEREG05]]
	; CHECK: fcfids 1, [[MOVEREG05]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define signext i16 @_Z6testsdd(double %arg) {			define signext i16 @_Z6testsdd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i16, align 2			%arg.addr = alloca i16, align 2
	store i16 %arg, i16* %arg.addr, align 2			store i16 %arg, i16* %arg.addr, align 2
	%0 = load i16, i16* %arg.addr, align 2			%0 = load i16, i16* %arg.addr, align 2
	%conv = uitofp i16 %0 to float			%conv = uitofp i16 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z7testfust			; CHECK-LABEL: @_Z7testfust
	; CHECK: mtvsrwz [[MOVEREG07:[0-9]+]], 3			; CHECK: mtvsrwz [[MOVEREG07:[0-9]+]], 3
	; FIXME: Once we have XSCVUXDSP implemented, this will change			; CHECK: xscvuxdsp 1, [[MOVEREG07]]
	; CHECK: fcfidus 1, [[MOVEREG07]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define zeroext i16 @_Z7testusdd(double %arg) {			define zeroext i16 @_Z7testusdd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i32, align 4			%arg.addr = alloca i32, align 4
	store i32 %arg, i32* %arg.addr, align 4			store i32 %arg, i32* %arg.addr, align 4
	%0 = load i32, i32* %arg.addr, align 4			%0 = load i32, i32* %arg.addr, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z6testfii			; CHECK-LABEL: @_Z6testfii
	; CHECK: mtvsrwa [[MOVEREG09:[0-9]+]], 3			; CHECK: mtvsrwa [[MOVEREG09:[0-9]+]], 3
	; FIXME: Once we have XSCVSXDSP implemented, this will change			; CHECK: xscvsxdsp 1, [[MOVEREG09]]
	; CHECK: fcfids 1, [[MOVEREG09]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define signext i32 @_Z6testidd(double %arg) {			define signext i32 @_Z6testidd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i32, align 4			%arg.addr = alloca i32, align 4
	store i32 %arg, i32* %arg.addr, align 4			store i32 %arg, i32* %arg.addr, align 4
	%0 = load i32, i32* %arg.addr, align 4			%0 = load i32, i32* %arg.addr, align 4
	%conv = uitofp i32 %0 to float			%conv = uitofp i32 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z7testfuij			; CHECK-LABEL: @_Z7testfuij
	; CHECK: mtvsrwz [[MOVEREG11:[0-9]+]], 3			; CHECK: mtvsrwz [[MOVEREG11:[0-9]+]], 3
	; FIXME: Once we have XSCVUXDSP implemented, this will change			; CHECK: xscvuxdsp 1, [[MOVEREG11]]
	; CHECK: fcfidus 1, [[MOVEREG11]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define zeroext i32 @_Z7testuidd(double %arg) {			define zeroext i32 @_Z7testuidd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i64, align 8			%arg.addr = alloca i64, align 8
	store i64 %arg, i64* %arg.addr, align 8			store i64 %arg, i64* %arg.addr, align 8
	%0 = load i64, i64* %arg.addr, align 8			%0 = load i64, i64* %arg.addr, align 8
	%conv = sitofp i64 %0 to float			%conv = sitofp i64 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL:@_Z7testfllx			; CHECK-LABEL:@_Z7testfllx
	; CHECK: mtvsrd [[MOVEREG13:[0-9]+]], 3			; CHECK: mtvsrd [[MOVEREG13:[0-9]+]], 3
	; FIXME: Once we have XSCVSXDSP implemented, this will change			; CHECK: xscvsxdsp 1, [[MOVEREG13]]
	; CHECK: fcfids 1, [[MOVEREG13]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define i64 @_Z7testlldd(double %arg) {			define i64 @_Z7testlldd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 35 Lines
	entry:			entry:
	%arg.addr = alloca i64, align 8			%arg.addr = alloca i64, align 8
	store i64 %arg, i64* %arg.addr, align 8			store i64 %arg, i64* %arg.addr, align 8
	%0 = load i64, i64* %arg.addr, align 8			%0 = load i64, i64* %arg.addr, align 8
	%conv = uitofp i64 %0 to float			%conv = uitofp i64 %0 to float
	ret float %conv			ret float %conv
	; CHECK-LABEL: @_Z8testfully			; CHECK-LABEL: @_Z8testfully
	; CHECK: mtvsrd [[MOVEREG15:[0-9]+]], 3			; CHECK: mtvsrd [[MOVEREG15:[0-9]+]], 3
	; FIXME: Once we have XSCVUXDSP implemented, this will change			; CHECK: xscvuxdsp 1, [[MOVEREG15]]
	; CHECK: fcfidus 1, [[MOVEREG15]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define i64 @_Z8testulldd(double %arg) {			define i64 @_Z8testulldd(double %arg) {
	entry:			entry:
	%arg.addr = alloca double, align 8			%arg.addr = alloca double, align 8
	store double %arg, double* %arg.addr, align 8			store double %arg, double* %arg.addr, align 8
	%0 = load double, double* %arg.addr, align 8			%0 = load double, double* %arg.addr, align 8
	Show All 19 Lines

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll

				; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s
				; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s -check-prefix=CHECK-LE

				; Function Attrs: nounwind
				define <16 x i8> @buildc(i8 zeroext %a) {
				entry:
				%a.addr = alloca i8, align 1
				store i8 %a, i8* %a.addr, align 1
				%0 = load i8, i8* %a.addr, align 1
				%splat.splatinsert = insertelement <16 x i8> undef, i8 %0, i32 0
				%splat.splat = shufflevector <16 x i8> %splat.splatinsert, <16 x i8> undef, <16 x i32> zeroinitializer
				ret <16 x i8> %splat.splat
				; CHECK: sldi [[REG1:[0-9]+]], 3, 56
				; CHECK: mtvsrd {{[0-9]+}}, [[REG1]]
				; CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3
				; CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]
				}

				; Function Attrs: nounwind
				define <8 x i16> @builds(i16 zeroext %a) {
				entry:
				%a.addr = alloca i16, align 2
				store i16 %a, i16* %a.addr, align 2
				%0 = load i16, i16* %a.addr, align 2
				%splat.splatinsert = insertelement <8 x i16> undef, i16 %0, i32 0
				%splat.splat = shufflevector <8 x i16> %splat.splatinsert, <8 x i16> undef, <8 x i32> zeroinitializer
				ret <8 x i16> %splat.splat
				; CHECK: sldi [[REG1:[0-9]+]], 3, 48
				; CHECK: mtvsrd {{[0-9]+}}, [[REG1]]
				; CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3
				; CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]
				}

				; Function Attrs: nounwind
				define <4 x i32> @buildi(i32 zeroext %a) {
				entry:
				%a.addr = alloca i32, align 4
				store i32 %a, i32* %a.addr, align 4
				%0 = load i32, i32* %a.addr, align 4
				%splat.splatinsert = insertelement <4 x i32> undef, i32 %0, i32 0
				%splat.splat = shufflevector <4 x i32> %splat.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer
				ret <4 x i32> %splat.splat
				; CHECK: sldi [[REG1:[0-9]+]], 3, 32
				; CHECK: mtvsrd {{[0-9]+}}, [[REG1]]
				; CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3
				; CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]
				}

				; Function Attrs: nounwind
				define <2 x i64> @buildl(i64 %a) {
				entry:
				%a.addr = alloca i64, align 8
				store i64 %a, i64* %a.addr, align 8
				%0 = load i64, i64* %a.addr, align 8
				%splat.splatinsert = insertelement <2 x i64> undef, i64 %0, i32 0
				%splat.splat = shufflevector <2 x i64> %splat.splatinsert, <2 x i64> undef, <2 x i32> zeroinitializer
				ret <2 x i64> %splat.splat
				; CHECK: mtvsrd {{[0-9]+}}, 3
				; CHECK-LE: mtvsrd [[REG1:[0-9]+]], 3
				; CHECK-LE: xxswapd {{[0-9]+}}, [[REG1]]
				}

				; Function Attrs: nounwind
				define <4 x float> @buildf(float %a) {
				entry:
				%a.addr = alloca float, align 4
				store float %a, float* %a.addr, align 4
				%0 = load float, float* %a.addr, align 4
				%splat.splatinsert = insertelement <4 x float> undef, float %0, i32 0
				%splat.splat = shufflevector <4 x float> %splat.splatinsert, <4 x float> undef, <4 x i32> zeroinitializer
				ret <4 x float> %splat.splat
				; CHECK: xscvdpspn {{[0-9]+}}, 1
				; CHECK-LE: xscvdpspn [[REG1:[0-9]+]], 1
				; CHECK-LE: xxsldwi {{[0-9]+}}, [[REG1]], [[REG1]], 1
				wschmidtUnsubmitted Not Done Reply Inline Actions I was confused by this at first, but now I get it. In all of these tests, the code generation that you are checking for is just for the %splat.splatinsert calculation, right? This is the part that translates into a scalar_to_vector node. I assume that the generated code follows this up with code to splat element 0 to the entire result register, correct? wschmidt: I was confused by this at first, but now I get it. In all of these tests, the code generation…
				nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Yes, the subsequent instruction will be a splat (VSX if we have one, VMX otherwise). I can add those to the check patterns but, I can't verify the correct register (because of the "off-by-32 relationship" between the VSX and VMX registers). I was actually hoping to use the FileCheck capability of checking line numbers (ranges), but that actually looks for the word line I think. So I can add the CHECK: vsplat[bhw] and just confirm the right element in the splat instruction if you'd like. nemanjai: Yes, the subsequent instruction will be a splat (VSX if we have one, VMX otherwise). I can add…
				wschmidtUnsubmitted Not Done Reply Inline Actions No, that's ok. Just add a comment at the beginning of the file indicating that you are just checking the code generated for the insertelement. Alternatively, you can make this more obvious by putting the CHECK comment lines right after the insertelement in each test variant. wschmidt: No, that's ok. Just add a comment at the beginning of the file indicating that you are just…
				}

test/CodeGen/PowerPC/vsx.ll

	Show First 20 Lines • Show All 1,220 Lines • ▼ Show 20 Lines
	; CHECK-FISL-DAG: addi [[R2:[0-9]+]], 1, -16			; CHECK-FISL-DAG: addi [[R2:[0-9]+]], 1, -16
	; CHECK-FISL-DAG: addi [[R3:[0-9]+]], 3, 2			; CHECK-FISL-DAG: addi [[R3:[0-9]+]], 3, 2
	; CHECK-FISL-DAG: std [[R1]], -8(1)			; CHECK-FISL-DAG: std [[R1]], -8(1)
	; CHECK-FISL-DAG: std [[R3]], -16(1)			; CHECK-FISL-DAG: std [[R3]], -16(1)
	; CHECK-FISL-DAG: lxvd2x 0, 0, [[R2]]			; CHECK-FISL-DAG: lxvd2x 0, 0, [[R2]]
	; CHECK-FISL: blr			; CHECK-FISL: blr

	; CHECK-LE-LABEL: @test80			; CHECK-LE-LABEL: @test80
	; CHECK-LE-DAG: addi [[R1:[0-9]+]], 1, -16			; CHECK-LE-DAG: mtvsrd [[R1:[0-9]+]], 3
	; CHECK-LE-DAG: addi [[R2:[0-9]+]], {{[0-9]+}}, .LCPI			; CHECK-LE-DAG: addi [[R2:[0-9]+]], {{[0-9]+}}, .LCPI
	; CHECK-LE-DAG: lxvd2x [[V1:[0-9]+]], 0, [[R1]]			; CHECK-LE-DAG: xxswapd [[V1:[0-9]+]], [[R1]]
	; CHECK-LE-DAG: lxvd2x [[V2:[0-9]+]], 0, [[R2]]			; CHECK-LE-DAG: lxvd2x [[V2:[0-9]+]], 0, [[R2]]
	; CHECK-LE-DAG: xxswapd 34, [[V1]]			; CHECK-LE-DAG: xxspltd 34, [[V1]]
	; CHECK-LE-DAG: xxswapd 35, [[V2]]			; CHECK-LE-DAG: xxswapd 35, [[V2]]
	; CHECK-LE: vaddudm 2, 2, 3			; CHECK-LE: vaddudm 2, 2, 3
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

	define <2 x double> @test81(<4 x float> %b) {			define <2 x double> @test81(<4 x float> %b) {
	%w = bitcast <4 x float> %b to <2 x double>			%w = bitcast <4 x float> %b to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w
	Show All 26 Lines

test/CodeGen/PowerPC/vsx_scalar_ld_st.ll

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	entry:			entry:
	%ff = alloca float, align 4			%ff = alloca float, align 4
	%0 = load i32, i32* @i, align 4			%0 = load i32, i32* @i, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	store volatile float %conv, float* %ff, align 4			store volatile float %conv, float* %ff, align 4
	ret void			ret void
	; CHECK-LABEL: @intToFlt			; CHECK-LABEL: @intToFlt
	; CHECK: lxsiwax [[REGLD2:[0-9]+]],			; CHECK: lxsiwax [[REGLD2:[0-9]+]],
	; FIXME: the below will change when the VSX form is implemented			; CHECK: xscvsxdsp {{[0-9]}}, [[REGLD2]]
	; CHECK: fcfids {{[0-9]}}, [[REGLD2]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @dblToUInt() #0 {			define void @dblToUInt() #0 {
	entry:			entry:
	%uiui = alloca i32, align 4			%uiui = alloca i32, align 4
	%0 = load double, double* @d, align 8			%0 = load double, double* @d, align 8
	%conv = fptoui double %0 to i32			%conv = fptoui double %0 to i32
	Show All 35 Lines
	entry:			entry:
	%ff = alloca float, align 4			%ff = alloca float, align 4
	%0 = load i32, i32* @ui, align 4			%0 = load i32, i32* @ui, align 4
	%conv = uitofp i32 %0 to float			%conv = uitofp i32 %0 to float
	store volatile float %conv, float* %ff, align 4			store volatile float %conv, float* %ff, align 4
	ret void			ret void
	; CHECK-LABEL: @uIntToFlt			; CHECK-LABEL: @uIntToFlt
	; CHECK: lxsiwzx [[REGLD4:[0-9]+]],			; CHECK: lxsiwzx [[REGLD4:[0-9]+]],
	; FIXME: the below will change when the VSX form is implemented			; CHECK: xscvuxdsp {{[0-9]+}}, [[REGLD4]]
	; CHECK: fcfidus {{[0-9]+}}, [[REGLD4]]
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define void @dblToFloat() #0 {			define void @dblToFloat() #0 {
	entry:			entry:
	%ff = alloca float, align 4			%ff = alloca float, align 4
	%0 = load double, double* @d, align 8			%0 = load double, double* @d, align 8
	%conv = fptrunc double %0 to float			%conv = fptrunc double %0 to float
	Show All 19 Lines

test/MC/Disassembler/PowerPC/vsx.txt

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	0xf3 0x1f 0xd9 0x1c			0xf3 0x1f 0xd9 0x1c

	# CHECK: xscpsgndp 7, 63, 27			# CHECK: xscpsgndp 7, 63, 27
	0xf0 0xff 0xdd 0x84			0xf0 0xff 0xdd 0x84

	# CHECK: xscvdpsp 7, 27			# CHECK: xscvdpsp 7, 27
	0xf0 0xe0 0xdc 0x24			0xf0 0xe0 0xdc 0x24

				# CHECK: xscvdpspn 7, 27
				0xf0 0xe0 0xdc 0x2c

	# CHECK: xscvdpsxds 7, 27			# CHECK: xscvdpsxds 7, 27
	0xf0 0xe0 0xdd 0x60			0xf0 0xe0 0xdd 0x60

	# CHECK: xscvdpsxws 7, 27			# CHECK: xscvdpsxws 7, 27
	0xf0 0xe0 0xd9 0x60			0xf0 0xe0 0xd9 0x60

	# CHECK: xscvdpuxds 7, 27			# CHECK: xscvdpuxds 7, 27
	0xf0 0xe0 0xdd 0x20			0xf0 0xe0 0xdd 0x20

	# CHECK: xscvdpuxws 7, 27			# CHECK: xscvdpuxws 7, 27
	0xf0 0xe0 0xd9 0x20			0xf0 0xe0 0xd9 0x20

	# CHECK: xscvspdp 7, 27			# CHECK: xscvspdp 7, 27
	0xf0 0xe0 0xdd 0x24			0xf0 0xe0 0xdd 0x24

				# CHECK: xscvspdpn 7, 27
				0xf0 0xe0 0xdd 0x2c

				# CHECK: xscvsxdsp 7, 27
				0xf0 0xe0 0xdc 0xe0

	# CHECK: xscvsxddp 7, 27			# CHECK: xscvsxddp 7, 27
	0xf0 0xe0 0xdd 0xe0			0xf0 0xe0 0xdd 0xe0

				# CHECK: xscvuxdsp 7, 27
				0xf0 0xe0 0xdc 0xa0

	# CHECK: xscvuxddp 7, 27			# CHECK: xscvuxddp 7, 27
	0xf0 0xe0 0xdd 0xa0			0xf0 0xe0 0xdd 0xa0

	# CHECK: xsdivsp 7, 63, 27			# CHECK: xsdivsp 7, 63, 27
	0xf0 0xff 0xd8 0xc4			0xf0 0xff 0xd8 0xc4

	# CHECK: xsdivdp 7, 63, 27			# CHECK: xsdivdp 7, 63, 27
	0xf0 0xff 0xd9 0xc4			0xf0 0xff 0xd9 0xc4
	▲ Show 20 Lines • Show All 444 Lines • Show Last 20 Lines

test/MC/PowerPC/vsx.s

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	# CHECK-LE: xscmpudp 6, 63, 27 # encoding: [0x1c,0xd9,0x1f,0xf3]			# CHECK-LE: xscmpudp 6, 63, 27 # encoding: [0x1c,0xd9,0x1f,0xf3]
	xscmpudp 6, 63, 27			xscmpudp 6, 63, 27
	# CHECK-BE: xscpsgndp 7, 63, 27 # encoding: [0xf0,0xff,0xdd,0x84]			# CHECK-BE: xscpsgndp 7, 63, 27 # encoding: [0xf0,0xff,0xdd,0x84]
	# CHECK-LE: xscpsgndp 7, 63, 27 # encoding: [0x84,0xdd,0xff,0xf0]			# CHECK-LE: xscpsgndp 7, 63, 27 # encoding: [0x84,0xdd,0xff,0xf0]
	xscpsgndp 7, 63, 27			xscpsgndp 7, 63, 27
	# CHECK-BE: xscvdpsp 7, 27 # encoding: [0xf0,0xe0,0xdc,0x24]			# CHECK-BE: xscvdpsp 7, 27 # encoding: [0xf0,0xe0,0xdc,0x24]
	# CHECK-LE: xscvdpsp 7, 27 # encoding: [0x24,0xdc,0xe0,0xf0]			# CHECK-LE: xscvdpsp 7, 27 # encoding: [0x24,0xdc,0xe0,0xf0]
	xscvdpsp 7, 27			xscvdpsp 7, 27
				# CHECK-BE: xscvdpspn 7, 27 # encoding: [0xf0,0xe0,0xdc,0x2c]
				# CHECK-LE: xscvdpspn 7, 27 # encoding: [0x2c,0xdc,0xe0,0xf0]
				xscvdpspn 7, 27
	# CHECK-BE: xscvdpsxds 7, 27 # encoding: [0xf0,0xe0,0xdd,0x60]			# CHECK-BE: xscvdpsxds 7, 27 # encoding: [0xf0,0xe0,0xdd,0x60]
	# CHECK-LE: xscvdpsxds 7, 27 # encoding: [0x60,0xdd,0xe0,0xf0]			# CHECK-LE: xscvdpsxds 7, 27 # encoding: [0x60,0xdd,0xe0,0xf0]
	xscvdpsxds 7, 27			xscvdpsxds 7, 27
	# CHECK-BE: xscvdpsxws 7, 27 # encoding: [0xf0,0xe0,0xd9,0x60]			# CHECK-BE: xscvdpsxws 7, 27 # encoding: [0xf0,0xe0,0xd9,0x60]
	# CHECK-LE: xscvdpsxws 7, 27 # encoding: [0x60,0xd9,0xe0,0xf0]			# CHECK-LE: xscvdpsxws 7, 27 # encoding: [0x60,0xd9,0xe0,0xf0]
	xscvdpsxws 7, 27			xscvdpsxws 7, 27
	# CHECK-BE: xscvdpuxds 7, 27 # encoding: [0xf0,0xe0,0xdd,0x20]			# CHECK-BE: xscvdpuxds 7, 27 # encoding: [0xf0,0xe0,0xdd,0x20]
	# CHECK-LE: xscvdpuxds 7, 27 # encoding: [0x20,0xdd,0xe0,0xf0]			# CHECK-LE: xscvdpuxds 7, 27 # encoding: [0x20,0xdd,0xe0,0xf0]
	xscvdpuxds 7, 27			xscvdpuxds 7, 27
	# CHECK-BE: xscvdpuxws 7, 27 # encoding: [0xf0,0xe0,0xd9,0x20]			# CHECK-BE: xscvdpuxws 7, 27 # encoding: [0xf0,0xe0,0xd9,0x20]
	# CHECK-LE: xscvdpuxws 7, 27 # encoding: [0x20,0xd9,0xe0,0xf0]			# CHECK-LE: xscvdpuxws 7, 27 # encoding: [0x20,0xd9,0xe0,0xf0]
	xscvdpuxws 7, 27			xscvdpuxws 7, 27
	# CHECK-BE: xscvspdp 7, 27 # encoding: [0xf0,0xe0,0xdd,0x24]			# CHECK-BE: xscvspdp 7, 27 # encoding: [0xf0,0xe0,0xdd,0x24]
	# CHECK-LE: xscvspdp 7, 27 # encoding: [0x24,0xdd,0xe0,0xf0]			# CHECK-LE: xscvspdp 7, 27 # encoding: [0x24,0xdd,0xe0,0xf0]
	xscvspdp 7, 27			xscvspdp 7, 27
				# CHECK-BE: xscvspdpn 7, 27 # encoding: [0xf0,0xe0,0xdd,0x2c]
				# CHECK-LE: xscvspdpn 7, 27 # encoding: [0x2c,0xdd,0xe0,0xf0]
				xscvspdpn 7, 27
				# CHECK-BE: xscvsxdsp 7, 27 # encoding: [0xf0,0xe0,0xdc,0xe0]
				# CHECK-LE: xscvsxdsp 7, 27 # encoding: [0xe0,0xdc,0xe0,0xf0]
				xscvsxdsp 7, 27
	# CHECK-BE: xscvsxddp 7, 27 # encoding: [0xf0,0xe0,0xdd,0xe0]			# CHECK-BE: xscvsxddp 7, 27 # encoding: [0xf0,0xe0,0xdd,0xe0]
	# CHECK-LE: xscvsxddp 7, 27 # encoding: [0xe0,0xdd,0xe0,0xf0]			# CHECK-LE: xscvsxddp 7, 27 # encoding: [0xe0,0xdd,0xe0,0xf0]
	xscvsxddp 7, 27			xscvsxddp 7, 27
				# CHECK-BE: xscvuxdsp 7, 27 # encoding: [0xf0,0xe0,0xdc,0xa0]
				# CHECK-LE: xscvuxdsp 7, 27 # encoding: [0xa0,0xdc,0xe0,0xf0]
				xscvuxdsp 7, 27
	# CHECK-BE: xscvuxddp 7, 27 # encoding: [0xf0,0xe0,0xdd,0xa0]			# CHECK-BE: xscvuxddp 7, 27 # encoding: [0xf0,0xe0,0xdd,0xa0]
	# CHECK-LE: xscvuxddp 7, 27 # encoding: [0xa0,0xdd,0xe0,0xf0]			# CHECK-LE: xscvuxddp 7, 27 # encoding: [0xa0,0xdd,0xe0,0xf0]
	xscvuxddp 7, 27			xscvuxddp 7, 27
	# CHECK-BE: xsdivsp 7, 63, 27 # encoding: [0xf0,0xff,0xd8,0xc4]			# CHECK-BE: xsdivsp 7, 63, 27 # encoding: [0xf0,0xff,0xd8,0xc4]
	# CHECK-LE: xsdivsp 7, 63, 27 # encoding: [0xc4,0xd8,0xff,0xf0]			# CHECK-LE: xsdivsp 7, 63, 27 # encoding: [0xc4,0xd8,0xff,0xf0]
	xsdivsp 7, 63, 27			xsdivsp 7, 63, 27
	# CHECK-BE: xsdivdp 7, 63, 27 # encoding: [0xf0,0xff,0xd9,0xc4]			# CHECK-BE: xsdivdp 7, 63, 27 # encoding: [0xf0,0xff,0xd9,0xc4]
	# CHECK-LE: xsdivdp 7, 63, 27 # encoding: [0xc4,0xd9,0xff,0xf0]			# CHECK-LE: xsdivdp 7, 63, 27 # encoding: [0xc4,0xd9,0xff,0xf0]
	▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Scalar to vector conversions using direct movesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 30861

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCInstrVSX.td

lib/Target/PowerPC/PPCVSXCopy.cpp

test/CodeGen/PowerPC/fp-int-conversions-direct-moves.ll

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll

test/CodeGen/PowerPC/vsx.ll

test/CodeGen/PowerPC/vsx_scalar_ld_st.ll

test/MC/Disassembler/PowerPC/vsx.txt

test/MC/PowerPC/vsx.s

Scalar to vector conversions using direct moves
ClosedPublic