This is an archive of the discontinued LLVM Phabricator instance.

Differential D20443

[PowerPC] - Legalize illegal vector types by widening rather than integer promotion
ClosedPublic

Authored by nemanjai on May 19 2016, 11:26 AM.

Download Raw Diff

Details

Reviewers

wschmidt
cycheng
kbarton
amehsan
hfinkel

Summary

The legalization for an ISD::LOAD node when the type is v4i8 will convert it into an extended load to a vector type which ends up producing very bad code (see below). This patch simply adds a DAG combine that will convert such a load into an i32 load followed by a bitcast. This ends up collapsing into something more usable.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 57828.May 19 2016, 11:26 AM

nemanjai retitled this revision from to [PowerPC] - Combine loads of v4i8 to loads of i32 followed by bitcast.

nemanjai updated this object.

nemanjai added reviewers: hfinkel, kbarton, amehsan, cycheng, wschmidt.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added a subscriber: llvm-commits.

junbuml added a subscriber: junbuml.May 19 2016, 11:30 AM

An example of the type of code we were getting...
With a code pattern such as this:

define <16 x i8> @test(i32* %s, i32* %t) {
entry:
  %0 = bitcast i32* %s to <4 x i8>*
  %1 = load <4 x i8>, <4 x i8>* %0, align 4
  %2 = shufflevector <4 x i8> %1, <4 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
  ret <16 x i8> %2
}

We now get the following:

lwz 3, 0(3)
mtvsrd 34, 3
xxswapd  0, 34
xxspltw 34, 0, 3

and before this change, we were getting:

lbz 5, 0(3)
lbz 6, 1(3)
addis 4, 2, .LCPI0_0@toc@ha
addi 4, 4, .LCPI0_0@toc@l
mtvsrd 34, 5
lbz 5, 2(3)
lbz 3, 3(3)
lxvd2x 0, 0, 4
mtvsrd 35, 6
xxswapd  34, 34
mtvsrd 36, 5
mtvsrd 37, 3
xxswapd  35, 35
xxswapd  36, 36
xxswapd  37, 37
vmrglw 2, 3, 2
xxswapd  50, 0
vmrglw 19, 5, 4
vperm 2, 19, 2, 18
xxsldwi 12, 34, 34, 2
mfvsrwz 3, 34
xxsldwi 1, 34, 34, 1
xxsldwi 2, 34, 34, 3
mtvsrd 34, 3
mfvsrwz 3, 12
mfvsrwz 4, 1
mtvsrd 35, 3
mfvsrwz 3, 2
mtvsrd 36, 4
mtvsrd 37, 3
addis 3, 2, .LCPI0_1@toc@ha
xxswapd  34, 34
addi 3, 3, .LCPI0_1@toc@l
xxswapd  37, 37
lxvd2x 13, 0, 3
xxswapd  35, 35
xxswapd  36, 36
vmrglb 2, 5, 2
xxswapd  51, 13
vmrglb 3, 4, 3
vmrglh 2, 2, 3
xxspltw 34, 34, 3
vperm 2, 2, 2, 19

The TOC loads above were for materializing constant mask vectors for the vector shuffles that this degrades into.

This code pattern is a simulation of one that comes out of SROA and this patch provides a big improvement in one of the benchmarks.

Looks like you forgot the test case in the patch.

In D20443#434737, @hfinkel wrote:

Looks like you forgot the test case in the patch.

Argh, yes I did. Too busy putting the details into the comments :D. I'll re-post in a few minutes (and one of these days I'll get used to running "svn add").

Added the test case that I forgot in the initial patch.

amehsan added inline comments.May 20 2016, 7:01 AM

lib/Target/PowerPC/PPCISelLowering.cpp
10622–10635	Instead of fixing the issue for this particular pattern, can't we change type legalization, so that it always converts v4i8 to i32? This fixes the problem at hand, but if we have a different code pattern that includes v4i8, the current way of legalizing v4i8 in type legalization will kick in which seems to generate inefficient code, by promoting it to a larger vector and adding permutes and similar instructions. I think we may need to change how type legalization handles v4i8, instead of fixing this particular pattern.

hfinkel added inline comments.May 20 2016, 10:24 AM

lib/Target/PowerPC/PPCISelLowering.cpp

10622–10635

To this point, we currently have code like this:

// We promote all non-typed operations to v4i32.
setOperationAction(ISD::AND   , VT, Promote);
AddPromotedToType (ISD::AND   , VT, MVT::v4i32);
setOperationAction(ISD::OR    , VT, Promote);
AddPromotedToType (ISD::OR    , VT, MVT::v4i32);
setOperationAction(ISD::XOR   , VT, Promote);
AddPromotedToType (ISD::XOR   , VT, MVT::v4i32);
setOperationAction(ISD::LOAD  , VT, Promote);
AddPromotedToType (ISD::LOAD  , VT, MVT::v4i32);
setOperationAction(ISD::SELECT, VT, Promote);
AddPromotedToType (ISD::SELECT, VT, MVT::v4i32);
setOperationAction(ISD::SELECT_CC, VT, Promote);
AddPromotedToType (ISD::SELECT_CC, VT, MVT::v4i32);
setOperationAction(ISD::STORE, VT, Promote);
AddPromotedToType (ISD::STORE, VT, MVT::v4i32);

maybe something similar would work in this area as well?

nemanjai added inline comments.May 20 2016, 10:44 AM

lib/Target/PowerPC/PPCISelLowering.cpp
10622–10635	Actually, you bring up a very good point. We really should be doing something better with legalization of v4i8 (and I imagine all vectors narrower than vectors our hardware actually handles). However, I don't think we can legalize it as a scalar type. We should actually be widening the vector (rather than promoting the integer element type). I'll re-post this patch to do that instead.

This is a fundamentally different approach from the first attempt. As Ehsan suggested, I've updated how we legalize non-legal vector types. With this patch, if we do not support the vector type, we will widen it rather than performing an integer promotion which would often require scalarizing. However, widening can only be done when the bit-size of the vector element is a multiple of 8 (as we do not support any vectors made up of fractional byte elements).

There are instances where this approach produces worse code. This is exposed in some of the functions in the vsx.ll test case. To address one of those, this patch implements a DAG combine for a conversion of a v2i32 to v2f64 so that it remains a 2-instruction sequence. Similar DAG combines can later be implemented for other affected code patterns.

I've done some lightweight performance testing (the LNT tests on a quiet machine) and here are the results with benchmarks that take less than 5s to execute omitted (BEFORE_TIME is without this patch and AFTER_TIME is with this patch):

BENCHMARK_NAME                                                                            BEFORE_TIME AFTER_TIME  ABS_DIFF  PCT_DIFF
MultiSource/Applications/SPASS/Output/SPASS                                                    6.050     6.049    -0.001     -0.02%
MultiSource/Applications/JM/lencod/Output/lencod                                               5.102     5.138     0.036      0.70%
MultiSource/Applications/lambda-0.1.3/Output/lambda                                            5.891     5.889    -0.002     -0.04%
MultiSource/Applications/hexxagon/Output/hexxagon                                             13.814    13.808    -0.007     -0.05%
MultiSource/Applications/lua/Output/lua                                                       22.924    22.919    -0.004     -0.02%
MultiSource/Benchmarks/SciMark2-C/Output/scimark2                                             72.514    72.508    -0.006     -0.01%
MultiSource/Benchmarks/nbench/Output/nbench                                                   19.489    19.189    -0.300     -1.54%
MultiSource/Benchmarks/NPB-serial/is/Output/is                                                10.415    10.417     0.002      0.02%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/Output/IRSmk                                         32.249    32.987     0.739      2.29%
MultiSource/Benchmarks/ASC_Sequoia/AMGmk/Output/AMGmk                                         10.481    10.482     0.000      0.00%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/Output/CrystalMk                                  7.537     7.537    -0.001     -0.01%
MultiSource/Benchmarks/TSVC/Reductions-flt/Output/Reductions-flt                               5.548     5.548     0.000      0.00%
MultiSource/Benchmarks/TSVC/Searching-flt/Output/Searching-flt                                 6.452     6.452    -0.001     -0.01%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/Output/GlobalDataFlow-dbl                       6.823     6.825     0.002      0.03%
MultiSource/Benchmarks/TSVC/Reductions-dbl/Output/Reductions-dbl                               5.509     5.509     0.001      0.01%
MultiSource/Benchmarks/TSVC/Searching-dbl/Output/Searching-dbl                                 6.253     6.253     0.000      0.00%
MultiSource/Benchmarks/PAQ8p/Output/paq8p                                                    110.282    73.201   -37.080    -33.62%
MultiSource/Benchmarks/Bullet/Output/bullet                                                    5.242     5.209    -0.033     -0.63%
MultiSource/Benchmarks/7zip/Output/7zip-benchmark                                              7.853     7.842    -0.011     -0.14%
MultiSource/Benchmarks/mafft/Output/pairlocalalign                                            26.034    26.056     0.022      0.08%
SingleSource/Benchmarks/CoyoteBench/Output/almabench                                          30.836    30.827    -0.009     -0.03%
SingleSource/Benchmarks/CoyoteBench/Output/huffbench                                          21.345    21.350     0.005      0.02%
SingleSource/Benchmarks/Shootout/Output/sieve                                                  5.268     5.267    -0.001     -0.01%
SingleSource/Benchmarks/Shootout/Output/lists                                                  7.607     7.605    -0.002     -0.03%
SingleSource/Benchmarks/Shootout/Output/methcall                                              10.474    10.474     0.000      0.00%
SingleSource/Benchmarks/Shootout-C++/Output/lists                                             11.707    11.733     0.026      0.22%
SingleSource/Benchmarks/Shootout-C++/Output/methcall                                          11.177    11.175    -0.002     -0.02%
SingleSource/Benchmarks/Misc/Output/flops                                                      6.340     6.343     0.003      0.05%
SingleSource/Benchmarks/Misc/Output/salsa20                                                    6.292     6.291    -0.001     -0.01%
SingleSource/Benchmarks/Misc/Output/ReedSolomon                                                6.903     6.905     0.002      0.03%
SingleSource/Benchmarks/Misc-C++/Large/Output/sphereflake                                      6.154     6.154    -0.000     -0.00%
SingleSource/Benchmarks/Misc-C++/Output/stepanov_v1p2                                          9.809     9.809    -0.000     -0.00%
SingleSource/Benchmarks/Adobe-C++/Output/stepanov_abstraction                                  5.733     5.733    -0.000     -0.01%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/Output/2mm                       18.724    16.745    -1.980    -10.57%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/Output/3mm                       29.825    28.844    -0.981     -3.29%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/Output/gemm                      8.889     8.518    -0.371     -4.18%
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/Output/symm                     13.260    14.198     0.937      7.07%
SingleSource/Benchmarks/Linpack/Output/linpack-pc                                             13.215    13.214    -0.001     -0.01%
SingleSource/Benchmarks/SmallPT/Output/smallpt                                                12.783    12.781    -0.001     -0.01%
SingleSource/UnitTests/Vector/Altivec/Output/alti.expandfft                                   23.281    23.291     0.009      0.04%
SingleSource/UnitTests/Vectorizer/Output/gcc-loops                                             7.511     7.511     0.000      0.00%

amehsan added inline comments.Jun 2 2016, 2:48 PM

lib/Target/PowerPC/PPCISelLowering.cpp
10307	This can be sunk further down, right before where we actually use it.
10308–10309	If I am reading everything correctly you need to add some code here, to prevent an assertion. First of all the comment in include/llvm/CodeGen/ISDOpcodes.h above definition of EXTRACT_VECTOR_ELT says that index into the vector might be variable. Implementation of getConstantOperandVal uses a cast<ConstantSDNode> which "causing an assertion failure if it is not really an instance of the right type" according to http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates

nemanjai added inline comments.Jun 3 2016, 2:29 AM

lib/Target/PowerPC/PPCISelLowering.cpp
10308–10309	Yes, that is a very good point. I forgot to add a check that a node from which I'm extracting a constant is actually a constant node :). Will update the patch now.

I had forgotten to check whether the operands of the extractelement nodes are actually constants which can cause asserts. Added the checks now.

nemanjai added inline comments.Jun 7 2016, 7:21 AM

lib/Target/PowerPC/PPCISelLowering.cpp
10302	If there are no further requests, I won't post another review for this, but the final patch will do away with these "isa" calls and replace them with dyn_cast calls and the corresponding getConstantOperandVal() calls will be replaced with getZExtValue() calls on the resulting ConstantSDNode's.

As we discussed, before you commit the change, please add -verify-machineinstrs to your regression tests. No need to upload the patch again. Thanks.

nemanjai retitled this revision from [PowerPC] - Combine loads of v4i8 to loads of i32 followed by bitcast to [PowerPC] - Legalize illegal vector types by widening rather than integer promotion.Jun 20 2016, 1:54 PM

nemanjai edited edge metadata.

Just a friendly reminder that it would be nice to get this patch into trunk before we branch off 3.9. Please review and comment accordingly.

I apologize for the delay. Some minor requests below, but otherwise, LGTM.

lib/Target/PowerPC/PPCISelLowering.h
141	This name does not make it clear that you're converting to FP values. Let's name these SINT_VEC_TO_FP and UINT_VEC_TO_FP. Also note there that this is used for illegal vector types.
443	Can you say something about why it is worse. N years from now someone will wonder whether this is still true ;)
lib/Target/PowerPC/PPCInstrVSX.td
72	The existing naming convention here seems to have lower-case latter parts. Not sure why, but we might as well follow it for now. def PPCsvec2fp : ... def PPCuvec2fp : ...

This revision is now accepted and ready to land.Jul 4 2016, 5:48 AM

Committed revision 274535.

lib/Target/PowerPC/PPCISelLowering.h
443	Thank you. Will do (and I'll fix the typo - they're not "vectory" types :)).

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelLowering.h

16 lines

PPCISelLowering.cpp

55 lines

PPCInstrVSX.td

18 lines

test/

Analysis/

CostModel/

PowerPC/

load_store.ll

2 lines

CodeGen/

PowerPC/

load-v4i8-improved.ll

23 lines

p8altivec-shuffles-pred.ll

3 lines

vec_cmp.ll

6 lines

vsx.ll

94 lines

Diff 59512

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
MFVSR,		MFVSR,

/// Direct move from a GPR to a VSX register (algebraic)		/// Direct move from a GPR to a VSX register (algebraic)
MTVSRA,		MTVSRA,

/// Direct move from a GPR to a VSX register (zero)		/// Direct move from a GPR to a VSX register (zero)
MTVSRZ,		MTVSRZ,

		/// Extract a subvector from signed integer vector and convert to FP
		SINT_VEC_TO_VEC,
		hfinkelUnsubmitted Not Done Reply Inline Actions This name does not make it clear that you're converting to FP values. Let's name these SINT_VEC_TO_FP and UINT_VEC_TO_FP. Also note there that this is used for illegal vector types. hfinkel: This name does not make it clear that you're converting to FP values. Let's name these…

		/// Extract a subvector from unsigned integer vector and convert to FP
		UINT_VEC_TO_VEC,

// FIXME: Remove these once the ANDI glue bug is fixed:		// FIXME: Remove these once the ANDI glue bug is fixed:
/// i1 = ANDIo_1_[EQ\|GT]_BIT(i32 or i64 x) - Represents the result of the		/// i1 = ANDIo_1_[EQ\|GT]_BIT(i32 or i64 x) - Represents the result of the
/// eq or gt bit of CR0 after executing andi. x, 1. This is used to		/// eq or gt bit of CR0 after executing andi. x, 1. This is used to
/// implement truncation of i32 or i64 to i1.		/// implement truncation of i32 or i64 to i1.
ANDIo_1_EQ_BIT, ANDIo_1_GT_BIT,		ANDIo_1_EQ_BIT, ANDIo_1_GT_BIT,

// READ_TIME_BASE - A read of the 64-bit time-base register on a 32-bit		// READ_TIME_BASE - A read of the 64-bit time-base register on a 32-bit
// target (returns (Lo, Hi)). It takes a chain operand.		// target (returns (Lo, Hi)). It takes a chain operand.
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	namespace llvm {
public:		public:
explicit PPCTargetLowering(const PPCTargetMachine &TM,		explicit PPCTargetLowering(const PPCTargetMachine &TM,
const PPCSubtarget &STI);		const PPCSubtarget &STI);

/// getTargetNodeName() - This method returns the name of a target specific		/// getTargetNodeName() - This method returns the name of a target specific
/// DAG node.		/// DAG node.
const char *getTargetNodeName(unsigned Opcode) const override;		const char *getTargetNodeName(unsigned Opcode) const override;

		/// LegalizeTypeAction - The code we generate when illegal vector types are
		/// legalized by promoting the integer element type is much worse than code
		/// we generate if we widen the type for applicable vectory types.
		hfinkelUnsubmitted Not Done Reply Inline Actions Can you say something about why it is worse. N years from now someone will wonder whether this is still true ;) hfinkel: Can you say something about why it is worse. N years from now someone will wonder whether this…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Thank you. Will do (and I'll fix the typo - they're not "vectory" types :)). nemanjai: Thank you. Will do (and I'll fix the typo - they're not "vectory" types :)).
		TargetLoweringBase::LegalizeTypeAction getPreferredVectorAction(EVT VT)
		const override {
		if (VT.getVectorElementType().getSizeInBits() % 8 == 0)
		return TypeWidenVector;
		return TargetLoweringBase::getPreferredVectorAction(VT);
		}
bool useSoftFloat() const override;		bool useSoftFloat() const override;

MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {		MVT getScalarShiftAmountTy(const DataLayout &, EVT) const override {
return MVT::i32;		return MVT::i32;
}		}

bool isCheapToSpeculateCttz() const override {		bool isCheapToSpeculateCttz() const override {
return true;		return true;
▲ Show 20 Lines • Show All 449 Lines • ▼ Show 20 Lines	LowerCall_32SVR4(SDValue Chain, SDValue Callee, CallingConv::ID CallConv,
SDLoc dl, SelectionDAG &DAG,		SDLoc dl, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals,		SmallVectorImpl<SDValue> &InVals,
ImmutableCallSite *CS) const;		ImmutableCallSite *CS) const;

SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;

SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;
		SDValue DAGCombineBuildVector(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineFPToIntToFP(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineFPToIntToFP(SDNode *N, DAGCombinerInfo &DCI) const;

SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps,		unsigned &RefinementSteps,
bool &UseOneConstNR) const override;		bool &UseOneConstNR) const override;
SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,		SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,
unsigned &RefinementSteps) const override;		unsigned &RefinementSteps) const override;
Show All 29 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 845 Lines • ▼ Show 20 Lines	if (!isPPC64) {
setLibcallName(RTLIB::SRL_I128, nullptr);		setLibcallName(RTLIB::SRL_I128, nullptr);
setLibcallName(RTLIB::SRA_I128, nullptr);		setLibcallName(RTLIB::SRA_I128, nullptr);
}		}

setStackPointerRegisterToSaveRestore(isPPC64 ? PPC::X1 : PPC::R1);		setStackPointerRegisterToSaveRestore(isPPC64 ? PPC::X1 : PPC::R1);

// We have target-specific dag combine patterns for the following nodes:		// We have target-specific dag combine patterns for the following nodes:
setTargetDAGCombine(ISD::SINT_TO_FP);		setTargetDAGCombine(ISD::SINT_TO_FP);
		setTargetDAGCombine(ISD::BUILD_VECTOR);
if (Subtarget.hasFPCVT())		if (Subtarget.hasFPCVT())
setTargetDAGCombine(ISD::UINT_TO_FP);		setTargetDAGCombine(ISD::UINT_TO_FP);
setTargetDAGCombine(ISD::LOAD);		setTargetDAGCombine(ISD::LOAD);
setTargetDAGCombine(ISD::STORE);		setTargetDAGCombine(ISD::STORE);
setTargetDAGCombine(ISD::BR_CC);		setTargetDAGCombine(ISD::BR_CC);
if (Subtarget.useCRBits())		if (Subtarget.useCRBits())
setTargetDAGCombine(ISD::BRCOND);		setTargetDAGCombine(ISD::BRCOND);
setTargetDAGCombine(ISD::BSWAP);		setTargetDAGCombine(ISD::BSWAP);
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::RET_FLAG: return "PPCISD::RET_FLAG";		case PPCISD::RET_FLAG: return "PPCISD::RET_FLAG";
case PPCISD::READ_TIME_BASE: return "PPCISD::READ_TIME_BASE";		case PPCISD::READ_TIME_BASE: return "PPCISD::READ_TIME_BASE";
case PPCISD::EH_SJLJ_SETJMP: return "PPCISD::EH_SJLJ_SETJMP";		case PPCISD::EH_SJLJ_SETJMP: return "PPCISD::EH_SJLJ_SETJMP";
case PPCISD::EH_SJLJ_LONGJMP: return "PPCISD::EH_SJLJ_LONGJMP";		case PPCISD::EH_SJLJ_LONGJMP: return "PPCISD::EH_SJLJ_LONGJMP";
case PPCISD::MFOCRF: return "PPCISD::MFOCRF";		case PPCISD::MFOCRF: return "PPCISD::MFOCRF";
case PPCISD::MFVSR: return "PPCISD::MFVSR";		case PPCISD::MFVSR: return "PPCISD::MFVSR";
case PPCISD::MTVSRA: return "PPCISD::MTVSRA";		case PPCISD::MTVSRA: return "PPCISD::MTVSRA";
case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ";		case PPCISD::MTVSRZ: return "PPCISD::MTVSRZ";
		case PPCISD::SINT_VEC_TO_VEC: return "PPCISD::SINT_VEC_TO_VEC";
		case PPCISD::UINT_VEC_TO_VEC: return "PPCISD::UINT_VEC_TO_VEC";
case PPCISD::ANDIo_1_EQ_BIT: return "PPCISD::ANDIo_1_EQ_BIT";		case PPCISD::ANDIo_1_EQ_BIT: return "PPCISD::ANDIo_1_EQ_BIT";
case PPCISD::ANDIo_1_GT_BIT: return "PPCISD::ANDIo_1_GT_BIT";		case PPCISD::ANDIo_1_GT_BIT: return "PPCISD::ANDIo_1_GT_BIT";
case PPCISD::VCMP: return "PPCISD::VCMP";		case PPCISD::VCMP: return "PPCISD::VCMP";
case PPCISD::VCMPo: return "PPCISD::VCMPo";		case PPCISD::VCMPo: return "PPCISD::VCMPo";
case PPCISD::LBRX: return "PPCISD::LBRX";		case PPCISD::LBRX: return "PPCISD::LBRX";
case PPCISD::STBRX: return "PPCISD::STBRX";		case PPCISD::STBRX: return "PPCISD::STBRX";
case PPCISD::LFIWAX: return "PPCISD::LFIWAX";		case PPCISD::LFIWAX: return "PPCISD::LFIWAX";
case PPCISD::LFIWZX: return "PPCISD::LFIWZX";		case PPCISD::LFIWZX: return "PPCISD::LFIWZX";
▲ Show 20 Lines • Show All 9,212 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::DAGCombineExtBoolTrunc(SDNode *N,
SDValue ShiftCst =		SDValue ShiftCst =
DAG.getConstant(N->getValueSizeInBits(0) - PromBits, dl, ShiftAmountTy);		DAG.getConstant(N->getValueSizeInBits(0) - PromBits, dl, ShiftAmountTy);
return DAG.getNode(		return DAG.getNode(
ISD::SRA, dl, N->getValueType(0),		ISD::SRA, dl, N->getValueType(0),
DAG.getNode(ISD::SHL, dl, N->getValueType(0), N->getOperand(0), ShiftCst),		DAG.getNode(ISD::SHL, dl, N->getValueType(0), N->getOperand(0), ShiftCst),
ShiftCst);		ShiftCst);
}		}

		SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
		DAGCombinerInfo &DCI) const {
		assert(N->getOpcode() == ISD::BUILD_VECTOR &&
		"Should be called with a BUILD_VECTOR node");

		SelectionDAG &DAG = DCI.DAG;
		SDLoc dl(N);
		if (N->getValueType(0) != MVT::v2f64 \|\| !Subtarget.hasVSX())
		return SDValue();

		// Looking for:
		// (build_vector ([su]int_to_fp (extractelt 0)), [su]int_to_fp (extractelt 1))
		if (N->getOperand(0).getOpcode() != ISD::SINT_TO_FP &&
		N->getOperand(0).getOpcode() != ISD::UINT_TO_FP)
		return SDValue();
		if (N->getOperand(1).getOpcode() != ISD::SINT_TO_FP &&
		N->getOperand(1).getOpcode() != ISD::UINT_TO_FP)
		return SDValue();
		if (N->getOperand(0).getOpcode() != N->getOperand(1).getOpcode())
		return SDValue();

		SDValue Ext1 = N->getOperand(0).getOperand(0);
		SDValue Ext2 = N->getOperand(1).getOperand(0);
		if(Ext1.getOpcode() != ISD::EXTRACT_VECTOR_ELT \|\|
		Ext2.getOpcode() != ISD::EXTRACT_VECTOR_ELT \|\|
		!isa<ConstantSDNode>(Ext1.getOperand(1)) \|\|
		!isa<ConstantSDNode>(Ext2.getOperand(1)))
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions If there are no further requests, I won't post another review for this, but the final patch will do away with these "isa" calls and replace them with dyn_cast calls and the corresponding getConstantOperandVal() calls will be replaced with getZExtValue() calls on the resulting ConstantSDNode's. nemanjai: If there are no further requests, I won't post another review for this, but the final patch…
		return SDValue();
		if (Ext1.getValueType() != MVT::i32 \|\|
		Ext2.getValueType() != MVT::i32)
		if (Ext1.getOperand(0) != Ext2.getOperand(0))
		return SDValue();
		amehsanUnsubmitted Not Done Reply Inline Actions This can be sunk further down, right before where we actually use it. amehsan: This can be sunk further down, right before where we actually use it.

		int FirstElem = Ext1.getConstantOperandVal(1);
		amehsanUnsubmitted Not Done Reply Inline Actions If I am reading everything correctly you need to add some code here, to prevent an assertion. First of all the comment in include/llvm/CodeGen/ISDOpcodes.h above definition of EXTRACT_VECTOR_ELT says that index into the vector might be variable. Implementation of getConstantOperandVal uses a cast<ConstantSDNode> which "causing an assertion failure if it is not really an instance of the right type" according to http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates amehsan: If I am reading everything correctly you need to add some code here, to prevent an assertion.
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Yes, that is a very good point. I forgot to add a check that a node from which I'm extracting a constant is actually a constant node :). Will update the patch now. nemanjai: Yes, that is a very good point. I forgot to add a check that a node from which I'm extracting a…
		int SecondElem = Ext2.getConstantOperandVal(1);
		int SubvecIdx;
		if (FirstElem == 0 && SecondElem == 1)
		SubvecIdx = Subtarget.isLittleEndian() ? 1 : 0;
		else if (FirstElem == 2 && SecondElem == 3)
		SubvecIdx = Subtarget.isLittleEndian() ? 0 : 1;
		else
		return SDValue();

		SDValue SrcVec = Ext1.getOperand(0);
		auto NodeType = (N->getOperand(1).getOpcode() == ISD::SINT_TO_FP) ?
		PPCISD::SINT_VEC_TO_VEC : PPCISD::UINT_VEC_TO_VEC;
		return DAG.getNode(NodeType, dl, MVT::v2f64,
		SrcVec, DAG.getIntPtrConstant(SubvecIdx, dl));
		}

SDValue PPCTargetLowering::combineFPToIntToFP(SDNode *N,		SDValue PPCTargetLowering::combineFPToIntToFP(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
assert((N->getOpcode() == ISD::SINT_TO_FP \|\|		assert((N->getOpcode() == ISD::SINT_TO_FP \|\|
N->getOpcode() == ISD::UINT_TO_FP) &&		N->getOpcode() == ISD::UINT_TO_FP) &&
"Need an int -> FP conversion node here");		"Need an int -> FP conversion node here");

if (!Subtarget.has64BitSupport())		if (!Subtarget.has64BitSupport())
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	case ISD::LOAD: {

// For little endian, VSX loads require generating lxvd2x/xxswapd.		// For little endian, VSX loads require generating lxvd2x/xxswapd.
if (VT.isSimple()) {		if (VT.isSimple()) {
MVT LoadVT = VT.getSimpleVT();		MVT LoadVT = VT.getSimpleVT();
if (Subtarget.hasVSX() && Subtarget.isLittleEndian() &&		if (Subtarget.hasVSX() && Subtarget.isLittleEndian() &&
(LoadVT == MVT::v2f64 \|\| LoadVT == MVT::v2i64 \|\|		(LoadVT == MVT::v2f64 \|\| LoadVT == MVT::v2i64 \|\|
LoadVT == MVT::v4f32 \|\| LoadVT == MVT::v4i32))		LoadVT == MVT::v4f32 \|\| LoadVT == MVT::v4i32))
return expandVSXLoadForLE(N, DCI);		return expandVSXLoadForLE(N, DCI);
}		}

// We sometimes end up with a 64-bit integer load, from which we extract		// We sometimes end up with a 64-bit integer load, from which we extract
// two single-precision floating-point numbers. This happens with		// two single-precision floating-point numbers. This happens with
// std::complex<float>, and other similar structures, because of the way we		// std::complex<float>, and other similar structures, because of the way we
// canonicalize structure copies. However, if we lack direct moves,		// canonicalize structure copies. However, if we lack direct moves,
// then the final bitcasts from the extracted integer values to the		// then the final bitcasts from the extracted integer values to the
// floating-point numbers turn into store/load pairs. Even with direct moves,		// floating-point numbers turn into store/load pairs. Even with direct moves,
// just loading the two floating-point numbers is likely better.		// just loading the two floating-point numbers is likely better.
auto ReplaceTwoFloatLoad = [&]() {		auto ReplaceTwoFloatLoad = [&]() {
if (VT != MVT::i64)		if (VT != MVT::i64)
return false;		return false;

if (LD->getExtensionType() != ISD::NON_EXTLOAD \|\|		if (LD->getExtensionType() != ISD::NON_EXTLOAD \|\|
		amehsanUnsubmitted Not Done Reply Inline Actions Instead of fixing the issue for this particular pattern, can't we change type legalization, so that it always converts v4i8 to i32? This fixes the problem at hand, but if we have a different code pattern that includes v4i8, the current way of legalizing v4i8 in type legalization will kick in which seems to generate inefficient code, by promoting it to a larger vector and adding permutes and similar instructions. I think we may need to change how type legalization handles v4i8, instead of fixing this particular pattern. amehsan: Instead of fixing the issue for this particular pattern, can't we change type legalization, so…
		nemanjaiAuthorUnsubmitted Not Done Reply Inline Actions Actually, you bring up a very good point. We really should be doing something better with legalization of v4i8 (and I imagine all vectors narrower than vectors our hardware actually handles). However, I don't think we can legalize it as a scalar type. We should actually be widening the vector (rather than promoting the integer element type). I'll re-post this patch to do that instead. nemanjai: Actually, you bring up a very good point. We really should be doing something better with…
		hfinkelUnsubmitted Not Done Reply Inline Actions To this point, we currently have code like this: // We promote all non-typed operations to v4i32. setOperationAction(ISD::AND , VT, Promote); AddPromotedToType (ISD::AND , VT, MVT::v4i32); setOperationAction(ISD::OR , VT, Promote); AddPromotedToType (ISD::OR , VT, MVT::v4i32); setOperationAction(ISD::XOR , VT, Promote); AddPromotedToType (ISD::XOR , VT, MVT::v4i32); setOperationAction(ISD::LOAD , VT, Promote); AddPromotedToType (ISD::LOAD , VT, MVT::v4i32); setOperationAction(ISD::SELECT, VT, Promote); AddPromotedToType (ISD::SELECT, VT, MVT::v4i32); setOperationAction(ISD::SELECT_CC, VT, Promote); AddPromotedToType (ISD::SELECT_CC, VT, MVT::v4i32); setOperationAction(ISD::STORE, VT, Promote); AddPromotedToType (ISD::STORE, VT, MVT::v4i32); maybe something similar would work in this area as well? hfinkel: To this point, we currently have code like this: // We promote all non-typed operations…
LD->isVolatile())		LD->isVolatile())
return false;		return false;

// We're looking for a sequence like this:		// We're looking for a sequence like this:
// t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64		// t13: i64,ch = load<LD8[%ref.tmp]> t0, t6, undef:i64
// t16: i64 = srl t13, Constant:i32<32>		// t16: i64 = srl t13, Constant:i32<32>
// t17: i32 = truncate t16		// t17: i32 = truncate t16
// t18: f32 = bitcast t17		// t18: f32 = bitcast t17
▲ Show 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	if (LHS.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&

return DAG.getNode(PPCISD::COND_BRANCH, dl, MVT::Other, N->getOperand(0),		return DAG.getNode(PPCISD::COND_BRANCH, dl, MVT::Other, N->getOperand(0),
DAG.getConstant(CompOpc, dl, MVT::i32),		DAG.getConstant(CompOpc, dl, MVT::i32),
DAG.getRegister(PPC::CR6, MVT::i32),		DAG.getRegister(PPC::CR6, MVT::i32),
N->getOperand(4), CompNode.getValue(1));		N->getOperand(4), CompNode.getValue(1));
}		}
break;		break;
}		}
		case ISD::BUILD_VECTOR:
		return DAGCombineBuildVector(N, DCI);
}		}

return SDValue();		return SDValue();
}		}

SDValue		SDValue
PPCTargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,		PPCTargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
SelectionDAG &DAG,		SelectionDAG &DAG,
▲ Show 20 Lines • Show All 941 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	def SDT_PPClxvd2x : SDTypeProfile<1, 1, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [		def SDT_PPCstxvd2x : SDTypeProfile<0, 2, [
SDTCisVT<0, v2f64>, SDTCisPtrTy<1>		SDTCisVT<0, v2f64>, SDTCisPtrTy<1>
]>;		]>;
def SDT_PPCxxswapd : SDTypeProfile<1, 1, [		def SDT_PPCxxswapd : SDTypeProfile<1, 1, [
SDTCisSameAs<0, 1>		SDTCisSameAs<0, 1>
]>;		]>;
		def SDTVecConv : SDTypeProfile<1, 2, [
		SDTCisVec<0>, SDTCisVec<1>, SDTCisPtrTy<2>
		]>;

def PPClxvd2x : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,		def PPClxvd2x : SDNode<"PPCISD::LXVD2X", SDT_PPClxvd2x,
[SDNPHasChain, SDNPMayLoad]>;		[SDNPHasChain, SDNPMayLoad]>;
def PPCstxvd2x : SDNode<"PPCISD::STXVD2X", SDT_PPCstxvd2x,		def PPCstxvd2x : SDNode<"PPCISD::STXVD2X", SDT_PPCstxvd2x,
[SDNPHasChain, SDNPMayStore]>;		[SDNPHasChain, SDNPMayStore]>;
def PPCxxswapd : SDNode<"PPCISD::XXSWAPD", SDT_PPCxxswapd, [SDNPHasChain]>;		def PPCxxswapd : SDNode<"PPCISD::XXSWAPD", SDT_PPCxxswapd, [SDNPHasChain]>;
def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;		def PPCmfvsr : SDNode<"PPCISD::MFVSR", SDTUnaryOp, []>;
def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;		def PPCmtvsra : SDNode<"PPCISD::MTVSRA", SDTUnaryOp, []>;
def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;		def PPCmtvsrz : SDNode<"PPCISD::MTVSRZ", SDTUnaryOp, []>;
		def PPCSV2V : SDNode<"PPCISD::SINT_VEC_TO_VEC", SDTVecConv, []>;
		hfinkelUnsubmitted Not Done Reply Inline Actions The existing naming convention here seems to have lower-case latter parts. Not sure why, but we might as well follow it for now. def PPCsvec2fp : ... def PPCuvec2fp : ... hfinkel: The existing naming convention here seems to have lower-case latter parts. Not sure why, but we…
		def PPCUV2V : SDNode<"PPCISD::UINT_VEC_TO_VEC", SDTVecConv, []>;

multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,		multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,
string asmstr, InstrItinClass itin, Intrinsic Int,		string asmstr, InstrItinClass itin, Intrinsic Int,
ValueType OutTy, ValueType InTy> {		ValueType OutTy, ValueType InTy> {
let BaseName = asmbase in {		let BaseName = asmbase in {
def NAME : XX3Form_Rc<opcode, xo, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		def NAME : XX3Form_Rc<opcode, xo, (outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
!strconcat(asmbase, !strconcat(" ", asmstr)), itin,		!strconcat(asmbase, !strconcat(" ", asmstr)), itin,
[(set OutTy:$XT, (Int InTy:$XA, InTy:$XB))]>;		[(set OutTy:$XT, (Int InTy:$XA, InTy:$XB))]>;
▲ Show 20 Lines • Show All 526 Lines • ▼ Show 20 Lines	let Uses = [RM] in {
def XVCVSXDSP : XX2Form<60, 440,		def XVCVSXDSP : XX2Form<60, 440,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxdsp $XT, $XB", IIC_VecFP, []>;		"xvcvsxdsp $XT, $XB", IIC_VecFP, []>;
def XVCVSXWDP : XX2Form<60, 248,		def XVCVSXWDP : XX2Form<60, 248,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxwdp $XT, $XB", IIC_VecFP, []>;		"xvcvsxwdp $XT, $XB", IIC_VecFP, []>;
def XVCVSXWSP : XX2Form<60, 184,		def XVCVSXWSP : XX2Form<60, 184,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxwsp $XT, $XB", IIC_VecFP, []>;		"xvcvsxwsp $XT, $XB", IIC_VecFP,
		[(set v4f32:$XT, (sint_to_fp v4i32:$XB))]>;
def XVCVUXDDP : XX2Form<60, 488,		def XVCVUXDDP : XX2Form<60, 488,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvuxddp $XT, $XB", IIC_VecFP,		"xvcvuxddp $XT, $XB", IIC_VecFP,
[(set v2f64:$XT, (uint_to_fp v2i64:$XB))]>;		[(set v2f64:$XT, (uint_to_fp v2i64:$XB))]>;
def XVCVUXDSP : XX2Form<60, 424,		def XVCVUXDSP : XX2Form<60, 424,
(outs vsrc:$XT), (ins vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvuxdsp $XT, $XB", IIC_VecFP, []>;		"xvcvuxdsp $XT, $XB", IIC_VecFP, []>;
def XVCVUXWDP : XX2Form<60, 232,		def XVCVUXWDP : XX2Form<60, 232,
▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines
// \| i32 \| undef \| i32 \| undef \|		// \| i32 \| undef \| i32 \| undef \|
// so we need to shift everything to the left by one i32 (word) before		// so we need to shift everything to the left by one i32 (word) before
// the conversion.		// the conversion.
def : Pat<(sext_inreg v2i64:$C, v2i32),		def : Pat<(sext_inreg v2i64:$C, v2i32),
(XVCVDPSXDS (XVCVSXWDP (XXSLDWI $C, $C, 1)))>;		(XVCVDPSXDS (XVCVSXWDP (XXSLDWI $C, $C, 1)))>;
def : Pat<(v2f64 (sint_to_fp (sext_inreg v2i64:$C, v2i32))),		def : Pat<(v2f64 (sint_to_fp (sext_inreg v2i64:$C, v2i32))),
(XVCVSXWDP (XXSLDWI $C, $C, 1))>;		(XVCVSXWDP (XXSLDWI $C, $C, 1))>;

		def : Pat<(v2f64 (PPCSV2V v4i32:$C, 0)),
		(v2f64 (XVCVSXWDP (v2i64 (XXMRGHW $C, $C))))>;
		def : Pat<(v2f64 (PPCSV2V v4i32:$C, 1)),
		(v2f64 (XVCVSXWDP (v2i64 (XXMRGLW $C, $C))))>;

		def : Pat<(v2f64 (PPCUV2V v4i32:$C, 0)),
		(v2f64 (XVCVUXWDP (v2i64 (XXMRGHW $C, $C))))>;
		def : Pat<(v2f64 (PPCUV2V v4i32:$C, 1)),
		(v2f64 (XVCVUXWDP (v2i64 (XXMRGLW $C, $C))))>;

// Loads.		// Loads.
def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;		def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;
def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;		def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;

// Stores.		// Stores.
def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),		def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
▲ Show 20 Lines • Show All 1,220 Lines • Show Last 20 Lines

test/Analysis/CostModel/PowerPC/load_store.ll

Show All 25 Lines	define i32 @loads(i32 %arg) {
load i32, i32* undef, align 4		load i32, i32* undef, align 4
; CHECK: cost of 2 {{.*}} load		; CHECK: cost of 2 {{.*}} load
load i64, i64* undef, align 4		load i64, i64* undef, align 4
; CHECK: cost of 4 {{.*}} load		; CHECK: cost of 4 {{.*}} load
load i128, i128* undef, align 4		load i128, i128* undef, align 4

; FIXME: There actually are sub-vector Altivec loads, and so we could handle		; FIXME: There actually are sub-vector Altivec loads, and so we could handle
; this with a small expense, but we don't currently.		; this with a small expense, but we don't currently.
; CHECK: cost of 48 {{.*}} load		; CHECK: cost of 42 {{.*}} load
load <4 x i16>, <4 x i16>* undef, align 2		load <4 x i16>, <4 x i16>* undef, align 2

; CHECK: cost of 2 {{.*}} load		; CHECK: cost of 2 {{.*}} load
load <4 x i32>, <4 x i32>* undef, align 4		load <4 x i32>, <4 x i32>* undef, align 4

; CHECK: cost of 46 {{.*}} load		; CHECK: cost of 46 {{.*}} load
load <3 x float>, <3 x float>* undef, align 1		load <3 x float>, <3 x float>* undef, align 1

ret i32 undef		ret i32 undef
}		}

test/CodeGen/PowerPC/load-v4i8-improved.ll

				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu < %s \| FileCheck \
				; RUN: -implicit-check-not vmrg -implicit-check-not=vperm %s
				; RUN: llc -mcpu=pwr8 -mtriple=powerpc64-unknown-linux-gnu < %s \| FileCheck \
				; RUN: -implicit-check-not vmrg -implicit-check-not=vperm %s \
				; RUN: --check-prefix=CHECK-BE

				define <16 x i8> @test(i32* %s, i32* %t) {
				entry:
				%0 = bitcast i32* %s to <4 x i8>*
				%1 = load <4 x i8>, <4 x i8>* %0, align 4
				%2 = shufflevector <4 x i8> %1, <4 x i8> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
				ret <16 x i8> %2
				; CHECK-LABEL: test
				; CHECK: lwz [[GPR:[0-9]+]], 0(3)
				; CHECK: mtvsrd [[VSR:[0-9]+]], [[GPR]]
				; CHECK: xxswapd [[SWP:[0-9]+]], [[VSR]]
				; CHECK: xxspltw 34, [[SWP]], 3
				; CHECK-BE-LABEL: test
				; CHECK-BE: lwz [[GPR:[0-9]+]], 0(3)
				; CHECK-BE: sldi [[SHL:[0-9]+]], [[GPR]], 32
				; CHECK-BE: mtvsrd [[VSR:[0-9]+]], [[SHL]]
				; CHECK-BE: xxspltw 34, [[VSR]], 0
				}

test/CodeGen/PowerPC/p8altivec-shuffles-pred.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define <2 x i32> @test1(<4 x i32> %wide.vec) #0 {			define <2 x i32> @test1(<4 x i32> %wide.vec) #0 {
	entry:			entry:
	%strided.vec = shufflevector <4 x i32> %wide.vec, <4 x i32> undef, <2 x i32> <i32 0, i32 2>			%strided.vec = shufflevector <4 x i32> %wide.vec, <4 x i32> undef, <2 x i32> <i32 0, i32 2>
	ret <2 x i32> %strided.vec			ret <2 x i32> %strided.vec

	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: vsldoi 2, 2, 2, 12			; CHECK: vsldoi [[TGT:[0-9]+]], 2, 2, 8
				; CHECK: vmrghw 2, 2, [[TGT]]
	; CHECK: blr			; CHECK: blr
	}			}

	; Function Attrs: nounwind			; Function Attrs: nounwind
	define <16 x i8> @test2(<16 x i8> %wide.vec) #0 {			define <16 x i8> @test2(<16 x i8> %wide.vec) #0 {
	entry:			entry:
	%strided.vec = shufflevector <16 x i8> %wide.vec, <16 x i8> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 9, i32 10, i32 11>			%strided.vec = shufflevector <16 x i8> %wide.vec, <16 x i8> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 9, i32 10, i32 11>
	ret <16 x i8> %strided.vec			ret <16 x i8> %strided.vec

	; CHECK-LABEL: @test2			; CHECK-LABEL: @test2
	; CHECK: vsldoi 2, 2, 2, 12			; CHECK: vsldoi 2, 2, 2, 12
	; CHECK: blr			; CHECK: blr
	}			}

	attributes #0 = { nounwind "target-cpu"="pwr7" }			attributes #0 = { nounwind "target-cpu"="pwr7" }

test/CodeGen/PowerPC/vec_cmp.ll

	Show All 18 Lines


	define <4 x i8> @v4si8_cmp(<4 x i8> %x, <4 x i8> %y) nounwind readnone {			define <4 x i8> @v4si8_cmp(<4 x i8> %x, <4 x i8> %y) nounwind readnone {
	%cmp = icmp eq <4 x i8> %x, %y			%cmp = icmp eq <4 x i8> %x, %y
	%sext = sext <4 x i1> %cmp to <4 x i8>			%sext = sext <4 x i1> %cmp to <4 x i8>
	ret <4 x i8> %sext			ret <4 x i8> %sext
	}			}
	; CHECK-LABEL: v4si8_cmp:			; CHECK-LABEL: v4si8_cmp:
	; CHECK: vcmpequw {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}			; CHECK: vcmpequb {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}


	define <8 x i8> @v8si8_cmp(<8 x i8> %x, <8 x i8> %y) nounwind readnone {			define <8 x i8> @v8si8_cmp(<8 x i8> %x, <8 x i8> %y) nounwind readnone {
	%cmp = icmp eq <8 x i8> %x, %y			%cmp = icmp eq <8 x i8> %x, %y
	%sext = sext <8 x i1> %cmp to <8 x i8>			%sext = sext <8 x i1> %cmp to <8 x i8>
	ret <8 x i8> %sext			ret <8 x i8> %sext
	}			}
	; CHECK-LABEL: v8si8_cmp:			; CHECK-LABEL: v8si8_cmp:
	; CHECK: vcmpequh {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}			; CHECK: vcmpequb {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}


	; Additional tests for v16i8 since it is a altivec native type			; Additional tests for v16i8 since it is a altivec native type

	define <16 x i8> @v16si8_cmp_eq(<16 x i8> %x, <16 x i8> %y) nounwind readnone {			define <16 x i8> @v16si8_cmp_eq(<16 x i8> %x, <16 x i8> %y) nounwind readnone {
	%cmp = icmp eq <16 x i8> %x, %y			%cmp = icmp eq <16 x i8> %x, %y
	%sext = sext <16 x i1> %cmp to <16 x i8>			%sext = sext <16 x i1> %cmp to <16 x i8>
	ret <16 x i8> %sext			ret <16 x i8> %sext
	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines


	define <4 x i16> @v4si16_cmp(<4 x i16> %x, <4 x i16> %y) nounwind readnone {			define <4 x i16> @v4si16_cmp(<4 x i16> %x, <4 x i16> %y) nounwind readnone {
	%cmp = icmp eq <4 x i16> %x, %y			%cmp = icmp eq <4 x i16> %x, %y
	%sext = sext <4 x i1> %cmp to <4 x i16>			%sext = sext <4 x i1> %cmp to <4 x i16>
	ret <4 x i16> %sext			ret <4 x i16> %sext
	}			}
	; CHECK-LABEL: v4si16_cmp:			; CHECK-LABEL: v4si16_cmp:
	; CHECK: vcmpequw {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}			; CHECK: vcmpequh {{[0-9]+}}, {{[0-9]+}}, {{[0-9]+}}


	; Additional tests for v8i16 since it is an altivec native type			; Additional tests for v8i16 since it is an altivec native type

	define <8 x i16> @v8si16_cmp_eq(<8 x i16> %x, <8 x i16> %y) nounwind readnone {			define <8 x i16> @v8si16_cmp_eq(<8 x i16> %x, <8 x i16> %y) nounwind readnone {
	entry:			entry:
	%cmp = icmp eq <8 x i16> %x, %y			%cmp = icmp eq <8 x i16> %x, %y
	%sext = sext <8 x i1> %cmp to <8 x i16>			%sext = sext <8 x i1> %cmp to <8 x i16>
	▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vsx.ll

	Show First 20 Lines • Show All 1,138 Lines • ▼ Show 20 Lines
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

	define <2 x double> @test68(<2 x i32> %a) {			define <2 x double> @test68(<2 x i32> %a) {
	%w = sitofp <2 x i32> %a to <2 x double>			%w = sitofp <2 x i32> %a to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w

	; CHECK-LABEL: @test68			; CHECK-LABEL: @test68
	; CHECK: xxsldwi [[V1:[0-9]+]], 34, 34, 1			; CHECK: xxmrghw [[V1:[0-9]+]]
	; CHECK: xvcvsxwdp 34, [[V1]]			; CHECK: xvcvsxwdp 34, [[V1]]
	; CHECK: blr			; CHECK: blr

	; CHECK-LE-LABEL: @test68			; CHECK-LE-LABEL: @test68
	; CHECK-LE: xxsldwi [[V1:[0-9]+]], 34, 34, 1			; CHECK-LE: xxmrglw [[V1:[0-9]+]], 34, 34
	; CHECK-LE: xvcvsxwdp 34, [[V1]]			; CHECK-LE: xvcvsxwdp 34, [[V1]]
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

				; This gets scalarized so the code isn't great
	define <2 x double> @test69(<2 x i16> %a) {			define <2 x double> @test69(<2 x i16> %a) {
	%w = sitofp <2 x i16> %a to <2 x double>			%w = sitofp <2 x i16> %a to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w

	; CHECK-LABEL: @test69			; CHECK-LABEL: @test69
	; CHECK: vspltisw [[V1:[0-9]+]], 8			; CHECK-DAG: lfiwax
	; CHECK: vadduwm [[V2:[0-9]+]], [[V1]], [[V1]]			; CHECK-DAG: lfiwax
	; CHECK: vslw [[V3:[0-9]+]], {{[0-9]+}}, [[V2]]			; CHECK-DAG: xscvsxddp
	; CHECK: vsraw {{[0-9]+}}, [[V3]], [[V2]]			; CHECK-DAG: xscvsxddp
	; CHECK: xxsldwi [[V4:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}, 1			; CHECK: xxmrghd
	; CHECK: xvcvsxwdp 34, [[V4]]
	; CHECK: blr			; CHECK: blr

	; CHECK-LE-LABEL: @test69			; CHECK-LE-LABEL: @test69
	; CHECK-LE: vspltisw [[V1:[0-9]+]], 8			; CHECK-LE: mfvsrd
	; CHECK-LE: vadduwm [[V2:[0-9]+]], [[V1]], [[V1]]			; CHECK-LE: mtvsrwa
	; CHECK-LE: vslw [[V3:[0-9]+]], {{[0-9]+}}, [[V2]]			; CHECK-LE: mtvsrwa
	; CHECK-LE: vsraw {{[0-9]+}}, [[V3]], [[V2]]			; CHECK-LE: xscvsxddp
	; CHECK-LE: xxsldwi [[V4:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}, 1			; CHECK-LE: xscvsxddp
	; CHECK-LE: xvcvsxwdp 34, [[V4]]			; CHECK-LE: xxspltd
				; CHECK-LE: xxspltd
				; CHECK-LE: xxmrgld
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

				; This gets scalarized so the code isn't great
	define <2 x double> @test70(<2 x i8> %a) {			define <2 x double> @test70(<2 x i8> %a) {
	%w = sitofp <2 x i8> %a to <2 x double>			%w = sitofp <2 x i8> %a to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w

	; CHECK-LABEL: @test70			; CHECK-LABEL: @test70
	; CHECK: vspltisw [[V1:[0-9]+]], 12			; CHECK-DAG: lfiwax
	; CHECK: vadduwm [[V2:[0-9]+]], [[V1]], [[V1]]			; CHECK-DAG: lfiwax
	; CHECK: vslw [[V3:[0-9]+]], {{[0-9]+}}, [[V2]]			; CHECK-DAG: xscvsxddp
	; CHECK: vsraw {{[0-9]+}}, [[V3]], [[V2]]			; CHECK-DAG: xscvsxddp
	; CHECK: xxsldwi [[V4:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}, 1			; CHECK: xxmrghd
	; CHECK: xvcvsxwdp 34, [[V4]]
	; CHECK: blr			; CHECK: blr

	; CHECK-LE-LABEL: @test70			; CHECK-LE-LABEL: @test70
	; CHECK-LE: vspltisw [[V1:[0-9]+]], 12			; CHECK-LE: mfvsrd
	; CHECK-LE: vadduwm [[V2:[0-9]+]], [[V1]], [[V1]]			; CHECK-LE: mtvsrwa
	; CHECK-LE: vslw [[V3:[0-9]+]], {{[0-9]+}}, [[V2]]			; CHECK-LE: mtvsrwa
	; CHECK-LE: vsraw {{[0-9]+}}, [[V3]], [[V2]]			; CHECK-LE: xscvsxddp
	; CHECK-LE: xxsldwi [[V4:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}, 1			; CHECK-LE: xscvsxddp
	; CHECK-LE: xvcvsxwdp 34, [[V4]]			; CHECK-LE: xxspltd
				; CHECK-LE: xxspltd
				; CHECK-LE: xxmrgld
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

				; This gets scalarized so the code isn't great
	define <2 x i32> @test80(i32 %v) {			define <2 x i32> @test80(i32 %v) {
	%b1 = insertelement <2 x i32> undef, i32 %v, i32 0			%b1 = insertelement <2 x i32> undef, i32 %v, i32 0
	%b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer			%b2 = shufflevector <2 x i32> %b1, <2 x i32> undef, <2 x i32> zeroinitializer
	%i = add <2 x i32> %b2, <i32 2, i32 3>			%i = add <2 x i32> %b2, <i32 2, i32 3>
	ret <2 x i32> %i			ret <2 x i32> %i

	; CHECK-REG-LABEL: @test80			; CHECK-REG-LABEL: @test80
	; CHECK-REG-DAG: addi [[R1:[0-9]+]], 3, 3			; CHECK-REG: stw 3, -16(1)
	; CHECK-REG-DAG: addi [[R2:[0-9]+]], 1, -16			; CHECK-REG: addi [[R1:[0-9]+]], 1, -16
	; CHECK-REG-DAG: addi [[R3:[0-9]+]], 3, 2			; CHECK-REG: addis [[R2:[0-9]+]]
	; CHECK-REG: std [[R1]], -8(1)			; CHECK-REG: addi [[R2]], [[R2]]
	; CHECK-REG: std [[R3]], -16(1)			; CHECK-REG-DAG: lxvw4x [[VS1:[0-9]+]], 0, [[R1]]
	; CHECK-REG: lxvd2x 34, 0, [[R2]]			; CHECK-REG-DAG: lxvw4x 35, 0, [[R2]]
	; CHECK-REG-NOT: stxvd2x			; CHECK-REG: xxspltw 34, [[VS1]], 0
				; CHECK-REG: vadduwm 2, 2, 3
				; CHECK-REG-NOT: stxvw4x
	; CHECK-REG: blr			; CHECK-REG: blr

	; CHECK-FISL-LABEL: @test80			; CHECK-FISL-LABEL: @test80
	; CHECK-FISL-DAG: addi [[R1:[0-9]+]], 3, 3			; CHECK-FISL: mr 4, 3
	; CHECK-FISL-DAG: addi [[R2:[0-9]+]], 1, -16			; CHECK-FISL: stw 4, -16(1)
	; CHECK-FISL-DAG: addi [[R3:[0-9]+]], 3, 2			; CHECK-FISL: addi [[R1:[0-9]+]], 1, -16
	; CHECK-FISL-DAG: std [[R1]], -8(1)			; CHECK-FISL-DAG: lxvw4x [[VS1:[0-9]+]], 0, [[R1]]
	; CHECK-FISL-DAG: std [[R3]], -16(1)			; CHECK-FISL-DAG: xxspltw {{[0-9]+}}, [[VS1]], 0
	; CHECK-FISL-DAG: lxvd2x 0, 0, [[R2]]			; CHECK-FISL: addis [[R2:[0-9]+]]
				; CHECK-FISL: addi [[R2]], [[R2]]
				; CHECK-FISL-DAG: lxvw4x {{[0-9]+}}, 0, [[R2]]
				; CHECK-FISL: vadduwm
				; CHECK-FISL-NOT: stxvw4x
	; CHECK-FISL: blr			; CHECK-FISL: blr

	; CHECK-LE-LABEL: @test80			; CHECK-LE-LABEL: @test80
	; CHECK-LE-DAG: mtvsrd [[R1:[0-9]+]], 3			; CHECK-LE-DAG: mtvsrd [[R1:[0-9]+]], 3
				; CHECK-LE-DAG: xxswapd [[V1:[0-9]+]], [[R1]]
	; CHECK-LE-DAG: addi [[R2:[0-9]+]], {{[0-9]+}}, .LCPI			; CHECK-LE-DAG: addi [[R2:[0-9]+]], {{[0-9]+}}, .LCPI
	; CHECK-LE-DAG: lxvd2x [[V2:[0-9]+]], 0, [[R2]]			; CHECK-LE-DAG: lxvd2x [[V2:[0-9]+]], 0, [[R2]]
	; CHECK-LE-DAG: xxspltd 34, [[R1]]			; CHECK-LE-DAG: xxspltw 34, [[V1]]
	; CHECK-LE-DAG: xxswapd 35, [[V2]]			; CHECK-LE-DAG: xxswapd 35, [[V2]]
	; CHECK-LE: vaddudm 2, 2, 3			; CHECK-LE: vadduwm 2, 2, 3
	; CHECK-LE: blr			; CHECK-LE: blr
	}			}

	define <2 x double> @test81(<4 x float> %b) {			define <2 x double> @test81(<4 x float> %b) {
	%w = bitcast <4 x float> %b to <2 x double>			%w = bitcast <4 x float> %b to <2 x double>
	ret <2 x double> %w			ret <2 x double> %w

	; CHECK-LABEL: @test81			; CHECK-LABEL: @test81
	Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] - Legalize illegal vector types by widening rather than integer promotionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 59512

lib/Target/PowerPC/PPCISelLowering.h

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/PowerPC/PPCInstrVSX.td

test/Analysis/CostModel/PowerPC/load_store.ll

test/CodeGen/PowerPC/load-v4i8-improved.ll

test/CodeGen/PowerPC/p8altivec-shuffles-pred.ll

test/CodeGen/PowerPC/vec_cmp.ll

test/CodeGen/PowerPC/vsx.ll

[PowerPC] - Legalize illegal vector types by widening rather than integer promotion
ClosedPublic