This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1
TargetLowering.cpp
-
Target/
-
PowerPC/
2
PPCISelLowering.cpp
-
X86/
-
X86ISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
1
ppc64_basicSAD.ll

Differential D46101

[PowerPC] vectorize Sum of Absolute Difference
AbandonedPublic

Authored by inouehrs on Apr 25 2018, 7:52 PM.

Download Raw Diff

Details

Reviewers

hfinkel
echristo
craig.topper
kbarton
nemanjai
sfertile
lei
syzaara
RKSimon

Summary

This patch enables vectorization of Sum of Absolute Difference (SAD), which is already supported in x86 backend.
For example, the following code is compiled with vector max/min and sum-across instructions.

uint8_t *pix1, *pix2;
unsigned i_sum = 0;
for( unsigned x = 0; x < 16; x++ )
  i_sum += abs( pix1[x] - pix2[x] );

To implement this, I moved some helper functions defined in X86TargetLowering into the parent class TargetLowering with minor generalization to reuse them from PPC backend.
Is this an acceptable approach or is it better to implement analysis as DAG combining (and introduce a new Opcode like ISD::SAD)?

This patch supports only ppc64le so far. If accepted, I will add big endian support and also POWER9 new instruction support.

Diff Detail

Event Timeline

inouehrs created this revision.Apr 25 2018, 7:52 PM

RKSimon added a reviewer: RKSimon.Apr 28 2018, 8:44 AM

RKSimon added a subscriber: RKSimon.

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
4362	In a more general setting we probably need to support ABS(SUB(ZEXT(x)), ZEXT(y))) as well
lib/Target/PowerPC/PPCISelLowering.cpp
12110	This (and the comment) needs updating for PPC, maybe merge the 2 ifs to just: if (VT != MVT::v16i8 && VT != MVT::v8i16) return SDValue();
12130	How come PPC doesn't use the ISD::SMAX/SMIN/UMAX/UMIN opcodes? It'd remove a lot of this duplication.
test/CodeGen/PowerPC/ppc64_basicSAD.ll
1	Why not use utils/update_llc_test_checks.py ?

@inouehrs Are you still looking at this at all?

Herald added a subscriber: jsji. · View Herald TranscriptSep 29 2018, 4:15 AM

@RKSimon Other colleagues in IBM are working on this and hopefully they will submit a patch separately. So I abandone this.
Thank you so much for your comments.

RKSimon mentioned this in D49837: [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect..Feb 4 2019, 3:36 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

TargetLowering.h

12 lines

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

179 lines

Target/

PowerPC/

PPCISelLowering.cpp

107 lines

X86/

X86ISelLowering.cpp

166 lines

test/

CodeGen/

PowerPC/

ppc64_basicSAD.ll

132 lines

Diff 144053

include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,644 Lines • ▼ Show 20 Lines	public:
/// Check whether parameters to a call that are passed in callee saved		/// Check whether parameters to a call that are passed in callee saved
/// registers are the same as from the calling function. This needs to be		/// registers are the same as from the calling function. This needs to be
/// checked for tail call eligibility.		/// checked for tail call eligibility.
bool parametersInCSRMatch(const MachineRegisterInfo &MRI,		bool parametersInCSRMatch(const MachineRegisterInfo &MRI,
const uint32_t *CallerPreservedMask,		const uint32_t *CallerPreservedMask,
const SmallVectorImpl<CCValAssign> &ArgLocs,		const SmallVectorImpl<CCValAssign> &ArgLocs,
const SmallVectorImpl<SDValue> &OutVals) const;		const SmallVectorImpl<SDValue> &OutVals) const;

		static bool isBasicSADPattern(SelectionDAG &DAG, SDNode *Extract,
		SDValue &Zext0, SDValue &Zext1,
		ArrayRef<EVT> CandidateDataTypes,
		ArrayRef<ISD::NodeType> CandidateExtOps);

		static bool detectExtAbsDiff(const SDValue &Select, SDValue &Op0,
		SDValue &Op1, ArrayRef<EVT> CandidateDataTypes,
		ArrayRef<ISD::NodeType> CandidateExtOps);

		static SDValue matchBinOpReduction(SDNode *Extract, unsigned &BinOp,
		ArrayRef<ISD::NodeType> CandidateBinOps);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// TargetLowering Optimization Methods		// TargetLowering Optimization Methods
//		//

/// A convenience struct that encapsulates a DAG, and two SDValues for		/// A convenience struct that encapsulates a DAG, and two SDValues for
/// returning information from TargetLowering to its clients that want to		/// returning information from TargetLowering to its clients that want to
/// combine.		/// combine.
struct TargetLoweringOpt {		struct TargetLoweringOpt {
▲ Show 20 Lines • Show All 960 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 4,285 Lines • ▼ Show 20 Lines	if (C->isNullValue() && CC == ISD::SETEQ) {
SDValue Clz = DAG.getNode(ISD::CTLZ, dl, VT, Zext);		SDValue Clz = DAG.getNode(ISD::CTLZ, dl, VT, Zext);
SDValue Scc = DAG.getNode(ISD::SRL, dl, VT, Clz,		SDValue Scc = DAG.getNode(ISD::SRL, dl, VT, Clz,
DAG.getConstant(Log2b, dl, MVT::i32));		DAG.getConstant(Log2b, dl, MVT::i32));
return DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Scc);		return DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Scc);
}		}
}		}
return SDValue();		return SDValue();
}		}

		// Match a binop + shuffle pyramid that represents a horizontal reduction over
		// the elements of a vector.
		// Returns the vector that is being reduced on, or SDValue() if a reduction
		// was not matched.
		SDValue
		TargetLowering::matchBinOpReduction(SDNode *Extract, unsigned &BinOp,
		ArrayRef<ISD::NodeType> CandidateBinOps) {
		// The pattern must end in an extract from index 0.
		if ((Extract->getOpcode() != ISD::EXTRACT_VECTOR_ELT) \|\|
		!isNullConstant(Extract->getOperand(1)))
		return SDValue();

		SDValue Op = Extract->getOperand(0);
		unsigned Stages = Log2_32(Op.getValueType().getVectorNumElements());

		// Match against one of the candidate binary ops.
		if (llvm::none_of(CandidateBinOps, [Op](ISD::NodeType BinOp) {
		return Op.getOpcode() == unsigned(BinOp);
		}))
		return SDValue();

		// At each stage, we're looking for something that looks like:
		// %s = shufflevector <8 x i32> %op, <8 x i32> undef,
		// <8 x i32> <i32 2, i32 3, i32 undef, i32 undef,
		// i32 undef, i32 undef, i32 undef, i32 undef>
		// %a = binop <8 x i32> %op, %s
		// Where the mask changes according to the stage. E.g. for a 3-stage pyramid,
		// we expect something like:
		// <4,5,6,7,u,u,u,u>
		// <2,3,u,u,u,u,u,u>
		// <1,u,u,u,u,u,u,u>
		unsigned CandidateBinOp = Op.getOpcode();
		for (unsigned i = 0; i < Stages; ++i) {
		if (Op.getOpcode() != CandidateBinOp)
		return SDValue();

		ShuffleVectorSDNode *Shuffle =
		dyn_cast<ShuffleVectorSDNode>(Op.getOperand(0).getNode());
		if (Shuffle) {
		Op = Op.getOperand(1);
		} else {
		Shuffle = dyn_cast<ShuffleVectorSDNode>(Op.getOperand(1).getNode());
		Op = Op.getOperand(0);
		}

		// The first operand of the shuffle should be the same as the other operand
		// of the binop.
		if (!Shuffle \|\| Shuffle->getOperand(0) != Op)
		return SDValue();

		// Verify the shuffle has the expected (at this stage of the pyramid) mask.
		for (int Index = 0, MaskEnd = 1 << i; Index < MaskEnd; ++Index)
		if (Shuffle->getMaskElt(Index) != MaskEnd + Index)
		return SDValue();
		}

		BinOp = CandidateBinOp;
		return Op;
		}

		// Given a select, detect the following pattern:
		// 1: %2 = zext <N x i8> %0 to <N x i32>
		// 2: %3 = zext <N x i8> %1 to <N x i32>
		// 3: %4 = sub nsw <N x i32> %2, %3
		// 4: %5 = icmp sgt <N x i32> %4, [0 x N] or [-1 x N]
		// 5: %6 = sub nsw <N x i32> zeroinitializer, %4
		// 6: %7 = select <N x i1> %5, <N x i32> %4, <N x i32> %6
		// This is useful as it is the input into a SAD pattern.
		RKSimonUnsubmitted Not Done Reply Inline Actions In a more general setting we probably need to support ABS(SUB(ZEXT(x)), ZEXT(y))) as well RKSimon: In a more general setting we probably need to support ABS(SUB(ZEXT(x)), ZEXT(y))) as well
		bool
		TargetLowering::detectExtAbsDiff(const SDValue &Select, SDValue &Op0,
		SDValue &Op1,
		ArrayRef<EVT> CandidateDataTypes,
		ArrayRef<ISD::NodeType> CandidateExtOps) {
		// Check the condition of the select instruction is greater-than.
		SDValue SetCC = Select->getOperand(0);
		if (SetCC.getOpcode() != ISD::SETCC)
		return false;
		ISD::CondCode CC = cast<CondCodeSDNode>(SetCC.getOperand(2))->get();
		if (CC != ISD::SETGT && CC != ISD::SETLT)
		return false;

		SDValue SelectOp1 = Select->getOperand(1);
		SDValue SelectOp2 = Select->getOperand(2);

		// The following instructions assume SelectOp1 is the subtraction operand
		// and SelectOp2 is the negation operand.
		// In the case of SETLT this is the other way around.
		if (CC == ISD::SETLT)
		std::swap(SelectOp1, SelectOp2);

		// The second operand of the select should be the negation of the first
		// operand, which is implemented as 0 - SelectOp1.
		if (!(SelectOp2.getOpcode() == ISD::SUB &&
		ISD::isBuildVectorAllZeros(SelectOp2.getOperand(0).getNode()) &&
		SelectOp2.getOperand(1) == SelectOp1))
		return false;

		// The first operand of SetCC is the first operand of the select, which is the
		// difference between the two input vectors.
		if (SetCC.getOperand(0) != SelectOp1)
		return false;

		// In SetLT case, The second operand of the comparison can be either 1 or 0.
		APInt SplatVal;
		if ((CC == ISD::SETLT) &&
		!((ISD::isConstantSplatVector(SetCC.getOperand(1).getNode(), SplatVal) &&
		SplatVal.isOneValue()) \|\|
		(ISD::isBuildVectorAllZeros(SetCC.getOperand(1).getNode()))))
		return false;

		// In SetGT case, The second operand of the comparison can be either -1 or 0.
		if ((CC == ISD::SETGT) &&
		!(ISD::isBuildVectorAllZeros(SetCC.getOperand(1).getNode()) \|\|
		ISD::isBuildVectorAllOnes(SetCC.getOperand(1).getNode())))
		return false;

		// The first operand of the select is the difference between the two input
		// vectors.
		if (SelectOp1.getOpcode() != ISD::SUB)
		return false;

		Op0 = SelectOp1.getOperand(0);
		Op1 = SelectOp1.getOperand(1);

		// Check if the data type and signedness match for two input vector.
		if (Op0.getOpcode() != Op1.getOpcode() \|\|
		Op0.getOperand(0).getValueType() != Op1.getOperand(0).getValueType())
		return false;

		// Match against one of the candidate extension type.
		if (llvm::none_of(CandidateExtOps, [Op0](ISD::NodeType ExtOp) {
		return Op0.getOpcode() == unsigned(ExtOp);
		}))
		return false;

		// Match against one of the candidate data type.
		if (llvm::none_of(CandidateDataTypes, [Op0](EVT DT) {
		return Op0.getOperand(0).getValueType().getVectorElementType() == DT;
		}))
		return false;

		return true;
		}

		bool
		TargetLowering::isBasicSADPattern(SelectionDAG &DAG, SDNode *Extract,
		SDValue &Zext0, SDValue &Zext1,
		ArrayRef<EVT> CandidateDataTypes,
		ArrayRef<ISD::NodeType> CandidateExtOps) {
		// Match shuffle + add pyramid.
		unsigned BinOp = 0;
		SDValue Root = matchBinOpReduction(Extract, BinOp, {ISD::ADD});

		// The operand is expected to be extended by one of extension opcode
		// in CandidateExtOps from a data type in CandidateDataTypes
		// (verified in detectExtAbsDiff).
		// In order to convert to i64 and above, additional any/zero/sign
		// extend is expected.
		// The zero extend from 32 bit has no mathematical effect on the result.
		// Also the sign extend is basically zero extend
		// (extends the sign bit which is zero).
		// So it is correct to skip the sign/zero extend instruction.
		if (Root && (Root.getOpcode() == ISD::SIGN_EXTEND \|\|
		Root.getOpcode() == ISD::ZERO_EXTEND \|\|
		Root.getOpcode() == ISD::ANY_EXTEND))
		Root = Root.getOperand(0);

		// If there was a match, we want Root to be a select that is the root of an
		// abs-diff pattern.
		if (!Root \|\| (Root.getOpcode() != ISD::VSELECT))
		return false;

		// Check whether we have an abs-diff pattern feeding into the select.
		if (!detectExtAbsDiff(Root, Zext0, Zext1, CandidateDataTypes, CandidateExtOps))
		return false;

		return true;
		}

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,017 Lines • ▼ Show 20 Lines	PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
}		}

// Use reciprocal estimates.		// Use reciprocal estimates.
if (TM.Options.UnsafeFPMath) {		if (TM.Options.UnsafeFPMath) {
setTargetDAGCombine(ISD::FDIV);		setTargetDAGCombine(ISD::FDIV);
setTargetDAGCombine(ISD::FSQRT);		setTargetDAGCombine(ISD::FSQRT);
}		}

		setTargetDAGCombine(ISD::EXTRACT_VECTOR_ELT);

// Darwin long double math library functions have $LDBL128 appended.		// Darwin long double math library functions have $LDBL128 appended.
if (Subtarget.isDarwin()) {		if (Subtarget.isDarwin()) {
setLibcallName(RTLIB::COS_PPCF128, "cosl$LDBL128");		setLibcallName(RTLIB::COS_PPCF128, "cosl$LDBL128");
setLibcallName(RTLIB::POW_PPCF128, "powl$LDBL128");		setLibcallName(RTLIB::POW_PPCF128, "powl$LDBL128");
setLibcallName(RTLIB::REM_PPCF128, "fmodl$LDBL128");		setLibcallName(RTLIB::REM_PPCF128, "fmodl$LDBL128");
setLibcallName(RTLIB::SIN_PPCF128, "sinl$LDBL128");		setLibcallName(RTLIB::SIN_PPCF128, "sinl$LDBL128");
setLibcallName(RTLIB::SQRT_PPCF128, "sqrtl$LDBL128");		setLibcallName(RTLIB::SQRT_PPCF128, "sqrtl$LDBL128");
setLibcallName(RTLIB::LOG_PPCF128, "logl$LDBL128");		setLibcallName(RTLIB::LOG_PPCF128, "logl$LDBL128");
▲ Show 20 Lines • Show All 11,051 Lines • ▼ Show 20 Lines	if ((Op.getOperand(0).getOpcode() == ISD::FP_TO_UINT &&
}		}

return FP;		return FP;
}		}

return SDValue();		return SDValue();
}		}

		static SDValue combineBasicSADPattern(SDNode *Extract, SelectionDAG &DAG,
		const PPCSubtarget &Subtarget) {
		// Currently, we support SAD pattern only on ppc64le with VSX
		if (!(Subtarget.hasAltivec() && Subtarget.isPPC64() &&
		Subtarget.isLittleEndian()))
		return SDValue();

		// Verify the type we're extracting from is any integer type above i16.
		EVT VT = Extract->getOperand(0).getValueType();
		if (!VT.isSimple() \|\| !(VT.getVectorElementType().getSizeInBits() > 16))
		return SDValue();

		// We handle upto v16i* for SSE2 / v32i* for AVX / v64i* for AVX512.
		// TODO: We should be able to handle larger vectors by splitting them before
		// feeding them into several SADs, and then reducing over those.
		if (VT.getVectorNumElements() != 16 && VT.getVectorNumElements() != 8)
		RKSimonUnsubmitted Not Done Reply Inline Actions This (and the comment) needs updating for PPC, maybe merge the 2 ifs to just: if (VT != MVT::v16i8 && VT != MVT::v8i16) return SDValue(); RKSimon: This (and the comment) needs updating for PPC, maybe merge the 2 ifs to just: ``` if (VT != MVT…
		return SDValue();

		SDValue Zext0, Zext1;
		if (!TargetLowering::isBasicSADPattern(DAG, Extract, Zext0, Zext1,
		{MVT::i8, MVT::i16},
		{ISD::ZERO_EXTEND, ISD::SIGN_EXTEND}))
		return SDValue();

		EVT SrcVT = Zext0.getOperand(0).getValueType();
		bool IsSigned = (Zext0.getOpcode() == ISD::SIGN_EXTEND);

		SDLoc DL(Extract);
		SDValue VZero = SDValue(DAG.getMachineNode(PPC::V_SET0, DL, MVT::v4i32), 0);
		SDNode MaxNode, MinNode;
		if (SrcVT == MVT::v16i8) {
		if (IsSigned) {
		MaxNode = DAG.getMachineNode(PPC::VMAXSB, DL, MVT::v16i8,
		Zext0.getOperand(0), Zext1.getOperand(0));
		MinNode = DAG.getMachineNode(PPC::VMINSB, DL, MVT::v16i8,
		Zext0.getOperand(0), Zext1.getOperand(0));
		RKSimonUnsubmitted Not Done Reply Inline Actions How come PPC doesn't use the ISD::SMAX/SMIN/UMAX/UMIN opcodes? It'd remove a lot of this duplication. RKSimon: How come PPC doesn't use the ISD::SMAX/SMIN/UMAX/UMIN opcodes? It'd remove a lot of this…
		}
		else {
		MaxNode = DAG.getMachineNode(PPC::VMAXUB, DL, MVT::v16i8,
		Zext0.getOperand(0), Zext1.getOperand(0));
		MinNode = DAG.getMachineNode(PPC::VMINUB, DL, MVT::v16i8,
		Zext0.getOperand(0), Zext1.getOperand(0));
		}
		SDNode *AbsNode = DAG.getMachineNode(PPC::VSUBUBM, DL, MVT::v16i8,
		SDValue(MaxNode, 0),
		SDValue(MinNode, 0));
		SDNode *Sum1Node = DAG.getMachineNode(PPC::VSUM4UBS, DL, MVT::v4i32,
		SDValue(AbsNode, 0), VZero);
		SDNode *Sum2Node = DAG.getMachineNode(PPC::VSUMSWS, DL, MVT::v4i32,
		SDValue(Sum1Node, 0), VZero);
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32,
		SDValue(Sum2Node, 0), Extract->getOperand(1));
		}
		if (SrcVT == MVT::v8i16) {
		if (IsSigned) {
		MaxNode = DAG.getMachineNode(PPC::VMAXSH, DL, MVT::v8i16,
		Zext0.getOperand(0), Zext1.getOperand(0));
		MinNode = DAG.getMachineNode(PPC::VMINSH, DL, MVT::v8i16,
		Zext0.getOperand(0), Zext1.getOperand(0));
		}
		else {
		MaxNode = DAG.getMachineNode(PPC::VMAXUH, DL, MVT::v8i16,
		Zext0.getOperand(0), Zext1.getOperand(0));
		MinNode = DAG.getMachineNode(PPC::VMINUH, DL, MVT::v8i16,
		Zext0.getOperand(0), Zext1.getOperand(0));
		}
		SDNode *AbsNode = DAG.getMachineNode(PPC::VSUBUHM, DL, MVT::v8i16,
		SDValue(MaxNode, 0),
		SDValue(MinNode, 0));

		// We cannot use VSUM4SHS since the absolute value in AbsNode is unsigned.
		// So we create two zero-extended v4i32 vectors from input v8i16 vector
		// and execute two VSUMSWS instructions.
		SmallVector<SDValue, 16> Mask1, Mask2;
		for (unsigned i = 0; i < 16; i++)
		if (i & 2) {
		Mask1.push_back(DAG.getConstant(0, DL, MVT::i32));
		Mask2.push_back(DAG.getConstant(0, DL, MVT::i32));
		}
		else {
		Mask1.push_back(DAG.getConstant(29 - i, DL, MVT::i32));
		Mask2.push_back(DAG.getConstant(0xFF, DL, MVT::i32));
		}
		SDValue VMask1 = DAG.getBuildVector(MVT::v16i8, DL, Mask1);
		SDValue VMask2 = DAG.getBuildVector(MVT::v16i8, DL, Mask2);
		SDNode *AbsOddNode = DAG.getMachineNode(PPC::VPERM, DL, MVT::v8i16,
		VZero, SDValue(AbsNode, 0),
		VMask1);
		SDNode *AbsEvenNode = DAG.getMachineNode(PPC::VAND, DL, MVT::v8i16,
		SDValue(AbsNode, 0), VMask2);
		SDNode *Sum1Node = DAG.getMachineNode(PPC::VSUMSWS, DL, MVT::v4i32,
		SDValue(AbsEvenNode, 0), VZero);
		SDNode *Sum2Node = DAG.getMachineNode(PPC::VSUMSWS, DL, MVT::v4i32,
		SDValue(AbsOddNode, 0),
		SDValue(Sum1Node, 0));
		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32,
		SDValue(Sum2Node, 0), Extract->getOperand(1));
		}
		return SDValue();
		}

// expandVSXLoadForLE - Convert VSX loads (which may be intrinsics for		// expandVSXLoadForLE - Convert VSX loads (which may be intrinsics for
// builtins) into loads with swaps.		// builtins) into loads with swaps.
SDValue PPCTargetLowering::expandVSXLoadForLE(SDNode *N,		SDValue PPCTargetLowering::expandVSXLoadForLE(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
SDLoc dl(N);		SDLoc dl(N);
SDValue Chain;		SDValue Chain;
SDValue Base;		SDValue Base;
▲ Show 20 Lines • Show All 822 Lines • ▼ Show 20 Lines	if (LHS.getOpcode() == ISD::INTRINSIC_WO_CHAIN &&
DAG.getConstant(CompOpc, dl, MVT::i32),		DAG.getConstant(CompOpc, dl, MVT::i32),
DAG.getRegister(PPC::CR6, MVT::i32),		DAG.getRegister(PPC::CR6, MVT::i32),
N->getOperand(4), CompNode.getValue(1));		N->getOperand(4), CompNode.getValue(1));
}		}
break;		break;
}		}
case ISD::BUILD_VECTOR:		case ISD::BUILD_VECTOR:
return DAGCombineBuildVector(N, DCI);		return DAGCombineBuildVector(N, DCI);

		case ISD::EXTRACT_VECTOR_ELT:
		return combineBasicSADPattern(N, DAG, Subtarget);
}		}


return SDValue();		return SDValue();
}		}

SDValue		SDValue
PPCTargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,		PPCTargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
SelectionDAG &DAG,		SelectionDAG &DAG,
std::vector<SDNode > Created) const {		std::vector<SDNode > Created) const {
// fold (sdiv X, pow2)		// fold (sdiv X, pow2)
▲ Show 20 Lines • Show All 1,040 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 31,178 Lines • ▼ Show 20 Lines	if (N0.hasOneUse() && LogicOp1.getOpcode() == ISD::BITCAST &&
!isa<ConstantSDNode>(LogicOp1.getOperand(0))) {		!isa<ConstantSDNode>(LogicOp1.getOperand(0))) {
SDValue CastedOp0 = DAG.getBitcast(VT, LogicOp0);		SDValue CastedOp0 = DAG.getBitcast(VT, LogicOp0);
return DAG.getNode(FPOpcode, DL0, VT, LogicOp1.getOperand(0), CastedOp0);		return DAG.getNode(FPOpcode, DL0, VT, LogicOp1.getOperand(0), CastedOp0);
}		}

return SDValue();		return SDValue();
}		}

// Match a binop + shuffle pyramid that represents a horizontal reduction over
// the elements of a vector.
// Returns the vector that is being reduced on, or SDValue() if a reduction
// was not matched.
static SDValue matchBinOpReduction(SDNode *Extract, unsigned &BinOp,
ArrayRef<ISD::NodeType> CandidateBinOps) {
// The pattern must end in an extract from index 0.
if ((Extract->getOpcode() != ISD::EXTRACT_VECTOR_ELT) \|\|
!isNullConstant(Extract->getOperand(1)))
return SDValue();

SDValue Op = Extract->getOperand(0);
unsigned Stages = Log2_32(Op.getValueType().getVectorNumElements());

// Match against one of the candidate binary ops.
if (llvm::none_of(CandidateBinOps, [Op](ISD::NodeType BinOp) {
return Op.getOpcode() == unsigned(BinOp);
}))
return SDValue();

// At each stage, we're looking for something that looks like:
// %s = shufflevector <8 x i32> %op, <8 x i32> undef,
// <8 x i32> <i32 2, i32 3, i32 undef, i32 undef,
// i32 undef, i32 undef, i32 undef, i32 undef>
// %a = binop <8 x i32> %op, %s
// Where the mask changes according to the stage. E.g. for a 3-stage pyramid,
// we expect something like:
// <4,5,6,7,u,u,u,u>
// <2,3,u,u,u,u,u,u>
// <1,u,u,u,u,u,u,u>
unsigned CandidateBinOp = Op.getOpcode();
for (unsigned i = 0; i < Stages; ++i) {
if (Op.getOpcode() != CandidateBinOp)
return SDValue();

ShuffleVectorSDNode *Shuffle =
dyn_cast<ShuffleVectorSDNode>(Op.getOperand(0).getNode());
if (Shuffle) {
Op = Op.getOperand(1);
} else {
Shuffle = dyn_cast<ShuffleVectorSDNode>(Op.getOperand(1).getNode());
Op = Op.getOperand(0);
}

// The first operand of the shuffle should be the same as the other operand
// of the binop.
if (!Shuffle \|\| Shuffle->getOperand(0) != Op)
return SDValue();

// Verify the shuffle has the expected (at this stage of the pyramid) mask.
for (int Index = 0, MaskEnd = 1 << i; Index < MaskEnd; ++Index)
if (Shuffle->getMaskElt(Index) != MaskEnd + Index)
return SDValue();
}

BinOp = CandidateBinOp;
return Op;
}

// Given a select, detect the following pattern:
// 1: %2 = zext <N x i8> %0 to <N x i32>
// 2: %3 = zext <N x i8> %1 to <N x i32>
// 3: %4 = sub nsw <N x i32> %2, %3
// 4: %5 = icmp sgt <N x i32> %4, [0 x N] or [-1 x N]
// 5: %6 = sub nsw <N x i32> zeroinitializer, %4
// 6: %7 = select <N x i1> %5, <N x i32> %4, <N x i32> %6
// This is useful as it is the input into a SAD pattern.
static bool detectZextAbsDiff(const SDValue &Select, SDValue &Op0,
SDValue &Op1) {
// Check the condition of the select instruction is greater-than.
SDValue SetCC = Select->getOperand(0);
if (SetCC.getOpcode() != ISD::SETCC)
return false;
ISD::CondCode CC = cast<CondCodeSDNode>(SetCC.getOperand(2))->get();
if (CC != ISD::SETGT && CC != ISD::SETLT)
return false;

SDValue SelectOp1 = Select->getOperand(1);
SDValue SelectOp2 = Select->getOperand(2);

// The following instructions assume SelectOp1 is the subtraction operand
// and SelectOp2 is the negation operand.
// In the case of SETLT this is the other way around.
if (CC == ISD::SETLT)
std::swap(SelectOp1, SelectOp2);

// The second operand of the select should be the negation of the first
// operand, which is implemented as 0 - SelectOp1.
if (!(SelectOp2.getOpcode() == ISD::SUB &&
ISD::isBuildVectorAllZeros(SelectOp2.getOperand(0).getNode()) &&
SelectOp2.getOperand(1) == SelectOp1))
return false;

// The first operand of SetCC is the first operand of the select, which is the
// difference between the two input vectors.
if (SetCC.getOperand(0) != SelectOp1)
return false;

// In SetLT case, The second operand of the comparison can be either 1 or 0.
APInt SplatVal;
if ((CC == ISD::SETLT) &&
!((ISD::isConstantSplatVector(SetCC.getOperand(1).getNode(), SplatVal) &&
SplatVal.isOneValue()) \|\|
(ISD::isBuildVectorAllZeros(SetCC.getOperand(1).getNode()))))
return false;

// In SetGT case, The second operand of the comparison can be either -1 or 0.
if ((CC == ISD::SETGT) &&
!(ISD::isBuildVectorAllZeros(SetCC.getOperand(1).getNode()) \|\|
ISD::isBuildVectorAllOnes(SetCC.getOperand(1).getNode())))
return false;

// The first operand of the select is the difference between the two input
// vectors.
if (SelectOp1.getOpcode() != ISD::SUB)
return false;

Op0 = SelectOp1.getOperand(0);
Op1 = SelectOp1.getOperand(1);

// Check if the operands of the sub are zero-extended from vectors of i8.
if (Op0.getOpcode() != ISD::ZERO_EXTEND \|\|
Op0.getOperand(0).getValueType().getVectorElementType() != MVT::i8 \|\|
Op1.getOpcode() != ISD::ZERO_EXTEND \|\|
Op1.getOperand(0).getValueType().getVectorElementType() != MVT::i8)
return false;

return true;
}

// Given two zexts of <k x i8> to <k x i32>, create a PSADBW of the inputs		// Given two zexts of <k x i8> to <k x i32>, create a PSADBW of the inputs
// to these zexts.		// to these zexts.
static SDValue createPSADBW(SelectionDAG &DAG, const SDValue &Zext0,		static SDValue createPSADBW(SelectionDAG &DAG, const SDValue &Zext0,
const SDValue &Zext1, const SDLoc &DL,		const SDValue &Zext1, const SDLoc &DL,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
// Find the appropriate width for the PSADBW.		// Find the appropriate width for the PSADBW.
EVT InVT = Zext0.getOperand(0).getValueType();		EVT InVT = Zext0.getOperand(0).getValueType();
unsigned RegSize = std::max(128u, InVT.getSizeInBits());		unsigned RegSize = std::max(128u, InVT.getSizeInBits());
Show All 28 Lines	if (!Subtarget.hasSSE41())
return SDValue();		return SDValue();

EVT ExtractVT = Extract->getValueType(0);		EVT ExtractVT = Extract->getValueType(0);
if (ExtractVT != MVT::i16 && ExtractVT != MVT::i8)		if (ExtractVT != MVT::i16 && ExtractVT != MVT::i8)
return SDValue();		return SDValue();

// Check for SMAX/SMIN/UMAX/UMIN horizontal reduction patterns.		// Check for SMAX/SMIN/UMAX/UMIN horizontal reduction patterns.
unsigned BinOp;		unsigned BinOp;
SDValue Src = matchBinOpReduction(		SDValue Src = TargetLowering::matchBinOpReduction(
Extract, BinOp, {ISD::SMAX, ISD::SMIN, ISD::UMAX, ISD::UMIN});		Extract, BinOp, {ISD::SMAX, ISD::SMIN, ISD::UMAX, ISD::UMIN});
if (!Src)		if (!Src)
return SDValue();		return SDValue();

EVT SrcVT = Src.getValueType();		EVT SrcVT = Src.getValueType();
EVT SrcSVT = SrcVT.getScalarType();		EVT SrcSVT = SrcVT.getScalarType();
if (SrcSVT != ExtractVT \|\| (SrcVT.getSizeInBits() % 128) != 0)		if (SrcSVT != ExtractVT \|\| (SrcVT.getSizeInBits() % 128) != 0)
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static SDValue combineHorizontalPredicateResult(SDNode *Extract,
EVT ExtractVT = Extract->getValueType(0);		EVT ExtractVT = Extract->getValueType(0);
unsigned BitWidth = ExtractVT.getSizeInBits();		unsigned BitWidth = ExtractVT.getSizeInBits();
if (ExtractVT != MVT::i64 && ExtractVT != MVT::i32 && ExtractVT != MVT::i16 &&		if (ExtractVT != MVT::i64 && ExtractVT != MVT::i32 && ExtractVT != MVT::i16 &&
ExtractVT != MVT::i8)		ExtractVT != MVT::i8)
return SDValue();		return SDValue();

// Check for OR(any_of) and AND(all_of) horizontal reduction patterns.		// Check for OR(any_of) and AND(all_of) horizontal reduction patterns.
unsigned BinOp = 0;		unsigned BinOp = 0;
SDValue Match = matchBinOpReduction(Extract, BinOp, {ISD::OR, ISD::AND});		SDValue Match = TargetLowering::matchBinOpReduction(Extract, BinOp, {ISD::OR, ISD::AND});
if (!Match)		if (!Match)
return SDValue();		return SDValue();

// EXTRACT_VECTOR_ELT can require implicit extension of the vector element		// EXTRACT_VECTOR_ELT can require implicit extension of the vector element
// which we can't support here for now.		// which we can't support here for now.
if (Match.getScalarValueSizeInBits() != BitWidth)		if (Match.getScalarValueSizeInBits() != BitWidth)
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	static SDValue combineHorizontalPredicateResult(SDNode *Extract,
SDValue Ones = DAG.getAllOnesConstant(DL, ResVT);		SDValue Ones = DAG.getAllOnesConstant(DL, ResVT);
SDValue Res = DAG.getBitcast(MaskVT, Match);		SDValue Res = DAG.getBitcast(MaskVT, Match);
Res = DAG.getNode(X86ISD::MOVMSK, DL, MVT::i32, Res);		Res = DAG.getNode(X86ISD::MOVMSK, DL, MVT::i32, Res);
Res = DAG.getSelectCC(DL, Res, DAG.getConstant(CompareBits, DL, MVT::i32),		Res = DAG.getSelectCC(DL, Res, DAG.getConstant(CompareBits, DL, MVT::i32),
Ones, Zero, CondCode);		Ones, Zero, CondCode);
return DAG.getSExtOrTrunc(Res, DL, ExtractVT);		return DAG.getSExtOrTrunc(Res, DL, ExtractVT);
}		}

		static bool detectZextAbsDiff(const SDValue &SelectOp, SDValue &Op0,
		SDValue &Op1) {
		return TargetLowering::detectExtAbsDiff(SelectOp, Op0, Op1, {MVT::i8},
		{ISD::ZERO_EXTEND});
		}

static SDValue combineBasicSADPattern(SDNode *Extract, SelectionDAG &DAG,		static SDValue combineBasicSADPattern(SDNode *Extract, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
// PSADBW is only supported on SSE2 and up.		// PSADBW is only supported on SSE2 and up.
if (!Subtarget.hasSSE2())		if (!Subtarget.hasSSE2())
return SDValue();		return SDValue();

// Verify the type we're extracting from is any integer type above i16.		// Verify the type we're extracting from is any integer type above i16.
EVT VT = Extract->getOperand(0).getValueType();		EVT VT = Extract->getOperand(0).getValueType();
if (!VT.isSimple() \|\| !(VT.getVectorElementType().getSizeInBits() > 16))		if (!VT.isSimple() \|\| !(VT.getVectorElementType().getSizeInBits() > 16))
return SDValue();		return SDValue();

unsigned RegSize = 128;		unsigned RegSize = 128;
if (Subtarget.useBWIRegs())		if (Subtarget.useBWIRegs())
RegSize = 512;		RegSize = 512;
else if (Subtarget.hasAVX())		else if (Subtarget.hasAVX())
RegSize = 256;		RegSize = 256;

// We handle upto v16i* for SSE2 / v32i* for AVX / v64i* for AVX512.		// We handle upto v16i* for SSE2 / v32i* for AVX / v64i* for AVX512.
// TODO: We should be able to handle larger vectors by splitting them before		// TODO: We should be able to handle larger vectors by splitting them before
// feeding them into several SADs, and then reducing over those.		// feeding them into several SADs, and then reducing over those.
if (RegSize / VT.getVectorNumElements() < 8)		if (RegSize / VT.getVectorNumElements() < 8)
return SDValue();		return SDValue();

// Match shuffle + add pyramid.
unsigned BinOp = 0;
SDValue Root = matchBinOpReduction(Extract, BinOp, {ISD::ADD});

// The operand is expected to be zero extended from i8
// (verified in detectZextAbsDiff).
// In order to convert to i64 and above, additional any/zero/sign
// extend is expected.
// The zero extend from 32 bit has no mathematical effect on the result.
// Also the sign extend is basically zero extend
// (extends the sign bit which is zero).
// So it is correct to skip the sign/zero extend instruction.
if (Root && (Root.getOpcode() == ISD::SIGN_EXTEND \|\|
Root.getOpcode() == ISD::ZERO_EXTEND \|\|
Root.getOpcode() == ISD::ANY_EXTEND))
Root = Root.getOperand(0);

// If there was a match, we want Root to be a select that is the root of an
// abs-diff pattern.
if (!Root \|\| (Root.getOpcode() != ISD::VSELECT))
return SDValue();

// Check whether we have an abs-diff pattern feeding into the select.
SDValue Zext0, Zext1;		SDValue Zext0, Zext1;
if (!detectZextAbsDiff(Root, Zext0, Zext1))		if (!TargetLowering::isBasicSADPattern(DAG, Extract, Zext0, Zext1, {MVT::i8},
		{ISD::ZERO_EXTEND}))
return SDValue();		return SDValue();

// Create the SAD instruction.		// Create the SAD instruction.
SDLoc DL(Extract);		SDLoc DL(Extract);
SDValue SAD = createPSADBW(DAG, Zext0, Zext1, DL, Subtarget);		SDValue SAD = createPSADBW(DAG, Zext0, Zext1, DL, Subtarget);

// If the original vector was wider than 8 elements, sum over the results		// If the original vector was wider than 8 elements, sum over the results
// in the SAD vector.		// in the SAD vector.
▲ Show 20 Lines • Show All 8,295 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ppc64_basicSAD.ll

This file was added.

				; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -verify-machineinstrs \| FileCheck %s
				RKSimonUnsubmitted Not Done Reply Inline Actions Why not use utils/update_llc_test_checks.py ? RKSimon: Why not use utils/update_llc_test_checks.py ?

				define zeroext i32 @func8s(i8* nocapture readonly %pix1, i8* nocapture readonly %pix2) {
				; CHECK-LABEL: @func8s
				; CHECK-DAG: vminsb [[MIN:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vmaxsb [[MAX:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vxor [[ZERO:[0-9]+]], [[ZERO]], [[ZERO]]
				; CHECK: vsububm [[ABS:[0-9]+]], [[MAX]], [[MIN]]
				; CHECK: vsum4ubs [[SUM1:[0-9]+]], [[ABS]], [[ZERO]]
				; CHECK: vsumsws [[SUM2:[0-9]+]], [[SUM1]], [[ZERO]]
				; CHECK: mfvsrwz {{[0-9]+}}
				entry:
				%0 = bitcast i8* %pix1 to <16 x i8>*
				%1 = load <16 x i8>, <16 x i8>* %0, align 1
				%2 = sext <16 x i8> %1 to <16 x i32>
				%3 = bitcast i8* %pix2 to <16 x i8>*
				%4 = load <16 x i8>, <16 x i8>* %3, align 1
				%5 = sext <16 x i8> %4 to <16 x i32>
				%6 = sub nsw <16 x i32> %2, %5
				%7 = icmp sgt <16 x i32> %6, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
				%8 = sub nsw <16 x i32> zeroinitializer, %6
				%9 = select <16 x i1> %7, <16 x i32> %6, <16 x i32> %8
				%rdx.shuf = shufflevector <16 x i32> %9, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx = add nsw <16 x i32> %9, %rdx.shuf
				%rdx.shuf12 = shufflevector <16 x i32> %bin.rdx, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = add nsw <16 x i32> %bin.rdx, %rdx.shuf12
				%rdx.shuf14 = shufflevector <16 x i32> %bin.rdx13, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx15 = add nsw <16 x i32> %bin.rdx13, %rdx.shuf14
				%rdx.shuf16 = shufflevector <16 x i32> %bin.rdx15, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx17 = add nsw <16 x i32> %bin.rdx15, %rdx.shuf16
				%10 = extractelement <16 x i32> %bin.rdx17, i32 0
				ret i32 %10
				}

				define zeroext i32 @func8u(i8* nocapture readonly %pix1, i8* nocapture readonly %pix2) {
				; CHECK-LABEL: @func8u
				; CHECK-DAG: vminub [[MIN:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vmaxub [[MAX:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vxor [[ZERO:[0-9]+]], [[ZERO]], [[ZERO]]
				; CHECK: vsububm [[ABS:[0-9]+]], [[MAX]], [[MIN]]
				; CHECK: vsum4ubs [[SUM1:[0-9]+]], [[ABS]], [[ZERO]]
				; CHECK: vsumsws [[SUM2:[0-9]+]], [[SUM1]], [[ZERO]]
				; CHECK: mfvsrwz {{[0-9]+}}
				entry:
				%0 = bitcast i8* %pix1 to <16 x i8>*
				%1 = load <16 x i8>, <16 x i8>* %0, align 1
				%2 = zext <16 x i8> %1 to <16 x i32>
				%3 = bitcast i8* %pix2 to <16 x i8>*
				%4 = load <16 x i8>, <16 x i8>* %3, align 1
				%5 = zext <16 x i8> %4 to <16 x i32>
				%6 = sub nsw <16 x i32> %2, %5
				%7 = icmp sgt <16 x i32> %6, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
				%8 = sub nsw <16 x i32> zeroinitializer, %6
				%9 = select <16 x i1> %7, <16 x i32> %6, <16 x i32> %8
				%rdx.shuf = shufflevector <16 x i32> %9, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx = add <16 x i32> %9, %rdx.shuf
				%rdx.shuf12 = shufflevector <16 x i32> %bin.rdx, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = add <16 x i32> %bin.rdx, %rdx.shuf12
				%rdx.shuf14 = shufflevector <16 x i32> %bin.rdx13, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx15 = add <16 x i32> %bin.rdx13, %rdx.shuf14
				%rdx.shuf16 = shufflevector <16 x i32> %bin.rdx15, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx17 = add <16 x i32> %bin.rdx15, %rdx.shuf16
				%10 = extractelement <16 x i32> %bin.rdx17, i32 0
				ret i32 %10
				}


				define signext i32 @func16s(i16* nocapture readonly %pix1, i16* nocapture readonly %pix2) {
				; CHECK-LABEL: @func16s
				; CHECK-DAG: vminsh [[MIN:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vmaxsh [[MAX:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vxor [[ZERO:[0-9]+]], [[ZERO]], [[ZERO]]
				; CHECK: vsubuhm [[ABS:[0-9]+]], [[MAX]], [[MIN]]
				; CHECK-DAG: vand [[EVEN:[0-9]+]], [[ABS]], {{[0-9]+}}
				; CHECK-DAG: vperm [[ODD:[0-9]+]], [[ZERO]], [[ABS]], {{[0-9]+}}
				; CHECK: vsumsws [[SUM1:[0-9]+]], [[EVEN]], [[ZERO]]
				; CHECK: vsumsws [[SUM2:[0-9]+]], [[ODD]], [[SUM1]]
				; CHECK: mfvsrwz {{[0-9]+}}

				entry:
				%0 = bitcast i16* %pix1 to <8 x i16>*
				%1 = load <8 x i16>, <8 x i16>* %0, align 2
				%2 = sext <8 x i16> %1 to <8 x i32>
				%3 = bitcast i16* %pix2 to <8 x i16>*
				%4 = load <8 x i16>, <8 x i16>* %3, align 2
				%5 = sext <8 x i16> %4 to <8 x i32>
				%6 = sub nsw <8 x i32> %2, %5
				%7 = icmp sgt <8 x i32> %6, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
				%8 = sub nsw <8 x i32> zeroinitializer, %6
				%9 = select <8 x i1> %7, <8 x i32> %6, <8 x i32> %8
				%rdx.shuf = shufflevector <8 x i32> %9, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx = add nsw <8 x i32> %9, %rdx.shuf
				%rdx.shuf12 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = add nsw <8 x i32> %bin.rdx, %rdx.shuf12
				%rdx.shuf14 = shufflevector <8 x i32> %bin.rdx13, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx15 = add nsw <8 x i32> %bin.rdx13, %rdx.shuf14
				%10 = extractelement <8 x i32> %bin.rdx15, i32 0
				ret i32 %10
				}

				define signext i32 @func16u(i16* nocapture readonly %pix1, i16* nocapture readonly %pix2) {
				; CHECK-LABEL: @func16u
				; CHECK-DAG: vminuh [[MIN:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vmaxuh [[MAX:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}
				; CHECK-DAG: vxor [[ZERO:[0-9]+]], [[ZERO]], [[ZERO]]
				; CHECK: vsubuhm [[ABS:[0-9]+]], [[MAX]], [[MIN]]
				; CHECK-DAG: vand [[EVEN:[0-9]+]], [[ABS]], {{[0-9]+}}
				; CHECK-DAG: vperm [[ODD:[0-9]+]], [[ZERO]], [[ABS]], {{[0-9]+}}
				; CHECK: vsumsws [[SUM1:[0-9]+]], [[EVEN]], [[ZERO]]
				; CHECK: vsumsws [[SUM2:[0-9]+]], [[ODD]], [[SUM1]]
				; CHECK: mfvsrwz {{[0-9]+}}

				entry:
				%0 = bitcast i16* %pix1 to <8 x i16>*
				%1 = load <8 x i16>, <8 x i16>* %0, align 2
				%2 = zext <8 x i16> %1 to <8 x i32>
				%3 = bitcast i16* %pix2 to <8 x i16>*
				%4 = load <8 x i16>, <8 x i16>* %3, align 2
				%5 = zext <8 x i16> %4 to <8 x i32>
				%6 = sub nsw <8 x i32> %2, %5
				%7 = icmp sgt <8 x i32> %6, <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>
				%8 = sub nsw <8 x i32> zeroinitializer, %6
				%9 = select <8 x i1> %7, <8 x i32> %6, <8 x i32> %8
				%rdx.shuf = shufflevector <8 x i32> %9, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx = add nsw <8 x i32> %9, %rdx.shuf
				%rdx.shuf12 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx13 = add nsw <8 x i32> %bin.rdx, %rdx.shuf12
				%rdx.shuf14 = shufflevector <8 x i32> %bin.rdx13, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx15 = add nsw <8 x i32> %bin.rdx13, %rdx.shuf14
				%10 = extractelement <8 x i32> %bin.rdx15, i32 0
				ret i32 %10
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] vectorize Sum of Absolute DifferenceAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 144053

include/llvm/CodeGen/TargetLowering.h

lib/CodeGen/SelectionDAG/TargetLowering.cpp

lib/Target/PowerPC/PPCISelLowering.cpp

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/PowerPC/ppc64_basicSAD.ll

[PowerPC] vectorize Sum of Absolute Difference
AbandonedPublic