Download Raw Diff

Details

Reviewers

hfinkel
echristo
kbarton
sfertile
nemanjai
syzaara
lei

Commits

rG33486787cb6e: [PowerPC] fix incorrect vectorization of abs() on POWER9
rL330497: [PowerPC] fix incorrect vectorization of abs() on POWER9

Summary

Vectorized loops with abs() returns incorrect results on POWER9. This patch fixes it.
For example the following code returns negative result if input values are negative though it sums up the absolute value of the inputs. This problem causes test failures for libvpx.

int vpx_satd_c(const int16_t *coeff, int length) {
  int satd = 0;
  for (int i = 0; i < length; ++i) satd += abs(coeff[i]);
  return satd;
}

For vector absolute and vector absolute difference on POWER9, LLVM generates VABSDUW (Vector Absolute Difference Unsigned Word) instruction or variants.
Since these instructions are for unsigned integers, we need adjustment for signed integers.
For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000). Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1. For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000).

Diff Detail

Repository: rL LLVM

Event Timeline

inouehrs created this revision.Apr 11 2018, 8:33 AM

fix for halfword and byte cases

I imagine the constant materialization and splatting direct moves would not cost anything additional to what I suggested in an amortized sense (i.e. if LICM takes them out of the loop). However, I think it's still useful to produce code with lower path length.

Also, I can't really think of a better way to do it for the halfword version. I imagine there isn't a better way.
Finally for consistency, I would just use vxor instead of the various add opcodes. I think that is semantically equivalent to the modulo adds (correct me if I'm wrong here).

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4805 ↗	(On Diff #142023)	Shouldn't this check `ZERO_EXTEND_VECTOR_INREG` as well? Or is that a node we can't have this late?
4810 ↗	(On Diff #142023)	It seems that for the `v4i32` type, we should be able to just use `xvnegsp` rather than loading the immediate, moving and adding.
4828 ↗	(On Diff #142023)	We should just be able to do something like: xxspltib 35, 128 # Mask vxor 0, 3, 2 # Flip sign vabsduh ... # The actual absdiff

inouehrs added inline comments.Apr 18 2018, 12:39 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4805 ↗	(On Diff #142023)	In my understanding `ZERO_EXTEND_VECTOR_INREG` is created in the legalize phase, while this code is for the initial selection phase. So I think this code will not find `ZERO_EXTEND_VECTOR_INREG` node here.
4810 ↗	(On Diff #142023)	Do you know it is safe to use a floating point instruction for integer data if the bit pattern is for NaN of Inf?
4828 ↗	(On Diff #142023)	Good catch. I will update to use `xxspltib`. VSX splat immediate supports 8-bit immediate while older VMX splat immediate supports only 5 bits.

nemanjai added inline comments.Apr 18 2018, 3:40 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4810 ↗	(On Diff #142023)	I think it's OK according to the ISA since it doesn't modify any special registers or do anything special for NaN/Inf. The description just says that it copies the contents with the high bit of each word element complemented, so I think this is just a bitwise operation rather than a vector fp operation.

addressed comments from @nemanjai

inouehrs added inline comments.Apr 18 2018, 10:53 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4810 ↗	(On Diff #142023)	As far as I tested, it works at least on POWER9.

Sorry, I find the code kind of difficult to follow now. This is exacerbated by the fact that we end up creating nodes with AddOpcode that produce unary operations - which is very counter intuitive. I think it would be much easier to follow if you made it flow more naturally. Perhaps something along the lines of:

MachineSDNode *flipSign(...)
  if (Type == v4i32)
    <produce and return XVNEGSP>
  if (Type == v8i16)
    <produce the { 0x8000, 0x8000, ... } vector> // The implicit CSE in the DAG will ensure we don't get multiple nodes
  else if (Type == v16i8)
    <produce the { 0x80, 0x80, ... } vector>     // The implicit CSE in the DAG will ensure we don't get multiple nodes
  if (InputOp == Zero)
    <return vector from above>

  <produce and return the add/xor>
...
if (SkipAdjust)
  <just produce VABSDU[BHW], replace, return>
if (Opcode == SUB)
  Op1 = flipSign(Operand1)
  Op2 = flipSign(Operand2)
else
  Op1 = flipSign(Operand1)
  Op2 = flipSign(Zero)
<produce VABSDU[BHW] with Op1/Op2, replace, return>

I think that if it's structured in a similar way, it is much easier to follow the code and see exactly what is going on.

refactoring based on the suggestion from @nemanjai

I think this flows much more cleanly now. And if you reorder it to not create the bool temporary and return early, I think it'll be even better. Feel free to do that on the commit, I don't think this needs another review cycle.

BTW. Using V_SET0 or XXLXORz instead of XXSPLTIB to produce a zero vector might be a better choice since it has a 1-cycle lower maximum latency (but in practice, this shouldn't be noticeable).

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
330 ↗	(On Diff #143252)	This is a member function and you're passing `CurDAG` which is a data member of the same class, right? If so, please remove the argument. Also, the `SDLoc` is easy enough to get from `N`.
3977 ↗	(On Diff #143252)	Since you're setting the output parameter rather than returning it... s/returns/sets it to
3992 ↗	(On Diff #143252)	I suppose we could do this by: vspltish 5, 1 vspltish 6, 15 vslh 5, 6, 5 As the two splats can be done in parallel. But this might very well be worse since it uses an extra vector register and due to the dispatch rules for vector operations. Up to you of course.
4842 ↗	(On Diff #143252)	Can you just move this condition below where you set the opcodes and then we don't need `SkipAdjust` (since we now use it in only one place)?
4856 ↗	(On Diff #143252)	Can you please return here so the remaining code does not need to be nested into an `else`?

This revision is now accepted and ready to land.Apr 20 2018, 6:01 AM

Closed by commit rL330497: [PowerPC] fix incorrect vectorization of abs() on POWER9 (authored by inouehrs). · Explain WhyApr 21 2018, 2:35 AM

This revision was automatically updated to reflect the committed changes.

Diff 143437

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 321 Lines • ▼ Show 20 Lines	private:
SDValue combineToCMPB(SDNode *N);		SDValue combineToCMPB(SDNode *N);
void foldBoolExts(SDValue &Res, SDNode *&N);		void foldBoolExts(SDValue &Res, SDNode *&N);

bool AllUsersSelectZero(SDNode *N);		bool AllUsersSelectZero(SDNode *N);
void SwapAllSelectUsers(SDNode *N);		void SwapAllSelectUsers(SDNode *N);

bool isOffsetMultipleOf(SDNode *N, unsigned Val) const;		bool isOffsetMultipleOf(SDNode *N, unsigned Val) const;
void transferMemOperands(SDNode N, SDNode Result);		void transferMemOperands(SDNode N, SDNode Result);
		MachineSDNode flipSignBit(const SDValue &N, SDNode *SignBit = nullptr);
};		};

} // end anonymous namespace		} // end anonymous namespace

/// InsertVRSaveCode - Once the entire function has been instruction selected,		/// InsertVRSaveCode - Once the entire function has been instruction selected,
/// all virtual registers are created and all machine instructions are built,		/// all virtual registers are created and all machine instructions are built,
/// check to see if we need to save/restore VRSAVE. If so, do it.		/// check to see if we need to save/restore VRSAVE. If so, do it.
void PPCDAGToDAGISel::InsertVRSaveCode(MachineFunction &Fn) {		void PPCDAGToDAGISel::InsertVRSaveCode(MachineFunction &Fn) {
▲ Show 20 Lines • Show All 3,627 Lines • ▼ Show 20 Lines

void PPCDAGToDAGISel::transferMemOperands(SDNode N, SDNode Result) {		void PPCDAGToDAGISel::transferMemOperands(SDNode N, SDNode Result) {
// Transfer memoperands.		// Transfer memoperands.
MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);		MachineSDNode::mmo_iterator MemOp = MF->allocateMemRefsArray(1);
MemOp[0] = cast<MemSDNode>(N)->getMemOperand();		MemOp[0] = cast<MemSDNode>(N)->getMemOperand();
cast<MachineSDNode>(Result)->setMemRefs(MemOp, MemOp + 1);		cast<MachineSDNode>(Result)->setMemRefs(MemOp, MemOp + 1);
}		}

		/// This method returns a node after flipping the MSB of each element
		/// of vector integer type. Additionally, if SignBitVec is non-null,
		/// this method sets a node with one at MSB of all elements
		/// and zero at other bits in SignBitVec.
		MachineSDNode *
		PPCDAGToDAGISel::flipSignBit(const SDValue &N, SDNode **SignBitVec) {
		SDLoc dl(N);
		EVT VecVT = N.getValueType();
		if (VecVT == MVT::v4i32) {
		if (SignBitVec) {
		SDNode *ZV = CurDAG->getMachineNode(PPC::V_SET0, dl, MVT::v4i32);
		*SignBitVec = CurDAG->getMachineNode(PPC::XVNEGSP, dl, VecVT,
		SDValue(ZV, 0));
		}
		return CurDAG->getMachineNode(PPC::XVNEGSP, dl, VecVT, N);
		}
		else if (VecVT == MVT::v8i16) {
		SDNode *Hi = CurDAG->getMachineNode(PPC::LIS, dl, MVT::i32,
		getI32Imm(0x8000, dl));
		SDNode *ScaImm = CurDAG->getMachineNode(PPC::ORI, dl, MVT::i32,
		SDValue(Hi, 0),
		getI32Imm(0x8000, dl));
		SDNode *VecImm = CurDAG->getMachineNode(PPC::MTVSRWS, dl, VecVT,
		SDValue(ScaImm, 0));
		/*
		Alternatively, we can do this as follow to use VRF instead of GPR.
		vspltish 5, 1
		vspltish 6, 15
		vslh 5, 6, 5
		*/
		if (SignBitVec) *SignBitVec = VecImm;
		return CurDAG->getMachineNode(PPC::VADDUHM, dl, VecVT, N,
		SDValue(VecImm, 0));
		}
		else if (VecVT == MVT::v16i8) {
		SDNode *VecImm = CurDAG->getMachineNode(PPC::XXSPLTIB, dl, MVT::i32,
		getI32Imm(0x80, dl));
		if (SignBitVec) *SignBitVec = VecImm;
		return CurDAG->getMachineNode(PPC::VADDUBM, dl, VecVT, N,
		SDValue(VecImm, 0));
		}
		else
		llvm_unreachable("Unsupported vector data type for flipSignBit");
		}

// Select - Convert the specified operand from a target-independent to a		// Select - Convert the specified operand from a target-independent to a
// target-specific node if it hasn't already been changed.		// target-specific node if it hasn't already been changed.
void PPCDAGToDAGISel::Select(SDNode *N) {		void PPCDAGToDAGISel::Select(SDNode *N) {
SDLoc dl(N);		SDLoc dl(N);
if (N->isMachineOpcode()) {		if (N->isMachineOpcode()) {
N->setNodeId(-1);		N->setNodeId(-1);
return; // Already selected.		return; // Already selected.
}		}
▲ Show 20 Lines • Show All 797 Lines • ▼ Show 20 Lines	if ((Elt & 1) == 0) {
SDNode *Tmp1 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);		SDNode *Tmp1 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);
EltVal = getI32Imm(-16, dl);		EltVal = getI32Imm(-16, dl);
SDNode *Tmp2 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);		SDNode *Tmp2 = CurDAG->getMachineNode(Opc1, dl, VT, EltVal);
ReplaceNode(N, CurDAG->getMachineNode(Opc2, dl, VT, SDValue(Tmp1, 0),		ReplaceNode(N, CurDAG->getMachineNode(Opc2, dl, VT, SDValue(Tmp1, 0),
SDValue(Tmp2, 0)));		SDValue(Tmp2, 0)));
return;		return;
}		}
}		}
		case ISD::ABS: {
		assert(PPCSubTarget->hasP9Vector() && "ABS is supported with P9 Vector");

		// For vector absolute difference, we use VABSDUW instruction of POWER9.
		// Since VABSDU instructions are for unsigned integers, we need adjustment
		// for signed integers.
		// For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000).
		// Otherwise, abs(sub(-1, 0)) returns 0xFFFFFFFF(=-1) instead of 1.
		// For abs(a), we generate VABSDUW(a+0x80000000, 0x80000000).
		EVT VecVT = N->getOperand(0).getValueType();
		SDNode *AbsOp = nullptr;
		unsigned AbsOpcode;

		if (VecVT == MVT::v4i32)
		AbsOpcode = PPC::VABSDUW;
		else if (VecVT == MVT::v8i16)
		AbsOpcode = PPC::VABSDUH;
		else if (VecVT == MVT::v16i8)
		AbsOpcode = PPC::VABSDUB;
		else
		llvm_unreachable("Unsupported vector data type for ISD::ABS");

		// Even for signed integers, we can skip adjustment if all values are
		// known to be positive (as signed integer) due to zero-extended inputs.
		if (N->getOperand(0).getOpcode() == ISD::SUB &&
		N->getOperand(0)->getOperand(0).getOpcode() == ISD::ZERO_EXTEND &&
		N->getOperand(0)->getOperand(1).getOpcode() == ISD::ZERO_EXTEND) {
		AbsOp = CurDAG->getMachineNode(AbsOpcode, dl, VecVT,
		SDValue(N->getOperand(0)->getOperand(0)),
		SDValue(N->getOperand(0)->getOperand(1)));
		ReplaceNode(N, AbsOp);
		return;
		}
		if (N->getOperand(0).getOpcode() == ISD::SUB) {
		SDValue SubVal = N->getOperand(0);
		SDNode *Op0 = flipSignBit(SubVal->getOperand(0));
		SDNode *Op1 = flipSignBit(SubVal->getOperand(1));
		AbsOp = CurDAG->getMachineNode(AbsOpcode, dl, VecVT,
		SDValue(Op0, 0), SDValue(Op1, 0));
		}
		else {
		SDNode *Op1 = nullptr;
		SDNode *Op0 = flipSignBit(N->getOperand(0), &Op1);
		AbsOp = CurDAG->getMachineNode(AbsOpcode, dl, VecVT, SDValue(Op0, 0),
		SDValue(Op1, 0));
		}
		ReplaceNode(N, AbsOp);
		return;
		}
}		}

SelectCode(N);		SelectCode(N);
}		}

// If the target supports the cmpb instruction, do the idiom recognition here.		// If the target supports the cmpb instruction, do the idiom recognition here.
// We don't do this as a DAG combine because we don't want to do it as nodes		// We don't do this as a DAG combine because we don't want to do it as nodes
// are being combined (because we might miss part of the eventual idiom). We		// are being combined (because we might miss part of the eventual idiom). We
▲ Show 20 Lines • Show All 1,345 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrAltivec.td

Show First 20 Lines • Show All 1,498 Lines • ▼ Show 20 Lines	def VABSDUB : VXForm_1<1027, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
[(set v16i8:$vD, (int_ppc_altivec_vabsdub v16i8:$vA, v16i8:$vB))]>;		[(set v16i8:$vD, (int_ppc_altivec_vabsdub v16i8:$vA, v16i8:$vB))]>;
def VABSDUH : VXForm_1<1091, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),		def VABSDUH : VXForm_1<1091, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
"vabsduh $vD, $vA, $vB", IIC_VecGeneral,		"vabsduh $vD, $vA, $vB", IIC_VecGeneral,
[(set v8i16:$vD, (int_ppc_altivec_vabsduh v8i16:$vA, v8i16:$vB))]>;		[(set v8i16:$vD, (int_ppc_altivec_vabsduh v8i16:$vA, v8i16:$vB))]>;
def VABSDUW : VXForm_1<1155, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),		def VABSDUW : VXForm_1<1155, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
"vabsduw $vD, $vA, $vB", IIC_VecGeneral,		"vabsduw $vD, $vA, $vB", IIC_VecGeneral,
[(set v4i32:$vD, (int_ppc_altivec_vabsduw v4i32:$vA, v4i32:$vB))]>;		[(set v4i32:$vD, (int_ppc_altivec_vabsduw v4i32:$vA, v4i32:$vB))]>;

def : Pat<(v16i8:$vD (abs v16i8:$vA)),
(v16i8 (VABSDUB $vA, (v16i8 (V_SET0B))))>;
def : Pat<(v8i16:$vD (abs v8i16:$vA)),
(v8i16 (VABSDUH $vA, (v8i16 (V_SET0H))))>;
def : Pat<(v4i32:$vD (abs v4i32:$vA)),
(v4i32 (VABSDUW $vA, (v4i32 (V_SET0))))>;

def : Pat<(v16i8:$vD (abs (sub v16i8:$vA, v16i8:$vB))),
(v16i8 (VABSDUB $vA, $vB))>;
def : Pat<(v8i16:$vD (abs (sub v8i16:$vA, v8i16:$vB))),
(v8i16 (VABSDUH $vA, $vB))>;
def : Pat<(v4i32:$vD (abs (sub v4i32:$vA, v4i32:$vB))),
(v4i32 (VABSDUW $vA, $vB))>;

} // end HasP9Altivec		} // end HasP9Altivec

llvm/trunk/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll

	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 -verify-machineinstrs \| FileCheck %s
	; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr9 -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=powerpc64-unknown-linux-gnu -mcpu=pwr9 -verify-machineinstrs \| FileCheck %s
	; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK-PWR8 -implicit-check-not vabsdu			; RUN: llc < %s -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 -verify-machineinstrs \| FileCheck %s -check-prefix=CHECK-PWR8 -implicit-check-not vabsdu

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define <4 x i32> @simple_absv_32(<4 x i32> %a) local_unnamed_addr {			define <4 x i32> @simple_absv_32(<4 x i32> %a) local_unnamed_addr {
	entry:			entry:
	%sub.i = sub <4 x i32> zeroinitializer, %a			%sub.i = sub <4 x i32> zeroinitializer, %a
	%0 = tail call <4 x i32> @llvm.ppc.altivec.vmaxsw(<4 x i32> %a, <4 x i32> %sub.i)			%0 = tail call <4 x i32> @llvm.ppc.altivec.vmaxsw(<4 x i32> %a, <4 x i32> %sub.i)
	ret <4 x i32> %0			ret <4 x i32> %0
	; CHECK-LABEL: simple_absv_32			; CHECK-LABEL: simple_absv_32
	; CHECK: vxor [[ZERO:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}			; CHECK-DAG: vxor {{[0-9]+}}, [[REG:[0-9]+]], [[REG]]
	; CHECK-NEXT: vabsduw 2, 2, [[ZERO]]			; CHECK-DAG: xvnegsp 34, 34
				; CHECK-DAG: xvnegsp 35, {{[0-9]+}}
				; CHECK-NEXT: vabsduw 2, 2, {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	; CHECK-PWR8-LABEL: simple_absv_32			; CHECK-PWR8-LABEL: simple_absv_32
	; CHECK-PWR8: xxlxor			; CHECK-PWR8: xxlxor
	; CHECK-PWR8: vsubuwm			; CHECK-PWR8: vsubuwm
	; CHECK-PWR8: vmaxsw			; CHECK-PWR8: vmaxsw
	; CHECK-PWR8: blr			; CHECK-PWR8: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define <4 x i32> @simple_absv_32_swap(<4 x i32> %a) local_unnamed_addr {			define <4 x i32> @simple_absv_32_swap(<4 x i32> %a) local_unnamed_addr {
	entry:			entry:
	%sub.i = sub <4 x i32> zeroinitializer, %a			%sub.i = sub <4 x i32> zeroinitializer, %a
	%0 = tail call <4 x i32> @llvm.ppc.altivec.vmaxsw(<4 x i32> %sub.i, <4 x i32> %a)			%0 = tail call <4 x i32> @llvm.ppc.altivec.vmaxsw(<4 x i32> %sub.i, <4 x i32> %a)
	ret <4 x i32> %0			ret <4 x i32> %0
	; CHECK-LABEL: simple_absv_32_swap			; CHECK-LABEL: simple_absv_32_swap
	; CHECK: vxor [[ZERO:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}			; CHECK-DAG: vxor {{[0-9]+}}, [[REG:[0-9]+]], [[REG]]
	; CHECK-NEXT: vabsduw 2, 2, [[ZERO]]			; CHECK-DAG: xvnegsp 34, 34
				; CHECK-DAG: xvnegsp 35, {{[0-9]+}}
				; CHECK-NEXT: vabsduw 2, 2, {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	; CHECK-PWR8-LABEL: simple_absv_32_swap			; CHECK-PWR8-LABEL: simple_absv_32_swap
	; CHECK-PWR8: xxlxor			; CHECK-PWR8: xxlxor
	; CHECK-PWR8: vsubuwm			; CHECK-PWR8: vsubuwm
	; CHECK-PWR8: vmaxsw			; CHECK-PWR8: vmaxsw
	; CHECK-PWR8: blr			; CHECK-PWR8: blr
	}			}

	define <8 x i16> @simple_absv_16(<8 x i16> %a) local_unnamed_addr {			define <8 x i16> @simple_absv_16(<8 x i16> %a) local_unnamed_addr {
	entry:			entry:
	%sub.i = sub <8 x i16> zeroinitializer, %a			%sub.i = sub <8 x i16> zeroinitializer, %a
	%0 = tail call <8 x i16> @llvm.ppc.altivec.vmaxsh(<8 x i16> %a, <8 x i16> %sub.i)			%0 = tail call <8 x i16> @llvm.ppc.altivec.vmaxsh(<8 x i16> %a, <8 x i16> %sub.i)
	ret <8 x i16> %0			ret <8 x i16> %0
	; CHECK-LABEL: simple_absv_16			; CHECK-LABEL: simple_absv_16
	; CHECK: vxor [[ZERO:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}			; CHECK: mtvsrws {{[0-9]+}}, {{[0-9]+}}
	; CHECK-NEXT: vabsduh 2, 2, [[ZERO]]			; CHECK-NEXT: vadduhm 2, 2, [[IMM:[0-9]+]]
				; CHECK-NEXT: vabsduh 2, 2, [[IMM]]
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	; CHECK-PWR8-LABEL: simple_absv_16			; CHECK-PWR8-LABEL: simple_absv_16
	; CHECK-PWR8: xxlxor			; CHECK-PWR8: xxlxor
	; CHECK-PWR8: vsubuhm			; CHECK-PWR8: vsubuhm
	; CHECK-PWR8: vmaxsh			; CHECK-PWR8: vmaxsh
	; CHECK-PWR8: blr			; CHECK-PWR8: blr
	}			}

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	define <16 x i8> @simple_absv_8(<16 x i8> %a) local_unnamed_addr {			define <16 x i8> @simple_absv_8(<16 x i8> %a) local_unnamed_addr {
	entry:			entry:
	%sub.i = sub <16 x i8> zeroinitializer, %a			%sub.i = sub <16 x i8> zeroinitializer, %a
	%0 = tail call <16 x i8> @llvm.ppc.altivec.vmaxsb(<16 x i8> %a, <16 x i8> %sub.i)			%0 = tail call <16 x i8> @llvm.ppc.altivec.vmaxsb(<16 x i8> %a, <16 x i8> %sub.i)
	ret <16 x i8> %0			ret <16 x i8> %0
	; CHECK-LABEL: simple_absv_8			; CHECK-LABEL: simple_absv_8
	; CHECK: vxor [[ZERO:[0-9]+]], {{[0-9]+}}, {{[0-9]+}}			; CHECK: xxspltib {{[0-9]+}}, 128
	; CHECK-NEXT: vabsdub 2, 2, [[ZERO]]			; CHECK-NEXT: vaddubm 2, 2, [[IMM:[0-9]+]]
				; CHECK-NEXT: vabsdub 2, 2, [[IMM]]
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	; CHECK-PWR8-LABEL: simple_absv_8			; CHECK-PWR8-LABEL: simple_absv_8
	; CHECK-PWR8: xxlxor			; CHECK-PWR8: xxlxor
	; CHECK-PWR8: vsububm			; CHECK-PWR8: vsububm
	; CHECK-PWR8: vmaxsb			; CHECK-PWR8: vmaxsb
	; CHECK-PWR8: blr			; CHECK-PWR8: blr
	}			}

	; The select pattern can only be detected for v4i32.			; The select pattern can only be detected for v4i32.
	; Function Attrs: norecurse nounwind readnone			; Function Attrs: norecurse nounwind readnone
	define <4 x i32> @sub_absv_32(<4 x i32> %a, <4 x i32> %b) local_unnamed_addr {			define <4 x i32> @sub_absv_32(<4 x i32> %a, <4 x i32> %b) local_unnamed_addr {
	entry:			entry:
	%0 = sub nsw <4 x i32> %a, %b			%0 = sub nsw <4 x i32> %a, %b
	%1 = icmp sgt <4 x i32> %0, <i32 -1, i32 -1, i32 -1, i32 -1>			%1 = icmp sgt <4 x i32> %0, <i32 -1, i32 -1, i32 -1, i32 -1>
	%2 = sub <4 x i32> zeroinitializer, %0			%2 = sub <4 x i32> zeroinitializer, %0
	%3 = select <4 x i1> %1, <4 x i32> %0, <4 x i32> %2			%3 = select <4 x i1> %1, <4 x i32> %0, <4 x i32> %2
	ret <4 x i32> %3			ret <4 x i32> %3
	; CHECK-LABEL: sub_absv_32			; CHECK-LABEL: sub_absv_32
	; CHECK: vabsduw 2, 2, 3			; CHECK-DAG: xvnegsp 34, 34
				; CHECK-DAG: xvnegsp 35, 35
				; CHECK-NEXT: vabsduw 2, 2, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	; CHECK-PWR8-LABEL: sub_absv_32			; CHECK-PWR8-LABEL: sub_absv_32
	; CHECK-PWR8: vsubuwm			; CHECK-PWR8: vsubuwm
	; CHECK-PWR8: xxlxor			; CHECK-PWR8: xxlxor
	; CHECK-PWR8: blr			; CHECK-PWR8: blr
	}			}

	; FIXME: This does not produce the ISD::ABS that we are looking for.			; FIXME: This does not produce the ISD::ABS that we are looking for.
	▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] fix incorrect vectorization of abs() on POWER9
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143437

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/trunk/lib/Target/PowerPC/PPCInstrAltivec.td

llvm/trunk/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] fix incorrect vectorization of abs() on POWER9ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143437

llvm/trunk/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/trunk/lib/Target/PowerPC/PPCInstrAltivec.td

llvm/trunk/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll

[PowerPC] fix incorrect vectorization of abs() on POWER9
ClosedPublic