This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize a splat of negate in isFNEG
ClosedPublic

Authored by eraman on Jun 21 2018, 5:54 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
spatel

Commits

rG10fd92dd9413: [X86] Recognize a splat of negate in isFNEG
rL339043: [X86] Recognize a splat of negate in isFNEG

Summary

Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB) instructions in more cases. For example, the following sequence

a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this change.

Diff Detail

Repository: rL LLVM

Event Timeline

eraman created this revision.Jun 21 2018, 5:54 PM

Harbormaster completed remote builds in B19609: Diff 152418.Jun 21 2018, 5:54 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 21 2018, 5:54 PM

craig.topper added reviewers: RKSimon, spatel.Jun 21 2018, 6:25 PM

RKSimon mentioned this in rL335342: [X86] Regenerate tests to include fma comments.Jun 22 2018, 5:46 AM

I've updated avx2-fma-fneg-combine.ll to get rid of those comment changes please can you rebase?

Rebase after r335342

Harbormaster completed remote builds in B19623: Diff 152475.Jun 22 2018, 7:26 AM

RKSimon added inline comments.Jun 22 2018, 8:21 AM

test/CodeGen/X86/avx2-fma-fneg-combine.ll
131 ↗	(On Diff #152475)	Sorry I should have asked you to commit this test to trunk as well so this patch shows the codegen diff

Does this patch deal with the case if we negate the scalar before doing the splat?

define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {

%t0 = fsub float -0.0, %a
%t1 = insertelement <8 x float> undef, float %t0, i32 0
%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer
%t3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %t3

}

That transform was suggested as a canonicalization in:
https://bugs.llvm.org/show_bug.cgi?id=37463

No, this doesn't deal with this. Should I start from the negate and push it
down if it is transitively used by fma?

Rebase after adding the test case at r335367

Harbormaster completed remote builds in B19631: Diff 152511.Jun 22 2018, 10:38 AM

In D48467#1140891, @eraman wrote:

No, this doesn't deal with this. Should I start from the negate and push it
down if it is transitively used by fma?

That seems backwards. Why don't we start from the fma and pattern match an fneg of 1 of its operands (looking through a splat as needed)?

Looks like there's already logic in place for this in X86's combineFMA() with a possibly related FIXME comment?

RKSimon added inline comments.Jun 22 2018, 11:45 AM

lib/Target/X86/X86ISelLowering.cpp
31185 ↗	(On Diff #152511)	Please move the isFNEG as a NFC commit
31210 ↗	(On Diff #152511)	Can this be written as a llvm::any_of pattern?
31214 ↗	(On Diff #152511)	Early out if !SVOp \|\| !SVOp->isSplat()
31215 ↗	(On Diff #152511)	Don't use auto for non-obvious (casts etc.) cases.
31218 ↗	(On Diff #152511)	Its unlikely that Op1 isn't undef, but if you have cases of this you could handle both ops but testing SVOp->getSplatIndex()

I am going to take the approach suggested by Sanjay and expand isFNEG to
handle splat of a negated scalar. Then, the current combineFMA should take
care of the rest. I got the isFNEG to work with shuffle (and work on the
current test case) but haven't yet handled the insertelement case. I am out
traveling for the next ten days and will send a revised patch after I am
back.

Implement this by expanding the patterns generated by isFNEG

Harbormaster completed remote builds in B20012: Diff 154035.Jul 3 2018, 6:16 PM

eraman retitled this revision from [X86] Recognize an fnma in the presence of an intervening shuffle. to [X86] Recognize a splat of negate in isFNEG.Jul 3 2018, 6:17 PM

eraman edited the summary of this revision. (Show Details)

RKSimon added inline comments.Jul 4 2018, 5:37 AM

test/CodeGen/X86/avx2-fma-fneg-combine.ll
139 ↗	(On Diff #154035)	Please can you commit this test to trunk so the patch shows the codegen diff?

Rebase after r336404

Harbormaster completed remote builds in B20084: Diff 154342.Jul 5 2018, 6:00 PM

eraman marked an inline comment as done.Jul 9 2018, 4:16 PM

eraman added inline comments.

test/CodeGen/X86/avx2-fma-fneg-combine.ll
139 ↗	(On Diff #154035)	Forgot to respond here, but I have committed the test and rebased the patch after the commit.

RKSimon added inline comments.Jul 10 2018, 3:41 AM

lib/Target/X86/X86ISelLowering.cpp
36774 ↗	(On Diff #154342)	if (auto *SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode()))
36849 ↗	(On Diff #154342)	if (Opc == ISD::FSUB) std::swap(Op0, Op1); return Negate(Op0, Op1);
36856 ↗	(On Diff #154342)	Now that we're creating new nodes inside isFNEG() I wonder if we should try to avoid repetition of isFNEG() in combineXor etc. - Maybe split combineFneg into combineFneg and combineFnegPatterns which take the isFNEG result?

eraman marked 3 inline comments as done.Jul 11 2018, 12:17 PM

eraman added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36856 ↗	(On Diff #154342)	Instead of splitting combineFneg, I have removed the assert below and returned early if isFNEG returns an empty node.

Address review comments.

Harbormaster completed remote builds in B20274: Diff 155045.Jul 11 2018, 12:18 PM

RKSimon added inline comments.Jul 16 2018, 9:25 AM

lib/Target/X86/X86ISelLowering.cpp
36829 ↗	(On Diff #155045)	auto *BV = dyn_cast<BuildVectorSDNode>(Op1)
36834 ↗	(On Diff #155045)	Constant *C

eraman marked 2 inline comments as done.Jul 16 2018, 4:02 PM

eraman added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36829 ↗	(On Diff #155045)	I have also changed the following line to if (auto *CN = BV->getConstantFPSplatNode())

Address Simon's comments.

Harbormaster completed remote builds in B20426: Diff 155779.Jul 16 2018, 4:03 PM

RKSimon added inline comments.Jul 17 2018, 2:57 AM

lib/Target/X86/X86ISelLowering.cpp
36852 ↗	(On Diff #155779)	Are there any circumstances that this isn't a ConstantFP? getTargetConstantFromNode peeks through bitcasts so don't you need to use dyn_cast_or_null?
36863 ↗	(On Diff #155779)	dyn_cast_or_null?

Change cast_or_null to dyn_cast_or_null

lib/Target/X86/X86ISelLowering.cpp
36852 ↗	(On Diff #155779)	I have changed it to dyn_cast_or_null, but thinking about it I don't think that is needed. First, the current code reads if (Op1.getOpcode() == X86ISD::VBROADCAST) { if (auto *C = getTargetConstantFromNode(Op1.getOperand(0))) if (isSignMask(cast<ConstantFP>(C))) The x86 vbroadcast instruction broadcasts floating point values, so I think the cast<ConstantFP> is right.

RKSimon added inline comments.Jul 18 2018, 7:41 AM

lib/Target/X86/X86ISelLowering.cpp
36852 ↗	(On Diff #155779)	If you look at getTargetConstantFromNode the first thing it does is call peekThroughBitcasts so it can have any type. In fact we should probably be checking that the vector's element size is correct as well. This is the kind of thing a fuzz test finds in 3 months time......

eraman added inline comments.Jul 26 2018, 11:06 AM

lib/Target/X86/X86ISelLowering.cpp
36852 ↗	(On Diff #155779)	I don't get the comment about vector's element size. Which verctor's element size should be checked here?

RKSimon added inline comments.Jul 30 2018, 6:30 AM

lib/Target/X86/X86ISelLowering.cpp
36852 ↗	(On Diff #155779)	Looking at this again, you could simplify a lot of this by using getTargetConstantBitsFromNode instead to extract all the bits for you, avoiding all the BROADCAST/BUILD_VECTOR special cases as getTargetConstantBitsFromNode should do all of that already.

RKSimon mentioned this in rL338358: [X86][SSE] isFNEG - Use getTargetConstantBitsFromNode to handle all constant….Jul 31 2018, 3:13 AM

@eraman Please can you take a look at rL338358 - this shows how to use getTargetConstantBitsFromNode to avoid a lot of extra checks you need from using getTargetConstantFromNode directly

Updates to work with r338358.

Harbormaster completed remote builds in B21009: Diff 158802.Aug 2 2018, 11:08 AM

In D48467#1181907, @RKSimon wrote:

@eraman Please can you take a look at rL338358 - this shows how to use getTargetConstantBitsFromNode to avoid a lot of extra checks you need from using getTargetConstantFromNode directly

I have updated the code to make use of this. I had to extend getTargetConstantBitsFromNode to support build vector of ConstantFPSDNodes. PTAL.

RKSimon added inline comments.Aug 3 2018, 3:36 AM

lib/Target/X86/X86ISelLowering.cpp
36937 ↗	(On Diff #158802)	unsigned - only use auto when the type is very obvious
36965 ↗	(On Diff #158802)	Do we have test coverage for ignoring the undefs?
36970 ↗	(On Diff #158802)	for (unsigned I = 0, E = EltBits.size(); I < E; I++)
36980 ↗	(On Diff #158802)	Why did you create the lambda? Why not just inline Negate?

eraman marked 3 inline comments as done.Aug 3 2018, 10:24 AM

eraman added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36965 ↗	(On Diff #158802)	test7 of avx2-fma-fneg-combine.ll has fsub with all but one elements of the constant being undef.
36980 ↗	(On Diff #158802)	Carryover from an initial version where I thought the lambda made sense.

Address review comments.

Harbormaster completed remote builds in B21046: Diff 159040.Aug 3 2018, 10:25 AM

LGTM

This revision is now accepted and ready to land.Aug 6 2018, 7:10 AM

Closed by commit rL339043: [X86] Recognize a splat of negate in isFNEG (authored by eraman). · Explain WhyAug 6 2018, 12:24 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelLowering.cpp

95 lines

test/

CodeGen/

X86/

avx2-fma-fneg-combine.ll

23 lines

Diff 159352

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,627 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = Op.getNumOperands(); i != e; ++i) {
UndefSrcElts.setBit(i);		UndefSrcElts.setBit(i);
continue;		continue;
}		}
auto *Cst = cast<ConstantSDNode>(Src);		auto *Cst = cast<ConstantSDNode>(Src);
SrcEltBits[i] = Cst->getAPIntValue().zextOrTrunc(SrcEltSizeInBits);		SrcEltBits[i] = Cst->getAPIntValue().zextOrTrunc(SrcEltSizeInBits);
}		}
return CastBitData(UndefSrcElts, SrcEltBits);		return CastBitData(UndefSrcElts, SrcEltBits);
}		}
		if (ISD::isBuildVectorOfConstantFPSDNodes(Op.getNode())) {
		unsigned SrcEltSizeInBits = VT.getScalarSizeInBits();
		unsigned NumSrcElts = SizeInBits / SrcEltSizeInBits;

		APInt UndefSrcElts(NumSrcElts, 0);
		SmallVector<APInt, 64> SrcEltBits(NumSrcElts, APInt(SrcEltSizeInBits, 0));
		for (unsigned i = 0, e = Op.getNumOperands(); i != e; ++i) {
		const SDValue &Src = Op.getOperand(i);
		if (Src.isUndef()) {
		UndefSrcElts.setBit(i);
		continue;
		}
		auto *Cst = cast<ConstantFPSDNode>(Src);
		APInt RawBits = Cst->getValueAPF().bitcastToAPInt();
		SrcEltBits[i] = RawBits.zextOrTrunc(SrcEltSizeInBits);
		}
		return CastBitData(UndefSrcElts, SrcEltBits);
		}

// Extract constant bits from constant pool vector.		// Extract constant bits from constant pool vector.
if (auto *Cst = getTargetConstantFromNode(Op)) {		if (auto *Cst = getTargetConstantFromNode(Op)) {
Type *CstTy = Cst->getType();		Type *CstTy = Cst->getType();
if (!CstTy->isVectorTy() \|\| (SizeInBits != CstTy->getPrimitiveSizeInBits()))		if (!CstTy->isVectorTy() \|\| (SizeInBits != CstTy->getPrimitiveSizeInBits()))
return false;		return false;

unsigned SrcEltSizeInBits = CstTy->getScalarSizeInBits();		unsigned SrcEltSizeInBits = CstTy->getScalarSizeInBits();
▲ Show 20 Lines • Show All 31,322 Lines • ▼ Show 20 Lines	static SDValue combineTruncate(SDNode *N, SelectionDAG &DAG,
if (SDValue V = combineVectorSignBitsTruncation(N, DL, DAG, Subtarget))		if (SDValue V = combineVectorSignBitsTruncation(N, DL, DAG, Subtarget))
return V;		return V;

return combineVectorTruncation(N, DAG, Subtarget);		return combineVectorTruncation(N, DAG, Subtarget);
}		}

/// Returns the negated value if the node \p N flips sign of FP value.		/// Returns the negated value if the node \p N flips sign of FP value.
///		///
/// FP-negation node may have different forms: FNEG(x) or FXOR (x, 0x80000000).		/// FP-negation node may have different forms: FNEG(x), FXOR (x, 0x80000000)
		/// or FSUB(0, x)
/// AVX512F does not have FXOR, so FNEG is lowered as		/// AVX512F does not have FXOR, so FNEG is lowered as
/// (bitcast (xor (bitcast x), (bitcast ConstantFP(0x80000000)))).		/// (bitcast (xor (bitcast x), (bitcast ConstantFP(0x80000000)))).
/// In this case we go though all bitcasts.		/// In this case we go though all bitcasts.
static SDValue isFNEG(SDNode *N) {		/// This also recognizes splat of a negated value and returns the splat of that
		/// value.
		static SDValue isFNEG(SelectionDAG &DAG, SDNode *N) {
if (N->getOpcode() == ISD::FNEG)		if (N->getOpcode() == ISD::FNEG)
return N->getOperand(0);		return N->getOperand(0);

SDValue Op = peekThroughBitcasts(SDValue(N, 0));		SDValue Op = peekThroughBitcasts(SDValue(N, 0));
if (Op.getOpcode() != X86ISD::FXOR && Op.getOpcode() != ISD::XOR)		auto VT = Op->getValueType(0);
		if (auto SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode())) {
		// For a VECTOR_SHUFFLE(VEC1, VEC2), if the VEC2 is undef, then the negate
		// of this is VECTOR_SHUFFLE(-VEC1, UNDEF). The mask can be anything here.
		if (!SVOp->getOperand(1).isUndef())
		return SDValue();
		if (SDValue NegOp0 = isFNEG(DAG, SVOp->getOperand(0).getNode()))
		return DAG.getVectorShuffle(VT, SDLoc(SVOp), NegOp0, DAG.getUNDEF(VT),
		SVOp->getMask());
		return SDValue();
		}
		unsigned Opc = Op.getOpcode();
		if (Opc == ISD::INSERT_VECTOR_ELT) {
		// Negate of INSERT_VECTOR_ELT(UNDEF, V, INDEX) is INSERT_VECTOR_ELT(UNDEF,
		// -V, INDEX).
		SDValue InsVector = Op.getOperand(0);
		SDValue InsVal = Op.getOperand(1);
		if (!InsVector.isUndef())
		return SDValue();
		if (SDValue NegInsVal = isFNEG(DAG, InsVal.getNode()))
		return DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(Op), VT, InsVector,
		NegInsVal, Op.getOperand(2));
		return SDValue();
		}

		if (Opc != X86ISD::FXOR && Opc != ISD::XOR && Opc != ISD::FSUB)
return SDValue();		return SDValue();

SDValue Op1 = peekThroughBitcasts(Op.getOperand(1));		SDValue Op1 = peekThroughBitcasts(Op.getOperand(1));
if (!Op1.getValueType().isFloatingPoint())		if (!Op1.getValueType().isFloatingPoint())
return SDValue();		return SDValue();

// Extract constant bits and see if they are all sign bit masks.		SDValue Op0 = peekThroughBitcasts(Op.getOperand(0));

		// For XOR and FXOR, we want to check if constant bits of Op1 are sign bit
		// masks. For FSUB, we have to check if constant bits of Op0 are sign bit
		// masks and hence we swap the operands.
		if (Opc == ISD::FSUB)
		std::swap(Op0, Op1);

APInt UndefElts;		APInt UndefElts;
SmallVector<APInt, 16> EltBits;		SmallVector<APInt, 16> EltBits;
		// Extract constant bits and see if they are all sign bit masks. Ignore the
		// undef elements.
if (getTargetConstantBitsFromNode(Op1, Op1.getScalarValueSizeInBits(),		if (getTargetConstantBitsFromNode(Op1, Op1.getScalarValueSizeInBits(),
UndefElts, EltBits, false, false))		UndefElts, EltBits,
if (llvm::all_of(EltBits, [](APInt &I) { return I.isSignMask(); }))		/* AllowWholeUndefs */ true,
return peekThroughBitcasts(Op.getOperand(0));		/* AllowPartialUndefs */ false)) {
		for (unsigned I = 0, E = EltBits.size(); I < E; I++)
		if (!UndefElts[I] && !EltBits[I].isSignMask())
		return SDValue();

		return peekThroughBitcasts(Op0);
		}

return SDValue();		return SDValue();
}		}

/// Do target-specific dag combines on floating point negations.		/// Do target-specific dag combines on floating point negations.
static SDValue combineFneg(SDNode *N, SelectionDAG &DAG,		static SDValue combineFneg(SDNode *N, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
EVT OrigVT = N->getValueType(0);		EVT OrigVT = N->getValueType(0);
SDValue Arg = isFNEG(N);		SDValue Arg = isFNEG(DAG, N);
assert(Arg.getNode() && "N is expected to be an FNEG node");		if (!Arg)
		return SDValue();

EVT VT = Arg.getValueType();		EVT VT = Arg.getValueType();
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
SDLoc DL(N);		SDLoc DL(N);

// Let legalize expand this if it isn't a legal type yet.		// Let legalize expand this if it isn't a legal type yet.
if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))		if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	if (SDValue SetCC = foldXor1SetCC(N, DAG))
return SetCC;		return SetCC;

if (SDValue RV = foldXorTruncShiftIntoCmp(N, DAG))		if (SDValue RV = foldXorTruncShiftIntoCmp(N, DAG))
return RV;		return RV;

if (SDValue FPLogic = convertIntLogicToFPLogic(N, DAG, Subtarget))		if (SDValue FPLogic = convertIntLogicToFPLogic(N, DAG, Subtarget))
return FPLogic;		return FPLogic;

if (isFNEG(N))
return combineFneg(N, DAG, Subtarget);		return combineFneg(N, DAG, Subtarget);
return SDValue();
}		}

static SDValue combineBEXTR(SDNode *N, SelectionDAG &DAG,		static SDValue combineBEXTR(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,		TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
SDValue Op0 = N->getOperand(0);		SDValue Op0 = N->getOperand(0);
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	static SDValue combineFOr(SDNode *N, SelectionDAG &DAG,
// F[X]OR(0.0, x) -> x		// F[X]OR(0.0, x) -> x
if (isNullFPScalarOrVectorConst(N->getOperand(0)))		if (isNullFPScalarOrVectorConst(N->getOperand(0)))
return N->getOperand(1);		return N->getOperand(1);

// F[X]OR(x, 0.0) -> x		// F[X]OR(x, 0.0) -> x
if (isNullFPScalarOrVectorConst(N->getOperand(1)))		if (isNullFPScalarOrVectorConst(N->getOperand(1)))
return N->getOperand(0);		return N->getOperand(0);

if (isFNEG(N))
if (SDValue NewVal = combineFneg(N, DAG, Subtarget))		if (SDValue NewVal = combineFneg(N, DAG, Subtarget))
return NewVal;		return NewVal;

return lowerX86FPLogicOp(N, DAG, Subtarget);		return lowerX86FPLogicOp(N, DAG, Subtarget);
}		}

/// Do target-specific dag combines on X86ISD::FMIN and X86ISD::FMAX nodes.		/// Do target-specific dag combines on X86ISD::FMIN and X86ISD::FMAX nodes.
static SDValue combineFMinFMax(SDNode *N, SelectionDAG &DAG) {		static SDValue combineFMinFMax(SDNode *N, SelectionDAG &DAG) {
assert(N->getOpcode() == X86ISD::FMIN \|\| N->getOpcode() == X86ISD::FMAX);		assert(N->getOpcode() == X86ISD::FMIN \|\| N->getOpcode() == X86ISD::FMAX);

▲ Show 20 Lines • Show All 668 Lines • ▼ Show 20 Lines	static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,
if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) \|\| !Subtarget.hasAnyFMA())		if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) \|\| !Subtarget.hasAnyFMA())
return SDValue();		return SDValue();

SDValue A = N->getOperand(0);		SDValue A = N->getOperand(0);
SDValue B = N->getOperand(1);		SDValue B = N->getOperand(1);
SDValue C = N->getOperand(2);		SDValue C = N->getOperand(2);

auto invertIfNegative = [&DAG](SDValue &V) {		auto invertIfNegative = [&DAG](SDValue &V) {
if (SDValue NegVal = isFNEG(V.getNode())) {		if (SDValue NegVal = isFNEG(DAG, V.getNode())) {
V = DAG.getBitcast(V.getValueType(), NegVal);		V = DAG.getBitcast(V.getValueType(), NegVal);
return true;		return true;
}		}
// Look through extract_vector_elts. If it comes from an FNEG, create a		// Look through extract_vector_elts. If it comes from an FNEG, create a
// new extract from the FNEG input.		// new extract from the FNEG input.
if (V.getOpcode() == ISD::EXTRACT_VECTOR_ELT &&		if (V.getOpcode() == ISD::EXTRACT_VECTOR_ELT &&
isNullConstant(V.getOperand(1))) {		isNullConstant(V.getOperand(1))) {
if (SDValue NegVal = isFNEG(V.getOperand(0).getNode())) {		if (SDValue NegVal = isFNEG(DAG, V.getOperand(0).getNode())) {
NegVal = DAG.getBitcast(V.getOperand(0).getValueType(), NegVal);		NegVal = DAG.getBitcast(V.getOperand(0).getValueType(), NegVal);
V = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(V), V.getValueType(),		V = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(V), V.getValueType(),
NegVal, V.getOperand(1));		NegVal, V.getOperand(1));
return true;		return true;
}		}
}		}

return false;		return false;
Show All 16 Lines
}		}

// Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)		// Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)
static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,		static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

SDValue NegVal = isFNEG(N->getOperand(2).getNode());		SDValue NegVal = isFNEG(DAG, N->getOperand(2).getNode());
if (!NegVal)		if (!NegVal)
return SDValue();		return SDValue();

unsigned NewOpcode;		unsigned NewOpcode;
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default: llvm_unreachable("Unexpected opcode!");		default: llvm_unreachable("Unexpected opcode!");
case X86ISD::FMADDSUB: NewOpcode = X86ISD::FMSUBADD; break;		case X86ISD::FMADDSUB: NewOpcode = X86ISD::FMSUBADD; break;
case X86ISD::FMADDSUB_RND: NewOpcode = X86ISD::FMSUBADD_RND; break;		case X86ISD::FMADDSUB_RND: NewOpcode = X86ISD::FMSUBADD_RND; break;
▲ Show 20 Lines • Show All 2,873 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx2-fma-fneg-combine.ll

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	entry:
ret <2 x double> %sub.i		ret <2 x double> %sub.i
}		}

declare <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double> %a, <2 x double> %b, <2 x double> %c)		declare <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double> %a, <2 x double> %b, <2 x double> %c)

define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {		define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {
; X32-LABEL: test7:		; X32-LABEL: test7:
; X32: # %bb.0: # %entry		; X32: # %bb.0: # %entry
; X32-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2
; X32-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero		; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1
; X32-NEXT: vsubps %ymm2, %ymm3, %ymm2
; X32-NEXT: vbroadcastss %xmm2, %ymm2
; X32-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm2 ymm0) + ymm1
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test7:		; X64-LABEL: test7:
; X64: # %bb.0: # %entry		; X64: # %bb.0: # %entry
; X64-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; X64-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
; X64-NEXT: vsubps %ymm0, %ymm3, %ymm0
; X64-NEXT: vbroadcastss %xmm0, %ymm0		; X64-NEXT: vbroadcastss %xmm0, %ymm0
; X64-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2		; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
; X64-NEXT: retq		; X64-NEXT: retq
entry:		entry:
%0 = insertelement <8 x float> undef, float %a, i32 0		%0 = insertelement <8 x float> undef, float %a, i32 0
%1 = fsub <8 x float> <float -0.000000e+00, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, %0		%1 = fsub <8 x float> <float -0.000000e+00, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, %0
%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer		%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer
%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)		%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %3		ret <8 x float> %3

}		}

define <8 x float> @test8(float %a, <8 x float> %b, <8 x float> %c) {		define <8 x float> @test8(float %a, <8 x float> %b, <8 x float> %c) {
; X32-LABEL: test8:		; X32-LABEL: test8:
; X32: # %bb.0: # %entry		; X32: # %bb.0: # %entry
; X32-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2
; X32-NEXT: vbroadcastss {{.*#+}} xmm3 = [-0,-0,-0,-0]		; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1
; X32-NEXT: vxorps %xmm3, %xmm2, %xmm2
; X32-NEXT: vbroadcastss %xmm2, %ymm2
; X32-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm2 ymm0) + ymm1
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test8:		; X64-LABEL: test8:
; X64: # %bb.0: # %entry		; X64: # %bb.0: # %entry
; X64-NEXT: vbroadcastss {{.*#+}} xmm3 = [-0,-0,-0,-0]
; X64-NEXT: vxorps %xmm3, %xmm0, %xmm0
; X64-NEXT: vbroadcastss %xmm0, %ymm0		; X64-NEXT: vbroadcastss %xmm0, %ymm0
; X64-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2		; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
; X64-NEXT: retq		; X64-NEXT: retq
entry:		entry:
%0 = fsub float -0.0, %a		%0 = fsub float -0.0, %a
%1 = insertelement <8 x float> undef, float %0, i32 0		%1 = insertelement <8 x float> undef, float %0, i32 0
%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer		%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer
%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)		%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %3		ret <8 x float> %3
}		}

declare <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %c)		declare <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %c)

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize a splat of negate in isFNEGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 159352

llvm/trunk/lib/Target/X86/X86ISelLowering.cpp

llvm/trunk/test/CodeGen/X86/avx2-fma-fneg-combine.ll

[X86] Recognize a splat of negate in isFNEG
ClosedPublic