This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize a splat of negate in isFNEG
ClosedPublic

Authored by eraman on Jun 21 2018, 5:54 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
spatel

Commits

rG10fd92dd9413: [X86] Recognize a splat of negate in isFNEG
rL339043: [X86] Recognize a splat of negate in isFNEG

Summary

Expand isFNEG so that we generate the appropriate F(N)M(ADD|SUB) instructions in more cases. For example, the following sequence

a = _mm256_broadcast_ss(f)
d = _mm256_fnmadd_ps(a, b, c)

generates an fsub and fma without this patch and an fnma with this change.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 20012
Build 20012: arc lint + arc unit

Event Timeline

eraman created this revision.Jun 21 2018, 5:54 PM

Harbormaster completed remote builds in B19609: Diff 152418.Jun 21 2018, 5:54 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 21 2018, 5:54 PM

craig.topper added reviewers: RKSimon, spatel.Jun 21 2018, 6:25 PM

RKSimon mentioned this in rL335342: [X86] Regenerate tests to include fma comments.Jun 22 2018, 5:46 AM

I've updated avx2-fma-fneg-combine.ll to get rid of those comment changes please can you rebase?

Rebase after r335342

Harbormaster completed remote builds in B19623: Diff 152475.Jun 22 2018, 7:26 AM

RKSimon added inline comments.Jun 22 2018, 8:21 AM

test/CodeGen/X86/avx2-fma-fneg-combine.ll
131	Sorry I should have asked you to commit this test to trunk as well so this patch shows the codegen diff

Does this patch deal with the case if we negate the scalar before doing the splat?

define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {

%t0 = fsub float -0.0, %a
%t1 = insertelement <8 x float> undef, float %t0, i32 0
%t2 = shufflevector <8 x float> %t1, <8 x float> undef, <8 x i32> zeroinitializer
%t3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %t3

}

That transform was suggested as a canonicalization in:
https://bugs.llvm.org/show_bug.cgi?id=37463

No, this doesn't deal with this. Should I start from the negate and push it
down if it is transitively used by fma?

Rebase after adding the test case at r335367

Harbormaster completed remote builds in B19631: Diff 152511.Jun 22 2018, 10:38 AM

In D48467#1140891, @eraman wrote:

No, this doesn't deal with this. Should I start from the negate and push it
down if it is transitively used by fma?

That seems backwards. Why don't we start from the fma and pattern match an fneg of 1 of its operands (looking through a splat as needed)?

Looks like there's already logic in place for this in X86's combineFMA() with a possibly related FIXME comment?

RKSimon added inline comments.Jun 22 2018, 11:45 AM

lib/Target/X86/X86ISelLowering.cpp
31185	Please move the isFNEG as a NFC commit
31210	Can this be written as a llvm::any_of pattern?
31214	Early out if !SVOp \|\| !SVOp->isSplat()
31215	Don't use auto for non-obvious (casts etc.) cases.
31218	Its unlikely that Op1 isn't undef, but if you have cases of this you could handle both ops but testing SVOp->getSplatIndex()

I am going to take the approach suggested by Sanjay and expand isFNEG to
handle splat of a negated scalar. Then, the current combineFMA should take
care of the rest. I got the isFNEG to work with shuffle (and work on the
current test case) but haven't yet handled the insertelement case. I am out
traveling for the next ten days and will send a revised patch after I am
back.

Implement this by expanding the patterns generated by isFNEG

Harbormaster completed remote builds in B20012: Diff 154035.Jul 3 2018, 6:16 PM

eraman retitled this revision from [X86] Recognize an fnma in the presence of an intervening shuffle. to [X86] Recognize a splat of negate in isFNEG.Jul 3 2018, 6:17 PM

eraman edited the summary of this revision. (Show Details)

RKSimon added inline comments.Jul 4 2018, 5:37 AM

test/CodeGen/X86/avx2-fma-fneg-combine.ll
139	Please can you commit this test to trunk so the patch shows the codegen diff?

Rebase after r336404

Harbormaster completed remote builds in B20084: Diff 154342.Jul 5 2018, 6:00 PM

eraman marked an inline comment as done.Jul 9 2018, 4:16 PM

eraman added inline comments.

test/CodeGen/X86/avx2-fma-fneg-combine.ll
139	Forgot to respond here, but I have committed the test and rebased the patch after the commit.

RKSimon added inline comments.Jul 10 2018, 3:41 AM

lib/Target/X86/X86ISelLowering.cpp
36774	if (auto *SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode()))
36849	if (Opc == ISD::FSUB) std::swap(Op0, Op1); return Negate(Op0, Op1);
36856	Now that we're creating new nodes inside isFNEG() I wonder if we should try to avoid repetition of isFNEG() in combineXor etc. - Maybe split combineFneg into combineFneg and combineFnegPatterns which take the isFNEG result?

eraman marked 3 inline comments as done.Jul 11 2018, 12:17 PM

eraman added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36856	Instead of splitting combineFneg, I have removed the assert below and returned early if isFNEG returns an empty node.

Address review comments.

Harbormaster completed remote builds in B20274: Diff 155045.Jul 11 2018, 12:18 PM

RKSimon added inline comments.Jul 16 2018, 9:25 AM

lib/Target/X86/X86ISelLowering.cpp
36829	auto *BV = dyn_cast<BuildVectorSDNode>(Op1)
36834	Constant *C

eraman marked 2 inline comments as done.Jul 16 2018, 4:02 PM

eraman added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36829	I have also changed the following line to if (auto *CN = BV->getConstantFPSplatNode())

Address Simon's comments.

Harbormaster completed remote builds in B20426: Diff 155779.Jul 16 2018, 4:03 PM

RKSimon added inline comments.Jul 17 2018, 2:57 AM

lib/Target/X86/X86ISelLowering.cpp
36852	Are there any circumstances that this isn't a ConstantFP? getTargetConstantFromNode peeks through bitcasts so don't you need to use dyn_cast_or_null?
36863	dyn_cast_or_null?

Change cast_or_null to dyn_cast_or_null

lib/Target/X86/X86ISelLowering.cpp
36852	I have changed it to dyn_cast_or_null, but thinking about it I don't think that is needed. First, the current code reads if (Op1.getOpcode() == X86ISD::VBROADCAST) { if (auto *C = getTargetConstantFromNode(Op1.getOperand(0))) if (isSignMask(cast<ConstantFP>(C))) The x86 vbroadcast instruction broadcasts floating point values, so I think the cast<ConstantFP> is right.

RKSimon added inline comments.Jul 18 2018, 7:41 AM

lib/Target/X86/X86ISelLowering.cpp
36852	If you look at getTargetConstantFromNode the first thing it does is call peekThroughBitcasts so it can have any type. In fact we should probably be checking that the vector's element size is correct as well. This is the kind of thing a fuzz test finds in 3 months time......

eraman added inline comments.Jul 26 2018, 11:06 AM

lib/Target/X86/X86ISelLowering.cpp
36852	I don't get the comment about vector's element size. Which verctor's element size should be checked here?

RKSimon added inline comments.Jul 30 2018, 6:30 AM

lib/Target/X86/X86ISelLowering.cpp
36852	Looking at this again, you could simplify a lot of this by using getTargetConstantBitsFromNode instead to extract all the bits for you, avoiding all the BROADCAST/BUILD_VECTOR special cases as getTargetConstantBitsFromNode should do all of that already.

RKSimon mentioned this in rL338358: [X86][SSE] isFNEG - Use getTargetConstantBitsFromNode to handle all constant….Jul 31 2018, 3:13 AM

@eraman Please can you take a look at rL338358 - this shows how to use getTargetConstantBitsFromNode to avoid a lot of extra checks you need from using getTargetConstantFromNode directly

Updates to work with r338358.

Harbormaster completed remote builds in B21009: Diff 158802.Aug 2 2018, 11:08 AM

In D48467#1181907, @RKSimon wrote:

@eraman Please can you take a look at rL338358 - this shows how to use getTargetConstantBitsFromNode to avoid a lot of extra checks you need from using getTargetConstantFromNode directly

I have updated the code to make use of this. I had to extend getTargetConstantBitsFromNode to support build vector of ConstantFPSDNodes. PTAL.

RKSimon added inline comments.Aug 3 2018, 3:36 AM

lib/Target/X86/X86ISelLowering.cpp
36937	unsigned - only use auto when the type is very obvious
36965	Do we have test coverage for ignoring the undefs?
36970	for (unsigned I = 0, E = EltBits.size(); I < E; I++)
36980	Why did you create the lambda? Why not just inline Negate?

eraman marked 3 inline comments as done.Aug 3 2018, 10:24 AM

eraman added inline comments.

lib/Target/X86/X86ISelLowering.cpp
36965	test7 of avx2-fma-fneg-combine.ll has fsub with all but one elements of the constant being undef.
36980	Carryover from an initial version where I thought the lambda made sense.

Address review comments.

Harbormaster completed remote builds in B21046: Diff 159040.Aug 3 2018, 10:25 AM

LGTM

This revision is now accepted and ready to land.Aug 6 2018, 7:10 AM

Closed by commit rL339043: [X86] Recognize a splat of negate in isFNEG (authored by eraman). · Explain WhyAug 6 2018, 12:24 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

X86/

X86ISelLowering.cpp

93 lines

test/

CodeGen/

X86/

avx2-fma-fneg-combine.ll

32 lines

Diff 154035

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 27,245 Lines • ▼ Show 20 Lines
	break;			break;
	}			}

	unsigned SVTNumElts = SVT.getVectorNumElements();			unsigned SVTNumElts = SVT.getVectorNumElements();
	ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(N);			ShuffleVectorSDNode *SVOp = cast<ShuffleVectorSDNode>(N);
	for (unsigned i = 0, e = SVTNumElts; i != e && CanFold; ++i)			for (unsigned i = 0, e = SVTNumElts; i != e && CanFold; ++i)
	CanFold = SVOp->getMaskElt(i) == (int)(i * 2);			CanFold = SVOp->getMaskElt(i) == (int)(i * 2);
	for (unsigned i = SVTNumElts, e = NumElts; i != e && CanFold; ++i)			for (unsigned i = SVTNumElts, e = NumElts; i != e && CanFold; ++i)
	CanFold = SVOp->getMaskElt(i) < 0;			CanFold = SVOp->getMaskElt(i) < 0;
				RKSimonUnsubmitted Not Done Reply Inline Actions Please move the isFNEG as a NFC commit RKSimon: Please move the isFNEG as a NFC commit

	if (CanFold) {			if (CanFold) {
	SDValue BC00 = DAG.getBitcast(VT, BC0.getOperand(0));			SDValue BC00 = DAG.getBitcast(VT, BC0.getOperand(0));
	SDValue BC01 = DAG.getBitcast(VT, BC0.getOperand(1));			SDValue BC01 = DAG.getBitcast(VT, BC0.getOperand(1));
	SDValue NewBinOp = DAG.getNode(BC0.getOpcode(), dl, VT, BC00, BC01);			SDValue NewBinOp = DAG.getNode(BC0.getOpcode(), dl, VT, BC00, BC01);
	return DAG.getVectorShuffle(VT, dl, NewBinOp, N1, SVOp->getMask());			return DAG.getVectorShuffle(VT, dl, NewBinOp, N1, SVOp->getMask());
	}			}
	}			}
	}			}

	// Combine a vector_shuffle that is equal to build_vector load1, load2, load3,			// Combine a vector_shuffle that is equal to build_vector load1, load2, load3,
	// load4, <0, 1, 2, 3> into a 128-bit load if the load addresses are			// load4, <0, 1, 2, 3> into a 128-bit load if the load addresses are
	// consecutive, non-overlapping, and in the right order.			// consecutive, non-overlapping, and in the right order.
	SmallVector<SDValue, 16> Elts;			SmallVector<SDValue, 16> Elts;
	for (unsigned i = 0, e = VT.getVectorNumElements(); i != e; ++i) {			for (unsigned i = 0, e = VT.getVectorNumElements(); i != e; ++i) {
	if (SDValue Elt = getShuffleScalarElt(N, i, DAG, 0)) {			if (SDValue Elt = getShuffleScalarElt(N, i, DAG, 0)) {
	Elts.push_back(Elt);			Elts.push_back(Elt);
	continue;			continue;
	}			}
	Elts.clear();			Elts.clear();
	break;			break;
	}			}

	if (Elts.size() == VT.getVectorNumElements())			if (Elts.size() == VT.getVectorNumElements())
	if (SDValue LD =			if (SDValue LD =
				RKSimonUnsubmitted Not Done Reply Inline Actions Can this be written as a llvm::any_of pattern? RKSimon: Can this be written as a llvm::any_of pattern?
	EltsFromConsecutiveLoads(VT, Elts, dl, DAG, Subtarget, true))			EltsFromConsecutiveLoads(VT, Elts, dl, DAG, Subtarget, true))
	return LD;			return LD;

	// For AVX2, we sometimes want to combine			// For AVX2, we sometimes want to combine
				RKSimonUnsubmitted Not Done Reply Inline Actions Early out if !SVOp \|\| !SVOp->isSplat() RKSimon: Early out if !SVOp \|\| !SVOp->isSplat()
	// (vector_shuffle <mask> (concat_vectors t1, undef)			// (vector_shuffle <mask> (concat_vectors t1, undef)
				RKSimonUnsubmitted Not Done Reply Inline Actions Don't use auto for non-obvious (casts etc.) cases. RKSimon: Don't use auto for non-obvious (casts etc.) cases.
	// (concat_vectors t2, undef))			// (concat_vectors t2, undef))
	// Into:			// Into:
	// (vector_shuffle <mask> (concat_vectors t1, t2), undef)			// (vector_shuffle <mask> (concat_vectors t1, t2), undef)
				RKSimonUnsubmitted Not Done Reply Inline Actions Its unlikely that Op1 isn't undef, but if you have cases of this you could handle both ops but testing SVOp->getSplatIndex() RKSimon: Its unlikely that Op1 isn't undef, but if you have cases of this you could handle both ops but…
	// Since the latter can be efficiently lowered with VPERMD/VPERMQ			// Since the latter can be efficiently lowered with VPERMD/VPERMQ
	if (SDValue ShufConcat = combineShuffleOfConcatUndef(N, DAG, Subtarget))			if (SDValue ShufConcat = combineShuffleOfConcatUndef(N, DAG, Subtarget))
	return ShufConcat;			return ShufConcat;

	if (isTargetShuffle(N->getOpcode())) {			if (isTargetShuffle(N->getOpcode())) {
	SDValue Op(N, 0);			SDValue Op(N, 0);
	if (SDValue Shuffle = combineTargetShuffle(Op, DAG, DCI, Subtarget))			if (SDValue Shuffle = combineTargetShuffle(Op, DAG, DCI, Subtarget))
	return Shuffle;			return Shuffle;
	▲ Show 20 Lines • Show All 5,464 Lines • ▼ Show 20 Lines
	if (SDValue V = combineVectorSignBitsTruncation(N, DL, DAG, Subtarget))			if (SDValue V = combineVectorSignBitsTruncation(N, DL, DAG, Subtarget))
	return V;			return V;

	return combineVectorTruncation(N, DAG, Subtarget);			return combineVectorTruncation(N, DAG, Subtarget);
	}			}

	/// Returns the negated value if the node \p N flips sign of FP value.			/// Returns the negated value if the node \p N flips sign of FP value.
	///			///
	/// FP-negation node may have different forms: FNEG(x) or FXOR (x, 0x80000000).			/// FP-negation node may have different forms: FNEG(x), FXOR (x, 0x80000000)
				/// or FSUB(0, x)
	/// AVX512F does not have FXOR, so FNEG is lowered as			/// AVX512F does not have FXOR, so FNEG is lowered as
	/// (bitcast (xor (bitcast x), (bitcast ConstantFP(0x80000000)))).			/// (bitcast (xor (bitcast x), (bitcast ConstantFP(0x80000000)))).
	/// In this case we go though all bitcasts.			/// In this case we go though all bitcasts.
	static SDValue isFNEG(SDNode *N) {			/// This also recognizes splat of a negated value and returns the splat of that
				/// value.
				static SDValue isFNEG(SelectionDAG &DAG, SDNode *N) {
	if (N->getOpcode() == ISD::FNEG)			if (N->getOpcode() == ISD::FNEG)
	return N->getOperand(0);			return N->getOperand(0);

	SDValue Op = peekThroughBitcasts(SDValue(N, 0));			SDValue Op = peekThroughBitcasts(SDValue(N, 0));
	if (Op.getOpcode() != X86ISD::FXOR && Op.getOpcode() != ISD::XOR)			auto VT = Op->getValueType(0);
				ShuffleVectorSDNode *SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode());
				if (SVOp) {
				// For a VECTOR_SHUFFLE(VEC1, VEC2), if the VEC2 is undef, then the negate
				// of this is VECTOR_SHUFFLE(-VEC1, UNDEF). The mask can be anything here.
				if (!SVOp->getOperand(1).isUndef())
				return SDValue();
				if (SDValue NegOp0 = isFNEG(DAG, SVOp->getOperand(0).getNode()))
				return DAG.getVectorShuffle(VT, SDLoc(SVOp), NegOp0, DAG.getUNDEF(VT),
				SVOp->getMask());
				return SDValue();
				}
				auto Opc = Op.getOpcode();
				if (Opc == ISD::INSERT_VECTOR_ELT) {
				// Negate of INSERT_VECTOR_ELT(UNDEF, V, INDEX) is INSERT_VECTOR_ELT(UNDEF,
				// -V, INDEX).
				SDValue InsVector = Op.getOperand(0);
				SDValue InsVal = Op.getOperand(1);
				if (!InsVector.isUndef())
				return SDValue();
				if (SDValue NegInsVal = isFNEG(DAG, InsVal.getNode()))
				return DAG.getNode(ISD::INSERT_VECTOR_ELT, SDLoc(Op), VT, InsVector,
				NegInsVal, Op.getOperand(2));
				return SDValue();
				}

				if (Opc != X86ISD::FXOR && Opc != ISD::XOR && Opc != ISD::FSUB)
	return SDValue();			return SDValue();

	SDValue Op1 = peekThroughBitcasts(Op.getOperand(1));			SDValue Op1 = peekThroughBitcasts(Op.getOperand(1));
	if (!Op1.getValueType().isFloatingPoint())			if (!Op1.getValueType().isFloatingPoint())
	return SDValue();			return SDValue();

	SDValue Op0 = peekThroughBitcasts(Op.getOperand(0));			SDValue Op0 = peekThroughBitcasts(Op.getOperand(0));

	unsigned EltBits = Op1.getScalarValueSizeInBits();			unsigned EltBits = Op1.getScalarValueSizeInBits();
	auto isSignMask = [&](const ConstantFP *C) {			auto isSignMask = [&](const ConstantFP *C) {
	return C->getValueAPF().bitcastToAPInt() == APInt::getSignMask(EltBits);			return C->getValueAPF().bitcastToAPInt() == APInt::getSignMask(EltBits);
	};			};

	// There is more than one way to represent the same constant on			// There is more than one way to represent the same constant on
	// the different X86 targets. The type of the node may also depend on size.			// the different X86 targets. The type of the node may also depend on size.
	// - load scalar value and broadcast			// - load scalar value and broadcast
	// - BUILD_VECTOR node			// - BUILD_VECTOR node
	// - load from a constant pool.			// - load from a constant pool.
	// We check all variants here.			// We check all variants here.
	if (Op1.getOpcode() == X86ISD::VBROADCAST) {			auto IsNeg = [=](const ConstantFP *Val) {
	if (auto *C = getTargetConstantFromNode(Op1.getOperand(0)))			return (isSignMask(Val) && Opc != ISD::FSUB) \|\|
	if (isSignMask(cast<ConstantFP>(C)))			(Val->isZero() && Opc == ISD::FSUB);
	return Op0;			};

	} else if (BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(Op1)) {			auto Negate = [=](SDValue Op0, SDValue Op1) {
	if (ConstantFPSDNode *CN = BV->getConstantFPSplatNode())			if (Op1.getOpcode() == X86ISD::VBROADCAST) {
	if (isSignMask(CN->getConstantFPValue()))			if (auto *C = cast_or_null<ConstantFP>(
	return Op0;			getTargetConstantFromNode(Op1.getOperand(0))))
				if (IsNeg(C))
				return Op0;

	} else if (auto *C = getTargetConstantFromNode(Op1)) {			} else if (BuildVectorSDNode *BV = dyn_cast<BuildVectorSDNode>(Op1)) {
	if (C->getType()->isVectorTy()) {			if (ConstantFPSDNode *CN = BV->getConstantFPSplatNode())
	if (auto *SplatV = C->getSplatValue())			if (IsNeg(CN->getConstantFPValue()))
	if (isSignMask(cast<ConstantFP>(SplatV)))
	return Op0;			return Op0;
	} else if (auto *FPConst = dyn_cast<ConstantFP>(C))
	if (isSignMask(FPConst))			} else if (auto *C = getTargetConstantFromNode(Op1)) {
				RKSimonUnsubmitted Done Reply Inline Actions if (auto SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode())) RKSimon:* ``` if (auto *SVOp = dyn_cast<ShuffleVectorSDNode>(Op.getNode())) ```
	return Op0;			if (C->getType()->isVectorTy()) {
	}			if (auto *SplatV = cast_or_null<ConstantFP>(C->getSplatValue()))
	return SDValue();			if (IsNeg(SplatV))
				return Op0;
				} else if (auto *FPConst = dyn_cast<ConstantFP>(C))
				if (IsNeg(FPConst))
				return Op0;
				}
				return SDValue();
				};
				if (Opc == ISD::FSUB)
				return Negate(Op1, Op0);
				else
				return Negate(Op0, Op1);
	}			}

	/// Do target-specific dag combines on floating point negations.			/// Do target-specific dag combines on floating point negations.
	static SDValue combineFneg(SDNode *N, SelectionDAG &DAG,			static SDValue combineFneg(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	EVT OrigVT = N->getValueType(0);			EVT OrigVT = N->getValueType(0);
	SDValue Arg = isFNEG(N);			SDValue Arg = isFNEG(DAG, N);
	assert(Arg.getNode() && "N is expected to be an FNEG node");			assert(Arg.getNode() && "N is expected to be an FNEG node");

	EVT VT = Arg.getValueType();			EVT VT = Arg.getValueType();
	EVT SVT = VT.getScalarType();			EVT SVT = VT.getScalarType();
	SDLoc DL(N);			SDLoc DL(N);

	// Let legalize expand this if it isn't a legal type yet.			// Let legalize expand this if it isn't a legal type yet.
	if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))			if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
	Show All 17 Lines
	switch (Arg.getOpcode()) {			switch (Arg.getOpcode()) {
	case ISD::FMA: NewOpcode = X86ISD::FNMSUB; break;			case ISD::FMA: NewOpcode = X86ISD::FNMSUB; break;
	case X86ISD::FMSUB: NewOpcode = X86ISD::FNMADD; break;			case X86ISD::FMSUB: NewOpcode = X86ISD::FNMADD; break;
	case X86ISD::FNMADD: NewOpcode = X86ISD::FMSUB; break;			case X86ISD::FNMADD: NewOpcode = X86ISD::FMSUB; break;
	case X86ISD::FNMSUB: NewOpcode = ISD::FMA; break;			case X86ISD::FNMSUB: NewOpcode = ISD::FMA; break;
	case X86ISD::FMADD_RND: NewOpcode = X86ISD::FNMSUB_RND; break;			case X86ISD::FMADD_RND: NewOpcode = X86ISD::FNMSUB_RND; break;
	case X86ISD::FMSUB_RND: NewOpcode = X86ISD::FNMADD_RND; break;			case X86ISD::FMSUB_RND: NewOpcode = X86ISD::FNMADD_RND; break;
	case X86ISD::FNMADD_RND: NewOpcode = X86ISD::FMSUB_RND; break;			case X86ISD::FNMADD_RND: NewOpcode = X86ISD::FMSUB_RND; break;
	case X86ISD::FNMSUB_RND: NewOpcode = X86ISD::FMADD_RND; break;			case X86ISD::FNMSUB_RND: NewOpcode = X86ISD::FMADD_RND; break;
				RKSimonUnsubmitted Done Reply Inline Actions auto BV = dyn_cast<BuildVectorSDNode>(Op1) RKSimon:* auto *BV = dyn_cast<BuildVectorSDNode>(Op1)
				eramanAuthorUnsubmitted Not Done Reply Inline Actions I have also changed the following line to if (auto CN = BV->getConstantFPSplatNode()) eraman:* I have also changed the following line to if (auto *CN = BV->getConstantFPSplatNode())
	// We can't handle scalar intrinsic node here because it would only			// We can't handle scalar intrinsic node here because it would only
	// invert one element and not the whole vector. But we could try to handle			// invert one element and not the whole vector. But we could try to handle
	// a negation of the lower element only.			// a negation of the lower element only.
	}			}
	}			}
				RKSimonUnsubmitted Done Reply Inline Actions Constant C RKSimon:* Constant *C
	if (NewOpcode)			if (NewOpcode)
	return DAG.getBitcast(OrigVT, DAG.getNode(NewOpcode, DL, VT,			return DAG.getBitcast(OrigVT, DAG.getNode(NewOpcode, DL, VT,
	Arg.getNode()->ops()));			Arg.getNode()->ops()));

	return SDValue();			return SDValue();
	}			}

	static SDValue lowerX86FPLogicOp(SDNode *N, SelectionDAG &DAG,			static SDValue lowerX86FPLogicOp(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	MVT VT = N->getSimpleValueType(0);			MVT VT = N->getSimpleValueType(0);
	// If we have integer vector types available, use the integer opcodes.			// If we have integer vector types available, use the integer opcodes.
	if (VT.isVector() && Subtarget.hasSSE2()) {			if (VT.isVector() && Subtarget.hasSSE2()) {
	SDLoc dl(N);			SDLoc dl(N);

	MVT IntVT = MVT::getVectorVT(MVT::i64, VT.getSizeInBits() / 64);			MVT IntVT = MVT::getVectorVT(MVT::i64, VT.getSizeInBits() / 64);
				RKSimonUnsubmitted Done Reply Inline Actions if (Opc == ISD::FSUB) std::swap(Op0, Op1); return Negate(Op0, Op1); RKSimon: ``` if (Opc == ISD::FSUB) std::swap(Op0, Op1); return Negate(Op0, Op1); ```

	SDValue Op0 = DAG.getBitcast(IntVT, N->getOperand(0));			SDValue Op0 = DAG.getBitcast(IntVT, N->getOperand(0));
	SDValue Op1 = DAG.getBitcast(IntVT, N->getOperand(1));			SDValue Op1 = DAG.getBitcast(IntVT, N->getOperand(1));
				RKSimonUnsubmitted Not Done Reply Inline Actions Are there any circumstances that this isn't a ConstantFP? getTargetConstantFromNode peeks through bitcasts so don't you need to use dyn_cast_or_null? RKSimon: Are there any circumstances that this isn't a ConstantFP? getTargetConstantFromNode peeks…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions I have changed it to dyn_cast_or_null, but thinking about it I don't think that is needed. First, the current code reads if (Op1.getOpcode() == X86ISD::VBROADCAST) { if (auto C = getTargetConstantFromNode(Op1.getOperand(0))) if (isSignMask(cast<ConstantFP>(C))) The x86 vbroadcast instruction broadcasts floating point values, so I think the cast<ConstantFP> is right. eraman:* I have changed it to dyn_cast_or_null, but thinking about it I don't think that is needed.
				RKSimonUnsubmitted Not Done Reply Inline Actions If you look at getTargetConstantFromNode the first thing it does is call peekThroughBitcasts so it can have any type. In fact we should probably be checking that the vector's element size is correct as well. This is the kind of thing a fuzz test finds in 3 months time...... RKSimon: If you look at getTargetConstantFromNode the first thing it does is call peekThroughBitcasts so…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions I don't get the comment about vector's element size. Which verctor's element size should be checked here? eraman: I don't get the comment about vector's element size. Which verctor's element size should be…
				RKSimonUnsubmitted Not Done Reply Inline Actions Looking at this again, you could simplify a lot of this by using getTargetConstantBitsFromNode instead to extract all the bits for you, avoiding all the BROADCAST/BUILD_VECTOR special cases as getTargetConstantBitsFromNode should do all of that already. RKSimon: Looking at this again, you could simplify a lot of this by using getTargetConstantBitsFromNode…
	unsigned IntOpcode;			unsigned IntOpcode;
	switch (N->getOpcode()) {			switch (N->getOpcode()) {
	default: llvm_unreachable("Unexpected FP logic op");			default: llvm_unreachable("Unexpected FP logic op");
	case X86ISD::FOR: IntOpcode = ISD::OR; break;			case X86ISD::FOR: IntOpcode = ISD::OR; break;
				RKSimonUnsubmitted Not Done Reply Inline Actions Now that we're creating new nodes inside isFNEG() I wonder if we should try to avoid repetition of isFNEG() in combineXor etc. - Maybe split combineFneg into combineFneg and combineFnegPatterns which take the isFNEG result? RKSimon: Now that we're creating new nodes inside isFNEG() I wonder if we should try to avoid repetition…
				eramanAuthorUnsubmitted Not Done Reply Inline Actions Instead of splitting combineFneg, I have removed the assert below and returned early if isFNEG returns an empty node. eraman: Instead of splitting combineFneg, I have removed the assert below and returned early if isFNEG…
	case X86ISD::FXOR: IntOpcode = ISD::XOR; break;			case X86ISD::FXOR: IntOpcode = ISD::XOR; break;
	case X86ISD::FAND: IntOpcode = ISD::AND; break;			case X86ISD::FAND: IntOpcode = ISD::AND; break;
	case X86ISD::FANDN: IntOpcode = X86ISD::ANDNP; break;			case X86ISD::FANDN: IntOpcode = X86ISD::ANDNP; break;
	}			}
	SDValue IntOp = DAG.getNode(IntOpcode, dl, IntVT, Op0, Op1);			SDValue IntOp = DAG.getNode(IntOpcode, dl, IntVT, Op0, Op1);
	return DAG.getBitcast(VT, IntOp);			return DAG.getBitcast(VT, IntOp);
	}			}
				RKSimonUnsubmitted Not Done Reply Inline Actions dyn_cast_or_null? RKSimon: dyn_cast_or_null?
	return SDValue();			return SDValue();
	}			}


	/// Fold a xor(setcc cond, val), 1 --> setcc (inverted(cond), val)			/// Fold a xor(setcc cond, val), 1 --> setcc (inverted(cond), val)
	static SDValue foldXor1SetCC(SDNode *N, SelectionDAG &DAG) {			static SDValue foldXor1SetCC(SDNode *N, SelectionDAG &DAG) {
	if (N->getOpcode() != ISD::XOR)			if (N->getOpcode() != ISD::XOR)
	return SDValue();			return SDValue();
	Show All 31 Lines
	return SetCC;			return SetCC;

	if (SDValue RV = foldXorTruncShiftIntoCmp(N, DAG))			if (SDValue RV = foldXorTruncShiftIntoCmp(N, DAG))
	return RV;			return RV;

	if (SDValue FPLogic = convertIntLogicToFPLogic(N, DAG, Subtarget))			if (SDValue FPLogic = convertIntLogicToFPLogic(N, DAG, Subtarget))
	return FPLogic;			return FPLogic;

	if (isFNEG(N))			if (isFNEG(DAG, N))
	return combineFneg(N, DAG, Subtarget);			return combineFneg(N, DAG, Subtarget);
	return SDValue();			return SDValue();
	}			}

	static SDValue combineBEXTR(SDNode *N, SelectionDAG &DAG,			static SDValue combineBEXTR(SDNode *N, SelectionDAG &DAG,
	TargetLowering::DAGCombinerInfo &DCI,			TargetLowering::DAGCombinerInfo &DCI,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	SDValue Op0 = N->getOperand(0);			SDValue Op0 = N->getOperand(0);
	Show All 9 Lines
	if (auto *Cst1 = dyn_cast<ConstantSDNode>(Op1)) {			if (auto *Cst1 = dyn_cast<ConstantSDNode>(Op1)) {
	// Reduce Cst1 to the bottom 16-bits.			// Reduce Cst1 to the bottom 16-bits.
	// NOTE: SimplifyDemandedBits won't do this for constants.			// NOTE: SimplifyDemandedBits won't do this for constants.
	const APInt &Val1 = Cst1->getAPIntValue();			const APInt &Val1 = Cst1->getAPIntValue();
	APInt MaskedVal1 = Val1 & 0xFFFF;			APInt MaskedVal1 = Val1 & 0xFFFF;
	if (MaskedVal1 != Val1)			if (MaskedVal1 != Val1)
	return DAG.getNode(X86ISD::BEXTR, SDLoc(N), VT, Op0,			return DAG.getNode(X86ISD::BEXTR, SDLoc(N), VT, Op0,
	DAG.getConstant(MaskedVal1, SDLoc(N), VT));			DAG.getConstant(MaskedVal1, SDLoc(N), VT));
	}			}
				RKSimonUnsubmitted Done Reply Inline Actions unsigned - only use auto when the type is very obvious RKSimon: unsigned - only use auto when the type is very obvious

	// Only bottom 16-bits of the control bits are required.			// Only bottom 16-bits of the control bits are required.
	KnownBits Known;			KnownBits Known;
	APInt DemandedMask(APInt::getLowBitsSet(NumBits, 16));			APInt DemandedMask(APInt::getLowBitsSet(NumBits, 16));
	if (TLI.SimplifyDemandedBits(Op1, DemandedMask, Known, TLO)) {			if (TLI.SimplifyDemandedBits(Op1, DemandedMask, Known, TLO)) {
	DCI.CommitTargetLoweringOpt(TLO);			DCI.CommitTargetLoweringOpt(TLO);
	return SDValue(N, 0);			return SDValue(N, 0);
	}			}
	Show All 11 Lines
	/// undefined elements even if the input parameter does. This makes it suitable			/// undefined elements even if the input parameter does. This makes it suitable
	/// to be used as a replacement operand with operations (eg, bitwise-and) where			/// to be used as a replacement operand with operations (eg, bitwise-and) where
	/// an undef should not propagate.			/// an undef should not propagate.
	static SDValue getNullFPConstForNullVal(SDValue V, SelectionDAG &DAG,			static SDValue getNullFPConstForNullVal(SDValue V, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	if (!isNullFPScalarOrVectorConst(V))			if (!isNullFPScalarOrVectorConst(V))
	return SDValue();			return SDValue();

	if (V.getValueType().isVector())			if (V.getValueType().isVector())
				RKSimonUnsubmitted Not Done Reply Inline Actions Do we have test coverage for ignoring the undefs? RKSimon: Do we have test coverage for ignoring the undefs?
				eramanAuthorUnsubmitted Not Done Reply Inline Actions test7 of avx2-fma-fneg-combine.ll has fsub with all but one elements of the constant being undef. eraman: test7 of avx2-fma-fneg-combine.ll has fsub with all but one elements of the constant being…
	return getZeroVector(V.getSimpleValueType(), Subtarget, DAG, SDLoc(V));			return getZeroVector(V.getSimpleValueType(), Subtarget, DAG, SDLoc(V));

	return V;			return V;
	}			}

				RKSimonUnsubmitted Done Reply Inline Actions for (unsigned I = 0, E = EltBits.size(); I < E; I++) RKSimon: ``` for (unsigned I = 0, E = EltBits.size(); I < E; I++) ```
	static SDValue combineFAndFNotToFAndn(SDNode *N, SelectionDAG &DAG,			static SDValue combineFAndFNotToFAndn(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);
	SDLoc DL(N);			SDLoc DL(N);

	// Vector types are handled in combineANDXORWithAllOnesIntoANDNP().			// Vector types are handled in combineANDXORWithAllOnesIntoANDNP().
	if (!((VT == MVT::f32 && Subtarget.hasSSE1()) \|\|			if (!((VT == MVT::f32 && Subtarget.hasSSE1()) \|\|
	(VT == MVT::f64 && Subtarget.hasSSE2()) \|\|			(VT == MVT::f64 && Subtarget.hasSSE2()) \|\|
				RKSimonUnsubmitted Done Reply Inline Actions Why did you create the lambda? Why not just inline Negate? RKSimon: Why did you create the lambda? Why not just inline Negate?
				eramanAuthorUnsubmitted Not Done Reply Inline Actions Carryover from an initial version where I thought the lambda made sense. eraman: Carryover from an initial version where I thought the lambda made sense.
	(VT == MVT::v4f32 && Subtarget.hasSSE1() && !Subtarget.hasSSE2())))			(VT == MVT::v4f32 && Subtarget.hasSSE1() && !Subtarget.hasSSE2())))
	return SDValue();			return SDValue();

	auto isAllOnesConstantFP = [](SDValue V) {			auto isAllOnesConstantFP = [](SDValue V) {
	if (V.getSimpleValueType().isVector())			if (V.getSimpleValueType().isVector())
	return ISD::isBuildVectorAllOnes(V.getNode());			return ISD::isBuildVectorAllOnes(V.getNode());
	auto *C = dyn_cast<ConstantFPSDNode>(V);			auto *C = dyn_cast<ConstantFPSDNode>(V);
	return C && C->getConstantFPValue()->isAllOnesValue();			return C && C->getConstantFPValue()->isAllOnesValue();
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	// F[X]OR(0.0, x) -> x			// F[X]OR(0.0, x) -> x
	if (isNullFPScalarOrVectorConst(N->getOperand(0)))			if (isNullFPScalarOrVectorConst(N->getOperand(0)))
	return N->getOperand(1);			return N->getOperand(1);

	// F[X]OR(x, 0.0) -> x			// F[X]OR(x, 0.0) -> x
	if (isNullFPScalarOrVectorConst(N->getOperand(1)))			if (isNullFPScalarOrVectorConst(N->getOperand(1)))
	return N->getOperand(0);			return N->getOperand(0);

	if (isFNEG(N))			if (isFNEG(DAG, N))
	if (SDValue NewVal = combineFneg(N, DAG, Subtarget))			if (SDValue NewVal = combineFneg(N, DAG, Subtarget))
	return NewVal;			return NewVal;

	return lowerX86FPLogicOp(N, DAG, Subtarget);			return lowerX86FPLogicOp(N, DAG, Subtarget);
	}			}

	/// Do target-specific dag combines on X86ISD::FMIN and X86ISD::FMAX nodes.			/// Do target-specific dag combines on X86ISD::FMIN and X86ISD::FMAX nodes.
	static SDValue combineFMinFMax(SDNode *N, SelectionDAG &DAG) {			static SDValue combineFMinFMax(SDNode *N, SelectionDAG &DAG) {
	▲ Show 20 Lines • Show All 640 Lines • ▼ Show 20 Lines
	EVT ScalarVT = VT.getScalarType();			EVT ScalarVT = VT.getScalarType();
	if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) \|\| !Subtarget.hasAnyFMA())			if ((ScalarVT != MVT::f32 && ScalarVT != MVT::f64) \|\| !Subtarget.hasAnyFMA())
	return SDValue();			return SDValue();

	SDValue A = N->getOperand(0);			SDValue A = N->getOperand(0);
	SDValue B = N->getOperand(1);			SDValue B = N->getOperand(1);
	SDValue C = N->getOperand(2);			SDValue C = N->getOperand(2);

	auto invertIfNegative = [](SDValue &V) {			auto invertIfNegative = [&DAG](SDValue &V) {
	if (SDValue NegVal = isFNEG(V.getNode())) {			if (SDValue NegVal = isFNEG(DAG, V.getNode())) {
	V = NegVal;			V = NegVal;
	return true;			return true;
	}			}
	return false;			return false;
	};			};

	// Do not convert the passthru input of scalar intrinsics.			// Do not convert the passthru input of scalar intrinsics.
	// FIXME: We could allow negations of the lower element only.			// FIXME: We could allow negations of the lower element only.
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	}			}

	// Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)			// Combine FMADDSUB(A, B, FNEG(C)) -> FMSUBADD(A, B, C)
	static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,			static SDValue combineFMADDSUB(SDNode *N, SelectionDAG &DAG,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	SDLoc dl(N);			SDLoc dl(N);
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);

	SDValue NegVal = isFNEG(N->getOperand(2).getNode());			SDValue NegVal = isFNEG(DAG, N->getOperand(2).getNode());
	if (!NegVal)			if (!NegVal)
	return SDValue();			return SDValue();

	unsigned NewOpcode;			unsigned NewOpcode;
	switch (N->getOpcode()) {			switch (N->getOpcode()) {
	default: llvm_unreachable("Unexpected opcode!");			default: llvm_unreachable("Unexpected opcode!");
	case X86ISD::FMADDSUB: NewOpcode = X86ISD::FMSUBADD; break;			case X86ISD::FMADDSUB: NewOpcode = X86ISD::FMSUBADD; break;
	case X86ISD::FMADDSUB_RND: NewOpcode = X86ISD::FMSUBADD_RND; break;			case X86ISD::FMADDSUB_RND: NewOpcode = X86ISD::FMSUBADD_RND; break;
	▲ Show 20 Lines • Show All 2,721 Lines • Show Last 20 Lines

test/CodeGen/X86/avx2-fma-fneg-combine.ll

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	entry:
ret <2 x double> %sub.i		ret <2 x double> %sub.i
}		}

declare <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double> %a, <2 x double> %b, <2 x double> %c)		declare <2 x double> @llvm.x86.fma.vfmadd.pd(<2 x double> %a, <2 x double> %b, <2 x double> %c)

define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {		define <8 x float> @test7(float %a, <8 x float> %b, <8 x float> %c) {
; X32-LABEL: test7:		; X32-LABEL: test7:
; X32: # %bb.0: # %entry		; X32: # %bb.0: # %entry
; X32-NEXT: vmovss {{.*#+}} xmm2 = mem[0],zero,zero,zero		; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2
; X32-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero		; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1
; X32-NEXT: vsubps %ymm2, %ymm3, %ymm2
; X32-NEXT: vbroadcastss %xmm2, %ymm2
; X32-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm2 ymm0) + ymm1
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test7:		; X64-LABEL: test7:
; X64: # %bb.0: # %entry		; X64: # %bb.0: # %entry
; X64-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
; X64-NEXT: vmovss {{.*#+}} xmm3 = mem[0],zero,zero,zero
; X64-NEXT: vsubps %ymm0, %ymm3, %ymm0
; X64-NEXT: vbroadcastss %xmm0, %ymm0		; X64-NEXT: vbroadcastss %xmm0, %ymm0
; X64-NEXT: vfmadd213ps {{.#+}} ymm0 = (ymm1 ymm0) + ymm2		; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
; X64-NEXT: retq		; X64-NEXT: retq
entry:		entry:
%0 = insertelement <8 x float> undef, float %a, i32 0		%0 = insertelement <8 x float> undef, float %a, i32 0
		RKSimonUnsubmitted Not Done Reply Inline Actions Sorry I should have asked you to commit this test to trunk as well so this patch shows the codegen diff RKSimon: Sorry I should have asked you to commit this test to trunk as well so this patch shows the…
%1 = fsub <8 x float> <float -0.000000e+00, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, %0		%1 = fsub <8 x float> <float -0.000000e+00, float undef, float undef, float undef, float undef, float undef, float undef, float undef>, %0
%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer		%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer
%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)		%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %3		ret <8 x float> %3

}		}

		define <8 x float> @test8(float %a, <8 x float> %b, <8 x float> %c) {
		RKSimonUnsubmitted Done Reply Inline Actions Please can you commit this test to trunk so the patch shows the codegen diff? RKSimon: Please can you commit this test to trunk so the patch shows the codegen diff?
		eramanAuthorUnsubmitted Not Done Reply Inline Actions Forgot to respond here, but I have committed the test and rebased the patch after the commit. eraman: Forgot to respond here, but I have committed the test and rebased the patch after the commit.
		; X32-LABEL: test8:
		; X32: # %bb.0: # %entry
		; X32-NEXT: vbroadcastss {{[0-9]+}}(%esp), %ymm2
		; X32-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm2 ymm0) + ymm1
		; X32-NEXT: retl
		;
		; X64-LABEL: test8:
		; X64: # %bb.0: # %entry
		; X64-NEXT: vbroadcastss %xmm0, %ymm0
		; X64-NEXT: vfnmadd213ps {{.#+}} ymm0 = -(ymm1 ymm0) + ymm2
		; X64-NEXT: retq
		entry:
		%0 = fsub float -0.0, %a
		%1 = insertelement <8 x float> undef, float %0, i32 0
		%2 = shufflevector <8 x float> %1, <8 x float> undef, <8 x i32> zeroinitializer
		%3 = tail call <8 x float> @llvm.fma.v8f32(<8 x float> %2, <8 x float> %b, <8 x float> %c)
		ret <8 x float> %3
		}

declare <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %c)		declare <8 x float> @llvm.fma.v8f32(<8 x float> %a, <8 x float> %b, <8 x float> %c)

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize a splat of negate in isFNEGClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 154035

lib/Target/X86/X86ISelLowering.cpp

test/CodeGen/X86/avx2-fma-fneg-combine.ll

[X86] Recognize a splat of negate in isFNEG
ClosedPublic