This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
33/34
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx2-fma-fneg-combine.ll
1/1
fma-fneg-combine-2.ll

Differential D146494

[X86] Combine constant vector inputs for FMA
ClosedPublic

Authored by e-kud on Mar 20 2023, 7:49 PM.

Download Raw Diff

Details

Reviewers

pengfei
goldstein.w.n
RKSimon

Commits

rGc5276f772890: [X86] Combine constant vector inputs for FMA

Summary

Inspired by https://discourse.llvm.org/t/folding-memory-into-fma/69217

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

e-kud created this revision.Mar 20 2023, 7:49 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 7:49 PM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B220615: Diff 506827.Mar 20 2023, 7:50 PM

Limit to constant splat vectors.

Harbormaster completed remote builds in B221185: Diff 507565.Mar 22 2023, 5:55 PM

Rebased

e-kud added a parent revision: D147017: [X86] Add a fneg test for fma with a splat constant vector.Mar 27 2023, 7:05 PM

e-kud published this revision for review.Mar 27 2023, 7:09 PM

e-kud added reviewers: pengfei, goldstein.w.n, RKSimon.

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 7:09 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B222149: Diff 508868.Mar 27 2023, 7:29 PM

pengfei added inline comments.Mar 27 2023, 7:36 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
54037	Why just limit to splat constant?

RKSimon added inline comments.Mar 28 2023, 3:49 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
54037	Also - why the AVX2 and hasOneUse limits?

RKSimon added inline comments.Mar 28 2023, 3:51 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
54037	Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants

goldstein.w.n added inline comments.Mar 28 2023, 10:20 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
54037	Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally. Also why AVX2 still?
54037	Also - why the AVX2 and hasOneUse limits?
54037	Why just limit to splat constant? Should be match on ImmConstant no?

e-kud added inline comments.Mar 28 2023, 7:02 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
54037	Also - why the AVX2 and hasOneUse limits? Why just limit to splat constant? This is to limit to broadcasts, probably it is better to work with broadcast explicitly. I tried more general approach but it seems not much profitable. E.g. @@ -1,7 +1,7 @@ -; FMA3-LABEL: negated_constant_v4f64: ; FMA3: # %bb.0: ; FMA3-NEXT: vmovapd {{.#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1] -; FMA3-NEXT: vfmadd213pd {{.#+}} ymm2 = (ymm0 * ymm2) + ymm1 -; FMA3-NEXT: vfmadd231pd {{.#+}} ymm1 = (ymm0 mem) + ymm1 -; FMA3-NEXT: vaddpd %ymm1, %ymm2, %ymm0 +; FMA3-NEXT: vmovapd %ymm2, %ymm3 +; FMA3-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1 +; FMA3-NEXT: vfnmadd213pd {{.#+}} ymm2 = -(ymm0 ymm2) + ymm1 +; FMA3-NEXT: vaddpd %ymm2, %ymm3, %ymm0 ; FMA3-NEXT: retq What do you think?
54037	Should be match on ImmConstant no? AFAIK, `m_*` matching works with IR. Here we work with DAG. Am I missing something?
54037	Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants Yes. The problem appears when each constant initially has several users, then we start ping-ponging between them. To solve it thoroughly we need to check whether a constant can be eliminated completely: take two constant vectors and check that all users are fmas. It seems out of combiner context to me, doesn't it? However, I see some code that checks instruction's users during combining, maybe, it is the right way.

Addressed the comments.

pengfei added inline comments.Mar 30 2023, 7:58 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
53984–53985	Do we need to check it again given it's the only condition checked in line 54073?
53986	This doesn't explain why we need to iterate all its uses?
53988	`==` is intended? Shouldn't it be covered by `User->getOpcode() != ISD::FMA`?
53996	Maybe use `cast`?
53999–54000	Any case can fall into here?
54008–54010	Is the comment for line 54015? Why checking the use again?
54012	ditto.
llvm/test/CodeGen/X86/fma-fneg-combine-2.ll
132	We can make the elements different now, right?

Harbormaster completed remote builds in B222901: Diff 509881.Mar 30 2023, 8:51 PM

Addressed comments.
Added comments.
Fixed negative value search in case of undefs.
Updated tests.

e-kud added inline comments.Mar 31 2023, 8:39 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
53984–53985	Do we need to check it again given it's the only condition checked in line 54073? Dropped it.
53986	This doesn't explain why we need to iterate all its uses? I added comments describing the approach. Generally we don't want to optimize anything if there are other non-FMA consumers of constant `BUILD_VECTOR`. Because it will be loaded anyway. So, we want to understand which one (inverted or original) can be eliminated. If both can be eliminated, we need to fix a particular vector because we don't have a context to understand which one is original and which one is inverted even among a single instruction operands. If there is a concern about complexity of this check, it can be replaced with simplier `hasOneUse`. It limits the scope though.
53996	Maybe use `cast`?
53999–54000	Maybe use `cast`? Any case can fall into here? This is completely OK to have a constant vector with `undef` values, e.g `<2 x double> <double undef, double undef>`. There are some tests with such vectors. So, we can't use `cast` here. I assumed that during optimization we may obtain such the constant.
54008–54010	I've described approach above. Here we check users of an inverted vector.

Harbormaster completed remote builds in B223106: Diff 510166.Mar 31 2023, 9:55 PM

Thanks @e-kud, LGTM.

This revision is now accepted and ready to land.Mar 31 2023, 10:57 PM

Rebase

e-kud requested review of this revision.Apr 7 2023, 5:33 PM

Harbormaster completed remote builds in B224316: Diff 511831.Apr 7 2023, 5:55 PM

RKSimon added inline comments.Apr 13 2023, 8:16 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
54178	Couldn't both V and NV have negative values now that they aren't splats?

e-kud marked an inline comment as done.Apr 14 2023, 8:04 PM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
54178	It seems to me that they can't. If `V` is `<x0, x1, ..., xn>` then `NV` is `<-x0, -x1, ..., -xn>`. So, `V` and `NV` always have elements of different sign. The corner case is `V` consisting only of `undef`s. But original and negated versions of such vector are the same. So, everything should be handled well.

goldstein.w.n added inline comments.Apr 25 2023, 4:46 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
54173	persistently
54178	But couldn't both `V` and `NV` contain some negative and some positive values?

e-kud marked 8 inline comments as done.Apr 26 2023, 7:11 PM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
54178	The idea here is to prefer a vector that has the first negative operand. We don't care about other operands. A negative value can be only in `V` or `NV` not in both. But here is a pitfall that instead of FP value we may meet undef, That's why we iterate over operands until the first non-undef value is met. `V` and `NV` have undefs on same places. So, `V` and `NV` may contain both negative and positive values but we consider only the first non-undef operand.

Rebase. Spelling.

e-kud marked 2 inline comments as done.May 12 2023, 10:05 AM

Harbormaster completed remote builds in B231637: Diff 521700.May 12 2023, 11:18 AM

@e-kud do you want me to commit if for you? Or you are waiting for another approval?

In D146494#4339563, @pengfei wrote:

@e-kud do you want me to commit if for you? Or you are waiting for another approval?

@pengfei yes, I'll appreciate it! I wanted additionally check on SPEC: there are minor binary size reductions and several cases where this combine applies -- eliminated broadcasts as well as using a register operand instead of memory.

craig.topper added a subscriber: craig.topper.May 15 2023, 5:20 PM

craig.topper added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
54487	Can we use `llvm::any_of` or `!llvm::all_of` with a lambda?
54495	Capitalize `op`
54510	Can we use `llvm::any_of` or `!llvm::all_of` with a lambda?
54514	"consistently" is probably a better word than "peristently"
54517	llvm::any_of

Addressed the comments.

e-kud marked 5 inline comments as done.May 15 2023, 7:27 PM

e-kud added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
54514	Indeed :)
54517	I don't think that `any_of` is much better here. We need to stop traversing after the first non-undef value (this is controlled by return value of the lambda) and return `true` or `false` (using a variable from context). Then we generally ignore return result of `any_of` and use only the variable with the result. It looks artificial to me. Probably I'm missing something. Added an extra comment here.

craig.topper added inline comments.May 15 2023, 7:37 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
54517	Oops. I think I missed the break statement.

Harbormaster completed remote builds in B232172: Diff 522410.May 15 2023, 8:18 PM

This revision was not accepted when it landed; it landed in state Needs Review.May 17 2023, 12:21 AM

Closed by commit rGc5276f772890: [X86] Combine constant vector inputs for FMA (authored by e-kud, committed by pengfei). · Explain Why

This revision was automatically updated to reflect the committed changes.

pengfei added a commit: rGc5276f772890: [X86] Combine constant vector inputs for FMA.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

46 lines

test/

CodeGen/

X86/

avx2-fma-fneg-combine.ll

28 lines

fma-fneg-combine-2.ll

26 lines

Diff 509881

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 32,759 Lines • ▼ Show 20 Lines
	}			}

	if (SDValue NewAdd = promoteExtBeforeAdd(N, DAG, Subtarget))			if (SDValue NewAdd = promoteExtBeforeAdd(N, DAG, Subtarget))
	return NewAdd;			return NewAdd;

	return SDValue();			return SDValue();
	}			}

				static SDValue getInvertedVectorForFMA(SDValue V, SelectionDAG &DAG) {
				assert(ISD::isBuildVectorOfConstantFPSDNodes(V.getNode()) &&
				"ConstantFP build vector expected");
				pengfeiUnsubmitted Done Reply Inline Actions Do we need to check it again given it's the only condition checked in line 54073? pengfei: Do we need to check it again given it's the only condition checked in line 54073?
				e-kudAuthorUnsubmitted Done Reply Inline Actions Do we need to check it again given it's the only condition checked in line 54073? Dropped it. e-kud: > Do we need to check it again given it's the only condition checked in line 54073? Dropped it.
				// Check if we can eliminate a constant completely
				pengfeiUnsubmitted Done Reply Inline Actions This doesn't explain why we need to iterate all its uses? pengfei: This doesn't explain why we need to iterate all its uses?
				e-kudAuthorUnsubmitted Done Reply Inline Actions This doesn't explain why we need to iterate all its uses? I added comments describing the approach. Generally we don't want to optimize anything if there are other non-FMA consumers of constant `BUILD_VECTOR`. Because it will be loaded anyway. So, we want to understand which one (inverted or original) can be eliminated. If both can be eliminated, we need to fix a particular vector because we don't have a context to understand which one is original and which one is inverted even among a single instruction operands. If there is a concern about complexity of this check, it can be replaced with simplier `hasOneUse`. It limits the scope though. e-kud: > This doesn't explain why we need to iterate all its uses? I added comments describing the…
				for (const SDNode *User : V->uses()) {
				if (User->getOpcode() != ISD::FMA \|\| User->getOpcode() == ISD::STRICT_FMA)
				pengfeiUnsubmitted Done Reply Inline Actions `==` is intended? Shouldn't it be covered by `User->getOpcode() != ISD::FMA`? pengfei: `==` is intended? Shouldn't it be covered by `User->getOpcode() != ISD::FMA`?
				return SDValue();
				}
				// Form an inverted vector
				SmallVector<SDValue, 8> Ops;
				EVT VT = V.getValueType();
				EVT EltVT = VT.getVectorElementType();
				for (auto op : V->op_values()) {
				if (auto *Cst = dyn_cast<ConstantFPSDNode>(op)) {
				pengfeiUnsubmitted Done Reply Inline Actions Maybe use `cast`? pengfei: Maybe use `cast`?
				e-kudAuthorUnsubmitted Done Reply Inline Actions Maybe use `cast`? e-kud: > Maybe use `cast`?
				Ops.push_back(DAG.getConstantFP(-Cst->getValueAPF(), SDLoc(op), EltVT));
				} else {
				assert(op.isUndef());
				Ops.push_back(DAG.getUNDEF(EltVT));
				pengfeiUnsubmitted Done Reply Inline Actions Any case can fall into here? pengfei: Any case can fall into here?
				e-kudAuthorUnsubmitted Done Reply Inline Actions Maybe use `cast`? Any case can fall into here? This is completely OK to have a constant vector with `undef` values, e.g `<2 x double> <double undef, double undef>`. There are some tests with such vectors. So, we can't use `cast` here. I assumed that during optimization we may obtain such the constant. e-kud: > Maybe use `cast`? > Any case can fall into here? This is completely OK to have a constant…
				}
				}

				SDNode *NV = DAG.getNodeIfExists(ISD::BUILD_VECTOR, DAG.getVTList(VT), Ops);
				if (!NV)
				return SDValue();

				// If the inverted value also can be eliminated, we have to persistancy
				// prefer one of the values. We prefer a constant with negative value on the
				// first element.
				pengfeiUnsubmitted Done Reply Inline Actions Is the comment for line 54015? Why checking the use again? pengfei: Is the comment for line 54015? Why checking the use again?
				e-kudAuthorUnsubmitted Done Reply Inline Actions I've described approach above. Here we check users of an inverted vector. e-kud: I've described approach above. Here we check users of an inverted vector.
				for (const SDNode *User : NV->uses())
				if (User->getOpcode() != ISD::FMA \|\| User->getOpcode() == ISD::STRICT_FMA)
				pengfeiUnsubmitted Done Reply Inline Actions ditto. pengfei: ditto.
				return SDValue(NV, 0);

				if (cast<ConstantFPSDNode>(V->getOperand(0))->isNegative())
				return SDValue();

				return SDValue(NV, 0);
				}

	static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,			static SDValue combineFMA(SDNode *N, SelectionDAG &DAG,
	TargetLowering::DAGCombinerInfo &DCI,			TargetLowering::DAGCombinerInfo &DCI,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	SDLoc dl(N);			SDLoc dl(N);
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);
	bool IsStrict = N->isStrictFPOpcode() \|\| N->isTargetStrictFPOpcode();			bool IsStrict = N->isStrictFPOpcode() \|\| N->isTargetStrictFPOpcode();

	// Let legalize expand this if it isn't a legal type yet.			// Let legalize expand this if it isn't a legal type yet.
	const TargetLowering &TLI = DAG.getTargetLoweringInfo();			const TargetLowering &TLI = DAG.getTargetLoweringInfo();
	if (!TLI.isTypeLegal(VT))			if (!TLI.isTypeLegal(VT))
	return SDValue();			return SDValue();

	SDValue A = N->getOperand(IsStrict ? 1 : 0);			SDValue A = N->getOperand(IsStrict ? 1 : 0);
	SDValue B = N->getOperand(IsStrict ? 2 : 1);			SDValue B = N->getOperand(IsStrict ? 2 : 1);
	SDValue C = N->getOperand(IsStrict ? 3 : 2);			SDValue C = N->getOperand(IsStrict ? 3 : 2);

	// If the operation allows fast-math and the target does not support FMA,			// If the operation allows fast-math and the target does not support FMA,
				pengfeiUnsubmitted Done Reply Inline Actions Why just limit to splat constant? pengfei: Why just limit to splat constant?
				RKSimonUnsubmitted Done Reply Inline Actions Also - why the AVX2 and hasOneUse limits? RKSimon: Also - why the AVX2 and hasOneUse limits?
				RKSimonUnsubmitted Done Reply Inline Actions Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants RKSimon: Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants
				goldstein.w.nUnsubmitted Done Reply Inline Actions Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally. Also why AVX2 still? goldstein.w.n: > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants…
				e-kudAuthorUnsubmitted Done Reply Inline Actions Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants Yes. The problem appears when each constant initially has several users, then we start ping-ponging between them. To solve it thoroughly we need to check whether a constant can be eliminated completely: take two constant vectors and check that all users are fmas. It seems out of combiner context to me, doesn't it? However, I see some code that checks instruction's users during combining, maybe, it is the right way. e-kud: > Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants…
				goldstein.w.nUnsubmitted Done Reply Inline Actions Also - why the AVX2 and hasOneUse limits? goldstein.w.n: > Also - why the AVX2 and hasOneUse limits?
				e-kudAuthorUnsubmitted Done Reply Inline Actions Also - why the AVX2 and hasOneUse limits? Why just limit to splat constant? This is to limit to broadcasts, probably it is better to work with broadcast explicitly. I tried more general approach but it seems not much profitable. E.g. @@ -1,7 +1,7 @@ -; FMA3-LABEL: negated_constant_v4f64: ; FMA3: # %bb.0: ; FMA3-NEXT: vmovapd {{.#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1] -; FMA3-NEXT: vfmadd213pd {{.#+}} ymm2 = (ymm0 * ymm2) + ymm1 -; FMA3-NEXT: vfmadd231pd {{.#+}} ymm1 = (ymm0 mem) + ymm1 -; FMA3-NEXT: vaddpd %ymm1, %ymm2, %ymm0 +; FMA3-NEXT: vmovapd %ymm2, %ymm3 +; FMA3-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1 +; FMA3-NEXT: vfnmadd213pd {{.#+}} ymm2 = -(ymm0 ymm2) + ymm1 +; FMA3-NEXT: vaddpd %ymm2, %ymm3, %ymm0 ; FMA3-NEXT: retq What do you think? e-kud: > Also - why the AVX2 and hasOneUse limits? > Why just limit to splat constant? This is to…
				goldstein.w.nUnsubmitted Done Reply Inline Actions Why just limit to splat constant? Should be match on ImmConstant no? goldstein.w.n: > Why just limit to splat constant? Should be match on ImmConstant no?
				e-kudAuthorUnsubmitted Done Reply Inline Actions Should be match on ImmConstant no? AFAIK, `m_` matching works with IR. Here we work with DAG. Am I missing something? e-kud:* > Should be match on ImmConstant no? AFAIK, `m_*` matching works with IR. Here we work with…
	// split this into mul+add to avoid libcall(s).			// split this into mul+add to avoid libcall(s).
	SDNodeFlags Flags = N->getFlags();			SDNodeFlags Flags = N->getFlags();
	if (!IsStrict && Flags.hasAllowReassociation() &&			if (!IsStrict && Flags.hasAllowReassociation() &&
	TLI.isOperationExpand(ISD::FMA, VT)) {			TLI.isOperationExpand(ISD::FMA, VT)) {
	SDValue Fmul = DAG.getNode(ISD::FMUL, dl, VT, A, B, Flags);			SDValue Fmul = DAG.getNode(ISD::FMUL, dl, VT, A, B, Flags);
	return DAG.getNode(ISD::FADD, dl, VT, Fmul, C, Flags);			return DAG.getNode(ISD::FADD, dl, VT, Fmul, C, Flags);
	}			}

	Show All 18 Lines
	SDValue Vec = V.getOperand(0);			SDValue Vec = V.getOperand(0);
	if (SDValue NegV = TLI.getCheaperNegatedExpression(			if (SDValue NegV = TLI.getCheaperNegatedExpression(
	Vec, DAG, LegalOperations, CodeSize)) {			Vec, DAG, LegalOperations, CodeSize)) {
	V = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(V), V.getValueType(),			V = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SDLoc(V), V.getValueType(),
	NegV, V.getOperand(1));			NegV, V.getOperand(1));
	return true;			return true;
	}			}
	}			}
				// Lookup if there is an inverted version of constant vector V in DAG.
				if (ISD::isBuildVectorOfConstantFPSDNodes(V.getNode())) {
				if (SDValue NegV = getInvertedVectorForFMA(V, DAG)) {
				V = NegV;
				return true;
				}
				}
	return false;			return false;
	};			};

	// Do not convert the passthru input of scalar intrinsics.			// Do not convert the passthru input of scalar intrinsics.
	// FIXME: We could allow negations of the lower element only.			// FIXME: We could allow negations of the lower element only.
	bool NegA = invertIfNegative(A);			bool NegA = invertIfNegative(A);
	bool NegB = invertIfNegative(B);			bool NegB = invertIfNegative(B);
	bool NegC = invertIfNegative(C);			bool NegC = invertIfNegative(C);
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	DAG, DCI, Subtarget))			DAG, DCI, Subtarget))
	return V;			return V;

	if (VT.isVector())			if (VT.isVector())
	if (SDValue R = PromoteMaskArithmetic(N, DAG, Subtarget))			if (SDValue R = PromoteMaskArithmetic(N, DAG, Subtarget))
	return R;			return R;

	if (SDValue NewAdd = promoteExtBeforeAdd(N, DAG, Subtarget))			if (SDValue NewAdd = promoteExtBeforeAdd(N, DAG, Subtarget))
	return NewAdd;			return NewAdd;
				goldstein.w.nUnsubmitted Done Reply Inline Actions persistently goldstein.w.n: persistently

	if (SDValue R = combineOrCmpEqZeroToCtlzSrl(N, DAG, DCI, Subtarget))			if (SDValue R = combineOrCmpEqZeroToCtlzSrl(N, DAG, DCI, Subtarget))
	return R;			return R;

	// TODO: Combine with any target/faux shuffle.			// TODO: Combine with any target/faux shuffle.
				RKSimonUnsubmitted Done Reply Inline Actions Couldn't both V and NV have negative values now that they aren't splats? RKSimon: Couldn't both V and NV have negative values now that they aren't splats?
				e-kudAuthorUnsubmitted Done Reply Inline Actions It seems to me that they can't. If `V` is `<x0, x1, ..., xn>` then `NV` is `<-x0, -x1, ..., -xn>`. So, `V` and `NV` always have elements of different sign. The corner case is `V` consisting only of `undef`s. But original and negated versions of such vector are the same. So, everything should be handled well. e-kud: It seems to me that they can't. If `V` is `<x0, x1, ..., xn>` then `NV` is `<-x0, -x1, ...
				goldstein.w.nUnsubmitted Done Reply Inline Actions But couldn't both `V` and `NV` contain some negative and some positive values? goldstein.w.n: But couldn't both `V` and `NV` contain some negative and some positive values?
				e-kudAuthorUnsubmitted Done Reply Inline Actions The idea here is to prefer a vector that has the first negative operand. We don't care about other operands. A negative value can be only in `V` or `NV` not in both. But here is a pitfall that instead of FP value we may meet undef, That's why we iterate over operands until the first non-undef value is met. `V` and `NV` have undefs on same places. So, `V` and `NV` may contain both negative and positive values but we consider only the first non-undef operand. e-kud: The idea here is to prefer a vector that has the first negative operand. We don't care about…
	if (N0.getOpcode() == X86ISD::PACKUS && N0.getValueSizeInBits() == 128 &&			if (N0.getOpcode() == X86ISD::PACKUS && N0.getValueSizeInBits() == 128 &&
	VT.getScalarSizeInBits() == N0.getOperand(0).getScalarValueSizeInBits()) {			VT.getScalarSizeInBits() == N0.getOperand(0).getScalarValueSizeInBits()) {
	SDValue N00 = N0.getOperand(0);			SDValue N00 = N0.getOperand(0);
	SDValue N01 = N0.getOperand(1);			SDValue N01 = N0.getOperand(1);
	unsigned NumSrcEltBits = N00.getScalarValueSizeInBits();			unsigned NumSrcEltBits = N00.getScalarValueSizeInBits();
	APInt ZeroMask = APInt::getHighBitsSet(NumSrcEltBits, NumSrcEltBits / 2);			APInt ZeroMask = APInt::getHighBitsSet(NumSrcEltBits, NumSrcEltBits / 2);
	if ((N00.isUndef() \|\| DAG.MaskedValueIsZero(N00, ZeroMask)) &&			if ((N00.isUndef() \|\| DAG.MaskedValueIsZero(N00, ZeroMask)) &&
	(N01.isUndef() \|\| DAG.MaskedValueIsZero(N01, ZeroMask))) {			(N01.isUndef() \|\| DAG.MaskedValueIsZero(N01, ZeroMask))) {
	▲ Show 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	// Fold movmsk(icmp_sgt(x,-1)) -> not(movmsk(x)) to improve folding of movmsk			// Fold movmsk(icmp_sgt(x,-1)) -> not(movmsk(x)) to improve folding of movmsk
	// results with scalar comparisons.			// results with scalar comparisons.
	if (Src.getOpcode() == X86ISD::PCMPGT &&			if (Src.getOpcode() == X86ISD::PCMPGT &&
	ISD::isBuildVectorAllOnes(Src.getOperand(1).getNode())) {			ISD::isBuildVectorAllOnes(Src.getOperand(1).getNode())) {
	SDLoc DL(N);			SDLoc DL(N);
	APInt NotMask = APInt::getLowBitsSet(NumBits, NumElts);			APInt NotMask = APInt::getLowBitsSet(NumBits, NumElts);
	return DAG.getNode(ISD::XOR, DL, VT,			return DAG.getNode(ISD::XOR, DL, VT,
	DAG.getNode(X86ISD::MOVMSK, DL, VT, Src.getOperand(0)),			DAG.getNode(X86ISD::MOVMSK, DL, VT, Src.getOperand(0)),
	DAG.getConstant(NotMask, DL, VT));			DAG.getConstant(NotMask, DL, VT));
				craig.topperUnsubmitted Done Reply Inline Actions Can we use `llvm::any_of` or `!llvm::all_of` with a lambda? craig.topper: Can we use `llvm::any_of` or `!llvm::all_of` with a lambda?
	}			}

	// Fold movmsk(icmp_eq(and(x,c1),0)) -> movmsk(not(shl(x,c2)))			// Fold movmsk(icmp_eq(and(x,c1),0)) -> movmsk(not(shl(x,c2)))
	// iff pow2splat(c1).			// iff pow2splat(c1).
	// Use KnownBits to determine if only a single bit is non-zero			// Use KnownBits to determine if only a single bit is non-zero
	// in each element (pow2 or zero), and shift that bit to the msb.			// in each element (pow2 or zero), and shift that bit to the msb.
	if (Src.getOpcode() == X86ISD::PCMPEQ &&			if (Src.getOpcode() == X86ISD::PCMPEQ &&
	ISD::isBuildVectorAllZeros(Src.getOperand(1).getNode())) {			ISD::isBuildVectorAllZeros(Src.getOperand(1).getNode())) {
				craig.topperUnsubmitted Done Reply Inline Actions Capitalize `op` craig.topper: Capitalize `op`
	KnownBits KnownSrc = DAG.computeKnownBits(Src.getOperand(0));			KnownBits KnownSrc = DAG.computeKnownBits(Src.getOperand(0));
	if (KnownSrc.countMaxPopulation() == 1) {			if (KnownSrc.countMaxPopulation() == 1) {
	SDLoc DL(N);			SDLoc DL(N);
	MVT ShiftVT = SrcVT;			MVT ShiftVT = SrcVT;
	SDValue ShiftSrc = Src.getOperand(0);			SDValue ShiftSrc = Src.getOperand(0);
	if (ShiftVT.getScalarType() == MVT::i8) {			if (ShiftVT.getScalarType() == MVT::i8) {
	// vXi8 shifts - we only care about the signbit so can use PSLLW.			// vXi8 shifts - we only care about the signbit so can use PSLLW.
	ShiftVT = MVT::getVectorVT(MVT::i16, NumElts / 2);			ShiftVT = MVT::getVectorVT(MVT::i16, NumElts / 2);
	ShiftSrc = DAG.getBitcast(ShiftVT, ShiftSrc);			ShiftSrc = DAG.getBitcast(ShiftVT, ShiftSrc);
	}			}
	unsigned ShiftAmt = KnownSrc.countMinLeadingZeros();			unsigned ShiftAmt = KnownSrc.countMinLeadingZeros();
	ShiftSrc = getTargetVShiftByConstNode(X86ISD::VSHLI, DL, ShiftVT,			ShiftSrc = getTargetVShiftByConstNode(X86ISD::VSHLI, DL, ShiftVT,
	ShiftSrc, ShiftAmt, DAG);			ShiftSrc, ShiftAmt, DAG);
	ShiftSrc = DAG.getNOT(DL, DAG.getBitcast(SrcVT, ShiftSrc), SrcVT);			ShiftSrc = DAG.getNOT(DL, DAG.getBitcast(SrcVT, ShiftSrc), SrcVT);
	return DAG.getNode(X86ISD::MOVMSK, DL, VT, ShiftSrc);			return DAG.getNode(X86ISD::MOVMSK, DL, VT, ShiftSrc);
				craig.topperUnsubmitted Done Reply Inline Actions Can we use `llvm::any_of` or `!llvm::all_of` with a lambda? craig.topper: Can we use `llvm::any_of` or `!llvm::all_of` with a lambda?
	}			}
	}			}

	// Simplify the inputs.			// Simplify the inputs.
				craig.topperUnsubmitted Not Done Reply Inline Actions "consistently" is probably a better word than "peristently" craig.topper: "consistently" is probably a better word than "peristently"
				e-kudAuthorUnsubmitted Done Reply Inline Actions Indeed :) e-kud: Indeed :)
	const TargetLowering &TLI = DAG.getTargetLoweringInfo();			const TargetLowering &TLI = DAG.getTargetLoweringInfo();
	APInt DemandedMask(APInt::getAllOnes(NumBits));			APInt DemandedMask(APInt::getAllOnes(NumBits));
	if (TLI.SimplifyDemandedBits(SDValue(N, 0), DemandedMask, DCI))			if (TLI.SimplifyDemandedBits(SDValue(N, 0), DemandedMask, DCI))
				craig.topperUnsubmitted Not Done Reply Inline Actions llvm::any_of craig.topper: llvm::any_of
				e-kudAuthorUnsubmitted Done Reply Inline Actions I don't think that `any_of` is much better here. We need to stop traversing after the first non-undef value (this is controlled by return value of the lambda) and return `true` or `false` (using a variable from context). Then we generally ignore return result of `any_of` and use only the variable with the result. It looks artificial to me. Probably I'm missing something. Added an extra comment here. e-kud: I don't think that `any_of` is much better here. We need to stop traversing after the first non…
				craig.topperUnsubmitted Not Done Reply Inline Actions Oops. I think I missed the break statement. craig.topper: Oops. I think I missed the break statement.
	return SDValue(N, 0);			return SDValue(N, 0);

	return SDValue();			return SDValue();
	}			}

	static SDValue combineX86GatherScatter(SDNode *N, SelectionDAG &DAG,			static SDValue combineX86GatherScatter(SDNode *N, SelectionDAG &DAG,
	TargetLowering::DAGCombinerInfo &DCI,			TargetLowering::DAGCombinerInfo &DCI,
	const X86Subtarget &Subtarget) {			const X86Subtarget &Subtarget) {
	▲ Show 20 Lines • Show All 4,211 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%t3 = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)		%t3 = tail call nsz <8 x float> @llvm.fma.v8f32(<8 x float> %t2, <8 x float> %b, <8 x float> %c)
ret <8 x float> %t3		ret <8 x float> %t3
}		}

define <4 x double> @test9(<4 x double> %a) {		define <4 x double> @test9(<4 x double> %a) {
; X32-LABEL: test9:		; X32-LABEL: test9:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: vbroadcastsd {{.*#+}} ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]		; X32-NEXT: vbroadcastsd {{.*#+}} ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
; X32-NEXT: vbroadcastsd {{.*#+}} ymm2 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; X32-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm1
; X32-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm2 ymm0) + ymm1
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test9:		; X64-LABEL: test9:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vbroadcastsd {{.*#+}} ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]		; X64-NEXT: vbroadcastsd {{.*#+}} ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
; X64-NEXT: vbroadcastsd {{.*#+}} ymm2 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; X64-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm1
; X64-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm2 ymm0) + ymm1
; X64-NEXT: retq		; X64-NEXT: retq
%t = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)		%t = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)
ret <4 x double> %t		ret <4 x double> %t
}		}

define <4 x double> @test10(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test10(<4 x double> %a, <4 x double> %b) {
; X32-LABEL: test10:		; X32-LABEL: test10:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: vbroadcastsd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]		; X32-NEXT: vbroadcastsd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
; X32-NEXT: vfmadd213pd {{.#+}} ymm2 = (ymm0 ymm2) + ymm1		; X32-NEXT: vmovapd %ymm2, %ymm3
; X32-NEXT: vbroadcastsd {{.*#+}} ymm3 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]
; X32-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1		; X32-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1
; X32-NEXT: vaddpd %ymm3, %ymm2, %ymm0		; X32-NEXT: vfnmadd213pd {{.#+}} ymm2 = -(ymm0 ymm2) + ymm1
		; X32-NEXT: vaddpd %ymm2, %ymm3, %ymm0
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test10:		; X64-LABEL: test10:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vbroadcastsd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]		; X64-NEXT: vbroadcastsd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
; X64-NEXT: vfmadd213pd {{.#+}} ymm2 = (ymm0 ymm2) + ymm1		; X64-NEXT: vmovapd %ymm2, %ymm3
; X64-NEXT: vbroadcastsd {{.*#+}} ymm3 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]
; X64-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1		; X64-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1
; X64-NEXT: vaddpd %ymm3, %ymm2, %ymm0		; X64-NEXT: vfnmadd213pd {{.#+}} ymm2 = -(ymm0 ymm2) + ymm1
		; X64-NEXT: vaddpd %ymm2, %ymm3, %ymm0
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double -5.000000e-01, double undef, double -5.000000e-01, double -5.000000e-01>, <4 x double> %b)		%t0 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double -5.000000e-01, double undef, double -5.000000e-01, double -5.000000e-01>, <4 x double> %b)
%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double undef, double 5.000000e-01, double 5.000000e-01>, <4 x double> %b)		%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double undef, double 5.000000e-01, double 5.000000e-01>, <4 x double> %b)
%t2 = fadd <4 x double> %t0, %t1		%t2 = fadd <4 x double> %t0, %t1
ret <4 x double> %t2		ret <4 x double> %t2
}		}

define <4 x double> @test11(<4 x double> %a) {		define <4 x double> @test11(<4 x double> %a) {
; X32-LABEL: test11:		; X32-LABEL: test11:
; X32: # %bb.0:		; X32: # %bb.0:
; X32-NEXT: vbroadcastsd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; X32-NEXT: vbroadcastsd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]
; X32-NEXT: vaddpd %ymm1, %ymm0, %ymm2		; X32-NEXT: vaddpd %ymm1, %ymm0, %ymm0
; X32-NEXT: vbroadcastsd {{.*#+}} ymm0 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]		; X32-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm1 ymm0) - ymm1
; X32-NEXT: vfmadd231pd {{.#+}} ymm0 = (ymm1 ymm2) + ymm0
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test11:		; X64-LABEL: test11:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vbroadcastsd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; X64-NEXT: vbroadcastsd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]
; X64-NEXT: vaddpd %ymm1, %ymm0, %ymm2		; X64-NEXT: vaddpd %ymm1, %ymm0, %ymm0
; X64-NEXT: vbroadcastsd {{.*#+}} ymm0 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]		; X64-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm1 ymm0) - ymm1
; X64-NEXT: vfmadd231pd {{.#+}} ymm0 = (ymm1 ymm2) + ymm0
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = fadd <4 x double> %a, <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		%t0 = fadd <4 x double> %a, <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %t0, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)		%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %t0, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)
ret <4 x double> %t1		ret <4 x double> %t1
}		}

define <4 x double> @test12(<4 x double> %a) {		define <4 x double> @test12(<4 x double> %a) {
; X32-LABEL: test12:		; X32-LABEL: test12:
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fma-fneg-combine-2.ll

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	; FMA4-NEXT: retq
%fma = call nsz float @llvm.fma.f32(float %x, float -42.0, float %m)		%fma = call nsz float @llvm.fma.f32(float %x, float -42.0, float %m)
%nfma = fneg float %fma		%nfma = fneg float %fma
ret float %nfma		ret float %nfma
}		}

define <4 x double> @negated_constant_v4f64(<4 x double> %a) {		define <4 x double> @negated_constant_v4f64(<4 x double> %a) {
; FMA3-LABEL: negated_constant_v4f64:		; FMA3-LABEL: negated_constant_v4f64:
; FMA3: # %bb.0:		; FMA3: # %bb.0:
; FMA3-NEXT: vmovapd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; FMA3-NEXT: vmovapd {{.*#+}} ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
		pengfeiUnsubmitted Done Reply Inline Actions We can make the elements different now, right? pengfei: We can make the elements different now, right?
; FMA3-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm1 ymm0) + mem		; FMA3-NEXT: vfnmadd213pd {{.#+}} ymm0 = -(ymm1 ymm0) + ymm1
; FMA3-NEXT: retq		; FMA3-NEXT: retq
;		;
; FMA4-LABEL: negated_constant_v4f64:		; FMA4-LABEL: negated_constant_v4f64:
; FMA4: # %bb.0:		; FMA4: # %bb.0:
; FMA4-NEXT: vmovapd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; FMA4-NEXT: vmovapd {{.*#+}} ymm1 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 ymm1) + mem		; FMA4-NEXT: vfnmaddpd {{.#+}} ymm0 = -(ymm0 ymm1) + ymm1
; FMA4-NEXT: retq		; FMA4-NEXT: retq
%t = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)		%t = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)
ret <4 x double> %t		ret <4 x double> %t
}		}

define <4 x double> @negated_constant_v4f64_2fmas(<4 x double> %a, <4 x double> %b) {		define <4 x double> @negated_constant_v4f64_2fmas(<4 x double> %a, <4 x double> %b) {
; FMA3-LABEL: negated_constant_v4f64_2fmas:		; FMA3-LABEL: negated_constant_v4f64_2fmas:
; FMA3: # %bb.0:		; FMA3: # %bb.0:
; FMA3-NEXT: vmovapd {{.*#+}} ymm2 = <-5.0E-1,u,-5.0E-1,-5.0E-1>		; FMA3-NEXT: vmovapd {{.*#+}} ymm2 = <-5.0E-1,u,-5.0E-1,-5.0E-1>
; FMA3-NEXT: vfmadd213pd {{.#+}} ymm2 = (ymm0 ymm2) + ymm1		; FMA3-NEXT: vmovapd %ymm2, %ymm3
; FMA3-NEXT: vfmadd231pd {{.#+}} ymm1 = (ymm0 mem) + ymm1		; FMA3-NEXT: vfmadd213pd {{.#+}} ymm3 = (ymm0 ymm3) + ymm1
; FMA3-NEXT: vaddpd %ymm1, %ymm2, %ymm0		; FMA3-NEXT: vfnmadd213pd {{.#+}} ymm2 = -(ymm0 ymm2) + ymm1
		; FMA3-NEXT: vaddpd %ymm2, %ymm3, %ymm0
; FMA3-NEXT: retq		; FMA3-NEXT: retq
;		;
; FMA4-LABEL: negated_constant_v4f64_2fmas:		; FMA4-LABEL: negated_constant_v4f64_2fmas:
; FMA4: # %bb.0:		; FMA4: # %bb.0:
; FMA4-NEXT: vfmaddpd {{.#+}} ymm2 = (ymm0 mem) + ymm1		; FMA4-NEXT: vmovapd {{.*#+}} ymm2 = <-5.0E-1,u,-5.0E-1,-5.0E-1>
; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 mem) + ymm1		; FMA4-NEXT: vfmaddpd {{.#+}} ymm3 = (ymm0 ymm2) + ymm1
; FMA4-NEXT: vaddpd %ymm0, %ymm2, %ymm0		; FMA4-NEXT: vfnmaddpd {{.#+}} ymm0 = -(ymm0 ymm2) + ymm1
		; FMA4-NEXT: vaddpd %ymm0, %ymm3, %ymm0
; FMA4-NEXT: retq		; FMA4-NEXT: retq
%t0 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double -5.000000e-01, double undef, double -5.000000e-01, double -5.000000e-01>, <4 x double> %b)		%t0 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double -5.000000e-01, double undef, double -5.000000e-01, double -5.000000e-01>, <4 x double> %b)
%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double undef, double 5.000000e-01, double 5.000000e-01>, <4 x double> %b)		%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %a, <4 x double> <double 5.000000e-01, double undef, double 5.000000e-01, double 5.000000e-01>, <4 x double> %b)
%t2 = fadd <4 x double> %t0, %t1		%t2 = fadd <4 x double> %t0, %t1
ret <4 x double> %t2		ret <4 x double> %t2
}		}

define <4 x double> @negated_constant_v4f64_fadd(<4 x double> %a) {		define <4 x double> @negated_constant_v4f64_fadd(<4 x double> %a) {
; FMA3-LABEL: negated_constant_v4f64_fadd:		; FMA3-LABEL: negated_constant_v4f64_fadd:
; FMA3: # %bb.0:		; FMA3: # %bb.0:
; FMA3-NEXT: vmovapd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; FMA3-NEXT: vmovapd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]
; FMA3-NEXT: vaddpd %ymm1, %ymm0, %ymm0		; FMA3-NEXT: vaddpd %ymm1, %ymm0, %ymm0
; FMA3-NEXT: vfmadd213pd {{.#+}} ymm0 = (ymm1 ymm0) + mem		; FMA3-NEXT: vfmsub213pd {{.#+}} ymm0 = (ymm1 ymm0) - ymm1
; FMA3-NEXT: retq		; FMA3-NEXT: retq
;		;
; FMA4-LABEL: negated_constant_v4f64_fadd:		; FMA4-LABEL: negated_constant_v4f64_fadd:
; FMA4: # %bb.0:		; FMA4: # %bb.0:
; FMA4-NEXT: vmovapd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]		; FMA4-NEXT: vmovapd {{.*#+}} ymm1 = [5.0E-1,5.0E-1,5.0E-1,5.0E-1]
; FMA4-NEXT: vaddpd %ymm1, %ymm0, %ymm0		; FMA4-NEXT: vaddpd %ymm1, %ymm0, %ymm0
; FMA4-NEXT: vfmaddpd {{.#+}} ymm0 = (ymm0 ymm1) + mem		; FMA4-NEXT: vfmsubpd {{.#+}} ymm0 = (ymm0 ymm1) - ymm1
; FMA4-NEXT: retq		; FMA4-NEXT: retq
%t0 = fadd <4 x double> %a, <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>		%t0 = fadd <4 x double> %a, <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>
%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %t0, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)		%t1 = tail call <4 x double> @llvm.fma.v4f64(<4 x double> %t0, <4 x double> <double 5.000000e-01, double 5.000000e-01, double 5.000000e-01, double 5.000000e-01>, <4 x double> <double -5.000000e-01, double -5.000000e-01, double -5.000000e-01, double -5.000000e-01>)
ret <4 x double> %t1		ret <4 x double> %t1
}		}

define <4 x double> @negated_constant_v4f64_2fadd(<4 x double> %a) {		define <4 x double> @negated_constant_v4f64_2fadd(<4 x double> %a) {
; FMA3-LABEL: negated_constant_v4f64_2fadd:		; FMA3-LABEL: negated_constant_v4f64_2fadd:
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Combine constant vector inputs for FMAClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 509881

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx2-fma-fneg-combine.ll

llvm/test/CodeGen/X86/fma-fneg-combine-2.ll

[X86] Combine constant vector inputs for FMA
ClosedPublic