This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
5/12
DAGCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
fma-precision.ll
-
recipest.ll

Differential D75982

[DAGCombine] Respect the uses when combine FMA for ab+/-cd
ClosedPublic

Authored by steven.zhang on Mar 11 2020, 4:32 AM.

Download Raw Diff

Details

Reviewers

arsenm
RKSimon
spatel
jsji
nemanjai
cameron.mcinally
uweigand
lebedev.ri

Group Reviewers

Restricted Project

Commits

rGd577193c0f74: [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d

Summary

If the DAG looks like this: a*b+c*d, it could be folded into fma(a, b, c*d) or fma(c, d, a*b). https://reviews.llvm.org/D11855 was posted to improve it that respects the uses of a*b or c*d to do the best choice.
But for a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b). This patch is trying to respect the uses of a*b and c*d to make the best choice.
And this is the motivated case:

define double @fsub1(double %a, double %b, double %c, double %d)  {
entry:
  %mul = fmul fast double %b, %a
  %mul1 = fmul fast double %d, %c
  %sub = fsub fast double %mul, %mul1
  %mul3 = fmul fast double %mul, %sub
  ret double %mul3
}

 define double @fsub1(double %a, double %b, double %c, double %d)  {
 ; CHECK-LABEL: fsub1:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsmuldp 3, 4, 3
 ; CHECK-NEXT:    xsmuldp 0, 2, 1
-; CHECK-NEXT:    xsmsubadp 3, 2, 1
-; CHECK-NEXT:    xsmuldp 1, 0, 3
+; CHECK-NEXT:    fmr 1, 0
+; CHECK-NEXT:    xsnmsubadp 1, 4, 3
+; CHECK-NEXT:    xsmuldp 1, 0, 1

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steven.zhang created this revision.Mar 11 2020, 4:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2020, 4:33 AM

Herald added subscribers: • wuzish, kbarton, hiraditya, wdng. · View Herald Transcript

Harbormaster failed remote builds in B48795: Diff 249582!Mar 11 2020, 5:09 AM

If their uses are the same, add a target hook to allow some platform such as PowerPC to make the choice, as it has different precisions between these two folding which is caused by round a*b or c*d.
In PowerPC, we have some floating precision quite sensitive libraries that depend on the slightly difference between rounding a*b or c*d. So, we need some way to control the behavior of this combine.

To me that sounds like strict floating point semantics should be used there?

In D75982#1916780, @lebedev.ri wrote:

If their uses are the same, add a target hook to allow some platform such as PowerPC to make the choice, as it has different precisions between these two folding which is caused by round a*b or c*d.
In PowerPC, we have some floating precision quite sensitive libraries that depend on the slightly difference between rounding a*b or c*d. So, we need some way to control the behavior of this combine.

To me that sounds like strict floating point semantics should be used there?

I have no idea if llvm has suitable flag to indicate this semantics, and if yes, that would be great to guard it under some flag instead of the target hook. Do you have any suggestions ?

The precision difference is caused by follows to make the semantics clear(pls ignore this if you have understood it from the patch description):

// current implementation
x = a*b - c*d
-->
t = c*d; (rounding happens)
x = a*b - t  (fma(a,b, -t))

// another folding is:
x = a*b - c*d
-->
t = a*b (rounding happens)
x = t - c*d (fma(-c, d, t)

It is UB on evaluating which operand first for expression "lhs op rhs". Now, llvm prefer evaluating the rhs first for this case.

NFC update the code to make it easier to follow.

I don't think this is the right way to solve the problem. From what I can tell, there is no guarantee that this target hook will always work the way you want. What if the source code got reassociated in IR, so that the operands you're expecting as A/B/C/D are swapped?

If the math libs require that some FMA operation is performed to maintain precision, then they should be written using fma() or equivalent in whatever source language the library is written in:
https://en.cppreference.com/w/cpp/numeric/math/fma

Once we are in LLVM IR, that calculation should be maintained using:
http://llvm.org/docs/LangRef.html#int-fma

That said, I'm still not sure if this is all going to work as expected yet. Given the 'fast' flag, the compiler has the ability to do just about anything with FP calculations. But my guess is that we'll be ok in IR and codegen by using the fma intrinsic.

Harbormaster failed remote builds in B48811: Diff 249622!Mar 11 2020, 8:31 AM

lebedev.ri requested changes to this revision.Mar 11 2020, 3:16 PM

This revision now requires changes to proceed.Mar 11 2020, 3:16 PM

steven.zhang planned changes to this revision.Mar 11 2020, 6:45 PM

In D75982#1916940, @spatel wrote:

I don't think this is the right way to solve the problem. From what I can tell, there is no guarantee that this target hook will always work the way you want. What if the source code got reassociated in IR, so that the operands you're expecting as A/B/C/D are swapped?

If the math libs require that some FMA operation is performed to maintain precision, then they should be written using fma() or equivalent in whatever source language the library is written in:
https://en.cppreference.com/w/cpp/numeric/math/fma

Once we are in LLVM IR, that calculation should be maintained using:
http://llvm.org/docs/LangRef.html#int-fma

See also:
http://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

That said, I'm still not sure if this is all going to work as expected yet. Given the 'fast' flag, the compiler has the ability to do just about anything with FP calculations. But my guess is that we'll be ok in IR and codegen by using the fma intrinsic.

Make sense. Thank you for the information. I will remove the hook and only do it when there is more uses.

Remove the hook.

Harbormaster failed remote builds in B48923: Diff 249821!Mar 11 2020, 9:06 PM

lebedev.ri added inline comments.Mar 12 2020, 1:09 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11925	I'm not sure why `Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)` check is here? That is already checked in the lambdas.

steven.zhang marked an inline comment as done.Mar 12 2020, 1:50 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11925	Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which one, Therefore, we need the check here.

lebedev.ri added inline comments.Mar 12 2020, 2:34 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11925	Err, i mean, why `Aggressive` check is there?

steven.zhang marked an inline comment as done.Mar 12 2020, 3:08 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11925	Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is cheap as it only check some flags. I will remove the Aggressive check.

steven.zhang marked an inline comment as done.Mar 12 2020, 3:09 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11925	oops, s/safe/save

Remove the Aggressive check.

Harbormaster failed remote builds in B48959: Diff 249878!Mar 12 2020, 4:30 AM

This doesn't look unreasonable to me now.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11902–11931	I suggest to land this NFC refactoring first, to make the diff more understandable.
11927	// fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
11930	// fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))

steven.zhang marked an inline comment as done.Mar 12 2020, 8:11 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11902–11931	ok，I assume that you LGTM with the refactor. So, I won't post another revision review for this NFC.(adding two lambda to do the folding). With that NFC patch landing, I will update this patch. If there is any concern, let me know. Thank you!

spatel added inline comments.Mar 12 2020, 12:52 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

11926

It's not clear to me what we want to do in the case where both of the intermediate operands have other uses, and that difference is not visible in the existing tests. Would it be better to simplify this check to:

if (N1.hasOneUse())
  // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
else
  // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))

We should probably add tests like this either way:

define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
  %ab = fmul fast double %a, %b
  %cd = fmul fast double %c, %d
  store double %ab, double* %p1 ; extra use of %ab
  store double %ab, double* %p2 ; another extra use of %ab
  store double %cd, double* %p3 ; extra use of %cd
  %r = fsub fast double %ab, %cd
  ret double %r
}

define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
  %ab = fmul fast double %a, %b
  %cd = fmul fast double %c, %d
  store double %ab, double* %p1 ; extra use of %ab
  store double %cd, double* %p2 ; extra use of %cd
  store double %cd, double* %p3 ; another extra use of %cd 
  %r = fsub fast double %ab, %cd
  ret double %r
}

steven.zhang marked an inline comment as done.Mar 12 2020, 9:29 PM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11926	Good point but the answer is NO. I think, the direction that reduce the use of fmul that has less uses make sense, as it will expose other opportunities while its uses reaches one or zero quicker than the other one. For example: x = ab - cd; y = ef - cd; // there are more than one uses of "ab" later. We have 3 uses of "ab" at least, and 2 uses of "cd". We should prefer to fold "cd" for "x" and "y" so that, its def could be removed. That is the reason why we want to fold less uses instead of one use. Does it make sense ? I have added the tests for your cases and the one I give above.

Rebase the patch and add tests.

Harbormaster failed remote builds in B49094: Diff 250125!Mar 12 2020, 11:19 PM

spatel added inline comments.Mar 13 2020, 9:16 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11926	Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC commit, so we'll see the codegen diff (if any) resulting from this patch. I think with the extra tests + the NFC refactor separated out, this patch will be good.

Rebase the patch.

Harbormaster completed remote builds in B49266: Diff 250450.Mar 15 2020, 8:57 PM

LGTM (but see if @lebedev.ri has any more comments)

No more comments from me, looks reasonable i suppose.

@lebedev.ri Would you please accept this revision if didn't have comments as you request change for it. Thank you!

lebedev.ri resigned from this revision.Mar 16 2020, 11:37 PM

This revision is now accepted and ready to land.Mar 16 2020, 11:37 PM

Closed by commit rGd577193c0f74: [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d (authored by steven.zhang). · Explain WhyMar 17 2020, 9:04 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

11 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

47 lines

Target/

PowerPC/

PPCISelLowering.h

2 lines

PPCISelLowering.cpp

12 lines

test/

CodeGen/

PowerPC/

fma-precision.ll

53 lines

recipest.ll

9 lines

Diff 249582

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,620 Lines • ▼ Show 20 Lines	virtual bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
return false;		return false;
}		}

/// IR version		/// IR version
virtual bool isFMAFasterThanFMulAndFAdd(const Function &F, Type *) const {		virtual bool isFMAFasterThanFMulAndFAdd(const Function &F, Type *) const {
return false;		return false;
}		}

		/// Return true if we want to fold ab +/- cd as fma(c, d, a*b) or
		/// fma(-c, d, ab), otherwise, it is folded as fma(a, b, cd) or
		/// fma(a, b, -c*d). The result could be different between these two as
		/// different result of folding ab against cd.
		virtual bool shouldFMAFoldSecondOperand(const SDNode *N) const {
		assert(N->getOpcode() == ISD::FADD \|\| N->getOpcode() == ISD::FSUB);
		assert(N->getOperand(0).getOpcode() == ISD::FMUL &&
		N->getOperand(1).getOpcode() == ISD::FMUL);
		return false;
		}

/// Returns true if the FADD or FSUB node passed could legally be combined with		/// Returns true if the FADD or FSUB node passed could legally be combined with
/// an fmul to form an ISD::FMAD.		/// an fmul to form an ISD::FMAD.
virtual bool isFMADLegalForFAddFSub(const SelectionDAG &DAG,		virtual bool isFMADLegalForFAddFSub(const SelectionDAG &DAG,
const SDNode *N) const {		const SDNode *N) const {
assert(N->getOpcode() == ISD::FADD \|\| N->getOpcode() == ISD::FSUB);		assert(N->getOpcode() == ISD::FADD \|\| N->getOpcode() == ISD::FSUB);
return isOperationLegal(ISD::FMAD, N->getValueType(0));		return isOperationLegal(ISD::FMAD, N->getValueType(0));
}		}

▲ Show 20 Lines • Show All 1,764 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,673 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {
if (N.getOpcode() != ISD::FMUL)		if (N.getOpcode() != ISD::FMUL)
return false;		return false;
return AllowFusionGlobally \|\| isContractable(N.getNode());		return AllowFusionGlobally \|\| isContractable(N.getNode());
};		};
// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses or target really want to.
if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {		if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {
if (N0.getNode()->use_size() > N1.getNode()->use_size())		if (N0.getNode()->use_size() > N1.getNode()->use_size() \|\|
		((N0.getNode()->use_size() == N1.getNode()->use_size()) &&
		TLI.shouldFMAFoldSecondOperand(N)))
std::swap(N0, N1);		std::swap(N0, N1);
}		}

// fold (fadd (fmul x, y), z) -> (fma x, y, z)		// fold (fadd (fmul x, y), z) -> (fma x, y, z)
if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {		if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1), N1, Flags);		N0.getOperand(0), N0.getOperand(1), N1, Flags);
}		}
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {
if (N.getOpcode() != ISD::FMUL)		if (N.getOpcode() != ISD::FMUL)
return false;		return false;
return AllowFusionGlobally \|\| isContractable(N.getNode());		return AllowFusionGlobally \|\| isContractable(N.getNode());
};		};

// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))		// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))
if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {		auto tryToFoldXYSubZ = [&](SDValue XY, SDValue Z) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		if (isContractableFMUL(XY) && (Aggressive \|\| XY->hasOneUse())) {
N0.getOperand(0), N0.getOperand(1),		return DAG.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0),
DAG.getNode(ISD::FNEG, SL, VT, N1), Flags);		XY.getOperand(1), DAG.getNode(ISD::FNEG, SL, VT, Z),
		Flags);
}		}
		return SDValue();
		};

// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)		// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)
// Note: Commutes FSUB operands.		// Note: Commutes FSUB operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		auto tryToFoldXSubYZ = [&](SDValue X, SDValue YZ) {
		if (isContractableFMUL(YZ) && (Aggressive \|\| YZ->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FNEG, SL, VT,		DAG.getNode(ISD::FNEG, SL, VT, YZ.getOperand(0)),
N1.getOperand(0)),		YZ.getOperand(1), X, Flags);
N1.getOperand(1), N0, Flags);
}		}
		return SDValue();
		};

		// If we have two choices trying to fold (fsub (fmul u, v), (fmul x, y)),
		// prefer to fold the multiply with fewer uses or target really want to.
		if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure why `Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)` check is here? That is already checked in the lambdas. lebedev.ri: I'm not sure why `Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)` check is here?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which one, Therefore, we need the check here. steven.zhang: Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Err, i mean, why `Aggressive` check is there? lebedev.ri: Err, i mean, why `Aggressive` check is there?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is cheap as it only check some flags. I will remove the Aggressive check. steven.zhang: Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions oops, s/safe/save steven.zhang: oops, s/safe/save
		if (N0.getNode()->use_size() > N1.getNode()->use_size() \|\|
		spatelUnsubmitted Not Done Reply Inline Actions It's not clear to me what we want to do in the case where both of the intermediate operands have other uses, and that difference is not visible in the existing tests. Would it be better to simplify this check to: if (N1.hasOneUse()) // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b)) else // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d))) We should probably add tests like this either way: define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) { %ab = fmul fast double %a, %b %cd = fmul fast double %c, %d store double %ab, double* %p1 ; extra use of %ab store double %ab, double* %p2 ; another extra use of %ab store double %cd, double* %p3 ; extra use of %cd %r = fsub fast double %ab, %cd ret double %r } define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) { %ab = fmul fast double %a, %b %cd = fmul fast double %c, %d store double %ab, double* %p1 ; extra use of %ab store double %cd, double* %p2 ; extra use of %cd store double %cd, double* %p3 ; another extra use of %cd %r = fsub fast double %ab, %cd ret double %r } spatel: It's not clear to me what we want to do in the case where both of the intermediate operands…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Good point but the answer is NO. I think, the direction that reduce the use of fmul that has less uses make sense, as it will expose other opportunities while its uses reaches one or zero quicker than the other one. For example: x = ab - cd; y = ef - cd; // there are more than one uses of "ab" later. We have 3 uses of "ab" at least, and 2 uses of "cd". We should prefer to fold "cd" for "x" and "y" so that, its def could be removed. That is the reason why we want to fold less uses instead of one use. Does it make sense ? I have added the tests for your cases and the one I give above. steven.zhang: Good point but the answer is NO. I think, the direction that reduce the use of fmul that has…
		spatelUnsubmitted Not Done Reply Inline Actions Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC commit, so we'll see the codegen diff (if any) resulting from this patch. I think with the extra tests + the NFC refactor separated out, this patch will be good. spatel: Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC…
		((N0.getNode()->use_size() == N1.getNode()->use_size()) &&
		lebedev.riUnsubmitted Not Done Reply Inline Actions // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b)) lebedev.ri: ``` // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b)) ```
		TLI.shouldFMAFoldSecondOperand(N)))
		if (SDValue V = tryToFoldXSubYZ(N0, N1))
		return V;
		lebedev.riUnsubmitted Not Done Reply Inline Actions // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d))) lebedev.ri: ``` // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d))) ```
		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions I suggest to land this NFC refactoring first, to make the diff more understandable. lebedev.ri: I suggest to land this NFC refactoring first, to make the diff more understandable.
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions ok，I assume that you LGTM with the refactor. So, I won't post another revision review for this NFC.(adding two lambda to do the folding). With that NFC patch landing, I will update this patch. If there is any concern, let me know. Thank you! steven.zhang: ok，I assume that you LGTM with the refactor. So, I won't post another revision review for this…

		if (SDValue V = tryToFoldXYSubZ(N0, N1))
		return V;

		if (SDValue V = tryToFoldXSubYZ(N0, N1))
		return V;

// fold (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))		// fold (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))
if (N0.getOpcode() == ISD::FNEG && isContractableFMUL(N0.getOperand(0)) &&		if (N0.getOpcode() == ISD::FNEG && isContractableFMUL(N0.getOperand(0)) &&
(Aggressive \|\| (N0->hasOneUse() && N0.getOperand(0).hasOneUse()))) {		(Aggressive \|\| (N0->hasOneUse() && N0.getOperand(0).hasOneUse()))) {
SDValue N00 = N0.getOperand(0).getOperand(0);		SDValue N00 = N0.getOperand(0).getOperand(0);
SDValue N01 = N0.getOperand(0).getOperand(1);		SDValue N01 = N0.getOperand(0).getOperand(1);
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FNEG, SL, VT, N00), N01,		DAG.getNode(ISD::FNEG, SL, VT, N00), N01,
▲ Show 20 Lines • Show All 9,645 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 905 Lines • ▼ Show 20 Lines	public:

/// isFMAFasterThanFMulAndFAdd - Return true if an FMA operation is faster		/// isFMAFasterThanFMulAndFAdd - Return true if an FMA operation is faster
/// than a pair of fmul and fadd instructions. fmuladd intrinsics will be		/// than a pair of fmul and fadd instructions. fmuladd intrinsics will be
/// expanded to FMAs when this method returns true, otherwise fmuladd is		/// expanded to FMAs when this method returns true, otherwise fmuladd is
/// expanded to fmul + fadd.		/// expanded to fmul + fadd.
bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,		bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
EVT VT) const override;		EVT VT) const override;

		bool shouldFMAFoldSecondOperand(const SDNode *N) const override;

const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;		const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;

// Should we expand the build vector with shuffles?		// Should we expand the build vector with shuffles?
bool		bool
shouldExpandBuildVectorWithShuffles(EVT VT,		shouldExpandBuildVectorWithShuffles(EVT VT,
unsigned DefinedValues) const override;		unsigned DefinedValues) const override;

/// createFastISel - This method returns a target-specific FastISel object,		/// createFastISel - This method returns a target-specific FastISel object,
▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
cl::desc("don't always align innermost loop to 32 bytes on ppc"), cl::Hidden);		cl::desc("don't always align innermost loop to 32 bytes on ppc"), cl::Hidden);

static cl::opt<bool> EnableQuadPrecision("enable-ppc-quad-precision",		static cl::opt<bool> EnableQuadPrecision("enable-ppc-quad-precision",
cl::desc("enable quad precision float support on ppc"), cl::Hidden);		cl::desc("enable quad precision float support on ppc"), cl::Hidden);

static cl::opt<bool> UseAbsoluteJumpTables("ppc-use-absolute-jumptables",		static cl::opt<bool> UseAbsoluteJumpTables("ppc-use-absolute-jumptables",
cl::desc("use absolute jump tables on ppc"), cl::Hidden);		cl::desc("use absolute jump tables on ppc"), cl::Hidden);

		static cl::opt<bool> FMAEvaFirstOperand(
		"ppc-fma-eva-first-op",
		cl::desc("fold ab+/-cd as fma(c, d, ab) or fma(-c, d, ab)"),
		cl::Hidden);

STATISTIC(NumTailCalls, "Number of tail calls");		STATISTIC(NumTailCalls, "Number of tail calls");
STATISTIC(NumSiblingCalls, "Number of sibling calls");		STATISTIC(NumSiblingCalls, "Number of sibling calls");

static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);		static bool isNByteElemShuffleMask(ShuffleVectorSDNode *, unsigned, int);

static SDValue widenVec(SelectionDAG &DAG, SDValue Vec, const SDLoc &dl);		static SDValue widenVec(SelectionDAG &DAG, SDValue Vec, const SDLoc &dl);

// FIXME: Remove this once the bug has been fixed!		// FIXME: Remove this once the bug has been fixed!
▲ Show 20 Lines • Show All 1,328 Lines • ▼ Show 20 Lines	EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,
return VT.changeVectorElementTypeToInteger();		return VT.changeVectorElementTypeToInteger();
}		}

bool PPCTargetLowering::enableAggressiveFMAFusion(EVT VT) const {		bool PPCTargetLowering::enableAggressiveFMAFusion(EVT VT) const {
assert(VT.isFloatingPoint() && "Non-floating-point FMA?");		assert(VT.isFloatingPoint() && "Non-floating-point FMA?");
return true;		return true;
}		}

		bool PPCTargetLowering::shouldFMAFoldSecondOperand(const SDNode *N) const {
		assert(N->getOpcode() == ISD::FADD \|\| N->getOpcode() == ISD::FSUB);
		assert(N->getOperand(0).getOpcode() == ISD::FMUL &&
		N->getOperand(1).getOpcode() == ISD::FMUL);
		return FMAEvaFirstOperand;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Node matching predicates, for use by the tblgen matching code.		// Node matching predicates, for use by the tblgen matching code.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// isFloatingPointZero - Return true if this is 0.0 or -0.0.		/// isFloatingPointZero - Return true if this is 0.0 or -0.0.
static bool isFloatingPointZero(SDValue Op) {		static bool isFloatingPointZero(SDValue Op) {
if (ConstantFPSDNode *CFP = dyn_cast<ConstantFPSDNode>(Op))		if (ConstantFPSDNode *CFP = dyn_cast<ConstantFPSDNode>(Op))
return CFP->getValueAPF().isZero();		return CFP->getValueAPF().isZero();
▲ Show 20 Lines • Show All 14,382 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fma-precision.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-linux-gnu \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-linux-gnu \| FileCheck %s
				; RUN: llc < %s -verify-machineinstrs -mcpu=pwr9 --ppc-fma-eva-first-op -mtriple=powerpc64le-linux-gnu \| \
				; RUN: FileCheck %s -check-prefix=CHECK-EVA-FIRST-OP

	; Verify that the fold of ab-cd respect the uses of a*b			; Verify that the fold of ab-cd respect the uses of a*b
	define double @fsub1(double %a, double %b, double %c, double %d) {			define double @fsub1(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fsub1:			; CHECK-LABEL: fsub1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 3, 4, 3
	; CHECK-NEXT: xsmuldp 0, 2, 1			; CHECK-NEXT: xsmuldp 0, 2, 1
	; CHECK-NEXT: xsmsubadp 3, 2, 1			; CHECK-NEXT: fmr 1, 0
	; CHECK-NEXT: xsmuldp 1, 0, 3			; CHECK-NEXT: xsnmsubadp 1, 4, 3
				; CHECK-NEXT: xsmuldp 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; CHECK-EVA-FIRST-OP-LABEL: fsub1:
				; CHECK-EVA-FIRST-OP: # %bb.0: # %entry
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 0, 2, 1
				; CHECK-EVA-FIRST-OP-NEXT: fmr 1, 0
				; CHECK-EVA-FIRST-OP-NEXT: xsnmsubadp 1, 4, 3
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 1, 0, 1
				; CHECK-EVA-FIRST-OP-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%sub = fsub fast double %mul, %mul1			%sub = fsub fast double %mul, %mul1
	%mul3 = fmul fast double %mul, %sub			%mul3 = fmul fast double %mul, %sub
	ret double %mul3			ret double %mul3
	}			}

	; Verify that the fold of ab-cd respect the uses of c*d			; Verify that the fold of ab-cd respect the uses of c*d
	define double @fsub2(double %a, double %b, double %c, double %d) {			define double @fsub2(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fsub2:			; CHECK-LABEL: fsub2:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 0, 4, 3			; CHECK-NEXT: xsmuldp 0, 4, 3
	; CHECK-NEXT: fmr 3, 0			; CHECK-NEXT: fmr 3, 0
	; CHECK-NEXT: xsmsubadp 3, 2, 1			; CHECK-NEXT: xsmsubadp 3, 2, 1
	; CHECK-NEXT: xsmuldp 1, 0, 3			; CHECK-NEXT: xsmuldp 1, 0, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; CHECK-EVA-FIRST-OP-LABEL: fsub2:
				; CHECK-EVA-FIRST-OP: # %bb.0: # %entry
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 0, 4, 3
				; CHECK-EVA-FIRST-OP-NEXT: fmr 3, 0
				; CHECK-EVA-FIRST-OP-NEXT: xsmsubadp 3, 2, 1
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 1, 0, 3
				; CHECK-EVA-FIRST-OP-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%sub = fsub fast double %mul, %mul1			%sub = fsub fast double %mul, %mul1
	%mul3 = fmul fast double %mul1, %sub			%mul3 = fmul fast double %mul1, %sub
	ret double %mul3			ret double %mul3
	}			}

	; Verify that the fold of ab-cd if there is no uses of ab and cd			; Verify that the fold of ab-cd if there is no uses of ab and cd
	define double @fsub3(double %a, double %b, double %c, double %d) {			define double @fsub3(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fsub3:			; CHECK-LABEL: fsub3:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 0, 4, 3			; CHECK-NEXT: xsmuldp 0, 4, 3
	; CHECK-NEXT: xsmsubadp 0, 2, 1			; CHECK-NEXT: xsmsubadp 0, 2, 1
	; CHECK-NEXT: fmr 1, 0			; CHECK-NEXT: fmr 1, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; CHECK-EVA-FIRST-OP-LABEL: fsub3:
				; CHECK-EVA-FIRST-OP: # %bb.0: # %entry
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 1, 2, 1
				; CHECK-EVA-FIRST-OP-NEXT: xsnmsubadp 1, 4, 3
				; CHECK-EVA-FIRST-OP-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%sub = fsub fast double %mul, %mul1			%sub = fsub fast double %mul, %mul1
	ret double %sub			ret double %sub
	}			}

	; Verify that the fold of ab+cd respect the uses of a*b			; Verify that the fold of ab+cd respect the uses of a*b
	define double @fadd1(double %a, double %b, double %c, double %d) {			define double @fadd1(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fadd1:			; CHECK-LABEL: fadd1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 0, 2, 1			; CHECK-NEXT: xsmuldp 0, 2, 1
	; CHECK-NEXT: fmr 1, 0			; CHECK-NEXT: fmr 1, 0
	; CHECK-NEXT: xsmaddadp 1, 4, 3			; CHECK-NEXT: xsmaddadp 1, 4, 3
	; CHECK-NEXT: xsmuldp 1, 0, 1			; CHECK-NEXT: xsmuldp 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; CHECK-EVA-FIRST-OP-LABEL: fadd1:
				; CHECK-EVA-FIRST-OP: # %bb.0: # %entry
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 0, 2, 1
				; CHECK-EVA-FIRST-OP-NEXT: fmr 1, 0
				; CHECK-EVA-FIRST-OP-NEXT: xsmaddadp 1, 4, 3
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 1, 0, 1
				; CHECK-EVA-FIRST-OP-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%add = fadd fast double %mul1, %mul			%add = fadd fast double %mul1, %mul
	%mul3 = fmul fast double %mul, %add			%mul3 = fmul fast double %mul, %add
	ret double %mul3			ret double %mul3
	}			}

	; Verify that the fold of ab+cd respect the uses of c*d			; Verify that the fold of ab+cd respect the uses of c*d
	define double @fadd2(double %a, double %b, double %c, double %d) {			define double @fadd2(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fadd2:			; CHECK-LABEL: fadd2:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 0, 4, 3			; CHECK-NEXT: xsmuldp 0, 4, 3
	; CHECK-NEXT: fmr 3, 0			; CHECK-NEXT: fmr 3, 0
	; CHECK-NEXT: xsmaddadp 3, 2, 1			; CHECK-NEXT: xsmaddadp 3, 2, 1
	; CHECK-NEXT: xsmuldp 1, 0, 3			; CHECK-NEXT: xsmuldp 1, 0, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; CHECK-EVA-FIRST-OP-LABEL: fadd2:
				; CHECK-EVA-FIRST-OP: # %bb.0: # %entry
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 0, 4, 3
				; CHECK-EVA-FIRST-OP-NEXT: fmr 3, 0
				; CHECK-EVA-FIRST-OP-NEXT: xsmaddadp 3, 2, 1
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 1, 0, 3
				; CHECK-EVA-FIRST-OP-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%add = fadd fast double %mul1, %mul			%add = fadd fast double %mul1, %mul
	%mul3 = fmul fast double %mul1, %add			%mul3 = fmul fast double %mul1, %add
	ret double %mul3			ret double %mul3
	}			}

	; Verify that the fold of ab+cd if there is no uses of ab and cd			; Verify that the fold of ab+cd if there is no uses of ab and cd
	define double @fadd3(double %a, double %b, double %c, double %d) {			define double @fadd3(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fadd3:			; CHECK-LABEL: fadd3:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 1, 2, 1			; CHECK-NEXT: xsmuldp 1, 2, 1
	; CHECK-NEXT: xsmaddadp 1, 4, 3			; CHECK-NEXT: xsmaddadp 1, 4, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
				;
				; CHECK-EVA-FIRST-OP-LABEL: fadd3:
				; CHECK-EVA-FIRST-OP: # %bb.0: # %entry
				; CHECK-EVA-FIRST-OP-NEXT: xsmuldp 0, 4, 3
				; CHECK-EVA-FIRST-OP-NEXT: xsmaddadp 0, 2, 1
				; CHECK-EVA-FIRST-OP-NEXT: fmr 1, 0
				; CHECK-EVA-FIRST-OP-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%add = fadd fast double %mul1, %mul			%add = fadd fast double %mul1, %mul
	ret double %add			ret double %add
	}			}

llvm/test/CodeGen/PowerPC/recipest.ll

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: frsqrtes 0, 1			; CHECK-NEXT: frsqrtes 0, 1
	; CHECK-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; CHECK-NEXT: addis 4, 2, .LCPI10_1@toc@ha			; CHECK-NEXT: addis 4, 2, .LCPI10_1@toc@ha
	; CHECK-NEXT: lfs 4, .LCPI10_0@toc@l(3)			; CHECK-NEXT: lfs 4, .LCPI10_0@toc@l(3)
	; CHECK-NEXT: lfs 5, .LCPI10_1@toc@l(4)			; CHECK-NEXT: lfs 5, .LCPI10_1@toc@l(4)
	; CHECK-NEXT: fmuls 1, 1, 0			; CHECK-NEXT: fmuls 1, 1, 0
	; CHECK-NEXT: fmadds 1, 1, 0, 4			; CHECK-NEXT: fmadds 1, 1, 0, 4
	; CHECK-NEXT: fmuls 0, 0, 5			; CHECK-NEXT: fmuls 0, 0, 5
	; CHECK-NEXT: fres 5, 2			; CHECK-NEXT: fmuls 0, 0, 1
				; CHECK-NEXT: fres 1, 2
	; CHECK-NEXT: fmuls 4, 0, 1			; CHECK-NEXT: fmuls 4, 0, 1
	; CHECK-NEXT: fmuls 4, 4, 5			; CHECK-NEXT: fnmsubs 0, 2, 4, 0
	; CHECK-NEXT: fmuls 2, 2, 4			; CHECK-NEXT: fmadds 0, 1, 0, 4
	; CHECK-NEXT: fmsubs 0, 0, 1, 2
	; CHECK-NEXT: fmadds 0, 5, 0, 4
	; CHECK-NEXT: fmuls 1, 3, 0			; CHECK-NEXT: fmuls 1, 3, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%x = call fast float @llvm.sqrt.f32(float %a)			%x = call fast float @llvm.sqrt.f32(float %a)
	%y = fmul fast float %x, %b			%y = fmul fast float %x, %b
	%z = fdiv fast float %c, %y			%z = fdiv fast float %c, %y
	ret float %z			ret float %z
	}			}

	▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Respect the uses when combine FMA for a*b+/-c*dClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 249582

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/PowerPC/PPCISelLowering.h

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

llvm/test/CodeGen/PowerPC/fma-precision.ll

llvm/test/CodeGen/PowerPC/recipest.ll

[DAGCombine] Respect the uses when combine FMA for ab+/-cd
ClosedPublic