This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
5/12
DAGCombiner.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
fma-precision.ll
-
recipest.ll

Differential D75982

[DAGCombine] Respect the uses when combine FMA for ab+/-cd
ClosedPublic

Authored by steven.zhang on Mar 11 2020, 4:32 AM.

Download Raw Diff

Details

Reviewers

arsenm
RKSimon
spatel
jsji
nemanjai
cameron.mcinally
uweigand
lebedev.ri

Group Reviewers

Restricted Project

Commits

rGd577193c0f74: [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d

Summary

If the DAG looks like this: a*b+c*d, it could be folded into fma(a, b, c*d) or fma(c, d, a*b). https://reviews.llvm.org/D11855 was posted to improve it that respects the uses of a*b or c*d to do the best choice.
But for a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b). This patch is trying to respect the uses of a*b and c*d to make the best choice.
And this is the motivated case:

define double @fsub1(double %a, double %b, double %c, double %d)  {
entry:
  %mul = fmul fast double %b, %a
  %mul1 = fmul fast double %d, %c
  %sub = fsub fast double %mul, %mul1
  %mul3 = fmul fast double %mul, %sub
  ret double %mul3
}

 define double @fsub1(double %a, double %b, double %c, double %d)  {
 ; CHECK-LABEL: fsub1:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsmuldp 3, 4, 3
 ; CHECK-NEXT:    xsmuldp 0, 2, 1
-; CHECK-NEXT:    xsmsubadp 3, 2, 1
-; CHECK-NEXT:    xsmuldp 1, 0, 3
+; CHECK-NEXT:    fmr 1, 0
+; CHECK-NEXT:    xsnmsubadp 1, 4, 3
+; CHECK-NEXT:    xsmuldp 1, 0, 1

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	240 ms	libunwind.libunwind::Unknown Unit Message ("")

Event Timeline

steven.zhang created this revision.Mar 11 2020, 4:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2020, 4:33 AM

Herald added subscribers: • wuzish, kbarton, hiraditya, wdng. · View Herald Transcript

Harbormaster failed remote builds in B48795: Diff 249582!Mar 11 2020, 5:09 AM

If their uses are the same, add a target hook to allow some platform such as PowerPC to make the choice, as it has different precisions between these two folding which is caused by round a*b or c*d.
In PowerPC, we have some floating precision quite sensitive libraries that depend on the slightly difference between rounding a*b or c*d. So, we need some way to control the behavior of this combine.

To me that sounds like strict floating point semantics should be used there?

In D75982#1916780, @lebedev.ri wrote:

If their uses are the same, add a target hook to allow some platform such as PowerPC to make the choice, as it has different precisions between these two folding which is caused by round a*b or c*d.
In PowerPC, we have some floating precision quite sensitive libraries that depend on the slightly difference between rounding a*b or c*d. So, we need some way to control the behavior of this combine.

To me that sounds like strict floating point semantics should be used there?

I have no idea if llvm has suitable flag to indicate this semantics, and if yes, that would be great to guard it under some flag instead of the target hook. Do you have any suggestions ?

The precision difference is caused by follows to make the semantics clear(pls ignore this if you have understood it from the patch description):

// current implementation
x = a*b - c*d
-->
t = c*d; (rounding happens)
x = a*b - t  (fma(a,b, -t))

// another folding is:
x = a*b - c*d
-->
t = a*b (rounding happens)
x = t - c*d (fma(-c, d, t)

It is UB on evaluating which operand first for expression "lhs op rhs". Now, llvm prefer evaluating the rhs first for this case.

NFC update the code to make it easier to follow.

I don't think this is the right way to solve the problem. From what I can tell, there is no guarantee that this target hook will always work the way you want. What if the source code got reassociated in IR, so that the operands you're expecting as A/B/C/D are swapped?

If the math libs require that some FMA operation is performed to maintain precision, then they should be written using fma() or equivalent in whatever source language the library is written in:
https://en.cppreference.com/w/cpp/numeric/math/fma

Once we are in LLVM IR, that calculation should be maintained using:
http://llvm.org/docs/LangRef.html#int-fma

That said, I'm still not sure if this is all going to work as expected yet. Given the 'fast' flag, the compiler has the ability to do just about anything with FP calculations. But my guess is that we'll be ok in IR and codegen by using the fma intrinsic.

Harbormaster failed remote builds in B48811: Diff 249622!Mar 11 2020, 8:31 AM

lebedev.ri requested changes to this revision.Mar 11 2020, 3:16 PM

This revision now requires changes to proceed.Mar 11 2020, 3:16 PM

steven.zhang planned changes to this revision.Mar 11 2020, 6:45 PM

In D75982#1916940, @spatel wrote:

I don't think this is the right way to solve the problem. From what I can tell, there is no guarantee that this target hook will always work the way you want. What if the source code got reassociated in IR, so that the operands you're expecting as A/B/C/D are swapped?

If the math libs require that some FMA operation is performed to maintain precision, then they should be written using fma() or equivalent in whatever source language the library is written in:
https://en.cppreference.com/w/cpp/numeric/math/fma

Once we are in LLVM IR, that calculation should be maintained using:
http://llvm.org/docs/LangRef.html#int-fma

See also:
http://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

That said, I'm still not sure if this is all going to work as expected yet. Given the 'fast' flag, the compiler has the ability to do just about anything with FP calculations. But my guess is that we'll be ok in IR and codegen by using the fma intrinsic.

Make sense. Thank you for the information. I will remove the hook and only do it when there is more uses.

Remove the hook.

Harbormaster failed remote builds in B48923: Diff 249821!Mar 11 2020, 9:06 PM

lebedev.ri added inline comments.Mar 12 2020, 1:09 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940	I'm not sure why `Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)` check is here? That is already checked in the lambdas.

steven.zhang marked an inline comment as done.Mar 12 2020, 1:50 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940	Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which one, Therefore, we need the check here.

lebedev.ri added inline comments.Mar 12 2020, 2:34 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940	Err, i mean, why `Aggressive` check is there?

steven.zhang marked an inline comment as done.Mar 12 2020, 3:08 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940	Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is cheap as it only check some flags. I will remove the Aggressive check.

steven.zhang marked an inline comment as done.Mar 12 2020, 3:09 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940	oops, s/safe/save

Remove the Aggressive check.

Harbormaster failed remote builds in B48959: Diff 249878!Mar 12 2020, 4:30 AM

This doesn't look unreasonable to me now.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11918–11956	I suggest to land this NFC refactoring first, to make the diff more understandable.
11942	// fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
11945	// fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))

steven.zhang marked an inline comment as done.Mar 12 2020, 8:11 AM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11918–11956	ok，I assume that you LGTM with the refactor. So, I won't post another revision review for this NFC.(adding two lambda to do the folding). With that NFC patch landing, I will update this patch. If there is any concern, let me know. Thank you!

spatel added inline comments.Mar 12 2020, 12:52 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

11941

It's not clear to me what we want to do in the case where both of the intermediate operands have other uses, and that difference is not visible in the existing tests. Would it be better to simplify this check to:

if (N1.hasOneUse())
  // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
else
  // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))

We should probably add tests like this either way:

define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
  %ab = fmul fast double %a, %b
  %cd = fmul fast double %c, %d
  store double %ab, double* %p1 ; extra use of %ab
  store double %ab, double* %p2 ; another extra use of %ab
  store double %cd, double* %p3 ; extra use of %cd
  %r = fsub fast double %ab, %cd
  ret double %r
}

define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
  %ab = fmul fast double %a, %b
  %cd = fmul fast double %c, %d
  store double %ab, double* %p1 ; extra use of %ab
  store double %cd, double* %p2 ; extra use of %cd
  store double %cd, double* %p3 ; another extra use of %cd 
  %r = fsub fast double %ab, %cd
  ret double %r
}

steven.zhang marked an inline comment as done.Mar 12 2020, 9:29 PM

steven.zhang added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11941	Good point but the answer is NO. I think, the direction that reduce the use of fmul that has less uses make sense, as it will expose other opportunities while its uses reaches one or zero quicker than the other one. For example: x = ab - cd; y = ef - cd; // there are more than one uses of "ab" later. We have 3 uses of "ab" at least, and 2 uses of "cd". We should prefer to fold "cd" for "x" and "y" so that, its def could be removed. That is the reason why we want to fold less uses instead of one use. Does it make sense ? I have added the tests for your cases and the one I give above.

Rebase the patch and add tests.

Harbormaster failed remote builds in B49094: Diff 250125!Mar 12 2020, 11:19 PM

spatel added inline comments.Mar 13 2020, 9:16 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11941	Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC commit, so we'll see the codegen diff (if any) resulting from this patch. I think with the extra tests + the NFC refactor separated out, this patch will be good.

Rebase the patch.

Harbormaster completed remote builds in B49266: Diff 250450.Mar 15 2020, 8:57 PM

LGTM (but see if @lebedev.ri has any more comments)

No more comments from me, looks reasonable i suppose.

@lebedev.ri Would you please accept this revision if didn't have comments as you request change for it. Thank you!

lebedev.ri resigned from this revision.Mar 16 2020, 11:37 PM

This revision is now accepted and ready to land.Mar 16 2020, 11:37 PM

Closed by commit rGd577193c0f74: [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d (authored by steven.zhang). · Explain WhyMar 17 2020, 9:04 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

25 lines

test/

CodeGen/

PowerPC/

fma-precision.ll

70 lines

recipest.ll

9 lines

Diff 250125

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,909 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUBForFMACombine(SDNode *N) {
// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {
if (N.getOpcode() != ISD::FMUL)		if (N.getOpcode() != ISD::FMUL)
return false;		return false;
return AllowFusionGlobally \|\| isContractable(N.getNode());		return AllowFusionGlobally \|\| isContractable(N.getNode());
};		};

// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))		// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))
auto tryToFoldXYSubZ = [&](SDValue XY, SDValue Z) {		auto tryToFoldXYSubZ = [&](SDValue XY, SDValue Z) {
if (isContractableFMUL(XY) && (Aggressive \|\| XY->hasOneUse())) {		if (isContractableFMUL(XY) && (Aggressive \|\| XY->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0),		return DAG.getNode(PreferredFusedOpcode, SL, VT, XY.getOperand(0),
XY.getOperand(1), DAG.getNode(ISD::FNEG, SL, VT, Z),		XY.getOperand(1), DAG.getNode(ISD::FNEG, SL, VT, Z),
Flags);		Flags);
}		}
return SDValue();		return SDValue();
};		};

// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)		// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)
// Note: Commutes FSUB operands.		// Note: Commutes FSUB operands.
auto tryToFoldXSubYZ = [&](SDValue X, SDValue YZ) {		auto tryToFoldXSubYZ = [&](SDValue X, SDValue YZ) {
if (isContractableFMUL(YZ) && (Aggressive \|\| YZ->hasOneUse())) {		if (isContractableFMUL(YZ) && (Aggressive \|\| YZ->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FNEG, SL, VT, YZ.getOperand(0)),		DAG.getNode(ISD::FNEG, SL, VT, YZ.getOperand(0)),
YZ.getOperand(1), X, Flags);		YZ.getOperand(1), X, Flags);
}		}
return SDValue();		return SDValue();
};		};

		// If we have two choices trying to fold (fsub (fmul u, v), (fmul x, y)),
		// prefer to fold the multiply with fewer uses.
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure why `Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)` check is here? That is already checked in the lambdas. lebedev.ri: I'm not sure why `Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)` check is here?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which one, Therefore, we need the check here. steven.zhang: Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Err, i mean, why `Aggressive` check is there? lebedev.ri: Err, i mean, why `Aggressive` check is there?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is cheap as it only check some flags. I will remove the Aggressive check. steven.zhang: Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions oops, s/safe/save steven.zhang: oops, s/safe/save
		if (isContractableFMUL(N0) && isContractableFMUL(N1) &&
		spatelUnsubmitted Not Done Reply Inline Actions It's not clear to me what we want to do in the case where both of the intermediate operands have other uses, and that difference is not visible in the existing tests. Would it be better to simplify this check to: if (N1.hasOneUse()) // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b)) else // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d))) We should probably add tests like this either way: define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) { %ab = fmul fast double %a, %b %cd = fmul fast double %c, %d store double %ab, double* %p1 ; extra use of %ab store double %ab, double* %p2 ; another extra use of %ab store double %cd, double* %p3 ; extra use of %cd %r = fsub fast double %ab, %cd ret double %r } define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) { %ab = fmul fast double %a, %b %cd = fmul fast double %c, %d store double %ab, double* %p1 ; extra use of %ab store double %cd, double* %p2 ; extra use of %cd store double %cd, double* %p3 ; another extra use of %cd %r = fsub fast double %ab, %cd ret double %r } spatel: It's not clear to me what we want to do in the case where both of the intermediate operands…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Good point but the answer is NO. I think, the direction that reduce the use of fmul that has less uses make sense, as it will expose other opportunities while its uses reaches one or zero quicker than the other one. For example: x = ab - cd; y = ef - cd; // there are more than one uses of "ab" later. We have 3 uses of "ab" at least, and 2 uses of "cd". We should prefer to fold "cd" for "x" and "y" so that, its def could be removed. That is the reason why we want to fold less uses instead of one use. Does it make sense ? I have added the tests for your cases and the one I give above. steven.zhang: Good point but the answer is NO. I think, the direction that reduce the use of fmul that has…
		spatelUnsubmitted Not Done Reply Inline Actions Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC commit, so we'll see the codegen diff (if any) resulting from this patch. I think with the extra tests + the NFC refactor separated out, this patch will be good. spatel: Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC…
		(N0.getNode()->use_size() > N1.getNode()->use_size())) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b)) lebedev.ri: ``` // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b)) ```
		// fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
		if (SDValue V = tryToFoldXSubYZ(N0, N1))
		return V;
		lebedev.riUnsubmitted Not Done Reply Inline Actions // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d))) lebedev.ri: ``` // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d))) ```
		// fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))
		if (SDValue V = tryToFoldXYSubZ(N0, N1))
		return V;
		} else {
// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))		// fold (fsub (fmul x, y), z) -> (fma x, y, (fneg z))
if (SDValue V = tryToFoldXYSubZ(N0, N1))		if (SDValue V = tryToFoldXYSubZ(N0, N1))
return V;		return V;

// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)		// fold (fsub x, (fmul y, z)) -> (fma (fneg y), z, x)
if (SDValue V = tryToFoldXSubYZ(N0, N1))		if (SDValue V = tryToFoldXSubYZ(N0, N1))
return V;		return V;
		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions I suggest to land this NFC refactoring first, to make the diff more understandable. lebedev.ri: I suggest to land this NFC refactoring first, to make the diff more understandable.
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions ok，I assume that you LGTM with the refactor. So, I won't post another revision review for this NFC.(adding two lambda to do the folding). With that NFC patch landing, I will update this patch. If there is any concern, let me know. Thank you! steven.zhang: ok，I assume that you LGTM with the refactor. So, I won't post another revision review for this…

// fold (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))		// fold (fsub (fneg (fmul, x, y)), z) -> (fma (fneg x), y, (fneg z))
if (N0.getOpcode() == ISD::FNEG && isContractableFMUL(N0.getOperand(0)) &&		if (N0.getOpcode() == ISD::FNEG && isContractableFMUL(N0.getOperand(0)) &&
(Aggressive \|\| (N0->hasOneUse() && N0.getOperand(0).hasOneUse()))) {		(Aggressive \|\| (N0->hasOneUse() && N0.getOperand(0).hasOneUse()))) {
SDValue N00 = N0.getOperand(0).getOperand(0);		SDValue N00 = N0.getOperand(0).getOperand(0);
SDValue N01 = N0.getOperand(0).getOperand(1);		SDValue N01 = N0.getOperand(0).getOperand(1);
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return DAG.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FNEG, SL, VT, N00), N01,		DAG.getNode(ISD::FNEG, SL, VT, N00), N01,
▲ Show 20 Lines • Show All 9,645 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/fma-precision.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-linux-gnu \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mcpu=pwr9 -mtriple=powerpc64le-linux-gnu \| FileCheck %s

	; Verify that the fold of ab-cd respect the uses of a*b			; Verify that the fold of ab-cd respect the uses of a*b
	define double @fsub1(double %a, double %b, double %c, double %d) {			define double @fsub1(double %a, double %b, double %c, double %d) {
	; CHECK-LABEL: fsub1:			; CHECK-LABEL: fsub1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: xsmuldp 3, 4, 3
	; CHECK-NEXT: xsmuldp 0, 2, 1			; CHECK-NEXT: xsmuldp 0, 2, 1
	; CHECK-NEXT: xsmsubadp 3, 2, 1			; CHECK-NEXT: fmr 1, 0
	; CHECK-NEXT: xsmuldp 1, 0, 3			; CHECK-NEXT: xsnmsubadp 1, 4, 3
				; CHECK-NEXT: xsmuldp 1, 0, 1
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%sub = fsub fast double %mul, %mul1			%sub = fsub fast double %mul, %mul1
	%mul3 = fmul fast double %mul, %sub			%mul3 = fmul fast double %mul, %sub
	ret double %mul3			ret double %mul3
	}			}
	▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: xsmaddadp 1, 4, 3			; CHECK-NEXT: xsmaddadp 1, 4, 3
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%mul = fmul fast double %b, %a			%mul = fmul fast double %b, %a
	%mul1 = fmul fast double %d, %c			%mul1 = fmul fast double %d, %c
	%add = fadd fast double %mul1, %mul			%add = fadd fast double %mul1, %mul
	ret double %add			ret double %add
	}			}

				define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
				; CHECK-LABEL: fma_multi_uses1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xsmuldp 1, 1, 2
				; CHECK-NEXT: xsmuldp 0, 3, 4
				; CHECK-NEXT: stfd 1, 0(7)
				; CHECK-NEXT: stfd 1, 0(8)
				; CHECK-NEXT: xsnmsubadp 1, 3, 4
				; CHECK-NEXT: stfd 0, 0(9)
				; CHECK-NEXT: blr
				%ab = fmul fast double %a, %b
				%cd = fmul fast double %c, %d
				store double %ab, double* %p1 ; extra use of %ab
				store double %ab, double* %p2 ; another extra use of %ab
				store double %cd, double* %p3 ; extra use of %cd
				%r = fsub fast double %ab, %cd
				ret double %r
				}

				define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
				; CHECK-LABEL: fma_multi_uses2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xsmuldp 5, 1, 2
				; CHECK-NEXT: xsmuldp 0, 3, 4
				; CHECK-NEXT: stfd 5, 0(7)
				; CHECK-NEXT: stfd 0, 0(8)
				; CHECK-NEXT: stfd 0, 0(9)
				; CHECK-NEXT: xsmsubadp 0, 1, 2
				; CHECK-NEXT: fmr 1, 0
				; CHECK-NEXT: blr
				%ab = fmul fast double %a, %b
				%cd = fmul fast double %c, %d
				store double %ab, double* %p1 ; extra use of %ab
				store double %cd, double* %p2 ; extra use of %cd
				store double %cd, double* %p3 ; another extra use of %cd
				%r = fsub fast double %ab, %cd
				ret double %r
				}

				define double @fma_multi_uses3(double %a, double %b, double %c, double %d, double %f, double %g, double* %p1, double* %p2, double* %p3) {
				; CHECK-LABEL: fma_multi_uses3:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xsmuldp 0, 1, 2
				; CHECK-NEXT: xsmuldp 1, 5, 6
				; CHECK-NEXT: ld 3, 96(1)
				; CHECK-NEXT: stfd 0, 0(9)
				; CHECK-NEXT: stfd 0, 0(10)
				; CHECK-NEXT: stfd 1, 0(3)
				; CHECK-NEXT: xsnmsubadp 1, 3, 4
				; CHECK-NEXT: xsnmsubadp 0, 3, 4
				; CHECK-NEXT: xsadddp 1, 0, 1
				; CHECK-NEXT: blr
				%ab = fmul fast double %a, %b
				%cd = fmul fast double %c, %d
				%fg = fmul fast double %f, %g
				store double %ab, double* %p1 ; extra use of %ab
				store double %ab, double* %p2 ; another extra use of %ab
				store double %fg, double* %p3 ; extra use of %fg
				%q = fsub fast double %fg, %cd ; The uses of %cd reduce to 1 after %r is folded. 2 uses of %fg, fold %cd, remove def of %cd
				%r = fsub fast double %ab, %cd ; Fold %r before %q. 3 uses of %ab, 2 uses of %cd, fold %cd
				%add = fadd fast double %r, %q
				ret double %add
				}

llvm/test/CodeGen/PowerPC/recipest.ll

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: frsqrtes 0, 1			; CHECK-NEXT: frsqrtes 0, 1
	; CHECK-NEXT: addis 3, 2, .LCPI10_0@toc@ha			; CHECK-NEXT: addis 3, 2, .LCPI10_0@toc@ha
	; CHECK-NEXT: addis 4, 2, .LCPI10_1@toc@ha			; CHECK-NEXT: addis 4, 2, .LCPI10_1@toc@ha
	; CHECK-NEXT: lfs 4, .LCPI10_0@toc@l(3)			; CHECK-NEXT: lfs 4, .LCPI10_0@toc@l(3)
	; CHECK-NEXT: lfs 5, .LCPI10_1@toc@l(4)			; CHECK-NEXT: lfs 5, .LCPI10_1@toc@l(4)
	; CHECK-NEXT: fmuls 1, 1, 0			; CHECK-NEXT: fmuls 1, 1, 0
	; CHECK-NEXT: fmadds 1, 1, 0, 4			; CHECK-NEXT: fmadds 1, 1, 0, 4
	; CHECK-NEXT: fmuls 0, 0, 5			; CHECK-NEXT: fmuls 0, 0, 5
	; CHECK-NEXT: fres 5, 2			; CHECK-NEXT: fmuls 0, 0, 1
				; CHECK-NEXT: fres 1, 2
	; CHECK-NEXT: fmuls 4, 0, 1			; CHECK-NEXT: fmuls 4, 0, 1
	; CHECK-NEXT: fmuls 4, 4, 5			; CHECK-NEXT: fnmsubs 0, 2, 4, 0
	; CHECK-NEXT: fmuls 2, 2, 4			; CHECK-NEXT: fmadds 0, 1, 0, 4
	; CHECK-NEXT: fmsubs 0, 0, 1, 2
	; CHECK-NEXT: fmadds 0, 5, 0, 4
	; CHECK-NEXT: fmuls 1, 3, 0			; CHECK-NEXT: fmuls 1, 3, 0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	%x = call fast float @llvm.sqrt.f32(float %a)			%x = call fast float @llvm.sqrt.f32(float %a)
	%y = fmul fast float %x, %b			%y = fmul fast float %x, %b
	%z = fdiv fast float %c, %y			%z = fdiv fast float %c, %y
	ret float %z			ret float %z
	}			}

	▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Respect the uses when combine FMA for a*b+/-c*dClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 250125

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/PowerPC/fma-precision.ll

llvm/test/CodeGen/PowerPC/recipest.ll

[DAGCombine] Respect the uses when combine FMA for ab+/-cd
ClosedPublic