This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Respect the uses when combine FMA for a*b+/-c*d
ClosedPublic

Authored by steven.zhang on Mar 11 2020, 4:32 AM.

Details

Summary

If the DAG looks like this: a*b+c*d, it could be folded into fma(a, b, c*d) or fma(c, d, a*b). https://reviews.llvm.org/D11855 was posted to improve it that respects the uses of a*b or c*d to do the best choice.
But for a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b). This patch is trying to respect the uses of a*b and c*d to make the best choice.
And this is the motivated case:

define double @fsub1(double %a, double %b, double %c, double %d)  {
entry:
  %mul = fmul fast double %b, %a
  %mul1 = fmul fast double %d, %c
  %sub = fsub fast double %mul, %mul1
  %mul3 = fmul fast double %mul, %sub
  ret double %mul3
}
 define double @fsub1(double %a, double %b, double %c, double %d)  {
 ; CHECK-LABEL: fsub1:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    xsmuldp 3, 4, 3
 ; CHECK-NEXT:    xsmuldp 0, 2, 1
-; CHECK-NEXT:    xsmsubadp 3, 2, 1
-; CHECK-NEXT:    xsmuldp 1, 0, 3
+; CHECK-NEXT:    fmr 1, 0
+; CHECK-NEXT:    xsnmsubadp 1, 4, 3
+; CHECK-NEXT:    xsmuldp 1, 0, 1

Diff Detail

Event Timeline

steven.zhang created this revision.Mar 11 2020, 4:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2020, 4:33 AM
lebedev.ri added a subscriber: lebedev.ri.

If their uses are the same, add a target hook to allow some platform such as PowerPC to make the choice, as it has different precisions between these two folding which is caused by round a*b or c*d.
In PowerPC, we have some floating precision quite sensitive libraries that depend on the slightly difference between rounding a*b or c*d. So, we need some way to control the behavior of this combine.

To me that sounds like strict floating point semantics should be used there?

If their uses are the same, add a target hook to allow some platform such as PowerPC to make the choice, as it has different precisions between these two folding which is caused by round a*b or c*d.
In PowerPC, we have some floating precision quite sensitive libraries that depend on the slightly difference between rounding a*b or c*d. So, we need some way to control the behavior of this combine.

To me that sounds like strict floating point semantics should be used there?

I have no idea if llvm has suitable flag to indicate this semantics, and if yes, that would be great to guard it under some flag instead of the target hook. Do you have any suggestions ?

The precision difference is caused by follows to make the semantics clear(pls ignore this if you have understood it from the patch description):

// current implementation
x = a*b - c*d
-->
t = c*d; (rounding happens)
x = a*b - t  (fma(a,b, -t))

// another folding is:
x = a*b - c*d
-->
t = a*b (rounding happens)
x = t - c*d (fma(-c, d, t)

It is UB on evaluating which operand first for expression "lhs op rhs". Now, llvm prefer evaluating the rhs first for this case.

NFC update the code to make it easier to follow.

I don't think this is the right way to solve the problem. From what I can tell, there is no guarantee that this target hook will always work the way you want. What if the source code got reassociated in IR, so that the operands you're expecting as A/B/C/D are swapped?

If the math libs require that some FMA operation is performed to maintain precision, then they should be written using fma() or equivalent in whatever source language the library is written in:
https://en.cppreference.com/w/cpp/numeric/math/fma

Once we are in LLVM IR, that calculation should be maintained using:
http://llvm.org/docs/LangRef.html#int-fma

See also:
http://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

That said, I'm still not sure if this is all going to work as expected yet. Given the 'fast' flag, the compiler has the ability to do just about anything with FP calculations. But my guess is that we'll be ok in IR and codegen by using the fma intrinsic.

lebedev.ri requested changes to this revision.Mar 11 2020, 3:16 PM
This revision now requires changes to proceed.Mar 11 2020, 3:16 PM
steven.zhang planned changes to this revision.Mar 11 2020, 6:45 PM

I don't think this is the right way to solve the problem. From what I can tell, there is no guarantee that this target hook will always work the way you want. What if the source code got reassociated in IR, so that the operands you're expecting as A/B/C/D are swapped?

If the math libs require that some FMA operation is performed to maintain precision, then they should be written using fma() or equivalent in whatever source language the library is written in:
https://en.cppreference.com/w/cpp/numeric/math/fma

Once we are in LLVM IR, that calculation should be maintained using:
http://llvm.org/docs/LangRef.html#int-fma

See also:
http://llvm.org/docs/LangRef.html#llvm-fmuladd-intrinsic

That said, I'm still not sure if this is all going to work as expected yet. Given the 'fast' flag, the compiler has the ability to do just about anything with FP calculations. But my guess is that we'll be ok in IR and codegen by using the fma intrinsic.

Make sense. Thank you for the information. I will remove the hook and only do it when there is more uses.

steven.zhang retitled this revision from [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d and add target hook if there uses are the same to [DAGCombine] Respect the uses when combine FMA for a*b+/-c*d.
steven.zhang edited the summary of this revision. (Show Details)

Remove the hook.

lebedev.ri added inline comments.Mar 12 2020, 1:09 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940

I'm not sure why Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1) check is here?
That is already checked in the lambdas.

steven.zhang marked an inline comment as done.Mar 12 2020, 1:50 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940

Only when both operands(LHS/RHS) are FMUL we should respect the uses to decide on folding which one, Therefore, we need the check here.

lebedev.ri added inline comments.Mar 12 2020, 2:34 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940

Err, i mean, why Aggressive check is there?

steven.zhang marked an inline comment as done.Mar 12 2020, 3:08 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940

Ah, just to safe the compiling time for those two isContractableFMUL. But seems that it is cheap as it only check some flags. I will remove the Aggressive check.

steven.zhang marked an inline comment as done.Mar 12 2020, 3:09 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11940

oops, s/safe/save

Remove the Aggressive check.

This doesn't look unreasonable to me now.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11918–11956

I suggest to land this NFC refactoring first, to make the diff more understandable.

11942
// fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
11945
// fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))
steven.zhang marked an inline comment as done.Mar 12 2020, 8:11 AM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11918–11956

ok,I assume that you LGTM with the refactor. So, I won't post another revision review for this NFC.(adding two lambda to do the folding). With that NFC patch landing, I will update this patch. If there is any concern, let me know. Thank you!

spatel added inline comments.Mar 12 2020, 12:52 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11941

It's not clear to me what we want to do in the case where both of the intermediate operands have other uses, and that difference is not visible in the existing tests. Would it be better to simplify this check to:

if (N1.hasOneUse())
  // fold (fsub (fmul a, b), (fmul c, d)) -> (fma (fneg c), d, (fmul a, b))
else
  // fold (fsub (fmul a, b), (fmul c, d)) -> (fma a, b, (fneg (fmul c, d)))

We should probably add tests like this either way:

define double @fma_multi_uses1(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
  %ab = fmul fast double %a, %b
  %cd = fmul fast double %c, %d
  store double %ab, double* %p1 ; extra use of %ab
  store double %ab, double* %p2 ; another extra use of %ab
  store double %cd, double* %p3 ; extra use of %cd
  %r = fsub fast double %ab, %cd
  ret double %r
}

define double @fma_multi_uses2(double %a, double %b, double %c, double %d, double* %p1, double* %p2, double* %p3) {
  %ab = fmul fast double %a, %b
  %cd = fmul fast double %c, %d
  store double %ab, double* %p1 ; extra use of %ab
  store double %cd, double* %p2 ; extra use of %cd
  store double %cd, double* %p3 ; another extra use of %cd 
  %r = fsub fast double %ab, %cd
  ret double %r
}
steven.zhang marked an inline comment as done.Mar 12 2020, 9:29 PM
steven.zhang added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11941

Good point but the answer is NO. I think, the direction that reduce the use of fmul that has less uses make sense, as it will expose other opportunities while its uses reaches one or zero quicker than the other one. For example:

x = a*b - c*d;
y = e*f - c*d;
// there are more than one uses of "a*b" later.

We have 3 uses of "a*b" at least, and 2 uses of "c*d". We should prefer to fold "c*d" for "x" and "y" so that, its def could be removed. That is the reason why we want to fold less uses instead of one use. Does it make sense ?

I have added the tests for your cases and the one I give above.

Rebase the patch and add tests.

spatel added inline comments.Mar 13 2020, 9:16 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11941

Yes, that makes sense. Please add the new tests with current codegen as a preliminary NFC commit, so we'll see the codegen diff (if any) resulting from this patch.

I think with the extra tests + the NFC refactor separated out, this patch will be good.

Rebase the patch.

spatel accepted this revision.Mar 16 2020, 8:06 AM

LGTM (but see if @lebedev.ri has any more comments)

No more comments from me, looks reasonable i suppose.

@lebedev.ri Would you please accept this revision if didn't have comments as you request change for it. Thank you!

lebedev.ri resigned from this revision.Mar 16 2020, 11:37 PM
This revision is now accepted and ready to land.Mar 16 2020, 11:37 PM
This revision was automatically updated to reflect the committed changes.