This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] tighten constraints for fma fold
ClosedPublic

Authored by spatel on Jun 24 2020, 1:29 PM.

Details

Summary

fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)

This is only allowed when "reassoc" is present on the fadd.

As discussed in D80801, this transform goes beyond what is allowed by "contract" FMF (-ffp-contract=fast). That is because we are fusing the trailing add of 'E' with a multiply, but without "reassoc", the code mandates that the products A*B and C*D are added together before adding in 'E'.

I've added this example to the LangRef to try to clarify the meaning of "contract". If that seems reasonable, we should probably do something similar for the clang docs because there does not appear to be any formal spec for the behavior of -ffp-contract=fast.

Diff Detail

Event Timeline

spatel created this revision.Jun 24 2020, 1:29 PM
Herald added a project: Restricted Project. · View Herald Transcript
qiucf added a subscriber: qiucf.Jul 1 2020, 8:23 PM
cameron.mcinally accepted this revision.Jul 6 2020, 10:34 AM

LGTM, but I encourage others to review too.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11991

Do we need to check Aggressive here? For a hypothetical target with 2 FMUL/FADD ports and 1 FMA port, assuming slow FMAs, this could be a performance loss.

It shouldn't be a problem for modern chips that I care about, so just picking nits.

This revision is now accepted and ready to land.Jul 6 2020, 10:34 AM
spatel marked 2 inline comments as done.Jul 6 2020, 10:55 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11991

Removing the 'Aggressive' clause was the previous patch. :)
D80801

The reason for not requiring 'Aggressive' is that using FMA on this case is what we should assume is the best case for a default target that supports FMA. As discussed in the earlier patch, we know that this is difficult to get right for all code sequences/targets, so there is already an opt-out to bypass this in SDAG and use MachineCombiner instead. Potentially, we could also transform patterns like this after they have been fused to FMA. That would again be in MachineCombiner (where we have the detailed scheduler info).

Oops, missed that. Sorry.

Still LGTM.

spatel marked an inline comment as done.Jul 10 2020, 8:30 AM

Any other comments/feedback?

This revision was automatically updated to reflect the committed changes.

Hi, I see reassoc and contract are equivalent in many cases (at least in DAG combiner):

define double @foo(double %a, double %b, double %c) {
entry:
  %0 = fmul reassoc double %a, %b
  %1 = fadd reassoc double %0, %c
  ret double %1
}
; => no contract bit, it's still fused into fma
define double @foo2(double %a, double %b) {
entry:
  %0 = fmul double %a, 1.1
  %1 = call contract double @llvm.fma.f64(double %0, double 2.1, double %b)
  ret double %1
}
; => no reassoc set, (fma (fmul x, c1), c2, y) became (fma x, c1*c2, y)

declare double @llvm.fma.f64(double, double, double)

Is this expected, or we need to differentiate them in more places?

Hi, I see reassoc and contract are equivalent in many cases (at least in DAG combiner):

define double @foo(double %a, double %b, double %c) {
entry:
  %0 = fmul reassoc double %a, %b
  %1 = fadd reassoc double %0, %c
  ret double %1
}
; => no contract bit, it's still fused into fma
define double @foo2(double %a, double %b) {
entry:
  %0 = fmul double %a, 1.1
  %1 = call contract double @llvm.fma.f64(double %0, double 2.1, double %b)
  ret double %1
}
; => no reassoc set, (fma (fmul x, c1), c2, y) became (fma x, c1*c2, y)

declare double @llvm.fma.f64(double, double, double)

Is this expected, or we need to differentiate them in more places?

In the first example, I think of reassoc as including anything that contract could enable, so reassoc is a superset. If that's not the right model, we should adjust the documentation to allow the other interpretation. It does look like gcc makes some distinction between "unsafe" and FP-contraction, but it does not behave the way I was expecting:
https://godbolt.org/z/EfhPYT

The second example looks very similar to the bug we fixed with this patch: we are not allowed to reassociate the order of multiplications with contract alone, so that's a bug.
Let me know if I should post a patch for that or if you are already working on it.

Let me know if I should post a patch for that or if you are already working on it.

Yes. I posted D89527.

It does look like gcc makes some distinction between "unsafe" and FP-contraction, but it does not behave the way I was expecting: https://godbolt.org/z/EfhPYT

The way compiler handles the option might be somewhat confusing: https://stackoverflow.com/questions/43352510 Besides, since GCC doesn't support pragma FP_CONTRACT (#pragma STDC FP_CONTRACT ON) well, -ffp-contract=on is equivalent to -ffp-contract=off in GCC. Clang looks to handle it correctly.