Page MenuHomePhabricator

Refactor and enhance FMA combine

Authored by ohsallen on Mar 3 2015, 5:56 PM.



When FP_EXTEND is known to be free for some target (PPC for instance), it is possible to arrange FP_EXTEND nodes in order to do more FMA combining. In this patch, the code that handles that is moved into the new routines created by Matt Arsenault (namely performFaddFmulCombines and performFsubFmulCombines).

There are also new combining opportunities that are implemented, which take effect when aggressive FMA fusion is enabled and it is possible to arrange FP_EXTEND nodes. At the moment, those are particularly interesting for PPC.

Diff Detail

Event Timeline

ohsallen updated this revision to Diff 21160.Mar 3 2015, 5:56 PM
ohsallen retitled this revision from to Refactor and enhance FMA combine.
ohsallen updated this object.
ohsallen edited the test plan for this revision. (Show Details)
ohsallen added reviewers: hfinkel, arsenm.
ohsallen added a subscriber: Unknown Object (MLST).

These are very interesting transformation! See comments inline.



// fold (fadd x, (fpext (fmul y, z)), z)

-> isn't there an extra , z)


The existing function is already not document, but adding a new argument only exhibits how lacking it is.
Can you add a comment on top of this function. I'd rather see defined what is the meaning of at least the three first arguments.

It should also be specified if this function can be called outside of fast-math and what combination of parameter is allowed in this case.


I wonder if this is canonical?
Couldn't it be transformed to:

fneg (fadd (fpext (fmul, x, y)), z)

Same canonicalization question, seems redundant with the one before.

Also: shouldn't AddToWorklist() be called for all the created node (except the root node for the resulting expression)?

hfinkel added inline comments.Mar 3 2015, 7:11 PM

Also, I don't find 'ArrangeFPExt' an informative name. Why not just name it LookThroughFPExt?


Please add a comment here explaining that we can't look through FPExt nodes, and thus that parameter is false, because doing so effectively introduces extra precision that would be invalid for FMAD.

ohsallen updated this revision to Diff 21227.Mar 4 2015, 2:19 PM

Thanks for those useful comments! I incorporated them and also refactored the separate functions. I think the behavior regarding whether we should choose FMAD or FMA is clearer now. Tell me what you think.


The code seems better after refactoring, thanks :)

See inline comments.


This turns two "low precision" and one "high precision" operations into two "high precision" operation.
I'm not saying it is necessarily bad, but I'm not convinced it is always beneficial.
The other FMA combines don't have the same behavior.


The only change between this one and the one two level above is that the operand of the outer fadd are reversed. Is there an elegant way of not duplicating the code? A lambda maybe?


I come back with the canonicalization problem. Wouldn't it be more canonical to have:

fneg (fadd (fpext (fmul, x, y)), z)

In which case this combine would be useless.

ohsallen updated this revision to Diff 21495.Mar 9 2015, 9:46 AM

Use lambdas when it helps to avoid code duplication and when convenient.

I'll commit this patch this week if no one is particularly against it ;-)


In principle you're right, that might not be *always* beneficial. But in general, it should be, because even when "high precision" operations are twice more expensive than "low precision" one, the transformation does not worsen things. Right now this is only enabled for PPC, for which low and high precision operations have the same cost. Tell me if this is not acceptable.


This code is triggered when TLI.isFPExtFree() returns true. The FPEXT nodes are only generated for correctness, and are expected to be removed later by a MC pass. Thus the code generated here is more canonical in this context.

mehdi_amini added inline comments.Mar 23 2015, 1:48 PM

Well you can imagine having more than twice the throughput in f16 than f32 on some targets, and you can also imagine that 2 x f16 operations consume less power than one f32.
I'd rather have Owen's opinion on this.


I'm not sure you understood my comment. I'm talking about the input pattern, not the output pattern.

hfinkel edited edge metadata.Mar 23 2015, 1:56 PM

I'll commit this patch this week if no one is particularly against it ;-)

No, that's not how it works. You need to wait for an explicit okay. (but please do ping once per week if you're missing reviewer feedback).


I agree, I think this is okay as-is. However, please note this explicitly in a comment above the transform.


There is indeed a lot of very similar code here. If we could share this better in between patterns that would be much better.


Sure, but don't miss the point. If the form being matched in non canonical, but still fires, then the correct fix is to add a canonicalization in DAGCombine, and not add logic for it here.

Assuming transforming from one to the other is always correct (even in the face of NaNs, etc.), then I agree that the fneg(fadd(...)) forms should be preferred.

hfinkel added inline comments.Mar 23 2015, 2:14 PM

Sure, but I also don't want to have an unnecessary proliferation target hooks. It is reasonable to believe that this will be generally beneficial, and if we someday have a target that wishes to enable enableAggressiveFMAFusion but does not want this specific type of transformation, then we can add a new hook at that time.

FWIW, this is not just a throughput issue, but also a format-conversion issue (when we have vectors of float vs. vectors of double, the fpext implies register reorganization, and minimizing the number of such reorganizations is likely, I think, to be beneficial).

ohsallen updated this revision to Diff 22585.Mar 24 2015, 10:58 AM
ohsallen edited edge metadata.

Added a comment (FIXME) about transforming single-precision operations into double-precision ones.

Hi Mehdi,

I’d rather see the duplicated code (the one made obsolete by a correct canonicalization) removed from your patch (i.e. do not build technical debt), and a separate commits that implement the canonicalization part.

Makes perfect sense, thanks for your feedback. I was able to do the canonicalization you suggest by adding the following transforms:

(fsub (fneg A), B) -> (fneg (fadd A, B))
(fpext (fneg x)) -> (fneg (fpext x))

The problem is that I think those should be enabled only with -enable-unsafe-fp-math. So the two FMA combines that can be removed because of the canonicalization now only happen with -enable-unsafe-fp-math, whereas they used to work with -fp-contract=fast as well... Not sure what to do here.


Hi Olivier,

Tentatively CC: Steve who might have an input on the validity of the two transforms below without fast-math.

ohsallen updated this revision to Diff 23179.Apr 2 2015, 1:27 PM

Added comment about the canonicalization and the issue with orthogonal flags -fp-contract=fast and -enable-unsafe-fp-math. Thanks!

Ok for commit as it is?


mehdi_amini accepted this revision.Apr 7 2015, 8:30 AM
mehdi_amini added a reviewer: mehdi_amini.
This revision is now accepted and ready to land.Apr 7 2015, 8:30 AM
mehdi_amini closed this revision.Jul 7 2015, 6:19 PM

Landed a long time ago: r235344