This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: IRTranslate llvm.fmuladd.* intrinsic
ClosedPublic

Authored by volkan on Feb 8 2018, 1:50 PM.

Diff Detail

Event Timeline

volkan created this revision.Feb 8 2018, 1:50 PM
qcolombet added inline comments.Feb 9 2018, 11:27 AM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

So far we stayed away of "optimizations" in the IRTranslator.
I wonder if this part of the lowering should be part of a combiner instead.

In particular, the language reference says:
Fusion is not guaranteed, even if the target platform supports it.

What do you think?

lib/CodeGen/GlobalISel/IRTranslator.cpp
765

That's also what I said when I spoke with him in person. Unfortunately, currently there's no other place to put it (to avoid duplicating across all targets).

volkan added inline comments.Feb 9 2018, 12:51 PM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

Yes, fusion is not guaranteed but I think we still need to translate this to either one of the options. If IRTranslator doesn't do that, every target would have to handle this in somewhere else as the combiner is optional.

qcolombet added inline comments.Feb 9 2018, 4:27 PM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

Unfortunately, currently there's no other place to put it (to avoid duplicating across all targets).

Couldn't this be offered as a generic combine (I mean G_FMUL (G_FADD) => FMA)?

I know this is optional, but this would be shared and is also sort of an optimization.

fusion is not guaranteed but I think we still need to translate this to either one of the options.

Yes, what I am suggesting is that fmuladd could always be expanded into G_FMUL (G_FADD). Just remove the FMA part.

If IRTranslator doesn't do that, every target would have to handle this in somewhere else as the combiner is optional.

Just to make just we are talking about the same thing, I agree we need to *always* lower that intrinsic in IRTranslator, but G_FMUL(G_FADD) is sufficient for that part.

dsanders added inline comments.Feb 9 2018, 4:48 PM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

My initial thought was that this ought to be a combine. However, I think this intrinsic is intended as a way to say you're expecting either a fused-multiply-add or a multiply-add (some targets have both) but don't care whether the rounding step happens or not, you just want whatever is fastest. We could handle this as a combine but it would cost compile-time to find the optimization opportunity in the expanded code and if we go that route then the intrinsic doesn't really serve a purpose compared to separate mul/add LLVM-IR instructions anymore.

qcolombet added inline comments.Feb 9 2018, 5:49 PM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

Arguably the compiler knows better (critical path, resources usages, and whatnot) and that wouldn't be the first time that we silently drop an intrinsic in favor of regular instructions (we did that for pretty much all the intel shuffle instructions for instance IIRC). Thus, keeping this as an optimization makes sense to me.

That said, I also get your point and I am fine with either solutions. Given it looks like we have 1 vs. 3 votes, I guess we keep the fancy expansion here :).

Thoughts?

rovka added inline comments.Feb 12 2018, 1:34 AM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

FWIW, I'm with Quentin on this one. The LangRef says it's up to the code generator to decide:

The ‘llvm.fmuladd.*’ intrinsic functions represent multiply-add expressions that can be fused if the code generator determines that (a) the target instruction set has support for a fused operation, and (b) that the fused operation is more efficient than the equivalent, separate pair of mul and add instructions.

I'd also nominate the fsub -0.0, x -> fneg transform to be moved from the irtranslator into the combiner.

That's just my 2 cents, I don't mind terribly if the patch goes in as is (or maybe with a FIXME) and we revisit some other time.

dsanders added inline comments.Feb 12 2018, 9:39 AM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

I'm ok with it either way, but if we're going to defer it then we should probably translate it to an intrinsic and legalize it into the mul+add/fma form.

I'd also nominate the fsub -0.0, x -> fneg transform to be moved from the irtranslator into the combiner.

I agree. IIRC, a lot of targets like the fsub representation but I know there's a few that like fneg. We need to handle this somewhere where the target can decide.

volkan added inline comments.Feb 12 2018, 9:40 AM
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

Just to make just we are talking about the same thing, I agree we need to *always* lower that intrinsic in IRTranslator, but G_FMUL(G_FADD) is sufficient for that part.

Oh, I thought you were suggesting translating this in the combiner pass. In that case, we can go for either one of the options and revisit this later as Diana suggested.

qcolombet accepted this revision.Feb 12 2018, 1:58 PM
qcolombet added inline comments.
lib/CodeGen/GlobalISel/IRTranslator.cpp
765

Agree. I leave the final decision to you.
At the very least, we want a FIXME here.

This revision is now accepted and ready to land.Feb 12 2018, 1:58 PM
volkan updated this revision to Diff 133962.Feb 12 2018, 4:48 PM

Added a TODO about moving a part of the translation to the combiner.

volkan marked an inline comment as done.Feb 12 2018, 4:48 PM
This revision was automatically updated to reflect the committed changes.