Details
Diff Detail
Event Timeline
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | So far we stayed away of "optimizations" in the IRTranslator. In particular, the language reference says: What do you think? |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | That's also what I said when I spoke with him in person. Unfortunately, currently there's no other place to put it (to avoid duplicating across all targets). |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | Yes, fusion is not guaranteed but I think we still need to translate this to either one of the options. If IRTranslator doesn't do that, every target would have to handle this in somewhere else as the combiner is optional. |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 |
Couldn't this be offered as a generic combine (I mean G_FMUL (G_FADD) => FMA)? I know this is optional, but this would be shared and is also sort of an optimization.
Yes, what I am suggesting is that fmuladd could always be expanded into G_FMUL (G_FADD). Just remove the FMA part.
Just to make just we are talking about the same thing, I agree we need to *always* lower that intrinsic in IRTranslator, but G_FMUL(G_FADD) is sufficient for that part. |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | My initial thought was that this ought to be a combine. However, I think this intrinsic is intended as a way to say you're expecting either a fused-multiply-add or a multiply-add (some targets have both) but don't care whether the rounding step happens or not, you just want whatever is fastest. We could handle this as a combine but it would cost compile-time to find the optimization opportunity in the expanded code and if we go that route then the intrinsic doesn't really serve a purpose compared to separate mul/add LLVM-IR instructions anymore. |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | Arguably the compiler knows better (critical path, resources usages, and whatnot) and that wouldn't be the first time that we silently drop an intrinsic in favor of regular instructions (we did that for pretty much all the intel shuffle instructions for instance IIRC). Thus, keeping this as an optimization makes sense to me. That said, I also get your point and I am fine with either solutions. Given it looks like we have 1 vs. 3 votes, I guess we keep the fancy expansion here :). Thoughts? |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | FWIW, I'm with Quentin on this one. The LangRef says it's up to the code generator to decide:
I'd also nominate the fsub -0.0, x -> fneg transform to be moved from the irtranslator into the combiner. That's just my 2 cents, I don't mind terribly if the patch goes in as is (or maybe with a FIXME) and we revisit some other time. |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | I'm ok with it either way, but if we're going to defer it then we should probably translate it to an intrinsic and legalize it into the mul+add/fma form.
I agree. IIRC, a lot of targets like the fsub representation but I know there's a few that like fneg. We need to handle this somewhere where the target can decide. |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 |
Oh, I thought you were suggesting translating this in the combiner pass. In that case, we can go for either one of the options and revisit this later as Diana suggested. |
lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
---|---|---|
765 | Agree. I leave the final decision to you. |
So far we stayed away of "optimizations" in the IRTranslator.
I wonder if this part of the lowering should be part of a combiner instead.
In particular, the language reference says:
Fusion is not guaranteed, even if the target platform supports it.
What do you think?