Page MenuHomePhabricator

Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity
ClosedPublic

Authored by hulx2000 on Jan 30 2015, 10:06 AM.

Details

Summary

For the following code:

if (y == 0)

{
y = -x * a;
b = a;
}

else

{
y -= x * a;
a = b;
}

HoistThenElseCodeToIf in SimplifyCFG.cpp will hoist the multiply outside the THEN/ELSE, stopping the compiler from forming fmsub later, this is not always a wise thing to do because target that support fused instructions can't take advantage of it, on some target, fmsub/fmadd is more efficient than (fmul +fadd/fsub), plus, by looking at the code generated by gcc compiler, I can see gcc doesn't do that kind of hoisting and leading to better performance on SPEC benchmark which I was analyzing.

This patch is to teach HoistThenElseCodeToIf to stop doing that kind of hoisting by using a target hook that checks for fmadd/fmsub opportunites , that improved spec2006/soplex by 2% consistently on cortex-a57 core, no significant gain for other spec2000/spec2006 benchmarks.

Diff Detail

Repository
rL LLVM

Event Timeline

hulx2000 retitled this revision from to Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.
hulx2000 updated this object.
hulx2000 edited the test plan for this revision. (Show Details)
hulx2000 set the repository for this revision to rL LLVM.
hulx2000 updated this object.
hulx2000 added a subscriber: Unknown Object (MLST).
hfinkel added inline comments.
include/llvm/Analysis/TargetTransformInfo.h
339

This name is pretty long. How about just isProfitableToHoist?

lib/Target/AArch64/AArch64ISelLowering.cpp
6726

I think you can make this the default implementation. PPC will want essentially the same implementation.

6731

Space before Only.

6743

Space after VT (you should probably run clang-format on the entire patch)

hulx2000 removed rL LLVM as the repository for this revision.

make it default as Hal suggested, unit test remain under AArch64, whoever want to support other target please add test.

hfinkel accepted this revision.Feb 3 2015, 4:36 PM
hfinkel edited edge metadata.

Adjusting for my comment below, LGTM.

include/llvm/Target/TargetLowering.h
27

You don't want enableAggressiveFMAFusion(VT) here. enableAggressiveFMAFusion(VT) affects whether we combine to form an FMA even when the fmul has more than one use, combine FMAs with other nodes to form multiple FMAs, whether we combine them through fpext nodes, etc. -- but it does not affect fusion legality (and, thus, whether a mul will be combined with a fsub/fadd to from an FMA).

This revision is now accepted and ready to land.Feb 3 2015, 4:36 PM
hulx2000 edited edge metadata.

removed the check of enableAggressiveFMAFusion

mcrosier edited edge metadata.Feb 4 2015, 10:47 AM

Lawrence,
The patch does not apply cleanly to top of trunk. Please rebase and upload a new patch. Once complete, I'd be happy to commit the patch on your behalf.

Chad

hulx2000 edited edge metadata.Feb 19 2015, 11:08 AM
hulx2000 set the repository for this revision to rL LLVM.

rebased patch

mcrosier closed this revision.Feb 23 2015, 11:17 AM

Committed in r230241.

sebpop added a subscriber: sebpop.Apr 8 2016, 8:55 AM

IMO this patch should not have been committed.

lib/Target/AArch64/AArch64ISelLowering.cpp
6540

I think that this is only avoiding a bigger issue, that is currently the instruction selection happens on a single basic block.
A better fix would be a machine pass that pairs multiplies with add/sub to form more FMAs.

6561

If both the multiply and the add/sub it feeds into can be hoisted, this code will still prevent multiplies to be hoisted.

hulx2000 added inline comments.Apr 8 2016, 3:18 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
6540

Could you please explain what do you mean about avoiding a bigger issue?

6561

I do agree about that.

sebpop added inline comments.Apr 8 2016, 6:37 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
6540

Instruction selection happens on a single basic block, and it cannot form FMAs across BBs. The long term solution is to have an instruction selection that can see further than a BB. Until then probably we could have an MI pass that matches FMAs across BB boundaries.