This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] scale repeated FP divisor by splat factor
ClosedPublic

Authored by spatel on Apr 23 2019, 10:14 AM.

Details

Summary

If we have a vector FP division with a splatted divisor, we can use the existing transform that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then potentially be converted into a scalar FP division by existing combines (rL358984) as seen in the tests here.

That can be a potentially big perf difference if scalar fdiv has better timing (including avoiding possible frequency throttling for vector ops).

There's another diff here in the ordering of the transforms - I'm proposing to move the repeated divisor transform ahead of the reciprocal estimate transform because that seems more likely to produce the best results. For default x86, we don't turn fdiv f32 into an estimate because the estimate accuracy is too poor for most code. That's probably the right perf choice for current and future CPUs since divss throughput is down to the 3-4 cycle range (Skylake/Ryzen).

Diff Detail

Event Timeline

spatel created this revision.Apr 23 2019, 10:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2019, 10:14 AM
RKSimon added inline comments.Apr 24 2019, 7:52 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11987

Does this move need to be a separate patch?

spatel marked an inline comment as done.Apr 24 2019, 8:36 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11987

Yes, let me change that back. I noticed that potential diff while looking at the other part of this patch, so I thought it would be good to see the 2 changes together, but it should stand independently assuming it makes sense.

spatel updated this revision to Diff 196470.Apr 24 2019, 9:23 AM

Patch updated:
Remove reordering of reciprocal estimate and repeated divisor transforms. We still see a diff for the SSE target with an illegal type because we don't try to generate the reciprocal estimate sequence until the types are legal.

RKSimon added inline comments.Apr 24 2019, 9:55 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11909

We probably want to not use this when optsize is enabled - add a TODO?

spatel updated this revision to Diff 196481.Apr 24 2019, 10:09 AM
spatel marked an inline comment as done.

Patch updated:
Add a TODO comment about optimizing for size. That's an existing concern even for scalar code. But there may not be a clear answer for optsize because, for example, we may be able to hoist fdiv out of a loop by doing this transform.

This revision is now accepted and ready to land.Apr 24 2019, 12:02 PM
This revision was automatically updated to reflect the committed changes.