This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Enable division-by-constant optimization for wide types
AcceptedPublic

Authored by nhaehnle on Sep 22 2016, 5:07 AM.

Details

Summary

This relies on previous support for expanding MULH[US] / [US]MUL_LOHI. Instead
of doing division-by-constant only when those instructions are legal, targets
should now use isIntDivCheap to signal that they do not want this expansion.

This change allows 64-bit division-by-constant to use the more efficient
multiply and shift lowering on AMDGPU.

This also affects a lowering on SPARC in a way that may or may not be more
efficient, see the change in the corresponding test case for the effect. I'd
appreciate some feedback on that.

The vector case is not enabled yet even though it should be correct and will
likely allow better overall code generation eventually. Enabling it gives some
regressions in X86 tests, mostly due to what looks like insufficient
peep-holing when vNi64 multiplies are scalarized.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 72153.Sep 22 2016, 5:07 AM
nhaehnle retitled this revision from to [SelectionDAG] Expand MULHU and enable division-by-constant for wide types.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.

The reason for excluding the vector case is that it hits another problem with MULHU in some X86 vector div tests, and affects additional X86 test cases. I'd prefer to keep things simple for this change.

efriedma added inline comments.
test/CodeGen/SPARC/rem.ll
62

This is generating 8 multiply instructions; something is going wrong in your algorithm. (It should only take four multiply instructions to perform a double-width multiply.)

I looked a bit more closely, and the issue with the Sparc test is that the code uses isOperationLegalOrCustom(ISD::UMUL_LOHI, MVT::i32) to check - this is taken unchanged from expandMUL - but UMUL_LOHI is Expand. When I force my code in expandUMUL_LOHI to use the smaller size UMUL_LOHI, I get only 4 multiplies.

I don't see a clear picture of how multiplication legalization is supposed to work. Currently, MULHU can become UMUL_LOHI in LegalizeDAG::ExpandNode, and UMUL_LOHI can convert to a MUL in a wider type in the DAGCombiner, or to a MUL + MULHU, but only if one of the resulting ops can be combined further. Each of these steps only happen when the resulting operations are LegalOrCustom.

Furthermore, UMUL_LOHI is marked as Expand in the targets that I've looked at, but there isn't actually any code for it in LegalizeDAG::ExpandNode. It all seems a bit messy :(

In part I get the impression that the LegalizeAction just doesn't contain enough information. If a target sets UMUL_LOHI to Expand, should that be a sequence of multiplies of the half-sized integer type, or should it be a single multiply in a larger type? Perhaps Promote should be used to indicate the second option?

In part I get the impression that the LegalizeAction just doesn't contain enough information. If a target sets UMUL_LOHI to Expand, should that be a sequence of multiplies of the half-sized integer type, or should it be a single multiply in a larger type? Perhaps Promote should be used to indicate the second option?

It's even worse because at least AMDGPU would want to legalize UMUL_LOHI on MVT::i32 to MUL + MULHU. So there are at least three plausible ways of handling a non-legal UMUL_LOHI. Perhaps this should be made explicit with an enum that the TargetLowering can choose from.

Using Promote to indicate that a larger multiply should be used seems reasonable.

Not sure it really makes sense to say that there are three ways to perform the operation; UMUL_LOHI and MULHU are essentially the same operation, in the same way that DIV and DIVREM are the same operation.

nhaehnle updated this revision to Diff 72621.Sep 27 2016, 3:36 AM

Move the bulk of the multiplication changes to D24956.

nhaehnle retitled this revision from [SelectionDAG] Expand MULHU and enable division-by-constant for wide types to [SelectionDAG] Enable division-by-constant optimization for wide types.Sep 27 2016, 3:39 AM
nhaehnle updated this object.
nhaehnle added a reviewer: ast.

+ BPF test changes because the sdiv lowering fails later now.

ast added inline comments.Sep 27 2016, 8:04 AM
test/CodeGen/BPF/sdiv_error.ll
3 ↗(On Diff #72621)

'cannot select' is unreadable to C programmer comparing to 'Unsupported signed div'.
Usability of BPF is the hardest problem we're facing, so we don't want to lose those error messages. Hence similar to the SDIV error, can you please add the same error for SMUL_LOHI in BPFDAGToDAGISel::Select() ?
Also keeping the same hint "Please convert to unsigned div/mod".
Thanks

nhaehnle added inline comments.Sep 27 2016, 12:38 PM
test/CodeGen/BPF/sdiv_error.ll
3 ↗(On Diff #72621)

Hmm, relying on backend error messages for usability is maybe not the best idea...

Not sure how to change this. Perhaps adding an isSigned parameter to isIntDivCheap?

ast requested changes to this revision.Sep 27 2016, 1:40 PM
ast edited edge metadata.
ast added inline comments.
test/CodeGen/BPF/sdiv_error.ll
3 ↗(On Diff #72621)

it's not the matter of optimization. There is no sdiv instruction. So please keep the backend error.

This revision now requires changes to proceed.Sep 27 2016, 1:40 PM
nhaehnle updated this revision to Diff 72774.Sep 28 2016, 1:13 AM
nhaehnle edited edge metadata.
nhaehnle marked an inline comment as done.

I still think it's pretty misguided to rely on errors from the backend,
given that this would be trivial to catch in the frontend where you have
more context for useful error messages anyway, but whatever.

I'm going to take the isIntDivCheap route because yes, it does have to do
with optimizations: you're relying on no optimizations being applied to
sdiv.

ast accepted this revision.Sep 28 2016, 9:05 AM
ast edited edge metadata.

lgtm
there are different front-ends and we cannot control them all, so backend has to have end-user understandable errors, though it's not pretty.

This revision is now accepted and ready to land.Sep 28 2016, 9:05 AM
nhaehnle updated this revision to Diff 79195.Nov 24 2016, 1:34 AM
nhaehnle edited edge metadata.

Rebase on latest version of D24956.

What happened to this patch?

RKSimon added a subscriber: RKSimon.Jul 1 2018, 2:16 AM

@nhaehnle If you could rebase this and enable vector idiv to show the x86 regressions I may be able to help

nagisa added a subscriber: nagisa.Jun 5 2021, 5:10 AM