This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM)
ClosedPublic

Authored by pbarrio on Sep 1 2016, 7:46 AM.

Details

Summary

This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.

Diff Detail

Repository
rL LLVM

Event Timeline

pbarrio updated this revision to Diff 70009.Sep 1 2016, 7:46 AM
pbarrio retitled this revision from to [ARM] Lower UDIV+UREM to UDIV+MLS (and the same for SREM).
pbarrio updated this object.
pbarrio added reviewers: rengolin, jmolloy, scott-0.
pbarrio added a subscriber: llvm-commits.
compnerd accepted this revision.Sep 1 2016, 7:55 AM
compnerd added a reviewer: compnerd.
compnerd added a subscriber: compnerd.

Nice!

lib/Target/ARM/ARMISelLowering.cpp
12109 ↗(On Diff #70009)

Why not inline this into the Div?

This revision is now accepted and ready to land.Sep 1 2016, 7:55 AM

This doesn't seem appropriate for -Oz (which happens to be set in the test your actually modifying).

rengolin edited edge metadata.Sep 1 2016, 12:19 PM

So, I think that's a worthy optimisation, but I worry about the multi-cycle that the selection dag takes to get there.

The validation mechanism keeps cycling between divs and mods, merging and extending them until, quite by accident, things fall into shape. For instance, another patch to add "rt_div" reports that library call being called three times.

One thing we don't do, for example, is to merge divs into divmods that only need the mods, so you'll end up with a call to idiv + idivmod. In here, you'll end up with DIV+DIV+MUL(MOD) with the two first divs being identical.

The only comfort is that pure mods need the div anyway, so in the simple case, it's ok.

cheers,
--renato

lib/Target/ARM/ARMISelLowering.cpp
12109 ↗(On Diff #70009)

Because the merge of divs+mods happens elsewhere. I believe divs are already covered by instructions where possible, but not mods. However, mods fallback here, so this is the right place, I think.

test/CodeGen/ARM/urem-opt-size.ll
43 ↗(On Diff #70009)

Please, add tests for signed/unsigned, mod and div+mod, div+div+mod and see if they merge (if not, leave a FIXME comment). Some of those tests will probably need to be in divmod-eabi.ll.

Also, please make sure to include regex for the registers, to make sure that the result from udiv gets passed correctly to the mul + add.

rengolin requested changes to this revision.Sep 1 2016, 12:20 PM
rengolin edited edge metadata.
This revision now requires changes to proceed.Sep 1 2016, 12:20 PM
jmolloy edited edge metadata.Sep 1 2016, 12:52 PM

Hi Tim,

Actually this optimization can be good for Oz; in fact it's our primary target. One udiv and one mls is 8 bytes, which OK is 4 bytes more than the call but doesn't require any argument setup/tear down.

Also, we can avoid linking in uidivmod at all, which in small programs is *significant* in terms of code size reduction.

James

Also, we can avoid linking in uidivmod at all, which in small programs is *significant* in terms of code size reduction.

That's true.

Division with all the speed optimisations and "code maintenance improvements" is a *very* large algorithm, and can incur into calls between themselves to save copy&paste. Just look at our compiler-rt implementation... :)

pbarrio updated this revision to Diff 70377.Sep 6 2016, 4:03 AM
pbarrio edited edge metadata.

Improved testing:

  • New signed remainder test.
  • New sdiv + srem test.
  • Registers are given as regular expressions.

All tests merge to udiv/sdiv + mls.

I am not sure about adding a div+div+mod test, though. I have checked that
two identical divs + remainder merge just fine. However, is that an IR that
we can typically get? I would expect the two divs to be merged before getting
to the back-end.

I would expect the two divs to be merged before getting to the back-end.

I think the important thing is that those two divs will be merged in SDAG before your code even runs, so is not possible to test.

James

Right, it makes sense. I didn't know that these sort of optimizations were also undertaken in the selection DAG.

Renato, thank you for the review; the testing looks better now. Could you have a look at the new patch when you have a few minutes? Let me know if you still see some room for improvement.

rengolin added inline comments.Sep 7 2016, 2:48 AM
test/CodeGen/ARM/urem-opt-size.ll
56 ↗(On Diff #70377)

Can you also add the other CHECKs so that we're clear on what's the expected behaviour on all tested archs?

pbarrio updated this revision to Diff 70517.Sep 7 2016, 3:49 AM
pbarrio edited edge metadata.

Additional checks added to the new tests.

rengolin accepted this revision.Sep 7 2016, 3:51 AM
rengolin edited edge metadata.

LGTM, thanks!

This revision is now accepted and ready to land.Sep 7 2016, 3:51 AM
This revision was automatically updated to reflect the committed changes.