This saves a library call to __aeabi_uidivmod. However, the
processor must feature hardware division in order to benefit from
the transformation.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Nice!
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
12109 ↗ | (On Diff #70009) | Why not inline this into the Div? |
This doesn't seem appropriate for -Oz (which happens to be set in the test your actually modifying).
So, I think that's a worthy optimisation, but I worry about the multi-cycle that the selection dag takes to get there.
The validation mechanism keeps cycling between divs and mods, merging and extending them until, quite by accident, things fall into shape. For instance, another patch to add "rt_div" reports that library call being called three times.
One thing we don't do, for example, is to merge divs into divmods that only need the mods, so you'll end up with a call to idiv + idivmod. In here, you'll end up with DIV+DIV+MUL(MOD) with the two first divs being identical.
The only comfort is that pure mods need the div anyway, so in the simple case, it's ok.
cheers,
--renato
lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
12109 ↗ | (On Diff #70009) | Because the merge of divs+mods happens elsewhere. I believe divs are already covered by instructions where possible, but not mods. However, mods fallback here, so this is the right place, I think. |
test/CodeGen/ARM/urem-opt-size.ll | ||
43 ↗ | (On Diff #70009) | Please, add tests for signed/unsigned, mod and div+mod, div+div+mod and see if they merge (if not, leave a FIXME comment). Some of those tests will probably need to be in divmod-eabi.ll. Also, please make sure to include regex for the registers, to make sure that the result from udiv gets passed correctly to the mul + add. |
Hi Tim,
Actually this optimization can be good for Oz; in fact it's our primary target. One udiv and one mls is 8 bytes, which OK is 4 bytes more than the call but doesn't require any argument setup/tear down.
Also, we can avoid linking in uidivmod at all, which in small programs is *significant* in terms of code size reduction.
James
That's true.
Division with all the speed optimisations and "code maintenance improvements" is a *very* large algorithm, and can incur into calls between themselves to save copy&paste. Just look at our compiler-rt implementation... :)
Improved testing:
- New signed remainder test.
- New sdiv + srem test.
- Registers are given as regular expressions.
All tests merge to udiv/sdiv + mls.
I am not sure about adding a div+div+mod test, though. I have checked that
two identical divs + remainder merge just fine. However, is that an IR that
we can typically get? I would expect the two divs to be merged before getting
to the back-end.
I would expect the two divs to be merged before getting to the back-end.
I think the important thing is that those two divs will be merged in SDAG before your code even runs, so is not possible to test.
James
Right, it makes sense. I didn't know that these sort of optimizations were also undertaken in the selection DAG.
Renato, thank you for the review; the testing looks better now. Could you have a look at the new patch when you have a few minutes? Let me know if you still see some room for improvement.
test/CodeGen/ARM/urem-opt-size.ll | ||
---|---|---|
56 ↗ | (On Diff #70377) | Can you also add the other CHECKs so that we're clear on what's the expected behaviour on all tested archs? |