This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Improve inline rotate/shift expansions
ClosedPublic

Authored by aykevl on Aug 23 2020, 5:27 AM.

Details

Summary

These expansions were rather inefficient and were done with more code
than necessary. This change optimizes them to use expansions more
similar to GCC. The code size is the same (when optimizing for code
size) but somehow LLVM reorders blocks in a non-optimal way. Still, this
should be an improvement with a reduction in code size of around 0.12%
(when building compiler-rt).


I made this patch to get more familiar with these inline expansions, in the hope that I can also do the other expansions inline (such as 32-bit shifts).

Diff Detail

Event Timeline

aykevl created this revision.Aug 23 2020, 5:27 AM
aykevl requested review of this revision.Aug 23 2020, 5:27 AM

I have tested this locally with my compiler-rt test setup. All tests still pass with this patch applied while the binary size is slightly reduced (when optimizing for size).

llvm/test/CodeGen/AVR/rot.ll
16

Note: this doesn't decrease binary size because LLVM duplicates the loop check (dec and br). When optimizing for size, this would be reduced by one instruction, such as in shift_i8_i8_size of shift.ll.

dylanmckay accepted this revision.Aug 26 2020, 8:59 AM

Great patch

I made this patch to get more familiar with these inline expansions, in the hope that I can also do the other expansions inline (such as 32-bit shifts).

That would be nice!

This revision is now accepted and ready to land.Aug 26 2020, 8:59 AM
This revision was landed with ongoing or failed builds.Oct 31 2020, 3:16 PM
This revision was automatically updated to reflect the committed changes.

Great patch

I made this patch to get more familiar with these inline expansions, in the hope that I can also do the other expansions inline (such as 32-bit shifts).

That would be nice!

I have tried several things and thought a lot about this, but I can't come up with a good way to handle this. The only thing I can think of is introduce new pseudo 32-bit and 64-bit shift instructions, but that just seems really ugly.
The problem is that I see no way of converting these bigger than 16 bit integers to 16-bit (or 8-bit) open coded instruction sequences before type legalization. Reading https://llvm.org/docs/CodeGenerator.html, it appears that these larger than 16 bit integers are all converted to builtin calls before any custom lowering can be done.

I suspect the real solution is to switch to GlobalISel, which apparently is much more flexible and should allow inserting a custom pass before type legalization.