This is an archive of the discontinued LLVM Phabricator instance.

[AVR] Optimize 8-bit logic left/right shifts
ClosedPublic

Authored by benshi001 on Oct 8 2020, 7:26 AM.

Diff Detail

Event Timeline

benshi001 created this revision.Oct 8 2020, 7:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2020, 7:26 AM
benshi001 requested review of this revision.Oct 8 2020, 7:26 AM

The original AVR backend TD is also wrong, that it maps the bswap IR to AVR's swap instruction.

Since the IR bswap is byte level operation, which requires the minimal data type to be at least 16-bit.

But AVR's swap performs on half-byte level.

benshi001 updated this revision to Diff 298275.Oct 14 2020, 7:02 PM
benshi001 edited the summary of this revision. (Show Details)

The original AVR backend TD is also wrong, that it maps the bswap IR to AVR's swap instruction.

Since the IR bswap is byte level operation, which requires the minimal data type to be at least 16-bit.

But AVR's swap performs on half-byte level.

I agree it looks wrong, but I can't get it to produce invalid code: https://godbolt.org/z/3YTnzx

Also, while this is an improvement I wonder whether there is a more systematic way to do this? This optimizes a very specific code pattern but it doesn't seem easy to extend it to other shifts and rotates, such as ((char)x) << 3 for example. The thing I have in mind works like this:

  • 8 bit shifts are all implemented with similar code that uses shifts, swaps, and ands etc to get the correct shift amount
  • 16 bit shifts can be built on top of that, either shifting two bytes individually and ORing the results together, or for larger (>= 8 bit) shifts swap the bytes and use an all-zero or all-ones byte for the other.

Or perhaps there are better ways to do this.

If you play around with godbolt.org (https://godbolt.org/z/foW7qc) you can see that it manages to produce very short code for nearly all shifts, apparently falling back to loops for hard cases. It even changes its strategy to inlining the entire shift (no loop) when using -O2.

benshi001 added a comment.EditedOct 31 2020, 10:54 PM

Also, while this is an improvement I wonder whether there is a more systematic way to do this? This optimizes a very specific code pattern but it doesn't seem easy to extend it to other shifts and rotates, such as ((char)x) << 3 for example. The thing I have in mind works like this:

The optimizations for shifts are a bit complex, and I would like to do it in smaller sperated patches. This way makes them easy to review.

  • 8 bit shifts are all implemented with similar code that uses shifts, swaps, and ands etc to get the correct shift amount

It is hard for 8-bit shift with shiftAmount = 3.

  • 16 bit shifts can be built on top of that, either shifting two bytes individually and ORing the results together, or for larger (>= 8 bit) shifts swap the bytes and use an all-zero or all-ones byte for the other.

In current patch, I would like to only optimize 8-bit shifts and leave 16-bit shifts in other patches.

Or perhaps there are better ways to do this.

If you play around with godbolt.org (https://godbolt.org/z/foW7qc) you can see that it manages to produce very short code for nearly all shifts, apparently falling back to loops for hard cases. It even changes its strategy to inlining the entire shift (no loop) when using -O2.

For 8-bit shifts, my patch performs the same as avr-gcc, except for shiftAmount = 7. I will improve it soon.

benshi001 updated this revision to Diff 302242.Nov 2 2020, 4:38 AM
benshi001 retitled this revision from [AVR] Optimize logic left/right shift to [AVR] Optimize 8-bit logic left/right shifts.
benshi001 edited the summary of this revision. (Show Details)

I have uploaded a new revision with more tests.

Now llvm-avr generates the same asm for logic left/right shifts when ShiftAmount = 1,2,3,4,5,6, except for 7.

This is shown in the test llvm/test/CodeGen/AVR/shift.ll.

Shall we commit this first? Then I will go on with the specific case ShiftAmount=7 in another patch.

Now llvm-avr generates the same asm for 8-bit shifts as AVR-GCC does, when ShiftAmount = 1,2,3,4,5,6, 7.

dylanmckay requested changes to this revision.Nov 18 2020, 3:15 AM

Nice patch, cheers.

Just a small comment around a latent FIXME, then it's good to approve.

llvm/lib/Target/AVR/AVRISelLowering.cpp
353

Now llvm-avr generates the same asm for 8-bit shifts as AVR-GCC does, when ShiftAmount = 1,2,3,4,5,6, 7.

This suggests that either this TODO comment is now unnecessary, or there is another optimization that could be implemented specifically for 7-bit shifts that AVR-GCC does not implement.

Remove the TODO comment, or, if you feel it is important, add a sentence describing the TODO

This revision now requires changes to proceed.Nov 18 2020, 3:15 AM
benshi001 updated this revision to Diff 306067.Nov 18 2020, 5:23 AM
benshi001 marked an inline comment as done.
benshi001 added inline comments.
llvm/lib/Target/AVR/AVRISelLowering.cpp
353

The TODO is removed.

benshi001 marked an inline comment as done.Dec 23 2020, 5:37 PM

ping ...

dylanmckay accepted this revision.Jan 23 2021, 1:03 AM
This revision is now accepted and ready to land.Jan 23 2021, 1:03 AM
This revision was automatically updated to reflect the committed changes.