Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[AVR] Refactor 8-bit & 16-bit shifts

Authored by benshi001 on Mar 10 2021, 5:16 AM.

Diff Detail

Event Timeline

benshi001 created this revision.Mar 10 2021, 5:16 AM
benshi001 requested review of this revision.Mar 10 2021, 5:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2021, 5:16 AM

As @aykevl suggested in,

there could be further optimizations for 8-bit and 16-bit shifts, besides current llvm's optimization.

If we continue current llvm's way, more AVR specific SDNode would be introduced, and introducing new SDNodes for each shift amount seems stupid.

And this patch removes those stupid SDNodes and introducing 6 new ones, which have an extra immediate shift amount operand.

This way make it easy to add optimization for other immediate shift amount.

benshi001 updated this revision to Diff 329635.Mar 10 2021, 5:53 AM

Other optimization such as ashr x, 6 and ashr x, 15 can be easily added in future patches along this way.

benshi001 added a comment.EditedMar 10 2021, 6:29 AM

As shown in test case,

this patch generates optimized code for most cases as avr-gcc does (though there are slight differences in instruction sequence).


  1. futher optimization for 8-bit ashr x, 6
  2. remove the redundant lsl <lower byte> in 16-bit shl when shiftAmount>8;
  3. remove the redundant lsr <higher byte> in 16-bit lshr when shiftAmount>8;
  4. remove the redundant asr <higher byte> in 16-bit ashr when shiftAmount>8;
benshi001 added a comment.EditedApr 5 2021, 8:32 PM

Some key points in my design,

  1. The reason for (outs LD8:$rd) but (ins GPR8:$src) in the new pseudo instructions, is that they rely on ANDI which requries R16-R31.
  1. The reasons LSLW/LSRW/ASRW are not treated as LSLWN(1)/LSRWN(1)/ASRWN(1), are that

    2.1 LSLW/LSRW/ASRW are important fallbacks when shiftAmount = 2, 3

    2.2 LSLW/LSRW/ASRW are assisting methods for LSLWN(x)/LSRWN(x)/ASRWN(x) when shiftAmount = 5,6,9,10,13,14.

    They are so important that need to keep indpendent.
dylanmckay accepted this revision.May 30 2021, 4:47 AM

Very nice design and simplification, looks good to me. Thank you for the link to Ayke's very nice blog post, as well as the detailed summary about your design and the tradeoffs within. It was also helpful that you kept the refactor and the optimizations themselves in separate patches, making this a non-functional change.

This revision is now accepted and ready to land.May 30 2021, 4:47 AM
This revision was automatically updated to reflect the committed changes.

This looks indeed a lot cleaner! I'm glad the blog post was useful :)

Ideally all these instructions would be expanded before register allocation (as described in my blog post), but sadly that doesn't seem to be possible with the current design of the AVR backend. Maybe it becomes possible in the future with GlobalISel.