Page MenuHomePhabricator

[AVR] Optimize 16-bit int shift
ClosedPublic

Authored by benshi001 on Oct 24 2020, 2:42 AM.

Details

Summary

Logic left and right shifts get improved with shift amount 4 ~ 15.
Arithmetic right shift gets improved with shift amount 8-15.

Diff Detail

Event Timeline

benshi001 created this revision.Oct 24 2020, 2:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 24 2020, 2:42 AM
benshi001 requested review of this revision.Oct 24 2020, 2:42 AM
benshi001 updated this revision to Diff 300473.Oct 24 2020, 4:11 AM

See my comment D89047#2366709.
I think this optimization would be implemented better in SelectionDAG instead of using pseudo-instructions. That would also make it possible to optimize things like (short)x << 12.

See my comment D89047#2366709.
I think this optimization would be implemented better in SelectionDAG instead of using pseudo-instructions. That would also make it possible to optimize things like (short)x << 12.

Thanks. I would try to improve my patch by

  1. introducing AVRISD:LSL4/SLR4/ASR4/LSL12/SLR12/ASR12,
  2. avoid using pseudo instructions.

The above ways can optimize most cases, except for shiftAmount = 3 and shiftAmount = 15. However I would like to handle these corner cases in another patch.

See my comment D89047#2366709.
I think this optimization would be implemented better in SelectionDAG instead of using pseudo-instructions. That would also make it possible to optimize things like (short)x << 12.

Could you please explain more about be implemented better in SelectionDAG instead of using pseudo-instructions ? I can not figure out a way using SelectionDAG. Because I think SelectionDAG requires data flow dependency and looks like a tree structure. But The mov/clr pair seems can not be put into a tree structure.

benshi001 updated this revision to Diff 303392.Nov 6 2020, 3:02 AM
benshi001 updated this revision to Diff 303394.Nov 6 2020, 3:05 AM
benshi001 edited the summary of this revision. (Show Details)

Logic left and right shifts get improved with shift amount 4 ~ 15.
Arithmetic right shift gets improved with shift amount 8-15.

Though they are not in the best form as avr-gcc generates, current llvm code gets big improvement against itself.

I will continue to optimize 16-bit int shifts in future patches.

dylanmckay accepted this revision.Jan 27 2021, 3:36 AM

Looks good to me. Indeed, it is sad this could not be implemented at a higher level but even as is it is a great optimization.

Thanks for the patch!

This revision is now accepted and ready to land.Jan 27 2021, 3:36 AM
This revision was automatically updated to reflect the committed changes.

I've been testing the llvm main branch and I found that this change introduces a miscompilation. I have not yet investigated what exactly is going wrong.

I found at least one code sample that is miscompiled.

Code:

unsigned lsl(int a, int b, int c, int d, int e, int n) {
    return n << 4;
}

Compiles to:

lsl:                                    ; @lsl
        push    r14
        push    r15
        swap    r15
        swap    r14
        andi    r15, 240
        eor     r15, r14
        andi    r14, 240
        eor     r15, r14
        movw    r24, r14
        pop     r15
        pop     r14
        ret

However, andi r15, 240 is not valid. According to the ISA specification, the register operand of andi must be in the range r16-r31.

Compiler Explorer: https://godbolt.org/z/qb9PWP

Thanks for reporting. I have submited a patch to fix that bug.
https://reviews.llvm.org/D96590

You are appreciated to review it.

I found at least one code sample that is miscompiled.

Code:

unsigned lsl(int a, int b, int c, int d, int e, int n) {
    return n << 4;
}

Compiles to:

lsl:                                    ; @lsl
        push    r14
        push    r15
        swap    r15
        swap    r14
        andi    r15, 240
        eor     r15, r14
        andi    r14, 240
        eor     r15, r14
        movw    r24, r14
        pop     r15
        pop     r14
        ret

However, andi r15, 240 is not valid. According to the ISA specification, the register operand of andi must be in the range r16-r31.

Compiler Explorer: https://godbolt.org/z/qb9PWP