Page MenuHomePhabricator

[AVR] Optimize 16-bit shifts
AbandonedPublic

Authored by benshi001 on Feb 11 2021, 7:26 AM.

Details

Reviewers
dylanmckay
aykevl
Summary

Remove redundant RORs when shift amount >= 8.

Diff Detail

Event Timeline

benshi001 created this revision.Feb 11 2021, 7:26 AM
benshi001 requested review of this revision.Feb 11 2021, 7:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2021, 7:26 AM
benshi001 edited the summary of this revision. (Show Details)Feb 11 2021, 7:27 AM

For int >> 9, the following instructions are genreated

mov r24, r25
clr r25
lsr r25
ror r24

However the ror is unecessary, it can be simpilfied to

mov r24, r25
clr r25
lsr r24
benshi001 updated this revision to Diff 323016.Feb 11 2021, 7:48 AM

I should have explained this a while ago, sorry for neglecting this.

See my comment D89047#2366709.
I think this optimization would be implemented better in SelectionDAG instead of using pseudo-instructions. That would also make it possible to optimize things like (short)x << 12.

Could you please explain more about be implemented better in SelectionDAG instead of using pseudo-instructions ? I can not figure out a way using SelectionDAG. Because I think SelectionDAG requires data flow dependency and looks like a tree structure. But The mov/clr pair seems can not be put into a tree structure.

Sorry I meant in the MachineIR phase. I think the ideal way to expand bit shift operations is as follows:

  1. Lower lsl/lshr/ashr operations in SelectionDAG to pseudo instructions that have a constant immediate operand. You will only need six pseudo instructions this way: for lsl/lshr/ashr and for both 8 bit and 16 bit versions. These pseudo instructions should have the usesCustomInserter flag set.
  2. Expand these instructions in AVRTargetLowering::EmitInstrWithCustomInserter. There are already a few existing expansions in there.

The big advantage is that you can do shifts before register allocation and I think it is also easier to structure the code. Also, this means you don't need as many instructions to expand in the pseudo instruction expansion pass.

Recently I have been thinking a lot about constant shifts on AVR and I've written a blog post here: https://aykevl.nl/2021/02/avr-bitshift
I think you will find it useful.

To be clear: I really appreciate your work on the AVR backend. There are many things that need to be improved, such as these bit shift operations. But I think there is a different and possibly better way to implement these expansions.

I should have explained this a while ago, sorry for neglecting this.

See my comment D89047#2366709.
I think this optimization would be implemented better in SelectionDAG instead of using pseudo-instructions. That would also make it possible to optimize things like (short)x << 12.

Could you please explain more about be implemented better in SelectionDAG instead of using pseudo-instructions ? I can not figure out a way using SelectionDAG. Because I think SelectionDAG requires data flow dependency and looks like a tree structure. But The mov/clr pair seems can not be put into a tree structure.

Sorry I meant in the MachineIR phase. I think the ideal way to expand bit shift operations is as follows:

  1. Lower lsl/lshr/ashr operations in SelectionDAG to pseudo instructions that have a constant immediate operand. You will only need six pseudo instructions this way: for lsl/lshr/ashr and for both 8 bit and 16 bit versions. These pseudo instructions should have the usesCustomInserter flag set.
  2. Expand these instructions in AVRTargetLowering::EmitInstrWithCustomInserter. There are already a few existing expansions in there.

The big advantage is that you can do shifts before register allocation and I think it is also easier to structure the code. Also, this means you don't need as many instructions to expand in the pseudo instruction expansion pass.

Recently I have been thinking a lot about constant shifts on AVR and I've written a blog post here: https://aykevl.nl/2021/02/avr-bitshift
I think you will find it useful.

To be clear: I really appreciate your work on the AVR backend. There are many things that need to be improved, such as these bit shift operations. But I think there is a different and possibly better way to implement these expansions.

Thanks for your explanation. I will study your way later.

I should have explained this a while ago, sorry for neglecting this.

See my comment D89047#2366709.
I think this optimization would be implemented better in SelectionDAG instead of using pseudo-instructions. That would also make it possible to optimize things like (short)x << 12.

Could you please explain more about be implemented better in SelectionDAG instead of using pseudo-instructions ? I can not figure out a way using SelectionDAG. Because I think SelectionDAG requires data flow dependency and looks like a tree structure. But The mov/clr pair seems can not be put into a tree structure.

Sorry I meant in the MachineIR phase. I think the ideal way to expand bit shift operations is as follows:

  1. Lower lsl/lshr/ashr operations in SelectionDAG to pseudo instructions that have a constant immediate operand. You will only need six pseudo instructions this way: for lsl/lshr/ashr and for both 8 bit and 16 bit versions. These pseudo instructions should have the usesCustomInserter flag set.
  2. Expand these instructions in AVRTargetLowering::EmitInstrWithCustomInserter. There are already a few existing expansions in there.

The big advantage is that you can do shifts before register allocation and I think it is also easier to structure the code. Also, this means you don't need as many instructions to expand in the pseudo instruction expansion pass.

Recently I have been thinking a lot about constant shifts on AVR and I've written a blog post here: https://aykevl.nl/2021/02/avr-bitshift
I think you will find it useful.

To be clear: I really appreciate your work on the AVR backend. There are many things that need to be improved, such as these bit shift operations. But I think there is a different and possibly better way to implement these expansions.

Thanks. I have submitted a new patch with a compromised solution against your suggestion. Your suggestion introducing 6 pseudo instructions with an extra shift amount operand is really good. https://reviews.llvm.org/D98335

But I still like to do the pseudo expansion in the old way, which requires the least change.

benshi001 abandoned this revision.Mar 10 2021, 5:38 AM