Details
- Reviewers
dylanmckay aykevl - Commits
- rG25531a1d9657: [AVR] Optimize 8-bit logic left/right shifts
Diff Detail
Event Timeline
The original AVR backend TD is also wrong, that it maps the bswap IR to AVR's swap instruction.
Since the IR bswap is byte level operation, which requires the minimal data type to be at least 16-bit.
But AVR's swap performs on half-byte level.
The original AVR backend TD is also wrong, that it maps the bswap IR to AVR's swap instruction.
Since the IR bswap is byte level operation, which requires the minimal data type to be at least 16-bit.
But AVR's swap performs on half-byte level.
I agree it looks wrong, but I can't get it to produce invalid code: https://godbolt.org/z/3YTnzx
Also, while this is an improvement I wonder whether there is a more systematic way to do this? This optimizes a very specific code pattern but it doesn't seem easy to extend it to other shifts and rotates, such as ((char)x) << 3 for example. The thing I have in mind works like this:
- 8 bit shifts are all implemented with similar code that uses shifts, swaps, and ands etc to get the correct shift amount
- 16 bit shifts can be built on top of that, either shifting two bytes individually and ORing the results together, or for larger (>= 8 bit) shifts swap the bytes and use an all-zero or all-ones byte for the other.
Or perhaps there are better ways to do this.
If you play around with godbolt.org (https://godbolt.org/z/foW7qc) you can see that it manages to produce very short code for nearly all shifts, apparently falling back to loops for hard cases. It even changes its strategy to inlining the entire shift (no loop) when using -O2.
Also, while this is an improvement I wonder whether there is a more systematic way to do this? This optimizes a very specific code pattern but it doesn't seem easy to extend it to other shifts and rotates, such as ((char)x) << 3 for example. The thing I have in mind works like this:
The optimizations for shifts are a bit complex, and I would like to do it in smaller sperated patches. This way makes them easy to review.
- 8 bit shifts are all implemented with similar code that uses shifts, swaps, and ands etc to get the correct shift amount
It is hard for 8-bit shift with shiftAmount = 3.
- 16 bit shifts can be built on top of that, either shifting two bytes individually and ORing the results together, or for larger (>= 8 bit) shifts swap the bytes and use an all-zero or all-ones byte for the other.
In current patch, I would like to only optimize 8-bit shifts and leave 16-bit shifts in other patches.
Or perhaps there are better ways to do this.
If you play around with godbolt.org (https://godbolt.org/z/foW7qc) you can see that it manages to produce very short code for nearly all shifts, apparently falling back to loops for hard cases. It even changes its strategy to inlining the entire shift (no loop) when using -O2.
For 8-bit shifts, my patch performs the same as avr-gcc, except for shiftAmount = 7. I will improve it soon.
I have uploaded a new revision with more tests.
Now llvm-avr generates the same asm for logic left/right shifts when ShiftAmount = 1,2,3,4,5,6, except for 7.
This is shown in the test llvm/test/CodeGen/AVR/shift.ll.
Shall we commit this first? Then I will go on with the specific case ShiftAmount=7 in another patch.
Now llvm-avr generates the same asm for 8-bit shifts as AVR-GCC does, when ShiftAmount = 1,2,3,4,5,6, 7.
Nice patch, cheers.
Just a small comment around a latent FIXME, then it's good to approve.
llvm/lib/Target/AVR/AVRISelLowering.cpp | ||
---|---|---|
353 |
This suggests that either this TODO comment is now unnecessary, or there is another optimization that could be implemented specifically for 7-bit shifts that AVR-GCC does not implement. Remove the TODO comment, or, if you feel it is important, add a sentence describing the TODO |
llvm/lib/Target/AVR/AVRISelLowering.cpp | ||
---|---|---|
353 | The TODO is removed. |
This suggests that either this TODO comment is now unnecessary, or there is another optimization that could be implemented specifically for 7-bit shifts that AVR-GCC does not implement.
Remove the TODO comment, or, if you feel it is important, add a sentence describing the TODO