there could be further optimizations for 8-bit and 16-bit shifts, besides current llvm's optimization.
If we continue current llvm's way, more AVR specific SDNode would be introduced, and introducing new SDNodes for each shift amount seems stupid.
And this patch removes those stupid SDNodes and introducing 6 new ones, which have an extra immediate shift amount operand.
This way make it easy to add optimization for other immediate shift amount.
As shown in test case https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/AVR/shift.ll,
this patch generates optimized code for most cases as avr-gcc does (though there are slight differences in instruction sequence).
- futher optimization for 8-bit ashr x, 6
- remove the redundant lsl <lower byte> in 16-bit shl when shiftAmount>8;
- remove the redundant lsr <higher byte> in 16-bit lshr when shiftAmount>8;
- remove the redundant asr <higher byte> in 16-bit ashr when shiftAmount>8;
Some key points in my design,
- The reason for (outs LD8:$rd) but (ins GPR8:$src) in the new pseudo instructions, is that they rely on ANDI which requries R16-R31.
- The reasons LSLW/LSRW/ASRW are not treated as LSLWN(1)/LSRWN(1)/ASRWN(1), are that
2.1 LSLW/LSRW/ASRW are important fallbacks when shiftAmount = 2, 3
2.2 LSLW/LSRW/ASRW are assisting methods for LSLWN(x)/LSRWN(x)/ASRWN(x) when shiftAmount = 5,6,9,10,13,14.
They are so important that need to keep indpendent.
Very nice design and simplification, looks good to me. Thank you for the link to Ayke's very nice blog post, as well as the detailed summary about your design and the tradeoffs within. It was also helpful that you kept the refactor and the optimizations themselves in separate patches, making this a non-functional change.
This looks indeed a lot cleaner! I'm glad the blog post was useful :)
Ideally all these instructions would be expanded before register allocation (as described in my blog post), but sadly that doesn't seem to be possible with the current design of the AVR backend. Maybe it becomes possible in the future with GlobalISel.