When not optimizing for minimum size (-Oz) we custom lower wide shifts
(SHL_PARTS, SRA_PARTS, SRL_PARTS) instead of expanding to a libcall. Although
conceptually independent from it, this patch builds upon the D59355 optimization
of SELECT sequences to obtain the optimized code exhibited in the updated tests.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Thanks for this Luis. This needs coverage for the optForMinSize checks. I'd recommend extending shifts.ll with minsize versions of the tests, e.g.
define i64 @lshr64_minsize(i64 %a, i64 %b) minsize nounwind { ; RV32I-LABEL: lshr64_minsize: ; RV32I: # %bb.0: ; RV32I-NEXT: addi sp, sp, -16 ; RV32I-NEXT: sw ra, 12(sp) ; RV32I-NEXT: call __lshrdi3 ; RV32I-NEXT: lw ra, 12(sp) ; RV32I-NEXT: addi sp, sp, 16 ; RV32I-NEXT: ret %1 = lshr i64 %a, %b ret i64 %1 }
Updates the patch to:
- Add a test covering the optForMinSize option
- Add a test that shifts a 128-bit value
- Add checks for RV64 (relevant for the 128-bit test)
- Fix the materialization of the -XLEN constant in RV64
I think it would be worth rounding out the tests a bit, especially as it should just require a trivial amount of copy and paste and editing to do so:
- ashr64_minsize
- shl64_minsize
- lshr128
- ashr128
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
557 | I'm sure there's an off-by-one error in my reasoning here, but aren't we already safe to assume that XLEN-Shamt can fit in the bits available for srl? Since we checked for if Shamt-XLEN < 0, the only case where XLEN-Shamt couldn't fit would be where XLEN-Shamt == XLEN IE when Shamt-XLEN == 0, which is handled by the basic case below. I'm assuming this was the reasoning for the XLEN-1 stuff and the (Lo >>u 1) here, if not can someone explain it to me? | |
560 | Nitpick: shouldn't this read Hi = Lo << (Shamt-XLEN) |
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
557 | I see at least part of my error. It should read 'IE when Shamt == 0`. Either way I'm not seeing how it should come up? |
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
557 | Okay sorry for the noise, I see why this is an issue now. Shamt == 0 is indeed handled by this case. Could we have some comment to that effect in this function? And likewise for lowerShiftRightParts? |
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
557 | I think the comment "// Shamt < XLEN" documents reasonably well that this branch also covers Shamt == 0, since "0 < XLEN". If I misunderstood your comment please clarify. | |
560 | The (Shamt-XLEN) should have been in the "else" branch, not the "then" branch. These kind of mistakes were easy to make because the assembly has the "basic blocks" in the reverse order (I can't fix that in the pseudo-code without inverting the condition, which would be even more confusing). I had caught most of them, but that one slipped through. Good catch! I'll fix that together with the remaining issues. |
- Fixed a bug in the pseudo-code for lowerShiftLeftParts;
- Added more tests, which now cover all the 64-bit shift variants, with and without the minsize attribute, plus all of the 128-bit shift variants.
This looks good to me. The GCC torture suite is happy with this code too, which serves as an extra (but non-exhaustive) sanity check. It's not immediately obvious to me that checking for OptForMinSize is better than OptForSize, but the precise criteria for when something should be done at Oz vs Os don't seem to be well specified and checking OptForMinSize is consistent with other backends, so the right choice for now.
I'm sure there's an off-by-one error in my reasoning here, but aren't we already safe to assume that XLEN-Shamt can fit in the bits available for srl? Since we checked for if Shamt-XLEN < 0, the only case where XLEN-Shamt couldn't fit would be where XLEN-Shamt == XLEN IE when Shamt-XLEN == 0, which is handled by the basic case below.
I'm assuming this was the reasoning for the XLEN-1 stuff and the (Lo >>u 1) here, if not can someone explain it to me?