This patch prevents generating a spurious zero extension of a sign
extended load, when the only use of the signed value is a comparison
that tests the sign bit of the signed extended value.
Now the compiler generates a zero extended load directly, and compares
the sign bit of the original unextended load instead of the sign
extended one.
The output code of (some of) the tests before and after the patch
looks as follows.
BEFORE: | AFTER: f_i32_i8: | f_i32_i8: ldrsb w9, [x0] | ldrb w8, [x0] and w8, w9, #0xff | tbnz w8, #7, .LBB0_2 tbnz w9, #31, .LBB0_2 | add w0, w8, w8 add w0, w8, w8 | ret ret | .LBB0_2: .LBB0_2: | mul w0, w8, w8 mul w0, w8, w8 | ret ret | | g_i32_i16: | g_i32_i16: ldrsh w8, [x0] | ldrh w0, [x0] and w0, w8, #0xffff | tbnz w0, #15, .LBB3_2 tbnz w8, #31, .LBB3_2 | ret ret | .LBB3_2: .LBB3_2: | lsl w0, w0, #1 lsl w0, w0, #1 | ret ret |
Notes:
There is no code-size degradation in the tests modified in
llvm/test/CodeGen/ARM/select-imm.ll
In particular, the THUMB1 test in there have gone through the
follow improvement:
BEFORE | AFTER t9: | t9: .fnstart | .fnstart .save {r4, lr} | .save {r4, lr} push {r4, lr} | push {r4, lr} ldrb r4, [r0] | ldrb r4, [r0] movs r0, #1 | movs r0, #1 bl f | bl f sxtb r1, r4 | cmp r4, r4 uxtb r0, r1 | bne .LBB0_3 cmp r0, r0 | sxtb r0, r4 bne .LBB8_3 | adds r0, r0, #1 adds r1, r1, #1 | mov r1, r4 .LBB0_2: mov r2, r0 | .LBB8_2: .LBB8_2: | adds r0, r0, #1 adds r1, r1, #1 | adds r1, r1, #1 adds r2, r2, #1 | uxtb r2, r1 uxtb r3, r2 | cmp r2, r4 cmp r3, r0 | blt .LBB8_2 .LBB0_3: blt .LBB8_2 | .LBB8_3: .LBB8_3: | pop {r4, pc} pop {r4, pc} |
Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose the bitmask is equal to one,