This is an archive of the discontinued LLVM Phabricator instance.

[llvm][AArch64] Prevent spurious zero extension.
AbandonedPublic

Authored by fpetrogalli on Oct 26 2020, 8:39 AM.

Details

Summary

This patch prevents generating a spurious zero extension of a sign
extended load, when the only use of the signed value is a comparison
that tests the sign bit of the signed extended value.

Now the compiler generates a zero extended load directly, and compares
the sign bit of the original unextended load instead of the sign
extended one.

The output code of (some of) the tests before and after the patch
looks as follows.

BEFORE:                          |  AFTER:
f_i32_i8:                        |  f_i32_i8:
        ldrsb   w9, [x0]         |      ldrb    w8, [x0]
        and     w8, w9, #0xff    |      tbnz    w8, #7, .LBB0_2
        tbnz    w9, #31, .LBB0_2 |      add     w0, w8, w8
        add     w0, w8, w8       |      ret
        ret                      |  .LBB0_2:
.LBB0_2:                         |      mul     w0, w8, w8
        mul     w0, w8, w8       |      ret
        ret                      |
                                 |
g_i32_i16:                       |  g_i32_i16:
        ldrsh   w8, [x0]         |      ldrh    w0, [x0]
        and     w0, w8, #0xffff  |      tbnz    w0, #15, .LBB3_2
        tbnz    w8, #31, .LBB3_2 |      ret
        ret                      |  .LBB3_2:
.LBB3_2:                         |      lsl     w0, w0, #1
        lsl     w0, w0, #1       |      ret
        ret                      |

Notes:

There is no code-size degradation in the tests modified in
llvm/test/CodeGen/ARM/select-imm.ll

In particular, the THUMB1 test in there have gone through the
follow improvement:

BEFORE                        |  AFTER
t9:                           |  t9:
        .fnstart              |          .fnstart
        .save   {r4, lr}      |          .save   {r4, lr}
        push    {r4, lr}      |          push    {r4, lr}
        ldrb    r4, [r0]      |          ldrb    r4, [r0]
        movs    r0, #1        |          movs    r0, #1
        bl      f             |          bl      f
        sxtb    r1, r4        |          cmp     r4, r4
        uxtb    r0, r1        |          bne     .LBB0_3
        cmp     r0, r0        |          sxtb    r0, r4
        bne     .LBB8_3       |          adds    r0, r0, #1
        adds    r1, r1, #1    |          mov     r1, r4     .LBB0_2:
        mov     r2, r0        |  .LBB8_2:
.LBB8_2:                      |          adds    r0, r0, #1
        adds    r1, r1, #1    |          adds    r1, r1, #1
        adds    r2, r2, #1    |          uxtb    r2, r1
        uxtb    r3, r2        |          cmp     r2, r4
        cmp     r3, r0        |          blt     .LBB8_2    .LBB0_3:
        blt     .LBB8_2       |  .LBB8_3:
.LBB8_3:                      |          pop     {r4, pc}
        pop     {r4, pc}      |

Diff Detail

Event Timeline

fpetrogalli created this revision.Oct 26 2020, 8:39 AM
fpetrogalli requested review of this revision.Oct 26 2020, 8:39 AM
fpetrogalli edited the summary of this revision. (Show Details)Oct 26 2020, 8:43 AM
samparker added inline comments.Oct 27 2020, 7:44 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1171 ↗(On Diff #300686)

There's so much existing logic around extensions and load widths that I'm struggling to believe that this is really needed... Maybe DAGCombiner::isAndLoadExtLoad can help instead?

I have removed the method SelectionDAG::isZeroExtendInReg in favour of using the machinery already available in DAGCombiner.

fpetrogalli marked an inline comment as done.Oct 28 2020, 3:36 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1171 ↗(On Diff #300686)

Thank you for pointing this out!

samparker added inline comments.Oct 28 2020, 6:50 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11386

No need for this-> now.

11400

Why are we only caring about those cases, couldn't this generally help mixed types too?

11404

We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check again. Also, are UseOne and UseTwo guaranteed to be ordered the way you're expecting here? Maybe just iterate through all the uses looking for IsZeroExtInReg?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11368

s/that realize/implementing/ or "that realizes" or "that implements".

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5915 ↗(On Diff #301218)

s/getSizeInBits().getFixedSize()/getFixedSizeInBits()/?

(Applies above too)

5913 ↗(On Diff #300686)

I realise this problem is inherited from the existing code. "Mask" seems a confusing name given that it's a bit position. Perhaps "QueryBit", "TestBit", "BitToTest" or similar might be a better name?

(Applies above too)

llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll
3

Could do with a comment to explain the purpose of the test.

fpetrogalli marked 6 inline comments as done.Oct 28 2020, 8:27 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11400

I have removed this extra check. I originally added it with the intention to write a test case that was doing the following:

`
%x =  load i8
%A = zext %x to i64
... use %A
%cmp = sgt i8 %x, -1

But the output code with and without this extra check don't differ at all, so I think this check is not necessary.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5913 ↗(On Diff #300686)

I agree with you. I have gone for: s/Mask/SignBitPos/

fpetrogalli retitled this revision from [llvm][AArch64] Prevent spurious zero extention. to [llvm][AArch64] Prevent spurious zero extension..Oct 28 2020, 8:27 AM
fpetrogalli marked 2 inline comments as done.

Address all review comments but one (still working on the version of UsesDifferInSignExtension that doesn't care about the order of the uses).

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11368
fpetrogalli edited the summary of this revision. (Show Details)Oct 29 2020, 9:15 AM
fpetrogalli edited the summary of this revision. (Show Details)

I have addressed the comment from @samparker about making the check in
UsesDifferInSignExtension independent on the order of the uses. The
code now uses llvm::any_of on all uses.

The request from Sam produced changes in the ARM backend, which
require to introduce an extra combine for AND to make sure no code
size regressions were introduced. The details of the changes are
explained in the summary.

fpetrogalli marked an inline comment as done.Oct 29 2020, 9:22 AM
fpetrogalli marked 2 inline comments as done.
  1. fixed typo
  2. described tests

s/Mask/SignBitPos/ in the code I changed. Revert the change in a place were Mask was accidentally renamed.

fpetrogalli added inline comments.Oct 29 2020, 11:13 AM
llvm/test/CodeGen/ARM/select-imm.ll
221–238

For reference, this test has changed as follows:

;       OLD                     NEW
;       .save   {r4, lr}        .save   {r4, lr}
;       push    {r4, lr}        push    {r4, lr}
;       ldrsb   r4, [r0]        ldrb    r4, [r0]
;       mov     r0, #1          mov     r0, #1
;       bl      f               bl      f
;       uxtb    r0, r4          cmp     r4, r4
;       cmp     r0, r0          popne   {r4, pc}
;       popne   {r4, pc}     .LBB0_1:
; .LBB0_1:                      sxtb    r0, r4
;       add     r1, r4, #1      add     r0, r0, #1
;       mov     r2, r0          mov     r1, r4
; .LBB0_2:                   .LBB0_2:
;       add     r2, r2, #1      add     r1, r1, #1
;       add     r1, r1, #1      add     r0, r0, #1
;       uxtb    r3, r2          uxtb    r2, r1
;       cmp     r3, r0          cmp     r2, r4
;       blt     .LBB0_2         blt     .LBB0_2
;       pop     {r4, pc}        pop     {r4, pc}
240–241

For reference, this output has changed as follows:

; OLD                            NEW
;         .save   {r4, lr}               .save   {r4, lr}
;         push    {r4, lr}               push    {r4, lr}
;         movs    r2, #0                 ldrb    r4, [r0]
;         rsbs    r1, r2, #0             movs    r0, #1
;         adcs    r1, r2                 bl      f
;         ldrb    r4, [r0]               cmp     r4, r4
;         mov     r0, r1                 bne     .LBB0_3
;         bl      f                      sxtb    r0, r4
;         sxtb    r1, r4                 adds    r0, r0, #1
;         uxtb    r0, r1                 mov     r1, r4
;         cmp     r0, r0         .LBB0_2:
;         bne     .LBB0_3                adds    r0, r0, #1
;         adds    r1, r1, #1             adds    r1, r1, #1
;         mov     r2, r0                 uxtb    r2, r1
; .LBB0_2:                               cmp     r2, r4
;         adds    r1, r1, #1             blt     .LBB0_2
;         adds    r2, r2, #1     .LBB0_3:
;         uxtb    r3, r2                 pop     {r4, pc}
;         cmp     r3, r0
;         blt     .LBB0_2
; .LBB0_3:
;         pop     {r4, pc}
260–261

For reference, this has changed as follows:

; OLD                         NEW
;         .save   {r4, lr}            .save   {r4, lr}
;         push    {r4, lr}            push    {r4, lr}
;         ldrsb.w r4, [r0]            ldrb    r4, [r0]
;         movs    r0, #1              movs    r0, #1
;         bl      f                   bl      f
;         uxtb    r0, r4              cmp     r4, r4
;         cmp     r0, r0              it      ne
;         it      ne                  popne   {r4, pc}
;         popne   {r4, pc}    .LBB0_1:
; .LBB0_1:                            sxtb    r0, r4
;         adds    r1, r4, #1          adds    r0, #1
;         mov     r2, r0              mov     r1, r4
; .LBB0_2:                    .LBB0_2:
;         adds    r2, #1              adds    r1, #1
;         adds    r1, #1              adds    r0, #1
;         uxtb    r3, r2              uxtb    r2, r1
;         cmp     r3, r0              cmp     r2, r4
;         blt     .LBB0_2             blt     .LBB0_2
;         pop     {r4, pc}            pop     {r4, pc}

This looks like it should be at least two separate changes; I think the AArch64ISelLowering change should have some impact on its own.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5748

Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose the bitmask is equal to one,

Updating the patch after splitting it in three patches. I'll link the remeining here after publishing them.

fpetrogalli marked an inline comment as done.Nov 2 2020, 8:00 AM
fpetrogalli added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5748

(notice that this inline comment now applies to D90605.

@efriedma - I did that and introduced the extra truncate that is needed before the zero extend, with BitMaskVT being set accordingly to the mask.

EVT BitMaskVT;
if (IsAndZeroExtMask(N0, N1, BitMaskVT))
    return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, DAG.getNode(SDLoc(N), BitMaskVTN0.getOperand(0)));

This generates a loop in the DAGCOmbine because now the zero extend + truncate combination is rendered to the same AND + BITMASK it is trying to replace.

Do you think it is worth investigating more?

fpetrogalli planned changes to this revision.Nov 17 2020, 5:22 AM
fpetrogalli marked an inline comment as done.

Parking this for the moment, as we need to refine the heuristic to decide when to prevent the spurious zero extension.

Thank you all for the reviews!

peterwaller-arm resigned from this revision.Mar 31 2021, 2:20 AM
fpetrogalli abandoned this revision.Jan 17 2023, 3:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 3:12 PM
Herald added a subscriber: StephenFan. · View Herald Transcript