This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
8/8
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1/1
zext-and-signed-compare.ll
-
ARM/
-
arm-shrink-wrapping-linux.ll
3/3
select-imm.ll

Differential D90162

[llvm][AArch64] Prevent spurious zero extension.
AbandonedPublic

Authored by fpetrogalli on Oct 26 2020, 8:39 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
efriedma
samparker
spatel
peterwaller-arm

Summary

This patch prevents generating a spurious zero extension of a sign
extended load, when the only use of the signed value is a comparison
that tests the sign bit of the signed extended value.

Now the compiler generates a zero extended load directly, and compares
the sign bit of the original unextended load instead of the sign
extended one.

The output code of (some of) the tests before and after the patch
looks as follows.

BEFORE:                          |  AFTER:
f_i32_i8:                        |  f_i32_i8:
        ldrsb   w9, [x0]         |      ldrb    w8, [x0]
        and     w8, w9, #0xff    |      tbnz    w8, #7, .LBB0_2
        tbnz    w9, #31, .LBB0_2 |      add     w0, w8, w8
        add     w0, w8, w8       |      ret
        ret                      |  .LBB0_2:
.LBB0_2:                         |      mul     w0, w8, w8
        mul     w0, w8, w8       |      ret
        ret                      |
                                 |
g_i32_i16:                       |  g_i32_i16:
        ldrsh   w8, [x0]         |      ldrh    w0, [x0]
        and     w0, w8, #0xffff  |      tbnz    w0, #15, .LBB3_2
        tbnz    w8, #31, .LBB3_2 |      ret
        ret                      |  .LBB3_2:
.LBB3_2:                         |      lsl     w0, w0, #1
        lsl     w0, w0, #1       |      ret
        ret                      |

Notes:

There is no code-size degradation in the tests modified in
llvm/test/CodeGen/ARM/select-imm.ll

In particular, the THUMB1 test in there have gone through the
follow improvement:

BEFORE                        |  AFTER
t9:                           |  t9:
        .fnstart              |          .fnstart
        .save   {r4, lr}      |          .save   {r4, lr}
        push    {r4, lr}      |          push    {r4, lr}
        ldrb    r4, [r0]      |          ldrb    r4, [r0]
        movs    r0, #1        |          movs    r0, #1
        bl      f             |          bl      f
        sxtb    r1, r4        |          cmp     r4, r4
        uxtb    r0, r1        |          bne     .LBB0_3
        cmp     r0, r0        |          sxtb    r0, r4
        bne     .LBB8_3       |          adds    r0, r0, #1
        adds    r1, r1, #1    |          mov     r1, r4     .LBB0_2:
        mov     r2, r0        |  .LBB8_2:
.LBB8_2:                      |          adds    r0, r0, #1
        adds    r1, r1, #1    |          adds    r1, r1, #1
        adds    r2, r2, #1    |          uxtb    r2, r1
        uxtb    r3, r2        |          cmp     r2, r4
        cmp     r3, r0        |          blt     .LBB8_2    .LBB0_3:
        blt     .LBB8_2       |  .LBB8_3:
.LBB8_3:                      |          pop     {r4, pc}
        pop     {r4, pc}      |

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	420 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
	40 ms	linux > LLVM.CodeGen/AArch64::zext-and-signed-compare.ll
	100 ms	linux > LLVM.CodeGen/ARM::select-imm.ll
	90 ms	windows > LLVM.CodeGen/AArch64::zext-and-signed-compare.ll
	190 ms	windows > LLVM.CodeGen/ARM::select-imm.ll
		View Full Test Results (6 Failed)

Event Timeline

fpetrogalli created this revision.Oct 26 2020, 8:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2020, 8:39 AM

Herald added subscribers: llvm-commits, ecnelises, danielkiss and 2 others. · View Herald Transcript

fpetrogalli requested review of this revision.Oct 26 2020, 8:39 AM

fpetrogalli edited the summary of this revision. (Show Details)Oct 26 2020, 8:43 AM

Harbormaster completed remote builds in B76412: Diff 300686.Oct 26 2020, 9:15 AM

fpetrogalli added reviewers: samparker, spatel.Oct 27 2020, 3:14 AM

samparker added inline comments.Oct 27 2020, 7:44 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1171 ↗	(On Diff #300686)	There's so much existing logic around extensions and load widths that I'm struggling to believe that this is really needed... Maybe DAGCombiner::isAndLoadExtLoad can help instead?

I have removed the method SelectionDAG::isZeroExtendInReg in favour of using the machinery already available in DAGCombiner.

fpetrogalli marked an inline comment as done.Oct 28 2020, 3:36 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1171 ↗	(On Diff #300686)	Thank you for pointing this out!

Harbormaster completed remote builds in B76698: Diff 301218.Oct 28 2020, 4:45 AM

samparker added inline comments.Oct 28 2020, 6:50 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11386	No need for this-> now.
11400	Why are we only caring about those cases, couldn't this generally help mixed types too?
11404	We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check again. Also, are UseOne and UseTwo guaranteed to be ordered the way you're expecting here? Maybe just iterate through all the uses looking for IsZeroExtInReg?

peterwaller-arm added inline comments.Oct 28 2020, 7:34 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11368	s/that realize/implementing/ or "that realizes" or "that implements".
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5915 ↗	(On Diff #301218)	s/getSizeInBits().getFixedSize()/getFixedSizeInBits()/? (Applies above too)
5913 ↗	(On Diff #300686)	I realise this problem is inherited from the existing code. "Mask" seems a confusing name given that it's a bit position. Perhaps "QueryBit", "TestBit", "BitToTest" or similar might be a better name? (Applies above too)
llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll
3	Could do with a comment to explain the purpose of the test.

fpetrogalli marked 6 inline comments as done.Oct 28 2020, 8:27 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11400	I have removed this extra check. I originally added it with the intention to write a test case that was doing the following: ` %x = load i8 %A = zext %x to i64 ... use %A %cmp = sgt i8 %x, -1 But the output code with and without this extra check don't differ at all, so I think this check is not necessary.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5913 ↗	(On Diff #300686)	I agree with you. I have gone for: `s/Mask/SignBitPos/`

fpetrogalli retitled this revision from [llvm][AArch64] Prevent spurious zero extention. to [llvm][AArch64] Prevent spurious zero extension..Oct 28 2020, 8:27 AM

Address all review comments but one (still working on the version of UsesDifferInSignExtension that doesn't care about the order of the uses).

Harbormaster completed remote builds in B76734: Diff 301296.Oct 28 2020, 9:06 AM

peterwaller-arm added inline comments.Oct 28 2020, 9:22 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11368

fpetrogalli edited the summary of this revision. (Show Details)Oct 29 2020, 9:15 AM

I have addressed the comment from @samparker about making the check in
UsesDifferInSignExtension independent on the order of the uses. The
code now uses llvm::any_of on all uses.

The request from Sam produced changes in the ARM backend, which
require to introduce an extra combine for AND to make sure no code
size regressions were introduced. The details of the changes are
explained in the summary.

fpetrogalli marked an inline comment as done.Oct 29 2020, 9:22 AM

Harbormaster completed remote builds in B76930: Diff 301652.Oct 29 2020, 10:34 AM

fixed typo
described tests

s/Mask/SignBitPos/ in the code I changed. Revert the change in a place were Mask was accidentally renamed.

fpetrogalli added inline comments.Oct 29 2020, 11:13 AM

llvm/test/CodeGen/ARM/select-imm.ll

221–238

For reference, this test has changed as follows:

;       OLD                     NEW
;       .save   {r4, lr}        .save   {r4, lr}
;       push    {r4, lr}        push    {r4, lr}
;       ldrsb   r4, [r0]        ldrb    r4, [r0]
;       mov     r0, #1          mov     r0, #1
;       bl      f               bl      f
;       uxtb    r0, r4          cmp     r4, r4
;       cmp     r0, r0          popne   {r4, pc}
;       popne   {r4, pc}     .LBB0_1:
; .LBB0_1:                      sxtb    r0, r4
;       add     r1, r4, #1      add     r0, r0, #1
;       mov     r2, r0          mov     r1, r4
; .LBB0_2:                   .LBB0_2:
;       add     r2, r2, #1      add     r1, r1, #1
;       add     r1, r1, #1      add     r0, r0, #1
;       uxtb    r3, r2          uxtb    r2, r1
;       cmp     r3, r0          cmp     r2, r4
;       blt     .LBB0_2         blt     .LBB0_2
;       pop     {r4, pc}        pop     {r4, pc}

240–241

For reference, this output has changed as follows:

; OLD                            NEW
;         .save   {r4, lr}               .save   {r4, lr}
;         push    {r4, lr}               push    {r4, lr}
;         movs    r2, #0                 ldrb    r4, [r0]
;         rsbs    r1, r2, #0             movs    r0, #1
;         adcs    r1, r2                 bl      f
;         ldrb    r4, [r0]               cmp     r4, r4
;         mov     r0, r1                 bne     .LBB0_3
;         bl      f                      sxtb    r0, r4
;         sxtb    r1, r4                 adds    r0, r0, #1
;         uxtb    r0, r1                 mov     r1, r4
;         cmp     r0, r0         .LBB0_2:
;         bne     .LBB0_3                adds    r0, r0, #1
;         adds    r1, r1, #1             adds    r1, r1, #1
;         mov     r2, r0                 uxtb    r2, r1
; .LBB0_2:                               cmp     r2, r4
;         adds    r1, r1, #1             blt     .LBB0_2
;         adds    r2, r2, #1     .LBB0_3:
;         uxtb    r3, r2                 pop     {r4, pc}
;         cmp     r3, r0
;         blt     .LBB0_2
; .LBB0_3:
;         pop     {r4, pc}

260–261

For reference, this has changed as follows:

; OLD                         NEW
;         .save   {r4, lr}            .save   {r4, lr}
;         push    {r4, lr}            push    {r4, lr}
;         ldrsb.w r4, [r0]            ldrb    r4, [r0]
;         movs    r0, #1              movs    r0, #1
;         bl      f                   bl      f
;         uxtb    r0, r4              cmp     r4, r4
;         cmp     r0, r0              it      ne
;         it      ne                  popne   {r4, pc}
;         popne   {r4, pc}    .LBB0_1:
; .LBB0_1:                            sxtb    r0, r4
;         adds    r1, r4, #1          adds    r0, #1
;         mov     r2, r0              mov     r1, r4
; .LBB0_2:                    .LBB0_2:
;         adds    r2, #1              adds    r1, #1
;         adds    r1, #1              adds    r0, #1
;         uxtb    r3, r2              uxtb    r2, r1
;         cmp     r3, r0              cmp     r2, r4
;         blt     .LBB0_2             blt     .LBB0_2
;         pop     {r4, pc}            pop     {r4, pc}

This looks like it should be at least two separate changes; I think the AArch64ISelLowering change should have some impact on its own.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5748	Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose the bitmask is equal to one,

Harbormaster completed remote builds in B76947: Diff 301691.Oct 29 2020, 11:45 AM

Harbormaster completed remote builds in B76946: Diff 301684.Oct 29 2020, 12:01 PM

Updating the patch after splitting it in three patches. I'll link the remeining here after publishing them.

fpetrogalli added parent revisions: D90605: [llvm][AArch64] Simplify (and (sign_extend..) #bitmask)., D90606: [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests..Nov 2 2020, 6:43 AM

fpetrogalli edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B77244: Diff 302257.Nov 2 2020, 6:49 AM

fpetrogalli marked an inline comment as done.Nov 2 2020, 8:00 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5748	(notice that this inline comment now applies to D90605. @efriedma - I did that and introduced the extra truncate that is needed before the zero extend, with `BitMaskVT` being set accordingly to the mask. EVT BitMaskVT; if (IsAndZeroExtMask(N0, N1, BitMaskVT)) return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, DAG.getNode(SDLoc(N), BitMaskVTN0.getOperand(0))); This generates a loop in the DAGCOmbine because now the zero extend + truncate combination is rendered to the same AND + BITMASK it is trying to replace. Do you think it is worth investigating more?

Parking this for the moment, as we need to refine the heuristic to decide when to prevent the spurious zero extension.

Thank you all for the reviews!

peterwaller-arm resigned from this revision.Mar 31 2021, 2:20 AM

fpetrogalli abandoned this revision.Jan 17 2023, 3:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 3:12 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

27 lines

test/

CodeGen/

AArch64/

zext-and-signed-compare.ll

107 lines

ARM/

arm-shrink-wrapping-linux.ll

28 lines

select-imm.ll

83 lines

Diff 302257

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,739 Lines • ▼ Show 20 Lines auto IsAndZeroExtMask = [this](SDValue LHS, SDValue RHS) {

auto *C = dyn_cast<ConstantSDNode>(RHS); auto *C = dyn_cast<ConstantSDNode>(RHS);

if (!C || !C->getAPIntValue().isMask()) if (!C || !C->getAPIntValue().isMask())

return false; return false;

unsigned NumOnes =C->getAPIntValue().countTrailingOnes(); unsigned NumOnes =C->getAPIntValue().countTrailingOnes();

if (!isPowerOf2_32(NumOnes)) if (!isPowerOf2_32(NumOnes))

return false; return false;

EVT BitMaskVT = EVT::getIntegerVT(*DAG.getContext(), EVT BitMaskVT = EVT::getIntegerVT(*DAG.getContext(),

efriedmaUnsubmitted

Done

Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose the bitmask is equal to one,

efriedma: Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose…

fpetrogalliAuthorUnsubmitted

Done

(notice that this inline comment now applies to D90605.

@efriedma - I did that and introduced the extra truncate that is needed before the zero extend, with BitMaskVT being set accordingly to the mask.

EVT BitMaskVT;
if (IsAndZeroExtMask(N0, N1, BitMaskVT))
    return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, DAG.getNode(SDLoc(N), BitMaskVTN0.getOperand(0)));

This generates a loop in the DAGCOmbine because now the zero extend + truncate combination is rendered to the same AND + BITMASK it is trying to replace.

Do you think it is worth investigating more?

fpetrogalli: (notice that this inline comment now applies to D90605. @efriedma - I did that and introduced…

NumOnes); NumOnes);

if (BitMaskVT != LHS.getOperand(0).getValueType()) if (BitMaskVT != LHS.getOperand(0).getValueType())

return false; return false;

return true; return true;

}; };

▲ Show 20 Lines • Show All 5,603 Lines • ▼ Show 20 Lines if (auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1)))

// extended enough. // extended enough.

unsigned InSignBits = DAG.ComputeNumSignBits(N0.getOperand(0)); unsigned InSignBits = DAG.ComputeNumSignBits(N0.getOperand(0));

if (((VTBits - ExtVTBits) - ShAmt->getZExtValue()) < InSignBits) if (((VTBits - ExtVTBits) - ShAmt->getZExtValue()) < InSignBits)

return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0.getOperand(0), return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0.getOperand(0),

N0.getOperand(1)); N0.getOperand(1));

} }

// Finds the pattern implementing the zero extension inreg for

peterwaller-armUnsubmitted

Done

s/that realize/implementing/ or "that realizes" or "that implements".

peterwaller-arm: s/that realize/implementing/ or "that realizes" or "that implements".

peterwaller-armUnsubmitted

Done

}

- // Finds the pattern implememting the zero extension inreg for

+ // Finds the pattern implementing the zero extension inreg for

// illegal values, which is rendered with an and instruction with a

peterwaller-arm:

// illegal values, which is rendered with an and instruction with a

// bit mask. For example, the node for zero extenting the load of an

// i8 value into a i32 value is rendered as:

// i32 = (and (load i8) 0xff)

auto IsZeroExtInReg = [this](SDNode *N) -> bool {

if (N->getOpcode() != ISD::AND)

return false;

auto *AndC = dyn_cast<ConstantSDNode>(N->getOperand(1));

auto *LoadN = dyn_cast<LoadSDNode>(N->getOperand(0));

if (!AndC || !LoadN)

return false;

EVT LoadResultTy = LoadN->getMemoryVT();

EVT ExtVT;

return isAndLoadExtLoad(AndC, LoadN, LoadResultTy, ExtVT);

samparkerUnsubmitted

Done

No need for this-> now.

samparker: No need for this-> now.

};

// fold (sext_inreg (extload x)) -> (sextload x) // fold (sext_inreg (extload x)) -> (sextload x)

// If sextload is not supported by target, we can only do the combine when // If sextload is not supported by target, we can only do the combine when

// load has one use. Doing otherwise can block folding the extload with other // load has one use. Doing otherwise can block folding the extload with other

// extends that the target does support. // extends that the target does support.

if (ISD::isEXTLoad(N0.getNode()) && if (ISD::isEXTLoad(N0.getNode()) && ISD::isUNINDEXEDLoad(N0.getNode()) &&

ISD::isUNINDEXEDLoad(N0.getNode()) &&

ExtVT == cast<LoadSDNode>(N0)->getMemoryVT() && ExtVT == cast<LoadSDNode>(N0)->getMemoryVT() &&

((!LegalOperations && cast<LoadSDNode>(N0)->isSimple() && ((!LegalOperations && cast<LoadSDNode>(N0)->isSimple() &&

N0.hasOneUse()) || N0.hasOneUse()) ||

TLI.isLoadExtLegal(ISD::SEXTLOAD, VT, ExtVT))) { TLI.isLoadExtLegal(ISD::SEXTLOAD, VT, ExtVT)) &&

!llvm::any_of(N0->uses(), IsZeroExtInReg)) {

LoadSDNode *LN0 = cast<LoadSDNode>(N0); LoadSDNode *LN0 = cast<LoadSDNode>(N0);

SDValue ExtLoad = DAG.getExtLoad(ISD::SEXTLOAD, SDLoc(N), VT, SDValue ExtLoad = DAG.getExtLoad(ISD::SEXTLOAD, SDLoc(N), VT,

samparkerUnsubmitted

Done

Why are we only caring about those cases, couldn't this generally help mixed types too?

samparker: Why are we only caring about those cases, couldn't this generally help mixed types too?

fpetrogalliAuthorUnsubmitted

Done

I have removed this extra check. I originally added it with the intention to write a test case that was doing the following:

`
%x =  load i8
%A = zext %x to i64
... use %A
%cmp = sgt i8 %x, -1

But the output code with and without this extra check don't differ at all, so I think this check is not necessary.

fpetrogalli: I have removed this extra check. I originally added it with the intention to write a test case…

LN0->getChain(), LN0->getChain(),

LN0->getBasePtr(), ExtVT, LN0->getBasePtr(), ExtVT,

LN0->getMemOperand()); LN0->getMemOperand());

CombineTo(N, ExtLoad); CombineTo(N, ExtLoad);

samparkerUnsubmitted

Done

We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check again. Also, are UseOne and UseTwo guaranteed to be ordered the way you're expecting here? Maybe just iterate through all the uses looking for IsZeroExtInReg?

samparker: We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check…

CombineTo(N0.getNode(), ExtLoad, ExtLoad.getValue(1)); CombineTo(N0.getNode(), ExtLoad, ExtLoad.getValue(1));

AddToWorklist(ExtLoad.getNode()); AddToWorklist(ExtLoad.getNode());

return SDValue(N, 0); // Return N so it doesn't get rechecked! return SDValue(N, 0); // Return N so it doesn't get rechecked!

} }

// fold (sext_inreg (zextload x)) -> (sextload x) iff load has one use // fold (sext_inreg (zextload x)) -> (sextload x) iff load has one use

if (ISD::isZEXTLoad(N0.getNode()) && ISD::isUNINDEXEDLoad(N0.getNode()) && if (ISD::isZEXTLoad(N0.getNode()) && ISD::isUNINDEXEDLoad(N0.getNode()) &&

N0.hasOneUse() && N0.hasOneUse() &&

ExtVT == cast<LoadSDNode>(N0)->getMemoryVT() && ExtVT == cast<LoadSDNode>(N0)->getMemoryVT() &&

▲ Show 20 Lines • Show All 11,062 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll

This file was added.

				; RUN: llc -mtriple aarch64-linux-gnu -o - -asm-verbose=0 < %s \| FileCheck %s

				; The purpose of the tests `f_` and `g_` is to make sure that the
				peterwaller-armUnsubmitted Done Reply Inline Actions Could do with a comment to explain the purpose of the test. peterwaller-arm: Could do with a comment to explain the purpose of the test.
				; zero extension of the load caused by the `zext` instuction is
				; preferred over the signed extension caused by the signed comparison
				; "greater than -1". The effect of prioritizing the zero extension is
				; to avoid the generation of the signed extension of the data being
				; loaded. This is done by making sure that the sign bit of the
				; original unextended data is being checked instead of the sign bit of
				; the sign extended value.
				;
				; The `f_` and `g_` differ slightly in their structure to make sure
				; that all the cases that compute the position of the sign bit in
				; AArch64IselLowering.cpp (LowerBR_CC) are covered.

				define i32 @f_i32_i8(i8* %p) nounwind {
				; CHECK-LABEL: f_i32_i8:
				; CHECK-NEXT: ldrb w[[N:[0-9]+]], [x0]
				; CHECK-NEXT: tbnz w[[N]], #7, .LBB[[BB:.*]]
				; CHECK-NEXT: add w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: mul w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				entry:
				%0 = load i8, i8* %p
				%conv = zext i8 %0 to i32
				%cmp = icmp sgt i8 %0, -1
				br i1 %cmp, label %A, label %B

				A:
				%retval2 = add i32 %conv, %conv
				ret i32 %retval2

				B:
				%retval1 = mul i32 %conv, %conv
				ret i32 %retval1
				}

				define i32 @f_i32_i16(i16* %p) nounwind {
				; CHECK-LABEL: f_i32_i16:
				; CHECK-NEXT: ldrh w[[N:[0-9]+]], [x0]
				; CHECK-NEXT: tbnz w[[N]], #15, .LBB[[BB:.*]]
				; CHECK-NEXT: add w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: mul w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				entry:
				%0 = load i16, i16* %p
				%conv = zext i16 %0 to i32
				%cmp = icmp sgt i16 %0, -1
				br i1 %cmp, label %A, label %B

				A:
				%retval2 = add i32 %conv, %conv
				ret i32 %retval2

				B:
				%retval1 = mul i32 %conv, %conv
				ret i32 %retval1
				}

				define i32 @g_i32_i8(i8* %p) nounwind {
				; CHECK-LABEL: g_i32_i8:
				; CHECK-NEXT: ldrb w0, [x0]
				; CHECK-NEXT: tbnz w0, #7, .LBB[[BB:.*]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: lsl w0, w0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = load i8, i8* %p, align 1
				%conv = zext i8 %0 to i32
				%cmp1 = icmp sgt i8 %0, -1
				br i1 %cmp1, label %return, label %B

				B: ; preds = %entry
				%add = shl nuw nsw i32 %conv, 1
				ret i32 %add

				return: ; preds = %entry
				ret i32 %conv
				}

				define i32 @g_i32_i16(i16* %p) nounwind {
				; CHECK-LABEL: g_i32_i16:
				; CHECK-NEXT: ldrh w0, [x0]
				; CHECK-NEXT: tbnz w0, #15, .LBB[[BB:.*]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: lsl w0, w0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = load i16, i16* %p, align 1
				%conv = zext i16 %0 to i32
				%cmp1 = icmp sgt i16 %0, -1
				br i1 %cmp1, label %return, label %B

				B: ; preds = %entry
				%add = shl nuw nsw i32 %conv, 1
				ret i32 %add

				return: ; preds = %entry
				ret i32 %conv
				}

llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; ENABLE-NEXT: cmn lr, #1			; ENABLE-NEXT: cmn lr, #1
	; ENABLE-NEXT: bgt .LBB0_7			; ENABLE-NEXT: bgt .LBB0_7
	; ENABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader			; ENABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader
	; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; ENABLE-NEXT: cmp r12, #191			; ENABLE-NEXT: cmp r12, #191
	; ENABLE-NEXT: bhi .LBB0_7			; ENABLE-NEXT: bhi .LBB0_7
	; ENABLE-NEXT: @ %bb.14: @ %while.body24.preheader			; ENABLE-NEXT: @ %bb.14: @ %while.body24.preheader
	; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; ENABLE-NEXT: sub r3, r3, #2			; ENABLE-NEXT: sub lr, r3, #2
	; ENABLE-NEXT: .LBB0_15: @ %while.body24			; ENABLE-NEXT: .LBB0_15: @ %while.body24
	; ENABLE-NEXT: @ Parent Loop BB0_7 Depth=1			; ENABLE-NEXT: @ Parent Loop BB0_7 Depth=1
	; ENABLE-NEXT: @ => This Inner Loop Header: Depth=2			; ENABLE-NEXT: @ => This Inner Loop Header: Depth=2
	; ENABLE-NEXT: mov r0, r3			; ENABLE-NEXT: mov r0, lr
	; ENABLE-NEXT: cmp r3, r2			; ENABLE-NEXT: cmp lr, r2
	; ENABLE-NEXT: bls .LBB0_7			; ENABLE-NEXT: bls .LBB0_7
	; ENABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge			; ENABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge
	; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; ENABLE-NEXT: mov r3, r0			; ENABLE-NEXT: mov lr, r0
	; ENABLE-NEXT: ldrsb lr, [r3], #-1			; ENABLE-NEXT: ldrb r12, [lr], #-1
	; ENABLE-NEXT: cmn lr, #1			; ENABLE-NEXT: sxtb r3, r12
	; ENABLE-NEXT: uxtb r12, lr			; ENABLE-NEXT: cmn r3, #1
	; ENABLE-NEXT: bgt .LBB0_7			; ENABLE-NEXT: bgt .LBB0_7
	; ENABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge			; ENABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge
	; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; ENABLE-NEXT: cmp r12, #192			; ENABLE-NEXT: cmp r12, #192
	; ENABLE-NEXT: blo .LBB0_15			; ENABLE-NEXT: blo .LBB0_15
	; ENABLE-NEXT: b .LBB0_7			; ENABLE-NEXT: b .LBB0_7
	; ENABLE-NEXT: .LBB0_18:			; ENABLE-NEXT: .LBB0_18:
	; ENABLE-NEXT: mov r0, r3			; ENABLE-NEXT: mov r0, r3
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; DISABLE-NEXT: cmn lr, #1			; DISABLE-NEXT: cmn lr, #1
	; DISABLE-NEXT: bgt .LBB0_7			; DISABLE-NEXT: bgt .LBB0_7
	; DISABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader			; DISABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader
	; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; DISABLE-NEXT: cmp r12, #191			; DISABLE-NEXT: cmp r12, #191
	; DISABLE-NEXT: bhi .LBB0_7			; DISABLE-NEXT: bhi .LBB0_7
	; DISABLE-NEXT: @ %bb.14: @ %while.body24.preheader			; DISABLE-NEXT: @ %bb.14: @ %while.body24.preheader
	; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; DISABLE-NEXT: sub r3, r3, #2			; DISABLE-NEXT: sub lr, r3, #2
	; DISABLE-NEXT: .LBB0_15: @ %while.body24			; DISABLE-NEXT: .LBB0_15: @ %while.body24
	; DISABLE-NEXT: @ Parent Loop BB0_7 Depth=1			; DISABLE-NEXT: @ Parent Loop BB0_7 Depth=1
	; DISABLE-NEXT: @ => This Inner Loop Header: Depth=2			; DISABLE-NEXT: @ => This Inner Loop Header: Depth=2
	; DISABLE-NEXT: mov r0, r3			; DISABLE-NEXT: mov r0, lr
	; DISABLE-NEXT: cmp r3, r2			; DISABLE-NEXT: cmp lr, r2
	; DISABLE-NEXT: bls .LBB0_7			; DISABLE-NEXT: bls .LBB0_7
	; DISABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge			; DISABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge
	; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; DISABLE-NEXT: mov r3, r0			; DISABLE-NEXT: mov lr, r0
	; DISABLE-NEXT: ldrsb lr, [r3], #-1			; DISABLE-NEXT: ldrb r12, [lr], #-1
	; DISABLE-NEXT: cmn lr, #1			; DISABLE-NEXT: sxtb r3, r12
	; DISABLE-NEXT: uxtb r12, lr			; DISABLE-NEXT: cmn r3, #1
	; DISABLE-NEXT: bgt .LBB0_7			; DISABLE-NEXT: bgt .LBB0_7
	; DISABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge			; DISABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge
	; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; DISABLE-NEXT: cmp r12, #192			; DISABLE-NEXT: cmp r12, #192
	; DISABLE-NEXT: blo .LBB0_15			; DISABLE-NEXT: blo .LBB0_15
	; DISABLE-NEXT: b .LBB0_7			; DISABLE-NEXT: b .LBB0_7
	; DISABLE-NEXT: .LBB0_18:			; DISABLE-NEXT: .LBB0_18:
	; DISABLE-NEXT: mov r0, r3			; DISABLE-NEXT: mov r0, r3
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/select-imm.ll

	Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
	}			}

	define void @t9(i8* %a, i8 %b) {			define void @t9(i8* %a, i8 %b) {
	entry:			entry:

	; ARM scheduler emits icmp/zext before both calls, so isn't relevant			; ARM scheduler emits icmp/zext before both calls, so isn't relevant

	; ARMT2-LABEL: t9:			; ARMT2-LABEL: t9:
				; ARMT2: .save {r4, lr}
				; ARMT2: push {r4, lr}
				; ARMT2: ldrb r4, [r0]
				; ARMT2: mov r0, #1
	; ARMT2: bl f			; ARMT2: bl f
	; ARMT2: uxtb r0, r4			; ARMT2: cmp r4, r4
	; ARMT2: cmp r0, r0			; ARMT2: popne {r4, pc}
	; ARMT2: add r1, r4, #1			; ARMT2: .LBB8_1:
	; ARMT2: mov r2, r0			; ARMT2: sxtb r0, r4
	; ARMT2: add r2, r2, #1			; ARMT2: add r0, r0, #1
				; ARMT2: mov r1, r4
				; ARMT2: .LBB8_2:
	; ARMT2: add r1, r1, #1			; ARMT2: add r1, r1, #1
	; ARMT2: uxtb r3, r2			; ARMT2: add r0, r0, #1
	; ARMT2: cmp r3, r0			; ARMT2: uxtb r2, r1
				; ARMT2: cmp r2, r4
				; ARMT2: blt .LBB8_2
				; ARMT2: pop {r4, pc}
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions For reference, this test has changed as follows: ; OLD NEW ; .save {r4, lr} .save {r4, lr} ; push {r4, lr} push {r4, lr} ; ldrsb r4, [r0] ldrb r4, [r0] ; mov r0, #1 mov r0, #1 ; bl f bl f ; uxtb r0, r4 cmp r4, r4 ; cmp r0, r0 popne {r4, pc} ; popne {r4, pc} .LBB0_1: ; .LBB0_1: sxtb r0, r4 ; add r1, r4, #1 add r0, r0, #1 ; mov r2, r0 mov r1, r4 ; .LBB0_2: .LBB0_2: ; add r2, r2, #1 add r1, r1, #1 ; add r1, r1, #1 add r0, r0, #1 ; uxtb r3, r2 uxtb r2, r1 ; cmp r3, r0 cmp r2, r4 ; blt .LBB0_2 blt .LBB0_2 ; pop {r4, pc} pop {r4, pc} fpetrogalli: For reference, this test has changed as follows: ``` ; OLD NEW…

	; THUMB1-LABEL: t9:			; THUMB1-LABEL: t9:
				; THUMB1: .save {r4, lr}
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions For reference, this output has changed as follows: ; OLD NEW ; .save {r4, lr} .save {r4, lr} ; push {r4, lr} push {r4, lr} ; movs r2, #0 ldrb r4, [r0] ; rsbs r1, r2, #0 movs r0, #1 ; adcs r1, r2 bl f ; ldrb r4, [r0] cmp r4, r4 ; mov r0, r1 bne .LBB0_3 ; bl f sxtb r0, r4 ; sxtb r1, r4 adds r0, r0, #1 ; uxtb r0, r1 mov r1, r4 ; cmp r0, r0 .LBB0_2: ; bne .LBB0_3 adds r0, r0, #1 ; adds r1, r1, #1 adds r1, r1, #1 ; mov r2, r0 uxtb r2, r1 ; .LBB0_2: cmp r2, r4 ; adds r1, r1, #1 blt .LBB0_2 ; adds r2, r2, #1 .LBB0_3: ; uxtb r3, r2 pop {r4, pc} ; cmp r3, r0 ; blt .LBB0_2 ; .LBB0_3: ; pop {r4, pc} fpetrogalli: For reference, this output has changed as follows: ``` ; OLD NEW…
				; THUMB1: push {r4, lr}
				; THUMB1: ldrb r4, [r0]
				; THUMB1: movs r0, #1
	; THUMB1: bl f			; THUMB1: bl f
	; THUMB1: sxtb r1, r4			; THUMB1: cmp r4, r4
	; THUMB1: uxtb r0, r1			; THUMB1: bne .LBB8_3
	; THUMB1: cmp r0, r0			; THUMB1: sxtb r0, r4
				; THUMB1: adds r0, r0, #1
				; THUMB1: mov r1, r4
				; THUMB1: .LBB8_2:
				; THUMB1: adds r0, r0, #1
	; THUMB1: adds r1, r1, #1			; THUMB1: adds r1, r1, #1
	; THUMB1: mov r2, r0			; THUMB1: uxtb r2, r1
	; THUMB1: adds r1, r1, #1			; THUMB1: cmp r2, r4
	; THUMB1: adds r2, r2, #1			; THUMB1: blt .LBB8_2
	; THUMB1: uxtb r3, r2			; THUMB1: .LBB8_3:
	; THUMB1: cmp r3, r0			; THUMB1: pop {r4, pc}

	; THUMB2-LABEL: t9:			; THUMB2-LABEL: t9:
				; THUMB2: .save {r4, lr}
				fpetrogalliAuthorUnsubmitted Done Reply Inline Actions For reference, this has changed as follows: ; OLD NEW ; .save {r4, lr} .save {r4, lr} ; push {r4, lr} push {r4, lr} ; ldrsb.w r4, [r0] ldrb r4, [r0] ; movs r0, #1 movs r0, #1 ; bl f bl f ; uxtb r0, r4 cmp r4, r4 ; cmp r0, r0 it ne ; it ne popne {r4, pc} ; popne {r4, pc} .LBB0_1: ; .LBB0_1: sxtb r0, r4 ; adds r1, r4, #1 adds r0, #1 ; mov r2, r0 mov r1, r4 ; .LBB0_2: .LBB0_2: ; adds r2, #1 adds r1, #1 ; adds r1, #1 adds r0, #1 ; uxtb r3, r2 uxtb r2, r1 ; cmp r3, r0 cmp r2, r4 ; blt .LBB0_2 blt .LBB0_2 ; pop {r4, pc} pop {r4, pc} fpetrogalli: For reference, this has changed as follows: ``` ; OLD NEW ; .
				; THUMB2: push {r4, lr}
				; THUMB2: ldrb r4, [r0]
				; THUMB2: movs r0, #1
	; THUMB2: bl f			; THUMB2: bl f
	; THUMB2: uxtb r0, r4			; THUMB2: cmp r4, r4
	; THUMB2: cmp r0, r0			; THUMB2: it ne
	; THUMB2: adds r1, r4, #1			; THUMB2: popne {r4, pc}
	; THUMB2: mov r2, r0			; THUMB2: .LBB8_1:
	; THUMB2: adds r2, #1			; THUMB2: sxtb r0, r4
				; THUMB2: adds r0, #1
				; THUMB2: mov r1, r4
				; THUMB2: .LBB8_2:
	; THUMB2: adds r1, #1			; THUMB2: adds r1, #1
	; THUMB2: uxtb r3, r2			; THUMB2: adds r0, #1
	; THUMB2: cmp r3, r0			; THUMB2: uxtb r2, r1
				; THUMB2: cmp r2, r4
				; THUMB2: blt .LBB8_2
				; THUMB2: pop {r4, pc}

	%0 = load i8, i8* %a			%0 = load i8, i8* %a
	%conv = sext i8 %0 to i32			%conv = sext i8 %0 to i32
	%conv119 = zext i8 %0 to i32			%conv119 = zext i8 %0 to i32
	%conv522 = and i32 %conv, 255			%conv522 = and i32 %conv, 255
	%cmp723 = icmp eq i32 %conv522, %conv119			%cmp723 = icmp eq i32 %conv522, %conv119
	tail call void @f(i1 zeroext %cmp723)			tail call void @f(i1 zeroext %cmp723)
	br i1 %cmp723, label %while.body, label %while.end			br i1 %cmp723, label %while.body, label %while.end
	▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][AArch64] Prevent spurious zero extension.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 302257

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll

llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll

llvm/test/CodeGen/ARM/select-imm.ll

[llvm][AArch64] Prevent spurious zero extension.
AbandonedPublic