This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
8/8
DAGCombiner.cpp
-
Target/AArch64/
-
AArch64/
3/3
AArch64ISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1/1
zext-and-signed-compare.ll
-
ARM/
-
arm-shrink-wrapping-linux.ll

Differential D90162

[llvm][AArch64] Prevent spurious zero extension.
AbandonedPublic

Authored by fpetrogalli on Oct 26 2020, 8:39 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
efriedma
samparker
spatel
peterwaller-arm

Summary

This patch prevents generating a spurious zero extension of a sign
extended load, when the only use of the signed value is a comparison
that tests the sign bit of the signed extended value.

Now the compiler generates a zero extended load directly, and compares
the sign bit of the original unextended load instead of the sign
extended one.

The output code of (some of) the tests before and after the patch
looks as follows.

BEFORE:                          |  AFTER:
f_i32_i8:                        |  f_i32_i8:
        ldrsb   w9, [x0]         |      ldrb    w8, [x0]
        and     w8, w9, #0xff    |      tbnz    w8, #7, .LBB0_2
        tbnz    w9, #31, .LBB0_2 |      add     w0, w8, w8
        add     w0, w8, w8       |      ret
        ret                      |  .LBB0_2:
.LBB0_2:                         |      mul     w0, w8, w8
        mul     w0, w8, w8       |      ret
        ret                      |
                                 |
g_i32_i16:                       |  g_i32_i16:
        ldrsh   w8, [x0]         |      ldrh    w0, [x0]
        and     w0, w8, #0xffff  |      tbnz    w0, #15, .LBB3_2
        tbnz    w8, #31, .LBB3_2 |      ret
        ret                      |  .LBB3_2:
.LBB3_2:                         |      lsl     w0, w0, #1
        lsl     w0, w0, #1       |      ret
        ret                      |

Notes:

There is no code-size degradation in the tests modified in
llvm/test/CodeGen/ARM/select-imm.ll

In particular, the THUMB1 test in there have gone through the
follow improvement:

BEFORE                        |  AFTER
t9:                           |  t9:
        .fnstart              |          .fnstart
        .save   {r4, lr}      |          .save   {r4, lr}
        push    {r4, lr}      |          push    {r4, lr}
        ldrb    r4, [r0]      |          ldrb    r4, [r0]
        movs    r0, #1        |          movs    r0, #1
        bl      f             |          bl      f
        sxtb    r1, r4        |          cmp     r4, r4
        uxtb    r0, r1        |          bne     .LBB0_3
        cmp     r0, r0        |          sxtb    r0, r4
        bne     .LBB8_3       |          adds    r0, r0, #1
        adds    r1, r1, #1    |          mov     r1, r4     .LBB0_2:
        mov     r2, r0        |  .LBB8_2:
.LBB8_2:                      |          adds    r0, r0, #1
        adds    r1, r1, #1    |          adds    r1, r1, #1
        adds    r2, r2, #1    |          uxtb    r2, r1
        uxtb    r3, r2        |          cmp     r2, r4
        cmp     r3, r0        |          blt     .LBB8_2    .LBB0_3:
        blt     .LBB8_2       |  .LBB8_3:
.LBB8_3:                      |          pop     {r4, pc}
        pop     {r4, pc}      |

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	420 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
	300 ms	windows > lld.ELF/invalid::symtab-sh-info.s

Event Timeline

fpetrogalli created this revision.Oct 26 2020, 8:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2020, 8:39 AM

Herald added subscribers: llvm-commits, ecnelises, danielkiss and 2 others. · View Herald Transcript

fpetrogalli requested review of this revision.Oct 26 2020, 8:39 AM

fpetrogalli edited the summary of this revision. (Show Details)Oct 26 2020, 8:43 AM

Harbormaster completed remote builds in B76412: Diff 300686.Oct 26 2020, 9:15 AM

fpetrogalli added reviewers: samparker, spatel.Oct 27 2020, 3:14 AM

samparker added inline comments.Oct 27 2020, 7:44 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1171 ↗	(On Diff #300686)	There's so much existing logic around extensions and load widths that I'm struggling to believe that this is really needed... Maybe DAGCombiner::isAndLoadExtLoad can help instead?

I have removed the method SelectionDAG::isZeroExtendInReg in favour of using the machinery already available in DAGCombiner.

fpetrogalli marked an inline comment as done.Oct 28 2020, 3:36 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1171 ↗	(On Diff #300686)	Thank you for pointing this out!

Harbormaster completed remote builds in B76698: Diff 301218.Oct 28 2020, 4:45 AM

samparker added inline comments.Oct 28 2020, 6:50 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11300	No need for this-> now.
11314	Why are we only caring about those cases, couldn't this generally help mixed types too?
11318	We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check again. Also, are UseOne and UseTwo guaranteed to be ordered the way you're expecting here? Maybe just iterate through all the uses looking for IsZeroExtInReg?

peterwaller-arm added inline comments.Oct 28 2020, 7:34 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11282	s/that realize/implementing/ or "that realizes" or "that implements".
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5912	I realise this problem is inherited from the existing code. "Mask" seems a confusing name given that it's a bit position. Perhaps "QueryBit", "TestBit", "BitToTest" or similar might be a better name? (Applies above too)
5915	s/getSizeInBits().getFixedSize()/getFixedSizeInBits()/? (Applies above too)
llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll
3	Could do with a comment to explain the purpose of the test.

fpetrogalli marked 6 inline comments as done.Oct 28 2020, 8:27 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11314	I have removed this extra check. I originally added it with the intention to write a test case that was doing the following: ` %x = load i8 %A = zext %x to i64 ... use %A %cmp = sgt i8 %x, -1 But the output code with and without this extra check don't differ at all, so I think this check is not necessary.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
5912	I agree with you. I have gone for: `s/Mask/SignBitPos/`

fpetrogalli retitled this revision from [llvm][AArch64] Prevent spurious zero extention. to [llvm][AArch64] Prevent spurious zero extension..Oct 28 2020, 8:27 AM

Address all review comments but one (still working on the version of UsesDifferInSignExtension that doesn't care about the order of the uses).

Harbormaster completed remote builds in B76734: Diff 301296.Oct 28 2020, 9:06 AM

peterwaller-arm added inline comments.Oct 28 2020, 9:22 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11282

fpetrogalli edited the summary of this revision. (Show Details)Oct 29 2020, 9:15 AM

I have addressed the comment from @samparker about making the check in
UsesDifferInSignExtension independent on the order of the uses. The
code now uses llvm::any_of on all uses.

The request from Sam produced changes in the ARM backend, which
require to introduce an extra combine for AND to make sure no code
size regressions were introduced. The details of the changes are
explained in the summary.

fpetrogalli marked an inline comment as done.Oct 29 2020, 9:22 AM

Harbormaster completed remote builds in B76930: Diff 301652.Oct 29 2020, 10:34 AM

fixed typo
described tests

s/Mask/SignBitPos/ in the code I changed. Revert the change in a place were Mask was accidentally renamed.

fpetrogalli added inline comments.Oct 29 2020, 11:13 AM

llvm/test/CodeGen/ARM/select-imm.ll

221 ↗

(On Diff #301691)

For reference, this test has changed as follows:

;       OLD                     NEW
;       .save   {r4, lr}        .save   {r4, lr}
;       push    {r4, lr}        push    {r4, lr}
;       ldrsb   r4, [r0]        ldrb    r4, [r0]
;       mov     r0, #1          mov     r0, #1
;       bl      f               bl      f
;       uxtb    r0, r4          cmp     r4, r4
;       cmp     r0, r0          popne   {r4, pc}
;       popne   {r4, pc}     .LBB0_1:
; .LBB0_1:                      sxtb    r0, r4
;       add     r1, r4, #1      add     r0, r0, #1
;       mov     r2, r0          mov     r1, r4
; .LBB0_2:                   .LBB0_2:
;       add     r2, r2, #1      add     r1, r1, #1
;       add     r1, r1, #1      add     r0, r0, #1
;       uxtb    r3, r2          uxtb    r2, r1
;       cmp     r3, r0          cmp     r2, r4
;       blt     .LBB0_2         blt     .LBB0_2
;       pop     {r4, pc}        pop     {r4, pc}

240 ↗

(On Diff #301691)

For reference, this output has changed as follows:

; OLD                            NEW
;         .save   {r4, lr}               .save   {r4, lr}
;         push    {r4, lr}               push    {r4, lr}
;         movs    r2, #0                 ldrb    r4, [r0]
;         rsbs    r1, r2, #0             movs    r0, #1
;         adcs    r1, r2                 bl      f
;         ldrb    r4, [r0]               cmp     r4, r4
;         mov     r0, r1                 bne     .LBB0_3
;         bl      f                      sxtb    r0, r4
;         sxtb    r1, r4                 adds    r0, r0, #1
;         uxtb    r0, r1                 mov     r1, r4
;         cmp     r0, r0         .LBB0_2:
;         bne     .LBB0_3                adds    r0, r0, #1
;         adds    r1, r1, #1             adds    r1, r1, #1
;         mov     r2, r0                 uxtb    r2, r1
; .LBB0_2:                               cmp     r2, r4
;         adds    r1, r1, #1             blt     .LBB0_2
;         adds    r2, r2, #1     .LBB0_3:
;         uxtb    r3, r2                 pop     {r4, pc}
;         cmp     r3, r0
;         blt     .LBB0_2
; .LBB0_3:
;         pop     {r4, pc}

260 ↗

(On Diff #301691)

For reference, this has changed as follows:

; OLD                         NEW
;         .save   {r4, lr}            .save   {r4, lr}
;         push    {r4, lr}            push    {r4, lr}
;         ldrsb.w r4, [r0]            ldrb    r4, [r0]
;         movs    r0, #1              movs    r0, #1
;         bl      f                   bl      f
;         uxtb    r0, r4              cmp     r4, r4
;         cmp     r0, r0              it      ne
;         it      ne                  popne   {r4, pc}
;         popne   {r4, pc}    .LBB0_1:
; .LBB0_1:                            sxtb    r0, r4
;         adds    r1, r4, #1          adds    r0, #1
;         mov     r2, r0              mov     r1, r4
; .LBB0_2:                    .LBB0_2:
;         adds    r2, #1              adds    r1, #1
;         adds    r1, #1              adds    r0, #1
;         uxtb    r3, r2              uxtb    r2, r1
;         cmp     r3, r0              cmp     r2, r4
;         blt     .LBB0_2             blt     .LBB0_2
;         pop     {r4, pc}            pop     {r4, pc}

This looks like it should be at least two separate changes; I think the AArch64ISelLowering change should have some impact on its own.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5692	Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose the bitmask is equal to one,

Harbormaster completed remote builds in B76947: Diff 301691.Oct 29 2020, 11:45 AM

Harbormaster completed remote builds in B76946: Diff 301684.Oct 29 2020, 12:01 PM

Updating the patch after splitting it in three patches. I'll link the remeining here after publishing them.

fpetrogalli added parent revisions: D90605: [llvm][AArch64] Simplify (and (sign_extend..) #bitmask)., D90606: [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests..Nov 2 2020, 6:43 AM

fpetrogalli edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B77244: Diff 302257.Nov 2 2020, 6:49 AM

fpetrogalli marked an inline comment as done.Nov 2 2020, 8:00 AM

fpetrogalli added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5692	(notice that this inline comment now applies to D90605. @efriedma - I did that and introduced the extra truncate that is needed before the zero extend, with `BitMaskVT` being set accordingly to the mask. EVT BitMaskVT; if (IsAndZeroExtMask(N0, N1, BitMaskVT)) return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, DAG.getNode(SDLoc(N), BitMaskVTN0.getOperand(0))); This generates a loop in the DAGCOmbine because now the zero extend + truncate combination is rendered to the same AND + BITMASK it is trying to replace. Do you think it is worth investigating more?

Parking this for the moment, as we need to refine the heuristic to decide when to prevent the spurious zero extension.

Thank you all for the reviews!

peterwaller-arm resigned from this revision.Mar 31 2021, 2:20 AM

fpetrogalli abandoned this revision.Jan 17 2023, 3:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 3:12 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

48 lines

Target/

AArch64/

AArch64ISelLowering.cpp

20 lines

test/

CodeGen/

AArch64/

zext-and-signed-compare.ll

94 lines

ARM/

arm-shrink-wrapping-linux.ll

28 lines

Diff 301218

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,683 Lines • ▼ Show 20 Lines SDValue DAGCombiner::MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,

EVT VT = N->getValueType(0); EVT VT = N->getValueType(0);

if (VT != MVT::i64 && VT != MVT::i32 && VT != MVT::i16) if (VT != MVT::i64 && VT != MVT::i32 && VT != MVT::i16)

return SDValue(); return SDValue();

if (!TLI.isOperationLegalOrCustom(ISD::BSWAP, VT)) if (!TLI.isOperationLegalOrCustom(ISD::BSWAP, VT))

return SDValue(); return SDValue();

// Recognize (and (shl a, 8), 0xff00), (and (srl a, 8), 0xff) // Recognize (and (shl a, 8), 0xff00), (and (srl a, 8), 0xff)

bool LookPassAnd0 = false; bool LookPassAnd0 = false;

bool LookPassAnd1 = false; bool LookPassAnd1 = false;

efriedmaUnsubmitted

Done

Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose the bitmask is equal to one,

efriedma: Shouldn't this be a less-than comparison as opposed to exact equality? For example, suppose…

fpetrogalliAuthorUnsubmitted

Done

(notice that this inline comment now applies to D90605.

@efriedma - I did that and introduced the extra truncate that is needed before the zero extend, with BitMaskVT being set accordingly to the mask.

EVT BitMaskVT;
if (IsAndZeroExtMask(N0, N1, BitMaskVT))
    return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, DAG.getNode(SDLoc(N), BitMaskVTN0.getOperand(0)));

This generates a loop in the DAGCOmbine because now the zero extend + truncate combination is rendered to the same AND + BITMASK it is trying to replace.

Do you think it is worth investigating more?

fpetrogalli: (notice that this inline comment now applies to D90605. @efriedma - I did that and introduced…

if (N0.getOpcode() == ISD::AND && N0.getOperand(0).getOpcode() == ISD::SRL) if (N0.getOpcode() == ISD::AND && N0.getOperand(0).getOpcode() == ISD::SRL)

std::swap(N0, N1); std::swap(N0, N1);

if (N1.getOpcode() == ISD::AND && N1.getOperand(0).getOpcode() == ISD::SHL) if (N1.getOpcode() == ISD::AND && N1.getOperand(0).getOpcode() == ISD::SHL)

std::swap(N0, N1); std::swap(N0, N1);

if (N0.getOpcode() == ISD::AND) { if (N0.getOpcode() == ISD::AND) {

if (!N0.getNode()->hasOneUse()) if (!N0.getNode()->hasOneUse())

return SDValue(); return SDValue();

ConstantSDNode *N01C = dyn_cast<ConstantSDNode>(N0.getOperand(1)); ConstantSDNode *N01C = dyn_cast<ConstantSDNode>(N0.getOperand(1));

▲ Show 20 Lines • Show All 5,573 Lines • ▼ Show 20 Lines if (auto *ShAmt = dyn_cast<ConstantSDNode>(N0.getOperand(1)))

// extended enough. // extended enough.

unsigned InSignBits = DAG.ComputeNumSignBits(N0.getOperand(0)); unsigned InSignBits = DAG.ComputeNumSignBits(N0.getOperand(0));

if (((VTBits - ExtVTBits) - ShAmt->getZExtValue()) < InSignBits) if (((VTBits - ExtVTBits) - ShAmt->getZExtValue()) < InSignBits)

return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0.getOperand(0), return DAG.getNode(ISD::SRA, SDLoc(N), VT, N0.getOperand(0),

N0.getOperand(1)); N0.getOperand(1));

} }

// Finds the pattern that realize the zero extension inreg for

peterwaller-armUnsubmitted

Done

s/that realize/implementing/ or "that realizes" or "that implements".

peterwaller-arm: s/that realize/implementing/ or "that realizes" or "that implements".

peterwaller-armUnsubmitted

Done

}

- // Finds the pattern implememting the zero extension inreg for

+ // Finds the pattern implementing the zero extension inreg for

// illegal values, which is rendered with an and instruction with a

peterwaller-arm:

// illegal values, which is rendered with an and instruction with a

// bit mask. For example, the node for zero extenting the load of an

// i8 value into a i32 value is rendered as:

// i32 = (and (load i8) 0xff)

auto IsZeroExtInReg = [this](SDNode *N) -> bool {

if (N->getOpcode() != ISD::AND)

return false;

auto *AndC = dyn_cast<ConstantSDNode>(N->getOperand(1));

auto *LoadN = dyn_cast<LoadSDNode>(N->getOperand(0));

if (!AndC || !LoadN)

return false;

EVT LoadResultTy = LoadN->getMemoryVT();

EVT ExtVT;

return this->isAndLoadExtLoad(AndC, LoadN, LoadResultTy, ExtVT);

samparkerUnsubmitted

Done

No need for this-> now.

samparker: No need for this-> now.

};

// Checks if the two uses of a load are extensions that differ in

// signedness.

auto UsesDifferInSignExtension = [&IsZeroExtInReg](LoadSDNode *Load) -> bool {

if (Load->use_size() != 2)

return false;

SDNode::use_iterator UseIt = Load->use_begin();

SDNode *UseOne = *UseIt;

SDNode *UseTwo = *++UseIt;

// We care about the two different sign extension only if they are

// extending to the same EVT.

samparkerUnsubmitted

Done

Why are we only caring about those cases, couldn't this generally help mixed types too?

samparker: Why are we only caring about those cases, couldn't this generally help mixed types too?

fpetrogalliAuthorUnsubmitted

Done

I have removed this extra check. I originally added it with the intention to write a test case that was doing the following:

`
%x =  load i8
%A = zext %x to i64
... use %A
%cmp = sgt i8 %x, -1

But the output code with and without this extra check don't differ at all, so I think this check is not necessary.

fpetrogalli: I have removed this extra check. I originally added it with the intention to write a test case…

if (UseOne->getValueType(0) != UseTwo->getValueType(0))

return false;

if (UseOne->getOpcode() == ISD::SIGN_EXTEND_INREG && IsZeroExtInReg(UseTwo))

samparkerUnsubmitted

Done

We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check again. Also, are UseOne and UseTwo guaranteed to be ordered the way you're expecting here? Maybe just iterate through all the uses looking for IsZeroExtInReg?

samparker: We already know that at least one use is SIGN_EXTEND_INREG node, so we shouldn't need to check…

return true;

return false;

};

// fold (sext_inreg (extload x)) -> (sextload x) // fold (sext_inreg (extload x)) -> (sextload x)

// If sextload is not supported by target, we can only do the combine when // If sextload is not supported by target, we can only do the combine when

// load has one use. Doing otherwise can block folding the extload with other // load has one use. Doing otherwise can block folding the extload with other

// extends that the target does support. // extends that the target does support.

if (ISD::isEXTLoad(N0.getNode()) && if (ISD::isEXTLoad(N0.getNode()) && ISD::isUNINDEXEDLoad(N0.getNode()) &&

ISD::isUNINDEXEDLoad(N0.getNode()) &&

ExtVT == cast<LoadSDNode>(N0)->getMemoryVT() && ExtVT == cast<LoadSDNode>(N0)->getMemoryVT() &&

((!LegalOperations && cast<LoadSDNode>(N0)->isSimple() && ((!LegalOperations && cast<LoadSDNode>(N0)->isSimple() &&

N0.hasOneUse()) || N0.hasOneUse()) ||

TLI.isLoadExtLegal(ISD::SEXTLOAD, VT, ExtVT))) { TLI.isLoadExtLegal(ISD::SEXTLOAD, VT, ExtVT)) &&

!UsesDifferInSignExtension(cast<LoadSDNode>(N0))) {

LoadSDNode *LN0 = cast<LoadSDNode>(N0); LoadSDNode *LN0 = cast<LoadSDNode>(N0);

SDValue ExtLoad = DAG.getExtLoad(ISD::SEXTLOAD, SDLoc(N), VT, SDValue ExtLoad = DAG.getExtLoad(ISD::SEXTLOAD, SDLoc(N), VT,

LN0->getChain(), LN0->getChain(),

LN0->getBasePtr(), ExtVT, LN0->getBasePtr(), ExtVT,

LN0->getMemOperand()); LN0->getMemOperand());

CombineTo(N, ExtLoad); CombineTo(N, ExtLoad);

CombineTo(N0.getNode(), ExtLoad, ExtLoad.getValue(1)); CombineTo(N0.getNode(), ExtLoad, ExtLoad.getValue(1));

AddToWorklist(ExtLoad.getNode()); AddToWorklist(ExtLoad.getNode());

▲ Show 20 Lines • Show All 11,125 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,880 Lines • ▼ Show 20 Lines	if (RHSC && RHSC->getZExtValue() == 0 && ProduceNonFlagSettingCondBr) {
}		}

return DAG.getNode(AArch64ISD::CBNZ, dl, MVT::Other, Chain, LHS, Dest);		return DAG.getNode(AArch64ISD::CBNZ, dl, MVT::Other, Chain, LHS, Dest);
} else if (CC == ISD::SETLT && LHS.getOpcode() != ISD::AND) {		} else if (CC == ISD::SETLT && LHS.getOpcode() != ISD::AND) {
// Don't combine AND since emitComparison converts the AND to an ANDS		// Don't combine AND since emitComparison converts the AND to an ANDS
// (a.k.a. TST) and the test in the test bit and branch instruction		// (a.k.a. TST) and the test in the test bit and branch instruction
// becomes redundant. This would also increase register pressure.		// becomes redundant. This would also increase register pressure.
uint64_t Mask = LHS.getValueSizeInBits() - 1;		uint64_t Mask = LHS.getValueSizeInBits() - 1;
		// If LHS is a sext_inreg, we can check the sign bit of the
		// original unextended data.
		if (LHS.getOpcode() == ISD::SIGN_EXTEND_INREG) {
		Mask = cast<VTSDNode>(LHS.getOperand(1))
		->getVT()
		.getSizeInBits()
		.getFixedSize() -
		1;
		LHS = LHS.getOperand(0);
		}
return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, LHS,		return DAG.getNode(AArch64ISD::TBNZ, dl, MVT::Other, Chain, LHS,
DAG.getConstant(Mask, dl, MVT::i64), Dest);		DAG.getConstant(Mask, dl, MVT::i64), Dest);
}		}
}		}
if (RHSC && RHSC->getSExtValue() == -1 && CC == ISD::SETGT &&		if (RHSC && RHSC->getSExtValue() == -1 && CC == ISD::SETGT &&
LHS.getOpcode() != ISD::AND && ProduceNonFlagSettingCondBr) {		LHS.getOpcode() != ISD::AND && ProduceNonFlagSettingCondBr) {
// Don't combine AND since emitComparison converts the AND to an ANDS		// Don't combine AND since emitComparison converts the AND to an ANDS
// (a.k.a. TST) and the test in the test bit and branch instruction		// (a.k.a. TST) and the test in the test bit and branch instruction
// becomes redundant. This would also increase register pressure.		// becomes redundant. This would also increase register pressure.
uint64_t Mask = LHS.getValueSizeInBits() - 1;		uint64_t Mask = LHS.getValueSizeInBits() - 1;
		// If LHS is a sext_inreg, we can check the sign bit of the
		// original unextended data.
		if (LHS.getOpcode() == ISD::SIGN_EXTEND_INREG) {
		Mask = cast<VTSDNode>(LHS.getOperand(1))
		peterwaller-armUnsubmitted Done Reply Inline Actions I realise this problem is inherited from the existing code. "Mask" seems a confusing name given that it's a bit position. Perhaps "QueryBit", "TestBit", "BitToTest" or similar might be a better name? (Applies above too) peterwaller-arm: I realise this problem is inherited from the existing code. "Mask" seems a confusing name given…
		fpetrogalliAuthorUnsubmitted Done Reply Inline Actions I agree with you. I have gone for: `s/Mask/SignBitPos/` fpetrogalli: I agree with you. I have gone for: `s/Mask/SignBitPos/`
		->getVT()
		.getSizeInBits()
		.getFixedSize() -
		peterwaller-armUnsubmitted Done Reply Inline Actions s/getSizeInBits().getFixedSize()/getFixedSizeInBits()/? (Applies above too) peterwaller-arm: s/getSizeInBits().getFixedSize()/getFixedSizeInBits()/? (Applies above too)
		1;
		LHS = LHS.getOperand(0);
		}
return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, LHS,		return DAG.getNode(AArch64ISD::TBZ, dl, MVT::Other, Chain, LHS,
DAG.getConstant(Mask, dl, MVT::i64), Dest);		DAG.getConstant(Mask, dl, MVT::i64), Dest);
}		}

SDValue CCVal;		SDValue CCVal;
SDValue Cmp = getAArch64Cmp(LHS, RHS, CC, CCVal, DAG, dl);		SDValue Cmp = getAArch64Cmp(LHS, RHS, CC, CCVal, DAG, dl);
return DAG.getNode(AArch64ISD::BRCOND, dl, MVT::Other, Chain, Dest, CCVal,		return DAG.getNode(AArch64ISD::BRCOND, dl, MVT::Other, Chain, Dest, CCVal,
Cmp);		Cmp);
▲ Show 20 Lines • Show All 10,467 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll

This file was added.

				; RUN: llc -mtriple aarch64-linux-gnu -o - -asm-verbose=0 < %s \| FileCheck %s

				define i32 @f_i32_i8(i8* %p) nounwind {
				peterwaller-armUnsubmitted Done Reply Inline Actions Could do with a comment to explain the purpose of the test. peterwaller-arm: Could do with a comment to explain the purpose of the test.
				; CHECK-LABEL: f_i32_i8:
				; CHECK-NEXT: ldrb w[[N:[0-9]+]], [x0]
				; CHECK-NEXT: tbnz w[[N]], #7, .LBB[[BB:.*]]
				; CHECK-NEXT: add w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: mul w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				entry:
				%0 = load i8, i8* %p
				%conv = zext i8 %0 to i32
				%cmp = icmp sgt i8 %0, -1
				br i1 %cmp, label %A, label %B

				A:
				%retval2 = add i32 %conv, %conv
				ret i32 %retval2

				B:
				%retval1 = mul i32 %conv, %conv
				ret i32 %retval1
				}

				define i32 @f_i32_i16(i16* %p) nounwind {
				; CHECK-LABEL: f_i32_i16:
				; CHECK-NEXT: ldrh w[[N:[0-9]+]], [x0]
				; CHECK-NEXT: tbnz w[[N]], #15, .LBB[[BB:.*]]
				; CHECK-NEXT: add w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: mul w0, w[[N]], w[[N]]
				; CHECK-NEXT: ret
				entry:
				%0 = load i16, i16* %p
				%conv = zext i16 %0 to i32
				%cmp = icmp sgt i16 %0, -1
				br i1 %cmp, label %A, label %B

				A:
				%retval2 = add i32 %conv, %conv
				ret i32 %retval2

				B:
				%retval1 = mul i32 %conv, %conv
				ret i32 %retval1
				}

				define i32 @g_i32_i8(i8* %p) nounwind {
				; CHECK-LABEL: g_i32_i8:
				; CHECK-NEXT: ldrb w0, [x0]
				; CHECK-NEXT: tbnz w0, #7, .LBB[[BB:.*]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: lsl w0, w0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = load i8, i8* %p, align 1
				%conv = zext i8 %0 to i32
				%cmp1 = icmp sgt i8 %0, -1
				br i1 %cmp1, label %return, label %B

				B: ; preds = %entry
				%add = shl nuw nsw i32 %conv, 1
				ret i32 %add

				return: ; preds = %entry
				ret i32 %conv
				}

				define i32 @g_i32_i16(i16* %p) nounwind {
				; CHECK-LABEL: g_i32_i16:
				; CHECK-NEXT: ldrh w0, [x0]
				; CHECK-NEXT: tbnz w0, #15, .LBB[[BB:.*]]
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB[[BB]]
				; CHECK-NEXT: lsl w0, w0, #1
				; CHECK-NEXT: ret
				entry:
				%0 = load i16, i16* %p, align 1
				%conv = zext i16 %0 to i32
				%cmp1 = icmp sgt i16 %0, -1
				br i1 %cmp1, label %return, label %B

				B: ; preds = %entry
				%add = shl nuw nsw i32 %conv, 1
				ret i32 %add

				return: ; preds = %entry
				ret i32 %conv
				}

llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; ENABLE-NEXT: cmn lr, #1			; ENABLE-NEXT: cmn lr, #1
	; ENABLE-NEXT: bgt .LBB0_7			; ENABLE-NEXT: bgt .LBB0_7
	; ENABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader			; ENABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader
	; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; ENABLE-NEXT: cmp r12, #191			; ENABLE-NEXT: cmp r12, #191
	; ENABLE-NEXT: bhi .LBB0_7			; ENABLE-NEXT: bhi .LBB0_7
	; ENABLE-NEXT: @ %bb.14: @ %while.body24.preheader			; ENABLE-NEXT: @ %bb.14: @ %while.body24.preheader
	; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; ENABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; ENABLE-NEXT: sub r3, r3, #2			; ENABLE-NEXT: sub lr, r3, #2
	; ENABLE-NEXT: .LBB0_15: @ %while.body24			; ENABLE-NEXT: .LBB0_15: @ %while.body24
	; ENABLE-NEXT: @ Parent Loop BB0_7 Depth=1			; ENABLE-NEXT: @ Parent Loop BB0_7 Depth=1
	; ENABLE-NEXT: @ => This Inner Loop Header: Depth=2			; ENABLE-NEXT: @ => This Inner Loop Header: Depth=2
	; ENABLE-NEXT: mov r0, r3			; ENABLE-NEXT: mov r0, lr
	; ENABLE-NEXT: cmp r3, r2			; ENABLE-NEXT: cmp lr, r2
	; ENABLE-NEXT: bls .LBB0_7			; ENABLE-NEXT: bls .LBB0_7
	; ENABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge			; ENABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge
	; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; ENABLE-NEXT: mov r3, r0			; ENABLE-NEXT: mov lr, r0
	; ENABLE-NEXT: ldrsb lr, [r3], #-1			; ENABLE-NEXT: ldrb r12, [lr], #-1
	; ENABLE-NEXT: cmn lr, #1			; ENABLE-NEXT: sxtb r3, r12
	; ENABLE-NEXT: uxtb r12, lr			; ENABLE-NEXT: cmn r3, #1
	; ENABLE-NEXT: bgt .LBB0_7			; ENABLE-NEXT: bgt .LBB0_7
	; ENABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge			; ENABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge
	; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; ENABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; ENABLE-NEXT: cmp r12, #192			; ENABLE-NEXT: cmp r12, #192
	; ENABLE-NEXT: blo .LBB0_15			; ENABLE-NEXT: blo .LBB0_15
	; ENABLE-NEXT: b .LBB0_7			; ENABLE-NEXT: b .LBB0_7
	; ENABLE-NEXT: .LBB0_18:			; ENABLE-NEXT: .LBB0_18:
	; ENABLE-NEXT: mov r0, r3			; ENABLE-NEXT: mov r0, r3
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; DISABLE-NEXT: cmn lr, #1			; DISABLE-NEXT: cmn lr, #1
	; DISABLE-NEXT: bgt .LBB0_7			; DISABLE-NEXT: bgt .LBB0_7
	; DISABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader			; DISABLE-NEXT: @ %bb.13: @ %land.rhs14.preheader
	; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; DISABLE-NEXT: cmp r12, #191			; DISABLE-NEXT: cmp r12, #191
	; DISABLE-NEXT: bhi .LBB0_7			; DISABLE-NEXT: bhi .LBB0_7
	; DISABLE-NEXT: @ %bb.14: @ %while.body24.preheader			; DISABLE-NEXT: @ %bb.14: @ %while.body24.preheader
	; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1			; DISABLE-NEXT: @ in Loop: Header=BB0_7 Depth=1
	; DISABLE-NEXT: sub r3, r3, #2			; DISABLE-NEXT: sub lr, r3, #2
	; DISABLE-NEXT: .LBB0_15: @ %while.body24			; DISABLE-NEXT: .LBB0_15: @ %while.body24
	; DISABLE-NEXT: @ Parent Loop BB0_7 Depth=1			; DISABLE-NEXT: @ Parent Loop BB0_7 Depth=1
	; DISABLE-NEXT: @ => This Inner Loop Header: Depth=2			; DISABLE-NEXT: @ => This Inner Loop Header: Depth=2
	; DISABLE-NEXT: mov r0, r3			; DISABLE-NEXT: mov r0, lr
	; DISABLE-NEXT: cmp r3, r2			; DISABLE-NEXT: cmp lr, r2
	; DISABLE-NEXT: bls .LBB0_7			; DISABLE-NEXT: bls .LBB0_7
	; DISABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge			; DISABLE-NEXT: @ %bb.16: @ %while.body24.land.rhs14_crit_edge
	; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; DISABLE-NEXT: mov r3, r0			; DISABLE-NEXT: mov lr, r0
	; DISABLE-NEXT: ldrsb lr, [r3], #-1			; DISABLE-NEXT: ldrb r12, [lr], #-1
	; DISABLE-NEXT: cmn lr, #1			; DISABLE-NEXT: sxtb r3, r12
	; DISABLE-NEXT: uxtb r12, lr			; DISABLE-NEXT: cmn r3, #1
	; DISABLE-NEXT: bgt .LBB0_7			; DISABLE-NEXT: bgt .LBB0_7
	; DISABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge			; DISABLE-NEXT: @ %bb.17: @ %while.body24.land.rhs14_crit_edge
	; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2			; DISABLE-NEXT: @ in Loop: Header=BB0_15 Depth=2
	; DISABLE-NEXT: cmp r12, #192			; DISABLE-NEXT: cmp r12, #192
	; DISABLE-NEXT: blo .LBB0_15			; DISABLE-NEXT: blo .LBB0_15
	; DISABLE-NEXT: b .LBB0_7			; DISABLE-NEXT: b .LBB0_7
	; DISABLE-NEXT: .LBB0_18:			; DISABLE-NEXT: .LBB0_18:
	; DISABLE-NEXT: mov r0, r3			; DISABLE-NEXT: mov r0, r3
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][AArch64] Prevent spurious zero extension.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 301218

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/zext-and-signed-compare.ll

llvm/test/CodeGen/ARM/arm-shrink-wrapping-linux.ll

[llvm][AArch64] Prevent spurious zero extension.
AbandonedPublic