This is an archive of the discontinued LLVM Phabricator instance.

It's "simpler", in some sense, but it's more complicated to lower because it's using the carry output of a uaddo in a non-addcarry operation, which is legal, but more expensive than using in an addcarry, particularly on ARM. I think the missing combine is that we should be able to turn the subtraction into some form of addcarry/subcarry.

In D62266#1512613, @efriedma wrote:

So the DAG is simpler, but that particular case is worse for the backend.

It's "simpler", in some sense, but it's more complicated to lower because it's using the carry output of a uaddo in a non-addcarry operation, which is legal, but more expensive than using in an addcarry, particularly on ARM. I think the missing combine is that we should be able to turn the subtraction into some form of addcarry/subcarry.

Hmm, that does not seem to be all.
I've added

bool HasSUBCARRY = TLI.isOperationLegalOrCustom(ISD::SUBCARRY, VT);
bool HasADDCARRY = TLI.isOperationLegalOrCustom(ISD::ADDCARRY, VT);
if (HasSUBCARRY || HasADDCARRY) {
  unsigned ANYCARRY = HasSUBCARRY ? ISD::SUBCARRY : ISD::ADDCARRY;
  // (sub X, Carry) -> (subcarry/addcarry X, 0, Carry)
  if (SDValue Carry = getAsCarry(TLI, N1)) {
    return DAG.getNode(ANYCARRY, DL, DAG.getVTList(VT, Carry.getValueType()),
                       N0, DAG.getConstant(0, DL, VT), Carry);
  }

  if (HasSUBCARRY) {
    // (sub Carry, X) -> (subcarry 0, X, Carry)
    if (SDValue Carry = getAsCarry(TLI, N0)) {
      return DAG.getNode(ISD::SUBCARRY, DL,
                         DAG.getVTList(VT, Carry.getValueType()),
                         DAG.getConstant(0, DL, VT), N1, Carry);
    }
  }
}

at the end of DAGCombiner::visitSUB(), and get this

SelectionDAG has 17 nodes:
  t0: ch = EntryToken
          t6: i32,ch = CopyFromReg t0, Register:i32 %2
            t4: i32,ch = CopyFromReg t0, Register:i32 %1
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t30: i32,i32 = uaddo t4, t2
        t31: i32,i32 = subcarry Constant:i32<0>, t6, t30:1
      t39: i32 = and t31, Constant:i32<65535>
    t40: ch = br_cc t0, seteq:ch, t39, Constant:i32<65535>, BasicBlock:ch<if.end 0x55706fb58828>
  t19: ch = br t40, BasicBlock:ch<for.cond.preheader 0x55706fb58698>

which is then resulting in

SelectionDAG has 22 nodes:
  t0: ch = EntryToken
            t6: i32,ch = CopyFromReg t0, Register:i32 %2
                    t4: i32,ch = CopyFromReg t0, Register:i32 %1
                    t2: i32,ch = CopyFromReg t0, Register:i32 %0
                  t51: i32,i32 = ARMISD::ADDC t4, t2
                t52: i32,i32 = ARMISD::ADDE Constant:i32<0>, Constant:i32<0>, t51:1
              t45: i32 = sub Constant:i32<1>, t52
            t46: i32,i32 = ARMISD::SUBC t45, Constant:i32<1>
          t47: i32,i32 = ARMISD::SUBE Constant:i32<0>, t6, t46:1
        t39: i32 = and t47, Constant:i32<65535>
      t41: glue = ARMISD::CMPZ t39, Constant:i32<65535>
    t43: ch = ARMISD::BRCOND t0, BasicBlock:ch<if.end 0x55706fb58828>, Constant:i32<0>, Register:i32 $cpsr, t41
  t19: ch = br t43, BasicBlock:ch<for.cond.preheader 0x55706fb58698>

So some other ARM-specific combine is missing, too?
Or am i totally misreading this?

uaddo+subcarry doesn't really work the way you'd want it to on ARM... I think you have to invert the carry bit, and there isn't any convenient way to do that. (Note the extra "SUB" between the ARMISD::ADDE and the ARMISD::SUBC.) So I think to get the optimal code here you have to produce neg+addcarry instead of subcarry.

In D62266#1512853, @efriedma wrote:

uaddo+subcarry doesn't really work the way you'd want it to on ARM... I think you have to invert the carry bit, and there isn't any convenient way to do that. (Note the extra "SUB" between the ARMISD::ADDE and the ARMISD::SUBC.) So I think to get the optimal code here you have to produce neg+addcarry instead of subcarry.

Hmm. I'm sorry, that did not make the problem any clearer to me.
Sadly, i have literally never encountered 'carry' before, so this makes very little sense to me.
Are you saying that the following transform should be performed: (sub Carry, X) -> (addcarry (sub 0, X), 0, !Carry)?
That regresses the test even further..

lebedev.ri added a child revision: D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.May 23 2019, 2:10 AM

Are you saying that the following transform should be performed: (sub Carry, X) -> (addcarry (sub 0, X), 0, !Carry)?

The correct transform is (sub Carry, X) -> (add Carry, (sub 0, X)) -> (addcarry (sub 0, X), 0, Carry), no "!".

Sorry, I wasn't really clear; the carry bit doesn't need to be inverted in target-independent code. It's just that on ARM, "subs"/"sbcs" produce a carry bit which has the opposite meaning of the "overflow" bit from usubo, and "sbc(s)" consumes a carry bit that's inverted from the carry bit operand of subcarry. If the result of a usubo is consumed by a subcarry, the two inversions cancel; otherwise, we need to transfer the the carry bit to a GPR, invert it, then transfer it back to the flags register. (Maybe this could be optimized a bit in certain cases, but it doesn't really optimize in general.)

lebedev.ri mentioned this in D62392: [DAGCombine][ARM] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold.May 24 2019, 7:37 AM

lebedev.ri added a child revision: D62392: [DAGCombine][ARM] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold.

lebedev.ri removed a child revision: D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.

In D62266#1514711, @efriedma wrote:

Are you saying that the following transform should be performed: (sub Carry, X) -> (addcarry (sub 0, X), 0, !Carry)?

The correct transform is (sub Carry, X) -> (add Carry, (sub 0, X)) -> (addcarry (sub 0, X), 0, Carry), no "!".

Oh right, i think i may have mistyped here. D62392

Sorry, I wasn't really clear; the carry bit doesn't need to be inverted in target-independent code. It's just that on ARM, "subs"/"sbcs" produce a carry bit which has the opposite meaning of the "overflow" bit from usubo, and "sbc(s)" consumes a carry bit that's inverted from the carry bit operand of subcarry. If the result of a usubo is consumed by a subcarry, the two inversions cancel; otherwise, we need to transfer the the carry bit to a GPR, invert it, then transfer it back to the flags register. (Maybe this could be optimized a bit in certain cases, but it doesn't really optimize in general.)

Thanks! That helps improves half of the cases (although still regressed as compared to LHS of this diff).

lebedev.ri added a comment.May 24 2019, 7:42 AM

This comment was removed by lebedev.ri.

efriedma added inline comments.May 24 2019, 3:03 PM

test/CodeGen/ARM/addsubcarry-promotion.ll
10	It looks like this test didn't get rebased correctly?
33	The big difference that's making the new code worse, now, is that somehow the compare `x==0` is getting transformed to something more like `x==-1`... and I guess something isn't handling that well. That's probably something the ARM backend should be handling in target-specific code, though; feel free to just file a bug for the missed optimization on `void a(int s, void f()) { if ((short)s==-1)f(); }`

lebedev.ri marked 2 inline comments as done.May 24 2019, 3:23 PM

lebedev.ri added inline comments.

test/CodeGen/ARM/addsubcarry-promotion.ll

No, D62392 is a child revision of this.

I understand how we get here. In IR terms, this is just two folds:
https://rise4fun.com/Alive/fTpN
https://godbolt.org/z/6RLVTQ

And thus we end with:

Optimized type-legalized selection DAG: %bb.0 'fn1:entry'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
              t6: i32,ch = CopyFromReg t0, Register:i32 %2
            t37: i32 = sub Constant:i32<0>, t6
          t39: i32 = sign_extend_inreg t37, ValueType:ch:i16
            t4: i32,ch = CopyFromReg t0, Register:i32 %1
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t36: i32,i32 = uaddo t4, t2
        t47: i32,i32 = addcarry t39, Constant:i32<0>, t36:1
      t46: i32 = and t47, Constant:i32<65535>
    t50: ch = br_cc t0, seteq:ch, t46, Constant:i32<65535>, BasicBlock:ch<if.end 0x5618ef707f18>
  t19: ch = br t50, BasicBlock:ch<for.cond.preheader 0x5618ef707d88>

I wonder, maybe this can still be undone in DAGCombine.
The important bits are:

    t47: i32,i32 = addcarry t39, Constant:i32<0>, t36:1
  t46: i32 = and t47, Constant:i32<65535>
t50: ch = br_cc t0, seteq:ch, t46, Constant:i32<65535>, BasicBlock:ch<if.end 0x5618ef707f18>

I wonder if the problem can be fixed by transforming this back into:

  t47: i32,i32 = addcarry t39, Constant:i32<65535>, t36:1
t50: ch = br_cc t0, seteq:ch, t47, Constant:i32<0>, BasicBlock:ch<if.end 0x5618ef707f18>

lebedev.ri marked an inline comment as done.May 24 2019, 3:31 PM

lebedev.ri added inline comments.

test/CodeGen/ARM/addsubcarry-promotion.ll
33	Or more specifically, https://rise4fun.com/Alive/slGX This would be an ok transform here because the second argument of `addcarry` is 0 here anyway.

lebedev.ri marked an inline comment as done.May 24 2019, 3:42 PM

lebedev.ri added inline comments.

test/CodeGen/ARM/addsubcarry-promotion.ll
33	Or without hardcoded constants: https://rise4fun.com/Alive/B2k Not sure if it's worthwhile doing this if it requires creating whole new 'add', but here we should be able to do it.

lebedev.ri mentioned this in D62450: [DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into the x node..May 25 2019, 10:23 AM

lebedev.ri marked 2 inline comments as done.May 25 2019, 10:28 AM

lebedev.ri added inline comments.

test/CodeGen/ARM/addsubcarry-promotion.ll
33	Done, D62450. I think that fixes (and improves upon?) the last regression, feel free to review these three patches :)

lebedev.ri edited the summary of this revision. (Show Details)May 25 2019, 10:29 AM

lebedev.ri added a parent revision: D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.May 28 2019, 10:21 AM

lebedev.ri removed a parent revision: D62263: [DAGCombine][X86][AArch64][AMDGPU] (x - y) + -1 -> add (xor y, -1), x fold.

lebedev.ri marked an inline comment as done.May 28 2019, 3:10 PM

lebedev.ri added inline comments.

test/CodeGen/ARM/addsubcarry-promotion.ll
33	ping @efriedma :)

Hello. I found this code that seems to loop forever:

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv6m-arm-none-eabi"

@a = dso_local local_unnamed_addr global i32 0, align 4
@b = dso_local local_unnamed_addr global i32 0, align 4

; Function Attrs: minsize nounwind optsize
define dso_local i32 @c() local_unnamed_addr #0 {
entry:
  %0 = load i32, i32* @a, align 4
  %sub = sub nsw i32 2000, %0
  %call = tail call i32 bitcast (i32 (...)* @d to i32 (i32)*)(i32 %sub) #2
  %1 = load i32, i32* @b, align 4
  %cmp = icmp sgt i32 %1, 1999
  br i1 %cmp, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %call1 = tail call i32 bitcast (i32 (...)* @e to i32 ()*)() #2
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret i32 undef
}

declare dso_local i32 @d(...) local_unnamed_addr #1
declare dso_local i32 @e(...) local_unnamed_addr #1

In D62266#1521995, @dmgreen wrote:

Hello. I found this code that seems to loop forever:

*This* code? As in *this* patch, correct?

target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "thumbv6m-arm-none-eabi"

@a = dso_local local_unnamed_addr global i32 0, align 4
@b = dso_local local_unnamed_addr global i32 0, align 4

; Function Attrs: minsize nounwind optsize
define dso_local i32 @c() local_unnamed_addr #0 {
entry:
  %0 = load i32, i32* @a, align 4
  %sub = sub nsw i32 2000, %0
  %call = tail call i32 bitcast (i32 (...)* @d to i32 (i32)*)(i32 %sub) #2
  %1 = load i32, i32* @b, align 4
  %cmp = icmp sgt i32 %1, 1999
  br i1 %cmp, label %if.then, label %if.end

if.then:                                          ; preds = %entry
  %call1 = tail call i32 bitcast (i32 (...)* @e to i32 ()*)() #2
  br label %if.end

if.end:                                           ; preds = %if.then, %entry
  ret i32 undef
}

declare dso_local i32 @d(...) local_unnamed_addr #1
declare dso_local i32 @e(...) local_unnamed_addr #1

Thanks for the reproducer.
There may be one more (as in, an unrelated to this one) endless loop in D62257 in test-suite, did not reduce it yet (i reverted that patch)

*This* code? As in *this* patch, correct?

Hello. I meant this .ll file seems to loop forever. With this patch or one of the three others (D62257, D62392, D62450). I only tested them together, so it may be the same as the problem from the test-suite.

In D62266#1522025, @dmgreen wrote:

*This* code? As in *this* patch, correct?

Hello. I meant this .ll file seems to loop forever. With this patch or one of the three others (D62257, D62392, D62450). I only tested them together, so it may be the same as the problem from the test-suite.

Confirmed, thank you for the reproducer.
This is another hang from what i have seen with D62257.

lebedev.ri removed a parent revision: D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.May 30 2019, 1:23 PM

lebedev.ri added a parent revision: D62257: [DAGCombiner][X86][AArch64] (x - C) + y -> (x + y) - C fold.

Diffusion mentioned this in rL362156: [DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts.May 30 2019, 2:10 PM

lebedev.ri mentioned this in rG46511d75b5bf: [DAGCombine] Limit 'hoist add/sub binop w/ constant op' to non-opaque consts.May 30 2019, 2:10 PM

Diffusion mentioned this in rL362159: [NFC][ARM] Add a test that potentially causes endless combine loop with D62266.May 30 2019, 2:38 PM

lebedev.ri mentioned this in rG31f193984839: [NFC][ARM] Add a test that potentially causes endless combine loop with D62266.May 30 2019, 2:42 PM

ping @efriedma.

Rebased, fixed endless combine loop (by not hoisting if constant is opaque, since we don't constant-fold those).

lebedev.ri mentioned this in D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.May 31 2019, 8:09 AM

LGTM. I'm okay with working on adding more test coverage and fixes for the ADDCARRY cases in a followup.

This revision is now accepted and ready to land.May 31 2019, 2:18 PM

Diffusion mentioned this in rL362295: [NFC][Codegen] shift-amount-mod.ll: drop innermost operation.Jun 1 2019, 4:05 AM

lebedev.ri mentioned this in rG1aaa23c0fc5f: [NFC][Codegen] shift-amount-mod.ll: drop innermost operation.Jun 1 2019, 4:06 AM

Rebased, NFC.

lebedev.ri added a child revision: D62774: [DAGCombine][X86][AArch64][MIPS][LANAI] (C - x) - y -> C - (x + y) fold (PR41952).Jun 1 2019, 5:50 AM

Closed by commit rL362487: [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold (authored by lebedevri). · Explain WhyJun 4 2019, 4:03 AM

This revision was automatically updated to reflect the committed changes.

lebedev.ri mentioned this in rGc00f3182243d: [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold.Sep 18 2019, 1:52 PM

Diffusion mentioned this in rL372259: [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold.Sep 18 2019, 1:52 PM

lebedev.ri mentioned this in rG4334892e7b07: [DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into….Oct 22 2019, 12:58 PM

hans mentioned this in rG684ebc605e0b: Revert 4334892e7b "[DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c'….Oct 23 2019, 11:00 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

7 lines

test/

CodeGen/

AArch64/

shift-amount-mod.ll

8 lines

sink-addsub-of-const.ll

22 lines

ARM/

addsubcarry-promotion.ll

69 lines

X86/

shift-amount-mod.ll

20 lines

sink-addsub-of-const.ll

68 lines

Diff 202313

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,472 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitADDLikeCommutative(SDValue N0, SDValue N1,
// Hoist one-use subtraction by non-opaque constant:		// Hoist one-use subtraction by non-opaque constant:
// (x - C) + y -> (x + y) - C		// (x - C) + y -> (x + y) - C
// This is necessary because SUB(X,C) -> ADD(X,-C) doesn't work for vectors.		// This is necessary because SUB(X,C) -> ADD(X,-C) doesn't work for vectors.
if (N0.hasOneUse() && N0.getOpcode() == ISD::SUB &&		if (N0.hasOneUse() && N0.getOpcode() == ISD::SUB &&
isConstantOrConstantVector(N0.getOperand(1), /NoOpaques=/true)) {		isConstantOrConstantVector(N0.getOperand(1), /NoOpaques=/true)) {
SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), N1);		SDValue Add = DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), N1);
return DAG.getNode(ISD::SUB, DL, VT, Add, N0.getOperand(1));		return DAG.getNode(ISD::SUB, DL, VT, Add, N0.getOperand(1));
}		}
		// Hoist one-use subtraction from non-opaque constant:
		// (C - x) + y -> (y - x) + C
		if (N0.hasOneUse() && N0.getOpcode() == ISD::SUB &&
		isConstantOrConstantVector(N0.getOperand(0), /NoOpaques=/true)) {
		SDValue Sub = DAG.getNode(ISD::SUB, DL, VT, N1, N0.getOperand(1));
		return DAG.getNode(ISD::ADD, DL, VT, Sub, N0.getOperand(0));
		}

// If the target's bool is represented as 0/1, prefer to make this 'sub 0/1'		// If the target's bool is represented as 0/1, prefer to make this 'sub 0/1'
// rather than 'add 0/-1' (the zext should get folded).		// rather than 'add 0/-1' (the zext should get folded).
// add (sext i1 Y), X --> sub X, (zext i1 Y)		// add (sext i1 Y), X --> sub X, (zext i1 Y)
if (N0.getOpcode() == ISD::SIGN_EXTEND &&		if (N0.getOpcode() == ISD::SIGN_EXTEND &&
N0.getOperand(0).getScalarValueSizeInBits() == 1 &&		N0.getOperand(0).getScalarValueSizeInBits() == 1 &&
TLI.getBooleanContents(VT) == TargetLowering::ZeroOrOneBooleanContent) {		TLI.getBooleanContents(VT) == TargetLowering::ZeroOrOneBooleanContent) {
SDValue ZExt = DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0));		SDValue ZExt = DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0));
▲ Show 20 Lines • Show All 17,792 Lines • Show Last 20 Lines

test/CodeGen/AArch64/shift-amount-mod.ll

	Show First 20 Lines • Show All 369 Lines • ▼ Show 20 Lines

	;==============================================================================;			;==============================================================================;
	; add to negated shift amount			; add to negated shift amount
	;			;

	define i32 @reg32_lshr_by_add_to_negated(i32 %val, i32 %a, i32 %b) nounwind {			define i32 @reg32_lshr_by_add_to_negated(i32 %val, i32 %a, i32 %b) nounwind {
	; CHECK-LABEL: reg32_lshr_by_add_to_negated:			; CHECK-LABEL: reg32_lshr_by_add_to_negated:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #32			; CHECK-NEXT: sub w8, w2, w1
	; CHECK-NEXT: sub w8, w8, w1
	; CHECK-NEXT: add w8, w8, w2
	; CHECK-NEXT: lsr w0, w0, w8			; CHECK-NEXT: lsr w0, w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%nega = sub i32 32, %a			%nega = sub i32 32, %a
	%negasubb = add i32 %nega, %b			%negasubb = add i32 %nega, %b
	%shifted = lshr i32 %val, %negasubb			%shifted = lshr i32 %val, %negasubb
	ret i32 %shifted			ret i32 %shifted
	}			}
	define i64 @reg64_lshr_by_add_to_negated(i64 %val, i64 %a, i64 %b) nounwind {			define i64 @reg64_lshr_by_add_to_negated(i64 %val, i64 %a, i64 %b) nounwind {
	; CHECK-LABEL: reg64_lshr_by_add_to_negated:			; CHECK-LABEL: reg64_lshr_by_add_to_negated:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #64			; CHECK-NEXT: sub x8, x2, x1
	; CHECK-NEXT: sub x8, x8, x1
	; CHECK-NEXT: add x8, x8, x2
	; CHECK-NEXT: lsr x0, x0, x8			; CHECK-NEXT: lsr x0, x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%nega = sub i64 64, %a			%nega = sub i64 64, %a
	%negasubb = add i64 %nega, %b			%negasubb = add i64 %nega, %b
	%shifted = lshr i64 %val, %negasubb			%shifted = lshr i64 %val, %negasubb
	ret i64 %shifted			ret i64 %shifted
	}			}

	▲ Show 20 Lines • Show All 273 Lines • Show Last 20 Lines

test/CodeGen/AArch64/sink-addsub-of-const.ll

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines

; add (sub C, %x), %y		; add (sub C, %x), %y
; Outer 'add' is commutative - 2 variants.		; Outer 'add' is commutative - 2 variants.

define i32 @sink_sub_from_const_to_add0(i32 %a, i32 %b, i32 %c) {		define i32 @sink_sub_from_const_to_add0(i32 %a, i32 %b, i32 %c) {
; CHECK-LABEL: sink_sub_from_const_to_add0:		; CHECK-LABEL: sink_sub_from_const_to_add0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add w8, w0, w1		; CHECK-NEXT: add w8, w0, w1
; CHECK-NEXT: mov w9, #32		; CHECK-NEXT: sub w8, w2, w8
; CHECK-NEXT: sub w8, w9, w8		; CHECK-NEXT: add w0, w8, #32 // =32
; CHECK-NEXT: add w0, w8, w2
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = add i32 %a, %b		%t0 = add i32 %a, %b
%t1 = sub i32 32, %t0		%t1 = sub i32 32, %t0
%r = add i32 %t1, %c		%r = add i32 %t1, %c
ret i32 %r		ret i32 %r
}		}
define i32 @sink_sub_from_const_to_add1(i32 %a, i32 %b, i32 %c) {		define i32 @sink_sub_from_const_to_add1(i32 %a, i32 %b, i32 %c) {
; CHECK-LABEL: sink_sub_from_const_to_add1:		; CHECK-LABEL: sink_sub_from_const_to_add1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add w8, w0, w1		; CHECK-NEXT: add w8, w0, w1
; CHECK-NEXT: mov w9, #32		; CHECK-NEXT: sub w8, w2, w8
; CHECK-NEXT: sub w8, w9, w8		; CHECK-NEXT: add w0, w8, #32 // =32
; CHECK-NEXT: add w0, w2, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = add i32 %a, %b		%t0 = add i32 %a, %b
%t1 = sub i32 32, %t0		%t1 = sub i32 32, %t0
%r = add i32 %c, %t1		%r = add i32 %c, %t1
ret i32 %r		ret i32 %r
}		}

; sub (add %x, C), %y		; sub (add %x, C), %y
Show All 38 Lines	; CHECK-NEXT: ret
%t1 = sub i32 %t0, 32		%t1 = sub i32 %t0, 32
%r = sub i32 %t1, %c		%r = sub i32 %t1, %c
ret i32 %r		ret i32 %r
}		}
define i32 @sink_sub_of_const_to_sub2(i32 %a, i32 %b, i32 %c) {		define i32 @sink_sub_of_const_to_sub2(i32 %a, i32 %b, i32 %c) {
; CHECK-LABEL: sink_sub_of_const_to_sub2:		; CHECK-LABEL: sink_sub_of_const_to_sub2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: sub w8, w1, w0		; CHECK-NEXT: sub w8, w1, w0
; CHECK-NEXT: add w8, w8, w2		; CHECK-NEXT: add w8, w2, w8
; CHECK-NEXT: add w0, w8, #32 // =32		; CHECK-NEXT: add w0, w8, #32 // =32
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = sub i32 %a, %b		%t0 = sub i32 %a, %b
%t1 = sub i32 %t0, 32		%t1 = sub i32 %t0, 32
%r = sub i32 %c, %t1		%r = sub i32 %c, %t1
ret i32 %r		ret i32 %r
}		}

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
; Outer 'add' is commutative - 2 variants.		; Outer 'add' is commutative - 2 variants.

define <4 x i32> @vec_sink_sub_from_const_to_add0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define <4 x i32> @vec_sink_sub_from_const_to_add0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vec_sink_sub_from_const_to_add0:		; CHECK-LABEL: vec_sink_sub_from_const_to_add0:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI16_0		; CHECK-NEXT: adrp x8, .LCPI16_0
; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI16_0]		; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI16_0]
; CHECK-NEXT: add v0.4s, v0.4s, v1.4s		; CHECK-NEXT: add v0.4s, v0.4s, v1.4s
; CHECK-NEXT: sub v0.4s, v3.4s, v0.4s		; CHECK-NEXT: sub v0.4s, v2.4s, v0.4s
; CHECK-NEXT: add v0.4s, v0.4s, v2.4s		; CHECK-NEXT: add v0.4s, v0.4s, v3.4s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = add <4 x i32> %a, %b		%t0 = add <4 x i32> %a, %b
%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0		%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0
%r = add <4 x i32> %t1, %c		%r = add <4 x i32> %t1, %c
ret <4 x i32> %r		ret <4 x i32> %r
}		}
define <4 x i32> @vec_sink_sub_from_const_to_add1(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define <4 x i32> @vec_sink_sub_from_const_to_add1(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vec_sink_sub_from_const_to_add1:		; CHECK-LABEL: vec_sink_sub_from_const_to_add1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI17_0		; CHECK-NEXT: adrp x8, .LCPI17_0
; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI17_0]		; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI17_0]
; CHECK-NEXT: add v0.4s, v0.4s, v1.4s		; CHECK-NEXT: add v0.4s, v0.4s, v1.4s
; CHECK-NEXT: sub v0.4s, v3.4s, v0.4s		; CHECK-NEXT: sub v0.4s, v2.4s, v0.4s
; CHECK-NEXT: add v0.4s, v2.4s, v0.4s		; CHECK-NEXT: add v0.4s, v0.4s, v3.4s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = add <4 x i32> %a, %b		%t0 = add <4 x i32> %a, %b
%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0		%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0
%r = add <4 x i32> %c, %t1		%r = add <4 x i32> %c, %t1
ret <4 x i32> %r		ret <4 x i32> %r
}		}

; sub (add %x, C), %y		; sub (add %x, C), %y
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <4 x i32> %r		ret <4 x i32> %r
}		}
define <4 x i32> @vec_sink_sub_of_const_to_sub2(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define <4 x i32> @vec_sink_sub_of_const_to_sub2(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; CHECK-LABEL: vec_sink_sub_of_const_to_sub2:		; CHECK-LABEL: vec_sink_sub_of_const_to_sub2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: adrp x8, .LCPI21_0		; CHECK-NEXT: adrp x8, .LCPI21_0
; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI21_0]		; CHECK-NEXT: ldr q3, [x8, :lo12:.LCPI21_0]
; CHECK-NEXT: sub v0.4s, v1.4s, v0.4s		; CHECK-NEXT: sub v0.4s, v1.4s, v0.4s
; CHECK-NEXT: add v0.4s, v0.4s, v2.4s		; CHECK-NEXT: add v0.4s, v2.4s, v0.4s
; CHECK-NEXT: add v0.4s, v0.4s, v3.4s		; CHECK-NEXT: add v0.4s, v0.4s, v3.4s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%t0 = sub <4 x i32> %a, %b		%t0 = sub <4 x i32> %a, %b
%t1 = sub <4 x i32> %t0, <i32 42, i32 24, i32 undef, i32 46>		%t1 = sub <4 x i32> %t0, <i32 42, i32 24, i32 undef, i32 46>
%r = sub <4 x i32> %c, %t1		%r = sub <4 x i32> %c, %t1
ret <4 x i32> %r		ret <4 x i32> %r
}		}

Show All 31 Lines

test/CodeGen/ARM/addsubcarry-promotion.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O2 -mtriple armv7a < %s \| FileCheck --check-prefixes=ARM,ARMV7A %s			; RUN: llc -O2 -mtriple armv7a < %s \| FileCheck --check-prefixes=ARM,ARMV7A %s

	; RUN: llc -O2 -mtriple thumbv6m < %s \| FileCheck --check-prefixes=THUMB1,THUMBV6M %s			; RUN: llc -O2 -mtriple thumbv6m < %s \| FileCheck --check-prefixes=THUMB1,THUMBV6M %s
	; RUN: llc -O2 -mtriple thumbv8m.base < %s \| FileCheck --check-prefixes=THUMB1,THUMBV8M-BASE %s			; RUN: llc -O2 -mtriple thumbv8m.base < %s \| FileCheck --check-prefixes=THUMB1,THUMBV8M-BASE %s

	; RUN: llc -O2 -mtriple thumbv7a < %s \| FileCheck --check-prefixes=THUMB,THUMBV7A %s			; RUN: llc -O2 -mtriple thumbv7a < %s \| FileCheck --check-prefixes=THUMB,THUMBV7A %s
	; RUN: llc -O2 -mtriple thumbv8m.main < %s \| FileCheck --check-prefixes=THUMB,THUMBV8M-MAIN %s			; RUN: llc -O2 -mtriple thumbv8m.main < %s \| FileCheck --check-prefixes=THUMB,THUMBV8M-MAIN %s

	define void @fn1(i32 %a, i32 %b, i32 %c) local_unnamed_addr #0 {			define void @fn1(i32 %a, i32 %b, i32 %c) local_unnamed_addr #0 {
				efriedmaUnsubmitted Not Done Reply Inline Actions It looks like this test didn't get rebased correctly? efriedma: It looks like this test didn't get rebased correctly?
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions No, D62392 is a child revision of this. lebedev.ri: No, D62392 is a child revision of this.
	; ARM-LABEL: fn1:			; ARM-LABEL: fn1:
	; ARM: @ %bb.0: @ %entry			; ARM: @ %bb.0: @ %entry
	; ARM-NEXT: rsb r2, r2, #1
	; ARM-NEXT: adds r0, r1, r0			; ARM-NEXT: adds r0, r1, r0
				; ARM-NEXT: mov r3, #0
				; ARM-NEXT: adc r0, r3, #0
	; ARM-NEXT: movw r1, #65535			; ARM-NEXT: movw r1, #65535
	; ARM-NEXT: sxth r2, r2			; ARM-NEXT: sub r0, r0, r2
	; ARM-NEXT: adc r0, r2, #0			; ARM-NEXT: uxth r0, r0
	; ARM-NEXT: tst r0, r1			; ARM-NEXT: cmp r0, r1
	; ARM-NEXT: bxeq lr			; ARM-NEXT: bxeq lr
	; ARM-NEXT: .LBB0_1: @ %for.cond			; ARM-NEXT: .LBB0_1: @ %for.cond
	; ARM-NEXT: @ =>This Inner Loop Header: Depth=1			; ARM-NEXT: @ =>This Inner Loop Header: Depth=1
	; ARM-NEXT: b .LBB0_1			; ARM-NEXT: b .LBB0_1
	;			;
	; THUMB1-LABEL: fn1:			; THUMBV6M-LABEL: fn1:
	; THUMB1: @ %bb.0: @ %entry			; THUMBV6M: @ %bb.0: @ %entry
	; THUMB1-NEXT: movs r3, #1			; THUMBV6M-NEXT: movs r3, #0
	; THUMB1-NEXT: subs r2, r3, r2			; THUMBV6M-NEXT: adds r0, r1, r0
	; THUMB1-NEXT: sxth r2, r2			; THUMBV6M-NEXT: adcs r3, r3
	; THUMB1-NEXT: movs r3, #0			; THUMBV6M-NEXT: subs r0, r3, r2
	; THUMB1-NEXT: adds r0, r1, r0			; THUMBV6M-NEXT: uxth r0, r0
	; THUMB1-NEXT: adcs r3, r2			; THUMBV6M-NEXT: ldr r1, .LCPI0_0
	; THUMB1-NEXT: lsls r0, r3, #16			; THUMBV6M-NEXT: cmp r0, r1
				efriedmaUnsubmitted Done Reply Inline Actions The big difference that's making the new code worse, now, is that somehow the compare `x==0` is getting transformed to something more like `x==-1`... and I guess something isn't handling that well. That's probably something the ARM backend should be handling in target-specific code, though; feel free to just file a bug for the missed optimization on `void a(int s, void f()) { if ((short)s==-1)f(); }` efriedma: The big difference that's making the new code worse, now, is that somehow the compare `x==0` is…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions I understand how we get here. In IR terms, this is just two folds: https://rise4fun.com/Alive/fTpN https://godbolt.org/z/6RLVTQ And thus we end with: Optimized type-legalized selection DAG: %bb.0 'fn1:entry' SelectionDAG has 20 nodes: t0: ch = EntryToken t6: i32,ch = CopyFromReg t0, Register:i32 %2 t37: i32 = sub Constant:i32<0>, t6 t39: i32 = sign_extend_inreg t37, ValueType:ch:i16 t4: i32,ch = CopyFromReg t0, Register:i32 %1 t2: i32,ch = CopyFromReg t0, Register:i32 %0 t36: i32,i32 = uaddo t4, t2 t47: i32,i32 = addcarry t39, Constant:i32<0>, t36:1 t46: i32 = and t47, Constant:i32<65535> t50: ch = br_cc t0, seteq:ch, t46, Constant:i32<65535>, BasicBlock:ch<if.end 0x5618ef707f18> t19: ch = br t50, BasicBlock:ch<for.cond.preheader 0x5618ef707d88> I wonder, maybe this can still be undone in DAGCombine. The important bits are: t47: i32,i32 = addcarry t39, Constant:i32<0>, t36:1 t46: i32 = and t47, Constant:i32<65535> t50: ch = br_cc t0, seteq:ch, t46, Constant:i32<65535>, BasicBlock:ch<if.end 0x5618ef707f18> I wonder if the problem can be fixed by transforming this back into: t47: i32,i32 = addcarry t39, Constant:i32<65535>, t36:1 t50: ch = br_cc t0, seteq:ch, t47, Constant:i32<0>, BasicBlock:ch<if.end 0x5618ef707f18> lebedev.ri: I understand how we get here. In IR terms, this is just two folds: https://rise4fun.
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Or more specifically, https://rise4fun.com/Alive/slGX This would be an ok transform here because the second argument of `addcarry` is 0 here anyway. lebedev.ri: Or more specifically, https://rise4fun.com/Alive/slGX This would be an ok transform here…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Or without hardcoded constants: https://rise4fun.com/Alive/B2k Not sure if it's worthwhile doing this if it requires creating whole new 'add', but here we should be able to do it. lebedev.ri: Or without hardcoded constants: https://rise4fun.com/Alive/B2k Not sure if it's worthwhile…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions Done, D62450. I think that fixes (and improves upon?) the last regression, feel free to review these three patches :) lebedev.ri: Done, D62450. I think that fixes (and improves upon?) the last regression, feel free to review…
				lebedev.riAuthorUnsubmitted Done Reply Inline Actions ping @efriedma :) lebedev.ri: ping @efriedma :)
	; THUMB1-NEXT: beq .LBB0_2			; THUMBV6M-NEXT: beq .LBB0_2
	; THUMB1-NEXT: .LBB0_1: @ %for.cond			; THUMBV6M-NEXT: .LBB0_1: @ %for.cond
	; THUMB1-NEXT: @ =>This Inner Loop Header: Depth=1			; THUMBV6M-NEXT: @ =>This Inner Loop Header: Depth=1
	; THUMB1-NEXT: b .LBB0_1			; THUMBV6M-NEXT: b .LBB0_1
	; THUMB1-NEXT: .LBB0_2: @ %if.end			; THUMBV6M-NEXT: .LBB0_2: @ %if.end
	; THUMB1-NEXT: bx lr			; THUMBV6M-NEXT: bx lr
				; THUMBV6M-NEXT: .p2align 2
				; THUMBV6M-NEXT: @ %bb.3:
				; THUMBV6M-NEXT: .LCPI0_0:
				; THUMBV6M-NEXT: .long 65535 @ 0xffff
				;
				; THUMBV8M-BASE-LABEL: fn1:
				; THUMBV8M-BASE: @ %bb.0: @ %entry
				; THUMBV8M-BASE-NEXT: movs r3, #0
				; THUMBV8M-BASE-NEXT: adds r0, r1, r0
				; THUMBV8M-BASE-NEXT: adcs r3, r3
				; THUMBV8M-BASE-NEXT: subs r0, r3, r2
				; THUMBV8M-BASE-NEXT: uxth r0, r0
				; THUMBV8M-BASE-NEXT: movw r1, #65535
				; THUMBV8M-BASE-NEXT: cmp r0, r1
				; THUMBV8M-BASE-NEXT: beq .LBB0_2
				; THUMBV8M-BASE-NEXT: .LBB0_1: @ %for.cond
				; THUMBV8M-BASE-NEXT: @ =>This Inner Loop Header: Depth=1
				; THUMBV8M-BASE-NEXT: b .LBB0_1
				; THUMBV8M-BASE-NEXT: .LBB0_2: @ %if.end
				; THUMBV8M-BASE-NEXT: bx lr
	;			;
	; THUMB-LABEL: fn1:			; THUMB-LABEL: fn1:
	; THUMB: @ %bb.0: @ %entry			; THUMB: @ %bb.0: @ %entry
	; THUMB-NEXT: rsb.w r2, r2, #1
	; THUMB-NEXT: adds r0, r0, r1			; THUMB-NEXT: adds r0, r0, r1
	; THUMB-NEXT: sxth r2, r2			; THUMB-NEXT: mov.w r3, #0
	; THUMB-NEXT: adc r0, r2, #0			; THUMB-NEXT: adc r0, r3, #0
	; THUMB-NEXT: lsls r0, r0, #16			; THUMB-NEXT: movw r1, #65535
				; THUMB-NEXT: subs r0, r0, r2
				; THUMB-NEXT: uxth r0, r0
				; THUMB-NEXT: cmp r0, r1
	; THUMB-NEXT: it eq			; THUMB-NEXT: it eq
	; THUMB-NEXT: bxeq lr			; THUMB-NEXT: bxeq lr
	; THUMB-NEXT: .LBB0_1: @ %for.cond			; THUMB-NEXT: .LBB0_1: @ %for.cond
	; THUMB-NEXT: @ =>This Inner Loop Header: Depth=1			; THUMB-NEXT: @ =>This Inner Loop Header: Depth=1
	; THUMB-NEXT: b .LBB0_1			; THUMB-NEXT: b .LBB0_1
	entry:			entry:
	%add = add i32 %b, %a			%add = add i32 %b, %a
	%cmp = icmp ult i32 %add, %b			%cmp = icmp ult i32 %add, %b
	Show All 16 Lines

test/CodeGen/X86/shift-amount-mod.ll

	Show First 20 Lines • Show All 855 Lines • ▼ Show 20 Lines
	;==============================================================================;			;==============================================================================;
	; add to negated shift amount			; add to negated shift amount
	;			;

	define i32 @reg32_lshr_by_add_to_negated(i32 %val, i32 %a, i32 %b) nounwind {			define i32 @reg32_lshr_by_add_to_negated(i32 %val, i32 %a, i32 %b) nounwind {
	; X32-LABEL: reg32_lshr_by_add_to_negated:			; X32-LABEL: reg32_lshr_by_add_to_negated:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl $32, %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: subl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: subl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: addl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: # kill: def $cl killed $cl killed $ecx			; X32-NEXT: # kill: def $cl killed $cl killed $ecx
	; X32-NEXT: shrl %cl, %eax			; X32-NEXT: shrl %cl, %eax
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: reg32_lshr_by_add_to_negated:			; X64-LABEL: reg32_lshr_by_add_to_negated:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: movl %edx, %ecx
	; X64-NEXT: movl %edi, %eax			; X64-NEXT: movl %edi, %eax
	; X64-NEXT: movl $32, %ecx
	; X64-NEXT: subl %esi, %ecx			; X64-NEXT: subl %esi, %ecx
	; X64-NEXT: addl %edx, %ecx
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx			; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shrl %cl, %eax			; X64-NEXT: shrl %cl, %eax
	; X64-NEXT: retq			; X64-NEXT: retq
	%nega = sub i32 32, %a			%nega = sub i32 32, %a
	%negasubb = add i32 %nega, %b			%negasubb = add i32 %nega, %b
	%shifted = lshr i32 %val, %negasubb			%shifted = lshr i32 %val, %negasubb
	ret i32 %shifted			ret i32 %shifted
	}			}
	define i64 @reg64_lshr_by_add_to_negated(i64 %val, i64 %a, i64 %b) nounwind {			define i64 @reg64_lshr_by_add_to_negated(i64 %val, i64 %a, i64 %b) nounwind {
	; X32-LABEL: reg64_lshr_by_add_to_negated:			; X32-LABEL: reg64_lshr_by_add_to_negated:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushl %esi			; X32-NEXT: pushl %esi
	; X32-NEXT: movl {{[0-9]+}}(%esp), %eax			; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
	; X32-NEXT: movl {{[0-9]+}}(%esp), %esi			; X32-NEXT: movl {{[0-9]+}}(%esp), %esi
	; X32-NEXT: movl $64, %ecx			; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: subl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: subl {{[0-9]+}}(%esp), %ecx
	; X32-NEXT: addl {{[0-9]+}}(%esp), %ecx			; X32-NEXT: addb $64, %cl
	; X32-NEXT: movl %esi, %edx			; X32-NEXT: movl %esi, %edx
	; X32-NEXT: shrl %cl, %edx			; X32-NEXT: shrl %cl, %edx
	; X32-NEXT: shrdl %cl, %esi, %eax			; X32-NEXT: shrdl %cl, %esi, %eax
	; X32-NEXT: testb $32, %cl			; X32-NEXT: testb $32, %cl
	; X32-NEXT: je .LBB29_2			; X32-NEXT: je .LBB29_2
	; X32-NEXT: # %bb.1:			; X32-NEXT: # %bb.1:
	; X32-NEXT: movl %edx, %eax			; X32-NEXT: movl %edx, %eax
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: xorl %edx, %edx
	; X32-NEXT: .LBB29_2:			; X32-NEXT: .LBB29_2:
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: reg64_lshr_by_add_to_negated:			; X64-LABEL: reg64_lshr_by_add_to_negated:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movl $64, %ecx
	; X64-NEXT: subl %esi, %ecx			; X64-NEXT: subl %esi, %ecx
	; X64-NEXT: addl %edx, %ecx			; X64-NEXT: # kill: def $cl killed $cl killed $rcx
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shrq %cl, %rax			; X64-NEXT: shrq %cl, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%nega = sub i64 64, %a			%nega = sub i64 64, %a
	%negasubb = add i64 %nega, %b			%negasubb = add i64 %nega, %b
	%shifted = lshr i64 %val, %negasubb			%shifted = lshr i64 %val, %negasubb
	ret i64 %shifted			ret i64 %shifted
	}			}

	▲ Show 20 Lines • Show All 352 Lines • ▼ Show 20 Lines
	; X32-NEXT: movl %edx, %eax			; X32-NEXT: movl %edx, %eax
	; X32-NEXT: xorl %edx, %edx			; X32-NEXT: xorl %edx, %edx
	; X32-NEXT: .LBB41_2:			; X32-NEXT: .LBB41_2:
	; X32-NEXT: popl %esi			; X32-NEXT: popl %esi
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: reg64_lshr_by_negated_unfolded_add_b:			; X64-LABEL: reg64_lshr_by_negated_unfolded_add_b:
	; X64: # %bb.0:			; X64: # %bb.0:
				; X64-NEXT: movq %rdx, %rcx
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: movl $64, %ecx
	; X64-NEXT: subl %esi, %ecx			; X64-NEXT: subl %esi, %ecx
	; X64-NEXT: addl %edx, %ecx			; X64-NEXT: # kill: def $cl killed $cl killed $rcx
	; X64-NEXT: # kill: def $cl killed $cl killed $ecx
	; X64-NEXT: shrq %cl, %rax			; X64-NEXT: shrq %cl, %rax
	; X64-NEXT: retq			; X64-NEXT: retq
	%nega = sub i64 0, %a			%nega = sub i64 0, %a
	%negaaddbitwidth = add i64 %nega, 64			%negaaddbitwidth = add i64 %nega, 64
	%negaaddbitwidthaddb = add i64 %negaaddbitwidth, %b			%negaaddbitwidthaddb = add i64 %negaaddbitwidth, %b
	%shifted = lshr i64 %val, %negaaddbitwidthaddb			%shifted = lshr i64 %val, %negaaddbitwidthaddb
	ret i64 %shifted			ret i64 %shifted
	}			}
	▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines

test/CodeGen/X86/sink-addsub-of-const.ll

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
}		}

; add (sub C, %x), %y		; add (sub C, %x), %y
; Outer 'add' is commutative - 2 variants.		; Outer 'add' is commutative - 2 variants.

define i32 @sink_sub_from_const_to_add0(i32 %a, i32 %b, i32 %c) {		define i32 @sink_sub_from_const_to_add0(i32 %a, i32 %b, i32 %c) {
; X32-LABEL: sink_sub_from_const_to_add0:		; X32-LABEL: sink_sub_from_const_to_add0:
; X32: # %bb.0:		; X32: # %bb.0:
		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: addl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: addl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: movl $32, %eax
; X32-NEXT: subl %ecx, %eax		; X32-NEXT: subl %ecx, %eax
; X32-NEXT: addl {{[0-9]+}}(%esp), %eax		; X32-NEXT: addl $32, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: sink_sub_from_const_to_add0:		; X64-LABEL: sink_sub_from_const_to_add0:
; X64: # %bb.0:		; X64: # %bb.0:
		; X64-NEXT: # kill: def $edx killed $edx def $rdx
; X64-NEXT: addl %esi, %edi		; X64-NEXT: addl %esi, %edi
; X64-NEXT: movl $32, %eax		; X64-NEXT: subl %edi, %edx
; X64-NEXT: subl %edi, %eax		; X64-NEXT: leal 32(%rdx), %eax
; X64-NEXT: addl %edx, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = add i32 %a, %b		%t0 = add i32 %a, %b
%t1 = sub i32 32, %t0		%t1 = sub i32 32, %t0
%r = add i32 %t1, %c		%r = add i32 %t1, %c
ret i32 %r		ret i32 %r
}		}
define i32 @sink_sub_from_const_to_add1(i32 %a, i32 %b, i32 %c) {		define i32 @sink_sub_from_const_to_add1(i32 %a, i32 %b, i32 %c) {
; X32-LABEL: sink_sub_from_const_to_add1:		; X32-LABEL: sink_sub_from_const_to_add1:
; X32: # %bb.0:		; X32: # %bb.0:
		; X32-NEXT: movl {{[0-9]+}}(%esp), %eax
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: addl {{[0-9]+}}(%esp), %ecx		; X32-NEXT: addl {{[0-9]+}}(%esp), %ecx
; X32-NEXT: movl $32, %eax
; X32-NEXT: subl %ecx, %eax		; X32-NEXT: subl %ecx, %eax
; X32-NEXT: addl {{[0-9]+}}(%esp), %eax		; X32-NEXT: addl $32, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: sink_sub_from_const_to_add1:		; X64-LABEL: sink_sub_from_const_to_add1:
; X64: # %bb.0:		; X64: # %bb.0:
		; X64-NEXT: # kill: def $edx killed $edx def $rdx
; X64-NEXT: addl %esi, %edi		; X64-NEXT: addl %esi, %edi
; X64-NEXT: movl $32, %eax		; X64-NEXT: subl %edi, %edx
; X64-NEXT: subl %edi, %eax		; X64-NEXT: leal 32(%rdx), %eax
; X64-NEXT: addl %edx, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = add i32 %a, %b		%t0 = add i32 %a, %b
%t1 = sub i32 32, %t0		%t1 = sub i32 32, %t0
%r = add i32 %c, %t1		%r = add i32 %c, %t1
ret i32 %r		ret i32 %r
}		}

; sub (add %x, C), %y		; sub (add %x, C), %y
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
; X32-NEXT: addl $32, %eax		; X32-NEXT: addl $32, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: sink_sub_of_const_to_sub2:		; X64-LABEL: sink_sub_of_const_to_sub2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: # kill: def $edx killed $edx def $rdx		; X64-NEXT: # kill: def $edx killed $edx def $rdx
; X64-NEXT: # kill: def $esi killed $esi def $rsi		; X64-NEXT: # kill: def $esi killed $esi def $rsi
; X64-NEXT: subl %edi, %esi		; X64-NEXT: subl %edi, %esi
; X64-NEXT: leal 32(%rsi,%rdx), %eax		; X64-NEXT: leal 32(%rdx,%rsi), %eax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = sub i32 %a, %b		%t0 = sub i32 %a, %b
%t1 = sub i32 %t0, 32		%t1 = sub i32 %t0, 32
%r = sub i32 %c, %t1		%r = sub i32 %c, %t1
ret i32 %r		ret i32 %r
}		}

; sub (sub C, %x), %y		; sub (sub C, %x), %y
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%r = add <4 x i32> %c, %t1		%r = add <4 x i32> %c, %t1
ret <4 x i32> %r		ret <4 x i32> %r
}		}

; add (sub C, %x), %y		; add (sub C, %x), %y
; Outer 'add' is commutative - 2 variants.		; Outer 'add' is commutative - 2 variants.

define <4 x i32> @vec_sink_sub_from_const_to_add0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define <4 x i32> @vec_sink_sub_from_const_to_add0(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; ALL-LABEL: vec_sink_sub_from_const_to_add0:		; X32-LABEL: vec_sink_sub_from_const_to_add0:
; ALL: # %bb.0:		; X32: # %bb.0:
; ALL-NEXT: paddd %xmm1, %xmm0		; X32-NEXT: paddd %xmm1, %xmm0
; ALL-NEXT: movdqa {{.*#+}} xmm1 = <42,24,u,46>		; X32-NEXT: psubd %xmm0, %xmm2
; ALL-NEXT: psubd %xmm0, %xmm1		; X32-NEXT: paddd {{\.LCPI.*}}, %xmm2
; ALL-NEXT: paddd %xmm2, %xmm1		; X32-NEXT: movdqa %xmm2, %xmm0
; ALL-NEXT: movdqa %xmm1, %xmm0		; X32-NEXT: retl
; ALL-NEXT: ret{{[l\|q]}}		;
		; X64-LABEL: vec_sink_sub_from_const_to_add0:
		; X64: # %bb.0:
		; X64-NEXT: paddd %xmm1, %xmm0
		; X64-NEXT: psubd %xmm0, %xmm2
		; X64-NEXT: paddd {{.*}}(%rip), %xmm2
		; X64-NEXT: movdqa %xmm2, %xmm0
		; X64-NEXT: retq
%t0 = add <4 x i32> %a, %b		%t0 = add <4 x i32> %a, %b
%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0		%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0
%r = add <4 x i32> %t1, %c		%r = add <4 x i32> %t1, %c
ret <4 x i32> %r		ret <4 x i32> %r
}		}
define <4 x i32> @vec_sink_sub_from_const_to_add1(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {		define <4 x i32> @vec_sink_sub_from_const_to_add1(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c) {
; ALL-LABEL: vec_sink_sub_from_const_to_add1:		; X32-LABEL: vec_sink_sub_from_const_to_add1:
; ALL: # %bb.0:		; X32: # %bb.0:
; ALL-NEXT: paddd %xmm1, %xmm0		; X32-NEXT: paddd %xmm1, %xmm0
; ALL-NEXT: movdqa {{.*#+}} xmm1 = <42,24,u,46>		; X32-NEXT: psubd %xmm0, %xmm2
; ALL-NEXT: psubd %xmm0, %xmm1		; X32-NEXT: paddd {{\.LCPI.*}}, %xmm2
; ALL-NEXT: paddd %xmm2, %xmm1		; X32-NEXT: movdqa %xmm2, %xmm0
; ALL-NEXT: movdqa %xmm1, %xmm0		; X32-NEXT: retl
; ALL-NEXT: ret{{[l\|q]}}		;
		; X64-LABEL: vec_sink_sub_from_const_to_add1:
		; X64: # %bb.0:
		; X64-NEXT: paddd %xmm1, %xmm0
		; X64-NEXT: psubd %xmm0, %xmm2
		; X64-NEXT: paddd {{.*}}(%rip), %xmm2
		; X64-NEXT: movdqa %xmm2, %xmm0
		; X64-NEXT: retq
%t0 = add <4 x i32> %a, %b		%t0 = add <4 x i32> %a, %b
%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0		%t1 = sub <4 x i32> <i32 42, i32 24, i32 undef, i32 46>, %t0
%r = add <4 x i32> %c, %t1		%r = add <4 x i32> %c, %t1
ret <4 x i32> %r		ret <4 x i32> %r
}		}

; sub (add %x, C), %y		; sub (add %x, C), %y
; sub %y, (add %x, C)		; sub %y, (add %x, C)
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C foldClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 202313

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/shift-amount-mod.ll

test/CodeGen/AArch64/sink-addsub-of-const.ll

test/CodeGen/ARM/addsubcarry-promotion.ll

test/CodeGen/X86/shift-amount-mod.ll

test/CodeGen/X86/sink-addsub-of-const.ll

[DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold
ClosedPublic