This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
addsubcarry-promotion.ll
-
X86/
-
subcarry.ll

Differential D62392

[DAGCombine][ARM] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold
ClosedPublic

Authored by lebedev.ri on May 24 2019, 7:37 AM.

Download Raw Diff

Details

Reviewers

efriedma
deadalnix

Commits

rGc00f3182243d: [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold
rL372259: [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold

Summary

DAGCombiner::visitADDLikeCommutative() already has a sibling fold: (add X, Carry) -> (addcarry X, 0, Carry)
This fold, as suggested by @efriedma, helps recover from some of the regressions of D62266

Diff Detail

Repository: rL LLVM

Event Timeline

lebedev.ri created this revision.May 24 2019, 7:37 AM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptMay 24 2019, 7:37 AM

lebedev.ri added a parent revision: D62266: [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold.May 24 2019, 7:37 AM

lebedev.ri added a child revision: D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.

lebedev.ri mentioned this in D62266: [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold.May 24 2019, 7:42 AM

It looks like we don't have great test coverage here. In general, we probably need to restrict getAsCarry depending on whether the intended use is an an addition or subtraction, at least for ARM.

test/CodeGen/ARM/addsubcarry-promotion.ll
28 ↗	(On Diff #201239)	The redundant sxth here is unfortunate, but probably orthogonal. Maybe a missing SimplifyDemandedBits case?

In D62392#1516829, @efriedma wrote:

It looks like we don't have great test coverage here.

Yes, indeed, that is the only affected test in the entire check-llvm-codegen.

In D62392#1516829, @efriedma wrote:

In general, we probably need to restrict getAsCarry depending on whether the intended use is an an addition or subtraction, at least for ARM.

I'm sorry, but i'm again not sure what that means (:
Restrict how?

I mean, we basically need two methods: getAsAddCarry (carry produced by UADDO/ADDCARRY) and getAsSubCarry (carry produced by USUBO/SUBCARRY), depending on how you're planning to use the carry bit, so we don't end up with same sort of mismatch we ran into before on ARM.

lebedev.ri added a child revision: D62450: [DAGCombine][ARM] x ==/!= c -> (x - c) ==/!= 0 iff '-c' can be folded into the x node..May 25 2019, 10:23 AM

lebedev.ri removed a child revision: D62294: [DAGCombine] (x - C) - y -> (x - y) - C fold.

In D62392#1516927, @efriedma wrote:

I mean, we basically need two methods: getAsAddCarry (carry produced by UADDO/ADDCARRY) and getAsSubCarry (carry produced by USUBO/SUBCARRY), depending on how you're planning to use the carry bit, so we don't end up with same sort of mismatch we ran into before on ARM.

Thank you for the explanation.

Is that required to be part of the follow-up fixes for D62266?
It sounds, while not unrelated, not exactly specific to the regressions there.
I don't know much about these 'carry' ops, so unless it's required to
fix those regressions, it might not be best for me to look into that..

Rebased, ping @efriedma.

If you don't want to continue working on this, that's okay, I think. D62450 looks like it's the more interesting fix.

In D62392#1525650, @efriedma wrote:

If you don't want to continue working on this, that's okay, I think.

I'm just really unfamiliar with carry-nodes (and with ARM),
so it is not obvious to me what needs fixing, and in what way,
if there are no existing tests that regress due to my change.

In D62392#1525650, @efriedma wrote:

D62450 looks like it's the more interesting fix.

These two are intertwined.
D62450 folds constant from setcc into parent addcarry/subcarry.
But without this patch there is no addcarry/subcarry...

Oh, I see...

I'll see if I can find some time to add some more testcases to the tree.

In D62392#1525658, @efriedma wrote:

Oh, I see...

I'll see if I can find some time to add some more testcases to the tree.

Yes, please do, that would be helpful.

Diffusion mentioned this in rL362487: [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold.Jun 4 2019, 4:03 AM

lebedev.ri mentioned this in rGbe6ce7b3f225: [DAGCombine][X86][AArch64][ARM] (C - x) + y -> (y - x) + C fold.Jun 4 2019, 4:08 AM

lebedev.ri added a reviewer: deadalnix.Jul 2 2019, 4:30 PM

Maybe it is worth adding some platform dependent check to actually make sure turning the carry into a scalar is expensive? Or is it a reasonable assumption to make that it expensive on all plateforms?

In any case, this is more likely than not that this will optimize better down the road anyways, so maybe, if such plateform exist, we may want to delegate the cleanup to plateform specific transforms.

I think it would be beneficial to have an X86 test case for this pattern.

In D62392#1587798, @deadalnix wrote:

Maybe it is worth adding some platform dependent check to actually make sure turning the carry into a scalar is expensive? Or is it a reasonable assumption to make that it expensive on all plateforms?

I honestly don't know the answers to these questions, carry nodes are not my strong side,
If you want (given your recent carry patches), you can totally take this patch over.

In any case, this is more likely than not that this will optimize better down the road anyways, so maybe, if such plateform exist, we may want to delegate the cleanup to plateform specific transforms.

I think it would be beneficial to have an X86 test case for this pattern.

@lebedev.ri I think these are reasonable assumptions. Just add a test case for X86 and this is good to go as far as I'm concerned.

Ping ?

In D62392#1592596, @deadalnix wrote:

@lebedev.ri I think these are reasonable assumptions. Just add a test case for X86 and this is good to go as far as I'm concerned.

Added x86 test and landed.

lebedev.ri mentioned this in rGa042aa1d829b: [CodeGen][X86][NFC] Tests for (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry)….Sep 18 2019, 1:51 PM

This revision was not accepted when it landed; it landed in state Needs Review.Sep 18 2019, 1:51 PM

Closed by commit rL372259: [DAGCombine][ARM][X86] (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry) fold (authored by lebedevri). · Explain Why

This revision was automatically updated to reflect the committed changes.

Diffusion mentioned this in rL372258: [CodeGen][X86][NFC] Tests for (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry)….

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

12 lines

test/

CodeGen/

ARM/

addsubcarry-promotion.ll

26 lines

X86/

subcarry.ll

6 lines

Diff 220745

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,378 Lines • ▼ Show 20 Lines	if (!LegalOperations && N1.getOpcode() == ISD::SRL && N1.hasOneUse()) {
ConstantSDNode *ShAmtC = isConstOrConstSplat(ShAmt);		ConstantSDNode *ShAmtC = isConstOrConstSplat(ShAmt);
if (ShAmtC &&		if (ShAmtC &&
ShAmtC->getAPIntValue() == (N1.getScalarValueSizeInBits() - 1)) {		ShAmtC->getAPIntValue() == (N1.getScalarValueSizeInBits() - 1)) {
SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, N1.getOperand(0), ShAmt);		SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, N1.getOperand(0), ShAmt);
return DAG.getNode(ISD::ADD, DL, VT, N0, SRA);		return DAG.getNode(ISD::ADD, DL, VT, N0, SRA);
}		}
}		}

		if (TLI.isOperationLegalOrCustom(ISD::ADDCARRY, VT)) {
		// (sub Carry, X) -> (addcarry (sub 0, X), 0, Carry)
		if (SDValue Carry = getAsCarry(TLI, N0)) {
		SDValue X = N1;
		SDValue Zero = DAG.getConstant(0, DL, VT);
		SDValue NegX = DAG.getNode(ISD::SUB, DL, VT, Zero, X);
		return DAG.getNode(ISD::ADDCARRY, DL,
		DAG.getVTList(VT, Carry.getValueType()), NegX, Zero,
		Carry);
		}
		}

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSUBSAT(SDNode *N) {		SDValue DAGCombiner::visitSUBSAT(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
SDLoc DL(N);		SDLoc DL(N);
▲ Show 20 Lines • Show All 17,573 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/addsubcarry-promotion.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -O2 -mtriple armv7a < %s \| FileCheck --check-prefixes=ARM,ARMV7A %s			; RUN: llc -O2 -mtriple armv7a < %s \| FileCheck --check-prefixes=ARM,ARMV7A %s

	; RUN: llc -O2 -mtriple thumbv6m < %s \| FileCheck --check-prefixes=THUMB1,THUMBV6M %s			; RUN: llc -O2 -mtriple thumbv6m < %s \| FileCheck --check-prefixes=THUMB1,THUMBV6M %s
	; RUN: llc -O2 -mtriple thumbv8m.base < %s \| FileCheck --check-prefixes=THUMB1,THUMBV8M-BASE %s			; RUN: llc -O2 -mtriple thumbv8m.base < %s \| FileCheck --check-prefixes=THUMB1,THUMBV8M-BASE %s

	; RUN: llc -O2 -mtriple thumbv7a < %s \| FileCheck --check-prefixes=THUMB,THUMBV7A %s			; RUN: llc -O2 -mtriple thumbv7a < %s \| FileCheck --check-prefixes=THUMB,THUMBV7A %s
	; RUN: llc -O2 -mtriple thumbv8m.main < %s \| FileCheck --check-prefixes=THUMB,THUMBV8M-MAIN %s			; RUN: llc -O2 -mtriple thumbv8m.main < %s \| FileCheck --check-prefixes=THUMB,THUMBV8M-MAIN %s

	define void @fn1(i32 %a, i32 %b, i32 %c) local_unnamed_addr #0 {			define void @fn1(i32 %a, i32 %b, i32 %c) local_unnamed_addr #0 {
	; ARM-LABEL: fn1:			; ARM-LABEL: fn1:
	; ARM: @ %bb.0: @ %entry			; ARM: @ %bb.0: @ %entry
				; ARM-NEXT: rsb r2, r2, #0
	; ARM-NEXT: adds r0, r1, r0			; ARM-NEXT: adds r0, r1, r0
	; ARM-NEXT: mov r3, #0
	; ARM-NEXT: adc r0, r3, #0
	; ARM-NEXT: movw r1, #65535			; ARM-NEXT: movw r1, #65535
	; ARM-NEXT: sub r0, r0, r2			; ARM-NEXT: sxth r2, r2
				; ARM-NEXT: adc r0, r2, #0
	; ARM-NEXT: uxth r0, r0			; ARM-NEXT: uxth r0, r0
	; ARM-NEXT: cmp r0, r1			; ARM-NEXT: cmp r0, r1
	; ARM-NEXT: bxeq lr			; ARM-NEXT: bxeq lr
	; ARM-NEXT: .LBB0_1: @ %for.cond			; ARM-NEXT: .LBB0_1: @ %for.cond
	; ARM-NEXT: @ =>This Inner Loop Header: Depth=1			; ARM-NEXT: @ =>This Inner Loop Header: Depth=1
	; ARM-NEXT: b .LBB0_1			; ARM-NEXT: b .LBB0_1
	;			;
	; THUMBV6M-LABEL: fn1:			; THUMBV6M-LABEL: fn1:
	; THUMBV6M: @ %bb.0: @ %entry			; THUMBV6M: @ %bb.0: @ %entry
				; THUMBV6M-NEXT: rsbs r2, r2, #0
				; THUMBV6M-NEXT: sxth r2, r2
	; THUMBV6M-NEXT: movs r3, #0			; THUMBV6M-NEXT: movs r3, #0
	; THUMBV6M-NEXT: adds r0, r1, r0			; THUMBV6M-NEXT: adds r0, r1, r0
	; THUMBV6M-NEXT: adcs r3, r3			; THUMBV6M-NEXT: adcs r3, r2
	; THUMBV6M-NEXT: subs r0, r3, r2			; THUMBV6M-NEXT: uxth r0, r3
	; THUMBV6M-NEXT: uxth r0, r0
	; THUMBV6M-NEXT: ldr r1, .LCPI0_0			; THUMBV6M-NEXT: ldr r1, .LCPI0_0
	; THUMBV6M-NEXT: cmp r0, r1			; THUMBV6M-NEXT: cmp r0, r1
	; THUMBV6M-NEXT: beq .LBB0_2			; THUMBV6M-NEXT: beq .LBB0_2
	; THUMBV6M-NEXT: .LBB0_1: @ %for.cond			; THUMBV6M-NEXT: .LBB0_1: @ %for.cond
	; THUMBV6M-NEXT: @ =>This Inner Loop Header: Depth=1			; THUMBV6M-NEXT: @ =>This Inner Loop Header: Depth=1
	; THUMBV6M-NEXT: b .LBB0_1			; THUMBV6M-NEXT: b .LBB0_1
	; THUMBV6M-NEXT: .LBB0_2: @ %if.end			; THUMBV6M-NEXT: .LBB0_2: @ %if.end
	; THUMBV6M-NEXT: bx lr			; THUMBV6M-NEXT: bx lr
	; THUMBV6M-NEXT: .p2align 2			; THUMBV6M-NEXT: .p2align 2
	; THUMBV6M-NEXT: @ %bb.3:			; THUMBV6M-NEXT: @ %bb.3:
	; THUMBV6M-NEXT: .LCPI0_0:			; THUMBV6M-NEXT: .LCPI0_0:
	; THUMBV6M-NEXT: .long 65535 @ 0xffff			; THUMBV6M-NEXT: .long 65535 @ 0xffff
	;			;
	; THUMBV8M-BASE-LABEL: fn1:			; THUMBV8M-BASE-LABEL: fn1:
	; THUMBV8M-BASE: @ %bb.0: @ %entry			; THUMBV8M-BASE: @ %bb.0: @ %entry
				; THUMBV8M-BASE-NEXT: rsbs r2, r2, #0
				; THUMBV8M-BASE-NEXT: sxth r2, r2
	; THUMBV8M-BASE-NEXT: movs r3, #0			; THUMBV8M-BASE-NEXT: movs r3, #0
	; THUMBV8M-BASE-NEXT: adds r0, r1, r0			; THUMBV8M-BASE-NEXT: adds r0, r1, r0
	; THUMBV8M-BASE-NEXT: adcs r3, r3			; THUMBV8M-BASE-NEXT: adcs r3, r2
	; THUMBV8M-BASE-NEXT: subs r0, r3, r2			; THUMBV8M-BASE-NEXT: uxth r0, r3
	; THUMBV8M-BASE-NEXT: uxth r0, r0
	; THUMBV8M-BASE-NEXT: movw r1, #65535			; THUMBV8M-BASE-NEXT: movw r1, #65535
	; THUMBV8M-BASE-NEXT: cmp r0, r1			; THUMBV8M-BASE-NEXT: cmp r0, r1
	; THUMBV8M-BASE-NEXT: beq .LBB0_2			; THUMBV8M-BASE-NEXT: beq .LBB0_2
	; THUMBV8M-BASE-NEXT: .LBB0_1: @ %for.cond			; THUMBV8M-BASE-NEXT: .LBB0_1: @ %for.cond
	; THUMBV8M-BASE-NEXT: @ =>This Inner Loop Header: Depth=1			; THUMBV8M-BASE-NEXT: @ =>This Inner Loop Header: Depth=1
	; THUMBV8M-BASE-NEXT: b .LBB0_1			; THUMBV8M-BASE-NEXT: b .LBB0_1
	; THUMBV8M-BASE-NEXT: .LBB0_2: @ %if.end			; THUMBV8M-BASE-NEXT: .LBB0_2: @ %if.end
	; THUMBV8M-BASE-NEXT: bx lr			; THUMBV8M-BASE-NEXT: bx lr
	;			;
	; THUMB-LABEL: fn1:			; THUMB-LABEL: fn1:
	; THUMB: @ %bb.0: @ %entry			; THUMB: @ %bb.0: @ %entry
				; THUMB-NEXT: rsbs r2, r2, #0
	; THUMB-NEXT: adds r0, r0, r1			; THUMB-NEXT: adds r0, r0, r1
	; THUMB-NEXT: mov.w r3, #0
	; THUMB-NEXT: adc r0, r3, #0
	; THUMB-NEXT: movw r1, #65535			; THUMB-NEXT: movw r1, #65535
	; THUMB-NEXT: subs r0, r0, r2			; THUMB-NEXT: sxth r2, r2
				; THUMB-NEXT: adc r0, r2, #0
	; THUMB-NEXT: uxth r0, r0			; THUMB-NEXT: uxth r0, r0
	; THUMB-NEXT: cmp r0, r1			; THUMB-NEXT: cmp r0, r1
	; THUMB-NEXT: it eq			; THUMB-NEXT: it eq
	; THUMB-NEXT: bxeq lr			; THUMB-NEXT: bxeq lr
	; THUMB-NEXT: .LBB0_1: @ %for.cond			; THUMB-NEXT: .LBB0_1: @ %for.cond
	; THUMB-NEXT: @ =>This Inner Loop Header: Depth=1			; THUMB-NEXT: @ =>This Inner Loop Header: Depth=1
	; THUMB-NEXT: b .LBB0_1			; THUMB-NEXT: b .LBB0_1
	entry:			entry:
	Show All 18 Lines

llvm/trunk/test/CodeGen/X86/subcarry.ll

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	}			}

	declare {i64, i1} @llvm.uadd.with.overflow(i64, i64)			declare {i64, i1} @llvm.uadd.with.overflow(i64, i64)
	declare {i64, i1} @llvm.usub.with.overflow(i64, i64)			declare {i64, i1} @llvm.usub.with.overflow(i64, i64)

	define i64 @sub_from_carry(i64 %x, i64 %y, i64* %valout, i64 %z) {			define i64 @sub_from_carry(i64 %x, i64 %y, i64* %valout, i64 %z) {
	; CHECK-LABEL: sub_from_carry:			; CHECK-LABEL: sub_from_carry:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: movq %rcx, %rax
				; CHECK-NEXT: negq %rax
	; CHECK-NEXT: addq %rsi, %rdi			; CHECK-NEXT: addq %rsi, %rdi
	; CHECK-NEXT: setb %al
	; CHECK-NEXT: movq %rdi, (%rdx)			; CHECK-NEXT: movq %rdi, (%rdx)
	; CHECK-NEXT: subq %rcx, %rax			; CHECK-NEXT: adcq $0, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%agg = call {i64, i1} @llvm.uadd.with.overflow(i64 %x, i64 %y)			%agg = call {i64, i1} @llvm.uadd.with.overflow(i64 %x, i64 %y)
	%val = extractvalue {i64, i1} %agg, 0			%val = extractvalue {i64, i1} %agg, 0
	%ov = extractvalue {i64, i1} %agg, 1			%ov = extractvalue {i64, i1} %agg, 1
	store i64 %val, i64* %valout, align 4			store i64 %val, i64* %valout, align 4
	%carry = zext i1 %ov to i64			%carry = zext i1 %ov to i64
	%res = sub i64 %carry, %z			%res = sub i64 %carry, %z
	ret i64 %res			ret i64 %res
	}			}