Not sure which way to go here - a 'not' is considered a less complex op than arbitrary xor/add, and 'not' might be better for value tracking and subsequent instcombines. But the adds are reassociable/commutable.

This might warrant a DAGCombiner patch to avoid regressions before we make the IR decision.

x86 scalar looks better with add+inc:

movl	%edi, %eax
notl	%esi
subl	%esi, %eax

vs.

leal	1(%rsi,%rdi), %eax

But AArch64 and PowerPC64LE appear to benefit from the 'not' with vector code:

mvn	v1.16b, v1.16b
sub	v0.4s, v0.4s, v1.4s

vs.

add	v0.4s, v1.4s, v0.4s
movi	v1.4s, #1
add	v0.4s, v0.4s, v1.4s

And PPC:

xxlnor 35, 35, 35
vsubuwm 2, 2, 3

vs.

vspltisw 4, 1
vadduwm 2, 3, 2
vadduwm 2, 2, 4

If we do go in this direction, do we want to increment 1st to reduce the burden on -reassociation? Assuming it will do:
(x + y) + 1 --> (x + 1) + y
...we might as well produce that directly?

cc'ing Eli in case he sees any other motivations to consider.

Yes, we should probably teach DAGCombine to choose the right form for each target/type.

It seems reasonable to prefer x+(y+1) over x-(-1-y), for reassociation etc. It's possible there could be some bad interaction at the interface between logical and arithmetic operations, which makes us miss some important optimization, but that doesn't seem likely to me.

In D63992#1565213, @efriedma wrote:

Yes, we should probably teach DAGCombine to choose the right form for each target/type.

It seems reasonable to prefer x+(y+1) over x-(-1-y), for reassociation etc. It's possible there could be some bad interaction at the interface between logical and arithmetic operations, which makes us miss some important optimization, but that doesn't seem likely to me.

Great, thank you for the feedback!
I will look into backend stuff; if there are no other concerns here please feel free to accept,
i'm not going to land until after the backend stuff is done.

In D63992#1565232, @lebedev.ri wrote:

In D63992#1565213, @efriedma wrote:

Yes, we should probably teach DAGCombine to choose the right form for each target/type.

It seems reasonable to prefer x+(y+1) over x-(-1-y), for reassociation etc. It's possible there could be some bad interaction at the interface between logical and arithmetic operations, which makes us miss some important optimization, but that doesn't seem likely to me.

Great, thank you for the feedback!
I will look into backend stuff; if there are no other concerns here please feel free to accept,
i'm not going to land until after the backend stuff is done.

Produce this:

(x + 1) + y

rather than:

(x + y) + 1

Mimic what -reassociate would produce.

LGTM

This revision is now accepted and ready to land.Jul 1 2019, 1:37 PM

Thank you for the review.

@RKSimon encountered https://bugs.llvm.org/show_bug.cgi?id=42486 while trying to write tests, PTAL.

Backend part: https://reviews.llvm.org/D64090

This revision was not accepted when it landed; it landed in state Changes Planned.Jul 3 2019, 2:43 AM

Closed by commit rL365011: [InstCombine] Y - ~X --> X + Y + 1 fold (PR42457) (authored by lebedevri). · Explain Why

This revision was automatically updated to reflect the committed changes.

Diffusion mentioned this in rL365010: [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457).

lebedev.ri mentioned this in rGc4b83a6054bb: [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457).Jul 3 2019, 2:43 AM

lebedev.ri mentioned this in D68408: [InstCombine] Negator - sink sinkable negations.Oct 3 2019, 10:28 AM

lebedev.ri mentioned this in rG352fef3f11f5: [InstCombine] Negator - sink sinkable negations.Apr 21 2020, 12:27 PM

Diff 207263

lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,571 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitSub(BinaryOperator &I) {
// (X + -1) - Y --> ~Y + X		// (X + -1) - Y --> ~Y + X
if (match(Op0, m_OneUse(m_Add(m_Value(X), m_AllOnes()))))		if (match(Op0, m_OneUse(m_Add(m_Value(X), m_AllOnes()))))
return BinaryOperator::CreateAdd(Builder.CreateNot(Op1), X);		return BinaryOperator::CreateAdd(Builder.CreateNot(Op1), X);

// Y - (X + 1) --> ~X + Y		// Y - (X + 1) --> ~X + Y
if (match(Op1, m_OneUse(m_Add(m_Value(X), m_One()))))		if (match(Op1, m_OneUse(m_Add(m_Value(X), m_One()))))
return BinaryOperator::CreateAdd(Builder.CreateNot(X), Op0);		return BinaryOperator::CreateAdd(Builder.CreateNot(X), Op0);

		// Y - ~X == X + Y + 1
		if (match(Op1, m_OneUse(m_Not(m_Value(X))))) {
		return BinaryOperator::CreateAdd(Builder.CreateAdd(Op0, X),
		ConstantInt::get(I.getType(), 1));
		}

if (Constant *C = dyn_cast<Constant>(Op0)) {		if (Constant *C = dyn_cast<Constant>(Op0)) {
bool IsNegate = match(C, m_ZeroInt());		bool IsNegate = match(C, m_ZeroInt());
Value *X;		Value *X;
if (match(Op1, m_ZExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1)) {		if (match(Op1, m_ZExt(m_Value(X))) && X->getType()->isIntOrIntVectorTy(1)) {
// 0 - (zext bool) --> sext bool		// 0 - (zext bool) --> sext bool
// C - (zext bool) --> bool ? C - 1 : C		// C - (zext bool) --> bool ? C - 1 : C
if (IsNegate)		if (IsNegate)
return CastInst::CreateSExtOrBitCast(X, I.getType());		return CastInst::CreateSExtOrBitCast(X, I.getType());
▲ Show 20 Lines • Show All 388 Lines • Show Last 20 Lines

test/Transforms/InstCombine/fold-sub-of-not-to-inc-of-add.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -instcombine -S < %s \| FileCheck %s

	; Given:			; Given:
	; sub %y, (xor %x, -1)			; sub %y, (xor %x, -1)
	; Transform it to:			; Transform it to:
	; add (add %x, %y), 1			; add (add %x, %y), 1

	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;
	; Scalar tests			; Scalar tests
	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;

	define i32 @p0_scalar(i32 %x, i32 %y) {			define i32 @p0_scalar(i32 %x, i32 %y) {
	; CHECK-LABEL: @p0_scalar(			; CHECK-LABEL: @p0_scalar(
	; CHECK-NEXT: [[T0:%.]] = xor i32 [[X:%.]], -1			; CHECK-NEXT: [[TMP1:%.]] = add i32 [[Y:%.]], [[X:%.*]]
	; CHECK-NEXT: [[T1:%.]] = sub i32 [[Y:%.]], [[T0]]			; CHECK-NEXT: [[T1:%.*]] = add i32 [[TMP1]], 1
	; CHECK-NEXT: ret i32 [[T1]]			; CHECK-NEXT: ret i32 [[T1]]
	;			;
	%t0 = xor i32 %x, -1			%t0 = xor i32 %x, -1
	%t1 = sub i32 %y, %t0			%t1 = sub i32 %y, %t0
	ret i32 %t1			ret i32 %t1
	}			}

	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;
	; Vector tests			; Vector tests
	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;

	define <4 x i32> @p1_vector_splat(<4 x i32> %x, <4 x i32> %y) {			define <4 x i32> @p1_vector_splat(<4 x i32> %x, <4 x i32> %y) {
	; CHECK-LABEL: @p1_vector_splat(			; CHECK-LABEL: @p1_vector_splat(
	; CHECK-NEXT: [[T0:%.]] = xor <4 x i32> [[X:%.]], <i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[Y:%.]], [[X:%.*]]
	; CHECK-NEXT: [[T1:%.]] = sub <4 x i32> [[Y:%.]], [[T0]]			; CHECK-NEXT: [[T1:%.*]] = add <4 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: ret <4 x i32> [[T1]]			; CHECK-NEXT: ret <4 x i32> [[T1]]
	;			;
	%t0 = xor <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>			%t0 = xor <4 x i32> %x, <i32 -1, i32 -1, i32 -1, i32 -1>
	%t1 = sub <4 x i32> %y, %t0			%t1 = sub <4 x i32> %y, %t0
	ret <4 x i32> %t1			ret <4 x i32> %t1
	}			}

	define <4 x i32> @p2_vector_undef(<4 x i32> %x, <4 x i32> %y) {			define <4 x i32> @p2_vector_undef(<4 x i32> %x, <4 x i32> %y) {
	; CHECK-LABEL: @p2_vector_undef(			; CHECK-LABEL: @p2_vector_undef(
	; CHECK-NEXT: [[T0:%.]] = xor <4 x i32> [[X:%.]], <i32 -1, i32 -1, i32 undef, i32 -1>			; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[Y:%.]], [[X:%.*]]
	; CHECK-NEXT: [[T1:%.]] = sub <4 x i32> [[Y:%.]], [[T0]]			; CHECK-NEXT: [[T1:%.*]] = add <4 x i32> [[TMP1]], <i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: ret <4 x i32> [[T1]]			; CHECK-NEXT: ret <4 x i32> [[T1]]
	;			;
	%t0 = xor <4 x i32> %x, <i32 -1, i32 -1, i32 undef, i32 -1>			%t0 = xor <4 x i32> %x, <i32 -1, i32 -1, i32 undef, i32 -1>
	%t1 = sub <4 x i32> %y, %t0			%t1 = sub <4 x i32> %y, %t0
	ret <4 x i32> %t1			ret <4 x i32> %t1
	}			}

	;------------------------------------------------------------------------------;			;------------------------------------------------------------------------------;
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Transforms/InstCombine/sub-minmax.ll

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
}		}


define i32 @max_na_bi_minux_na_use(i32 %A, i32 %Bi) {		define i32 @max_na_bi_minux_na_use(i32 %A, i32 %Bi) {
; CHECK-LABEL: @max_na_bi_minux_na_use(		; CHECK-LABEL: @max_na_bi_minux_na_use(
; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i32 [[A:%.]], -32		; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i32 [[A:%.]], -32
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[A]], i32 -32		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 [[A]], i32 -32
; CHECK-NEXT: [[L1:%.*]] = xor i32 [[TMP2]], -1		; CHECK-NEXT: [[L1:%.*]] = xor i32 [[TMP2]], -1
; CHECK-NEXT: [[X:%.*]] = sub i32 [[A]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[L1]], [[A]]
		; CHECK-NEXT: [[X:%.*]] = add i32 [[TMP3]], 1
; CHECK-NEXT: call void @use32(i32 [[L1]])		; CHECK-NEXT: call void @use32(i32 [[L1]])
; CHECK-NEXT: ret i32 [[X]]		; CHECK-NEXT: ret i32 [[X]]
;		;
%not = xor i32 %A, -1		%not = xor i32 %A, -1
%l0 = icmp ult i32 %not, 31		%l0 = icmp ult i32 %not, 31
%l1 = select i1 %l0, i32 %not, i32 31		%l1 = select i1 %l0, i32 %not, i32 31
%x = sub i32 %l1, %not		%x = sub i32 %l1, %not
call void @use32(i32 %l1)		call void @use32(i32 %l1)
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	;
ret i8 %minxy		ret i8 %minxy
}		}

define i8 @umin_not_sub_rev(i8 %x, i8 %y) {		define i8 @umin_not_sub_rev(i8 %x, i8 %y) {
; CHECK-LABEL: @umin_not_sub_rev(		; CHECK-LABEL: @umin_not_sub_rev(
; CHECK-NEXT: [[TMP1:%.]] = icmp ult i8 [[Y:%.]], [[X:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = icmp ult i8 [[Y:%.]], [[X:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Y]]		; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i8 [[X]], i8 [[Y]]
; CHECK-NEXT: [[MINXY:%.*]] = xor i8 [[TMP2]], -1		; CHECK-NEXT: [[MINXY:%.*]] = xor i8 [[TMP2]], -1
; CHECK-NEXT: [[SUBX:%.*]] = sub i8 [[X]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add i8 [[MINXY]], [[X]]
		; CHECK-NEXT: [[SUBX:%.*]] = add i8 [[TMP3]], 1
; CHECK-NEXT: [[SUBY:%.*]] = sub i8 [[Y]], [[TMP2]]		; CHECK-NEXT: [[SUBY:%.*]] = sub i8 [[Y]], [[TMP2]]
; CHECK-NEXT: call void @use8(i8 [[SUBX]])		; CHECK-NEXT: call void @use8(i8 [[SUBX]])
; CHECK-NEXT: call void @use8(i8 [[SUBY]])		; CHECK-NEXT: call void @use8(i8 [[SUBY]])
; CHECK-NEXT: ret i8 [[MINXY]]		; CHECK-NEXT: ret i8 [[MINXY]]
;		;
%nx = xor i8 %x, -1		%nx = xor i8 %x, -1
%ny = xor i8 %y, -1		%ny = xor i8 %y, -1
%cmpxy = icmp ult i8 %nx, %ny		%cmpxy = icmp ult i8 %nx, %ny
▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Y - ~X --> X + Y + 1 fold (PR42457)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207263

lib/Transforms/InstCombine/InstCombineAddSub.cpp

test/Transforms/InstCombine/fold-sub-of-not-to-inc-of-add.ll

test/Transforms/InstCombine/sub-minmax.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Y - ~X --> X + Y + 1 fold (PR42457)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207263

lib/Transforms/InstCombine/InstCombineAddSub.cpp

test/Transforms/InstCombine/fold-sub-of-not-to-inc-of-add.ll

test/Transforms/InstCombine/sub-minmax.ll

[InstCombine] Y - ~X --> X + Y + 1 fold (PR42457)
ClosedPublic