This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
sadd-with-overflow.ll

Differential D59071

[Transform] Improve saddo with mixed signs
AbandonedPublic

Authored by dlrobertson on Mar 6 2019, 7:09 PM.

Download Raw Diff

Details

Reviewers

nikic
spatel

Summary

Improve folding of the sadd.with.overflow intrinsic with add nsw
when given constants of mixed signs.

Diff Detail

Event Timeline

dlrobertson created this revision.Mar 6 2019, 7:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2019, 7:09 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

nikic added inline comments.Mar 7 2019, 12:17 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2060	The use of abs here makes me uncomfortable... abs(INT_MIN) is INT_MIN again. So if we take `saddo(X +nsw INT_MIN, 1)`, then the check here is `abs(INT_MIN) s< abs(1)` which is `INT_MIN s< 1`, which is true. While this will ultimately give a conservatively correct result, you'd prefer this case to take the other branch.

Generally I feel that the optimization we want to go for here is a more general one. We already detect always/never overflow conditions based on known bits. This is a variation that detects never overflow based on the constant range of the input, for one specific case. We have generic code in InstructionSimplify to determine the ConstantRange of binary operators and intrinsics (setLimitsForBinOp and setLimitsForIntrinsic) for cases where this can be done cheaply (inspecting only one instruction).

I'm thinking that we should generalize that code to take any Instruction, move it as an exported API into ValueTracking and then do a generic ConstantRange based always/never overflow check here. Does that sound reasonable @spatel?

In D59071#1421411, @nikic wrote:

Generally I feel that the optimization we want to go for here is a more general one. We already detect always/never overflow conditions based on known bits. This is a variation that detects never overflow based on the constant range of the input, for one specific case. We have generic code in InstructionSimplify to determine the ConstantRange of binary operators and intrinsics (setLimitsForBinOp and setLimitsForIntrinsic) for cases where this can be done cheaply (inspecting only one instruction).

I'm thinking that we should generalize that code to take any Instruction, move it as an exported API into ValueTracking and then do a generic ConstantRange based always/never overflow check here. Does that sound reasonable @spatel?

Yes, if we can use the same/similar code for more general transforms, it makes sense to move it to ValueTracking.

In D59071#1421411, @nikic wrote:

I'm thinking that we should generalize that code to take any Instruction, move it as an exported API into ValueTracking and then do a generic ConstantRange based always/never overflow check here.

So do you mean update the computeKnownBitsFromOperator for sadd_with_overflow to also take into account the overflow flag?

nikic mentioned this in rL355781: [ValueTracking] Move constant range computation into ValueTracking; NFC.Mar 9 2019, 1:17 PM

nikic mentioned this in rG490975979bee: [ValueTracking] Move constant range computation into ValueTracking; NFC.

Moved constant range calculation into ValueTracking with rL355781.

In D59071#1422370, @dlrobertson wrote:

So do you mean update the computeKnownBitsFromOperator for sadd_with_overflow to also take into account the overflow flag?

Based on the previous commit, you can call computeConstantRange() on the LHS, and then check whether adding the RHS constant to that range can ever overflow. Assuming a range of Lo <= X <= Hi and constant C: If C >= 0 then an overflow cannot occur if Hi <= SignedMax - C and always occurs if Lo > SignedMax -C. If C < 0 then an overflow cannot occur if Lo >= SignedMin - C and always occurs if Hi < SignedMin - C.

To go one step further, rather than limiting this optimization to just the with.overflow intrinsics, this could be done in computeOverflowForSignedAdd (and friends), as this would also benefit saturated math intrinsics, as well as nsw/nuw inference on normal adds. (It might make sense to start with computeOverflowForUnsignedAdd, as the logic for unsigned is simpler.)

nikic mentioned this in D59193: [ConstantRange] Add overflow check helpers.Mar 10 2019, 1:41 PM

I've created D59193 to implement just the raw overflow checking logic based on ConstantRanges (without using it in ValueTracking).

rkruppe added a subscriber: rkruppe.Mar 10 2019, 3:41 PM

D59386 implements the full optimization for the unsigned case, and D59450 is preparation for the signed add case, though there's a few more followups necessary there (for signed sub, for extension to use computeConstantRange, and for handling AlwaysOverflows).

In D59071#1431908, @nikic wrote:

D59386 implements the full optimization for the unsigned case, and D59450 is preparation for the signed add case, though there's a few more followups necessary there (for signed sub, for extension to use computeConstantRange, and for handling AlwaysOverflows).

Awesome! I'll move on to uaddo

nikic mentioned this in D60420: [ValueTracking] Use computeConstantRange() for signed add overflow determination.Apr 8 2019, 1:09 PM

nikic mentioned this in rL358014: [ValueTracking] Use computeConstantRange() in signed add overflow determination.Apr 9 2019, 9:11 AM

nikic mentioned this in rG10edd2b79d04: [ValueTracking] Use computeConstantRange() in signed add overflow determination.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

33 lines

test/

Transforms/

InstCombine/

sadd-with-overflow.ll

19 lines

Diff 189641

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 2,029 Lines • ▼ Show 20 Lines	case Intrinsic::fshr: {
break;		break;
}		}
case Intrinsic::sadd_with_overflow: {		case Intrinsic::sadd_with_overflow: {
if (Instruction *I = canonicalizeConstantArg0ToArg1(CI))		if (Instruction *I = canonicalizeConstantArg0ToArg1(CI))
return I;		return I;
if (Instruction *I = foldIntrinsicWithOverflowCommon(II))		if (Instruction *I = foldIntrinsicWithOverflowCommon(II))
return I;		return I;

// Given 2 constant operands whose sum does not overflow:		// Given 2 constant operands whose sum does not overflow the intrinsic
// saddo (X +nsw C0), C1 -> saddo X, C0 + C1		// may be folded.
		//
		// Cases for `saddo (X +nsw C0), C1 -> saddo X, C0 + C1`:
		//
		// - Same sign can be folded to a single `saddo` call.
		// - Opposite signs with `\|C1\| > \|C0\|` can be folded to a single `saddo`
		// call.
		//
		// Cases for `saddo (X +nsw C0), C1 -> X +nsw (C0 + C1), false`:
		//
		// - Opposite signs with `\|C1\| <= \|C0\|` can be folded to a single `+nsw`
		// call with the overflow flag set to false.
Value *X;		Value *X;
const APInt C0, C1;		const APInt C0, C1;
Value *Arg0 = II->getArgOperand(0);		Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);		Value *Arg1 = II->getArgOperand(1);
if (match(Arg0, m_NSWAdd(m_Value(X), m_APInt(C0))) &&		if (match(Arg0, m_NSWAdd(m_Value(X), m_APInt(C0))) &&
match(Arg1, m_APInt(C1))) {		match(Arg1, m_APInt(C1))) {
bool Overflow;		bool Overflow;
APInt NewC = C1->sadd_ov(*C0, Overflow);		APInt NewC = C1->sadd_ov(*C0, Overflow);
if (!Overflow)		if (!Overflow) {
		if (C0->isNegative() == C1->isNegative() \|\| C0->abs().slt(C1->abs())) {
		nikicUnsubmitted Not Done Reply Inline Actions The use of abs here makes me uncomfortable... abs(INT_MIN) is INT_MIN again. So if we take `saddo(X +nsw INT_MIN, 1)`, then the check here is `abs(INT_MIN) s< abs(1)` which is `INT_MIN s< 1`, which is true. While this will ultimately give a conservatively correct result, you'd prefer this case to take the other branch. nikic: The use of abs here makes me uncomfortable... abs(INT_MIN) is INT_MIN again. So if we take…
return replaceInstUsesWith(		return replaceInstUsesWith(
*II, Builder.CreateBinaryIntrinsic(		*II, Builder.CreateBinaryIntrinsic(
Intrinsic::sadd_with_overflow, X,		Intrinsic::sadd_with_overflow, X,
ConstantInt::get(Arg1->getType(), NewC)));		ConstantInt::get(Arg1->getType(), NewC)));
		} else {
		return CreateOverflowTuple(
		II,
		Builder.CreateNSWAdd(X, ConstantInt::get(Arg1->getType(), NewC)),
		Builder.getFalse());
		}
		}
}		}

break;		break;
}		}
case Intrinsic::uadd_with_overflow:		case Intrinsic::uadd_with_overflow:
case Intrinsic::umul_with_overflow:		case Intrinsic::umul_with_overflow:
case Intrinsic::smul_with_overflow:		case Intrinsic::smul_with_overflow:
if (Instruction *I = canonicalizeConstantArg0ToArg1(CI))		if (Instruction *I = canonicalizeConstantArg0ToArg1(CI))
▲ Show 20 Lines • Show All 2,687 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/sadd-with-overflow.ll

	Show All 11 Lines
	; CHECK-NEXT: [[TMP2:%.]] = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 [[TMP0:%.]], i32 20)			; CHECK-NEXT: [[TMP2:%.]] = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 [[TMP0:%.]], i32 20)
	; CHECK-NEXT: ret { i32, i1 } [[TMP2]]			; CHECK-NEXT: ret { i32, i1 } [[TMP2]]
	;			;
	%2 = add nsw i32 %0, 7			%2 = add nsw i32 %0, 7
	%3 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %2, i32 13)			%3 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %2, i32 13)
	ret { i32, i1 } %3			ret { i32, i1 } %3
	}			}

	define { i32, i1 } @fold_mixed_signs(i32) {			define { i32, i1 } @fold_mixed_signs_first_high(i32) {
	; CHECK-LABEL: @fold_mixed_signs(			; CHECK-LABEL: @fold_mixed_signs_first_high(
	; CHECK-NEXT: [[TMP2:%.]] = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 [[TMP0:%.]], i32 6)			; CHECK-NEXT: [[TMP2:%.]] = add nsw i32 [[TMP0:%.]], 6
	; CHECK-NEXT: ret { i32, i1 } [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = insertvalue { i32, i1 } { i32 undef, i1 false }, i32 [[TMP2]], 0
				; CHECK-NEXT: ret { i32, i1 } [[TMP3]]
	;			;
	%2 = add nsw i32 %0, 13			%2 = add nsw i32 %0, 13
	%3 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %2, i32 -7)			%3 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %2, i32 -7)
	ret { i32, i1 } %3			ret { i32, i1 } %3
	}			}

				define { i32, i1 } @fold_mixed_signs_second_high(i32) {
				; CHECK-LABEL: @fold_mixed_signs_second_high(
				; CHECK-NEXT: [[TMP2:%.]] = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 [[TMP0:%.]], i32 -6)
				; CHECK-NEXT: ret { i32, i1 } [[TMP2]]
				;
				%2 = add nsw i32 %0, 7
				%3 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %2, i32 -13)
				ret { i32, i1 } %3
				}

	define { i8, i1 } @fold_on_constant_add_no_overflow(i8) {			define { i8, i1 } @fold_on_constant_add_no_overflow(i8) {
	; CHECK-LABEL: @fold_on_constant_add_no_overflow(			; CHECK-LABEL: @fold_on_constant_add_no_overflow(
	; CHECK-NEXT: [[TMP2:%.]] = call { i8, i1 } @llvm.sadd.with.overflow.i8(i8 [[TMP0:%.]], i8 127)			; CHECK-NEXT: [[TMP2:%.]] = call { i8, i1 } @llvm.sadd.with.overflow.i8(i8 [[TMP0:%.]], i8 127)
	; CHECK-NEXT: ret { i8, i1 } [[TMP2]]			; CHECK-NEXT: ret { i8, i1 } [[TMP2]]
	;			;
	%2 = add nsw i8 %0, 100			%2 = add nsw i8 %0, 100
	%3 = tail call { i8, i1 } @llvm.sadd.with.overflow.i8(i8 %2, i8 27)			%3 = tail call { i8, i1 } @llvm.sadd.with.overflow.i8(i8 %2, i8 27)
	ret { i8, i1 } %3			ret { i8, i1 } %3
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines