This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Start canonicalizing to uadd.sat and usub.sat
ClosedPublic

Authored by nikic on Mar 2 2019, 4:52 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon

Commits

rG7462303e0680: [InstCombine] Use uadd.sat and usub.sat for canonicalization
rL357103: [InstCombine] Use uadd.sat and usub.sat for canonicalization

Summary

At this point our uadd.sat and usub.sat support should be better than expanded IR, so start canonicalizing to these intrinsics. I'm porting the already existing patterns, though we'll also want to convert the currently canonical patterns in addition to that.

Diff Detail

Event Timeline

nikic created this revision.Mar 2 2019, 4:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2019, 4:52 AM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

Now that it's a single instruction, what about one-use checks there?

In D58872#1416090, @lebedev.ri wrote:

Now that it's a single instruction, what about one-use checks there?

I've left them in place to avoid redundant calculations if the intrinsic needs to be expanded in the backend. If that's not a concern I can drop them (apart from the case that inserts an extra sub).

Use CreateBinaryIntrinsic instead of CreateCall.

There are 2 concerns here:

Is the IR-level analysis for the intrinsics as good/better than the raw IR? The description says yes, so I assume we've filled all the known analysis, value tracking, vectorization, and combining gaps.
Is codegen as good/better than the raw IR? Ie, for all known targets/types, are we DAG combining/lowering the intrinsics optimally?

I want to make sure about #2 because there was some potential fallout from aggressively forming the similar usub overflow intrinsic in CGP in D57789, so we ended up adding a target hook (cc @dmgreen).

@spatel We're not lowering optimally, e.g. not using UQADD and friends on AArch64. They're not going to be used with expanded IR either though, only via target intrinsics. I can take a look at improving that first.

In D58872#1416441, @nikic wrote:

@spatel We're not lowering optimally, e.g. not using UQADD and friends on AArch64. They're not going to be used with expanded IR either though, only via target intrinsics. I can take a look at improving that first.

Sounds good. I think this is the right transform in IR, but if there's anything we can do in advance to ensure that codegen doesn't regress, that's always best.

I don't think it will be affected with this patch yet, but I'm looking at 1 more x86 or general vector codegen improvement for min/max in PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
...and then I think we can also convert min/max patterns to uadd.sat in IR.

Hello. This one looks OK from the tests I just ran. A few things moved, but only by a small amount, and usually in the right direction.

I think the "with.overflow" intrinsics were causing more problems for us. These saturations, at least in the tests I've ran, seem to be handled OK.

test/Transforms/InstCombine/unsigned_saturated_sub.ll
131	Do these comments need updating?

spatel mentioned this in rL357012: [InstCombine] form uaddsat from add+umin (PR14613).Mar 26 2019, 10:48 AM

spatel mentioned this in rG81e8d76f5b63: [InstCombine] form uaddsat from add+umin (PR14613).

I converted umin-based patterns for uaddsat here:
rL357012
That should give us an x86 improvement in PR14613.

I'm not sure if we are holding this patch up for any other changes, but if not and if the above change doesn't cause trouble, we should push ahead with this patch.

test/Transforms/InstCombine/unsigned_saturated_sub.ll
4–7	Remove/update this comment.

Update comments.

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 26 2019, 12:04 PM

In D58872#1443251, @spatel wrote:

I'm not sure if we are holding this patch up for any other changes, but if not and if the above change doesn't cause trouble, we should push ahead with this patch.

I originally wanted to do https://bugs.llvm.org/show_bug.cgi?id=41023 before this, but had to give up on it.

LGTM

This revision is now accepted and ready to land.Mar 27 2019, 5:59 AM

Closed by commit rL357103: [InstCombine] Use uadd.sat and usub.sat for canonicalization (authored by nikic). · Explain WhyMar 27 2019, 10:56 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

49 lines

test/

Transforms/

InstCombine/

saturating-add-sub.ll

108 lines

unsigned_saturated_sub.ll

64 lines

Diff 189075

lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	if (C2Log > C1Log) {
V = Builder.CreateZExtOrTrunc(V, Y->getType());		V = Builder.CreateZExtOrTrunc(V, Y->getType());

if (NeedXor)		if (NeedXor)
V = Builder.CreateXor(V, *C2);		V = Builder.CreateXor(V, *C2);

return Builder.CreateOr(V, Y);		return Builder.CreateOr(V, Y);
}		}

/// Transform patterns such as: (a > b) ? a - b : 0		/// Transform patterns such as (a > b) ? a - b : 0 into usub.sat(a, b).
/// into: ((a > b) ? a : b) - b)
/// This produces a canonical max pattern that is more easily recognized by the
/// backend and converted into saturated subtraction instructions if those
/// exist.
/// There are 8 commuted/swapped variants of this pattern.		/// There are 8 commuted/swapped variants of this pattern.
/// TODO: Also support a - UMIN(a,b) patterns.		/// TODO: Also support a - UMIN(a,b) patterns.
static Value canonicalizeSaturatedSubtract(const ICmpInst ICI,		static Value canonicalizeSaturatedSubtract(const ICmpInst ICI,
const Value *TrueVal,		const Value *TrueVal,
const Value *FalseVal,		const Value *FalseVal,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
ICmpInst::Predicate Pred = ICI->getPredicate();		ICmpInst::Predicate Pred = ICI->getPredicate();
if (!ICmpInst::isUnsigned(Pred))		if (!ICmpInst::isUnsigned(Pred))
Show All 25 Lines	static Value canonicalizeSaturatedSubtract(const ICmpInst ICI,
else if (!match(TrueVal, m_Sub(m_Specific(A), m_Specific(B))))		else if (!match(TrueVal, m_Sub(m_Specific(A), m_Specific(B))))
return nullptr;		return nullptr;

// If sub is used anywhere else, we wouldn't be able to eliminate it		// If sub is used anywhere else, we wouldn't be able to eliminate it
// afterwards.		// afterwards.
if (!TrueVal->hasOneUse())		if (!TrueVal->hasOneUse())
return nullptr;		return nullptr;

// All checks passed, convert to canonical unsigned saturated subtraction		// (a > b) ? a - b : 0 -> usub.sat(a, b)
// form: sub(max()).		// (a > b) ? b - a : 0 -> -usub.sat(a, b)
// (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)		Value *Result = Builder.CreateBinaryIntrinsic(Intrinsic::usub_sat, A, B);
Value *Max = Builder.CreateSelect(Builder.CreateICmp(Pred, A, B), A, B);		if (IsNegative)
return IsNegative ? Builder.CreateSub(B, Max) : Builder.CreateSub(Max, B);		Result = Builder.CreateNeg(Result);
		return Result;
}		}

static Value canonicalizeSaturatedAdd(ICmpInst Cmp, Value TVal, Value FVal,		static Value canonicalizeSaturatedAdd(ICmpInst Cmp, Value TVal, Value FVal,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
if (!Cmp->hasOneUse())		if (!Cmp->hasOneUse())
return nullptr;		return nullptr;

// Match unsigned saturated add with constant.		// Match unsigned saturated add with constant.
Value *Cmp0 = Cmp->getOperand(0);		Value *Cmp0 = Cmp->getOperand(0);
Value *Cmp1 = Cmp->getOperand(1);		Value *Cmp1 = Cmp->getOperand(1);
ICmpInst::Predicate Pred = Cmp->getPredicate();		ICmpInst::Predicate Pred = Cmp->getPredicate();
Value *X;		Value *X;
const APInt C, CmpC;		const APInt C, CmpC;
if (Pred == ICmpInst::ICMP_ULT &&		if (Pred == ICmpInst::ICMP_ULT &&
match(TVal, m_Add(m_Value(X), m_APInt(C))) && X == Cmp0 &&		match(TVal, m_Add(m_Value(X), m_APInt(C))) && X == Cmp0 &&
match(FVal, m_AllOnes()) && match(Cmp1, m_APInt(CmpC)) && CmpC == ~C) {		match(FVal, m_AllOnes()) && match(Cmp1, m_APInt(CmpC)) && CmpC == ~C) {
// Commute compare predicate and select operands:		// (X u< ~C) ? (X + C) : -1 --> uadd.sat(X, C)
// (X u< ~C) ? (X + C) : -1 --> (X u> ~C) ? -1 : (X + C)		return Builder.CreateBinaryIntrinsic(
Value *NewCmp = Builder.CreateICmp(ICmpInst::ICMP_UGT, X, Cmp1);		Intrinsic::uadd_sat, X, ConstantInt::get(X->getType(), *C));
return Builder.CreateSelect(NewCmp, FVal, TVal);
}		}

// Match unsigned saturated add of 2 variables with an unnecessary 'not'.		// Match unsigned saturated add of 2 variables with an unnecessary 'not'.
// There are 8 commuted variants.		// There are 8 commuted variants.
// Canonicalize -1 (saturated result) to true value of the select.		// Canonicalize -1 (saturated result) to true value of the select. Just
		// swapping the compare operands is legal, because the selected value is the
		// same in case of equality, so we can interchange u< and u<=.
if (match(FVal, m_AllOnes())) {		if (match(FVal, m_AllOnes())) {
std::swap(TVal, FVal);		std::swap(TVal, FVal);
std::swap(Cmp0, Cmp1);		std::swap(Cmp0, Cmp1);
}		}
if (!match(TVal, m_AllOnes()))		if (!match(TVal, m_AllOnes()))
return nullptr;		return nullptr;

// Canonicalize predicate to 'ULT'.		// Canonicalize predicate to 'ULT'.
if (Pred == ICmpInst::ICMP_UGT) {		if (Pred == ICmpInst::ICMP_UGT) {
Pred = ICmpInst::ICMP_ULT;		Pred = ICmpInst::ICMP_ULT;
std::swap(Cmp0, Cmp1);		std::swap(Cmp0, Cmp1);
}		}
if (Pred != ICmpInst::ICMP_ULT)		if (Pred != ICmpInst::ICMP_ULT)
return nullptr;		return nullptr;

// Match unsigned saturated add of 2 variables with an unnecessary 'not'.		// Match unsigned saturated add of 2 variables with an unnecessary 'not'.
Value *Y;		Value *Y;
if (match(Cmp0, m_Not(m_Value(X))) &&		if (match(Cmp0, m_Not(m_Value(X))) &&
match(FVal, m_c_Add(m_Specific(X), m_Value(Y))) && Y == Cmp1) {		match(FVal, m_c_Add(m_Specific(X), m_Value(Y))) && Y == Cmp1) {
// Change the comparison to use the sum (false value of the select). That is		// (~X u< Y) ? -1 : (X + Y) --> uadd.sat(X, Y)
// a canonical pattern match form for uadd.with.overflow and eliminates a		// (~X u< Y) ? -1 : (Y + X) --> uadd.sat(X, Y)
// use of the 'not' op:		return Builder.CreateBinaryIntrinsic(Intrinsic::uadd_sat, X, Y);
// (~X u< Y) ? -1 : (X + Y) --> ((X + Y) u< Y) ? -1 : (X + Y)
// (~X u< Y) ? -1 : (Y + X) --> ((Y + X) u< Y) ? -1 : (Y + X)
Value *NewCmp = Builder.CreateICmp(ICmpInst::ICMP_ULT, FVal, Y);
return Builder.CreateSelect(NewCmp, TVal, FVal);
}		}
// The 'not' op may be included in the sum but not the compare.		// The 'not' op may be included in the sum but not the compare.
X = Cmp0;		X = Cmp0;
Y = Cmp1;		Y = Cmp1;
if (match(FVal, m_c_Add(m_Not(m_Specific(X)), m_Specific(Y)))) {		if (match(FVal, m_c_Add(m_Not(m_Specific(X)), m_Specific(Y)))) {
// Change the comparison to use the sum (false value of the select). That is		// (X u< Y) ? -1 : (~X + Y) --> uadd.sat(~X, Y)
// a canonical pattern match form for uadd.with.overflow:		// (X u< Y) ? -1 : (Y + ~X) --> uadd.sat(Y, ~X)
// (X u< Y) ? -1 : (~X + Y) --> ((~X + Y) u< Y) ? -1 : (~X + Y)		BinaryOperator *BO = cast<BinaryOperator>(FVal);
// (X u< Y) ? -1 : (Y + ~X) --> ((Y + ~X) u< Y) ? -1 : (Y + ~X)		return Builder.CreateBinaryIntrinsic(
Value *NewCmp = Builder.CreateICmp(ICmpInst::ICMP_ULT, FVal, Y);		Intrinsic::uadd_sat, BO->getOperand(0), BO->getOperand(1));
return Builder.CreateSelect(NewCmp, TVal, FVal);
}		}

return nullptr;		return nullptr;
}		}

/// Attempt to fold a cttz/ctlz followed by a icmp plus select into a single		/// Attempt to fold a cttz/ctlz followed by a icmp plus select into a single
/// call to cttz/ctlz with flag 'is_zero_undef' cleared.		/// call to cttz/ctlz with flag 'is_zero_undef' cleared.
///		///
▲ Show 20 Lines • Show All 1,402 Lines • Show Last 20 Lines

test/Transforms/InstCombine/saturating-add-sub.ll

Show First 20 Lines • Show All 715 Lines • ▼ Show 20 Lines	;
%r = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> %a_neg, <2 x i8> <i8 10, i8 20>)		%r = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> %a_neg, <2 x i8> <i8 10, i8 20>)
ret <2 x i8> %r		ret <2 x i8> %r
}		}

; Raw IR tests		; Raw IR tests

define i32 @uadd_sat(i32 %x, i32 %y) {		define i32 @uadd_sat(i32 %x, i32 %y) {
; CHECK-LABEL: @uadd_sat(		; CHECK-LABEL: @uadd_sat(
; CHECK-NEXT: [[A:%.]] = add i32 [[Y:%.]], [[X:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[X:%.]], i32 [[Y:%.*]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %y, %x		%a = add i32 %y, %x
%c = icmp ult i32 %notx, %y		%c = icmp ult i32 %notx, %y
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_commute_add(i32 %xp, i32 %y) {		define i32 @uadd_sat_commute_add(i32 %xp, i32 %y) {
; CHECK-LABEL: @uadd_sat_commute_add(		; CHECK-LABEL: @uadd_sat_commute_add(
; CHECK-NEXT: [[X:%.]] = urem i32 42, [[XP:%.]]		; CHECK-NEXT: [[X:%.]] = urem i32 42, [[XP:%.]]
; CHECK-NEXT: [[A:%.]] = add i32 [[X]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[X]], i32 [[Y:%.]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%x = urem i32 42, %xp ; thwart complexity-based-canonicalization		%x = urem i32 42, %xp ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %x, %y		%a = add i32 %x, %y
%c = icmp ult i32 %notx, %y		%c = icmp ult i32 %notx, %y
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_ugt(i32 %x, i32 %yp) {		define i32 @uadd_sat_ugt(i32 %x, i32 %yp) {
; CHECK-LABEL: @uadd_sat_ugt(		; CHECK-LABEL: @uadd_sat_ugt(
; CHECK-NEXT: [[Y:%.]] = sdiv i32 [[YP:%.]], 2442		; CHECK-NEXT: [[Y:%.]] = sdiv i32 [[YP:%.]], 2442
; CHECK-NEXT: [[A:%.]] = add i32 [[Y]], [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[X:%.]], i32 [[Y]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%y = sdiv i32 %yp, 2442 ; thwart complexity-based-canonicalization		%y = sdiv i32 %yp, 2442 ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %y, %x		%a = add i32 %y, %x
%c = icmp ugt i32 %y, %notx		%c = icmp ugt i32 %y, %notx
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define <2 x i32> @uadd_sat_ugt_commute_add(<2 x i32> %xp, <2 x i32> %yp) {		define <2 x i32> @uadd_sat_ugt_commute_add(<2 x i32> %xp, <2 x i32> %yp) {
; CHECK-LABEL: @uadd_sat_ugt_commute_add(		; CHECK-LABEL: @uadd_sat_ugt_commute_add(
; CHECK-NEXT: [[Y:%.]] = sdiv <2 x i32> [[YP:%.]], <i32 2442, i32 4242>		; CHECK-NEXT: [[Y:%.]] = sdiv <2 x i32> [[YP:%.]], <i32 2442, i32 4242>
; CHECK-NEXT: [[X:%.]] = srem <2 x i32> <i32 42, i32 43>, [[XP:%.]]		; CHECK-NEXT: [[X:%.]] = srem <2 x i32> <i32 42, i32 43>, [[XP:%.]]
; CHECK-NEXT: [[A:%.*]] = add <2 x i32> [[X]], [[Y]]		; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i32> @llvm.uadd.sat.v2i32(<2 x i32> [[X]], <2 x i32> [[Y]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> [[A]], [[Y]]		; CHECK-NEXT: ret <2 x i32> [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> <i32 -1, i32 -1>, <2 x i32> [[A]]
; CHECK-NEXT: ret <2 x i32> [[TMP2]]
;		;
%y = sdiv <2 x i32> %yp, <i32 2442, i32 4242> ; thwart complexity-based-canonicalization		%y = sdiv <2 x i32> %yp, <i32 2442, i32 4242> ; thwart complexity-based-canonicalization
%x = srem <2 x i32> <i32 42, i32 43>, %xp ; thwart complexity-based-canonicalization		%x = srem <2 x i32> <i32 42, i32 43>, %xp ; thwart complexity-based-canonicalization
%notx = xor <2 x i32> %x, <i32 -1, i32 -1>		%notx = xor <2 x i32> %x, <i32 -1, i32 -1>
%a = add <2 x i32> %x, %y		%a = add <2 x i32> %x, %y
%c = icmp ugt <2 x i32> %y, %notx		%c = icmp ugt <2 x i32> %y, %notx
%r = select <2 x i1> %c, <2 x i32> <i32 -1, i32 -1>, <2 x i32> %a		%r = select <2 x i1> %c, <2 x i32> <i32 -1, i32 -1>, <2 x i32> %a
ret <2 x i32> %r		ret <2 x i32> %r
}		}

define i32 @uadd_sat_commute_select(i32 %x, i32 %yp) {		define i32 @uadd_sat_commute_select(i32 %x, i32 %yp) {
; CHECK-LABEL: @uadd_sat_commute_select(		; CHECK-LABEL: @uadd_sat_commute_select(
; CHECK-NEXT: [[Y:%.]] = sdiv i32 [[YP:%.]], 2442		; CHECK-NEXT: [[Y:%.]] = sdiv i32 [[YP:%.]], 2442
; CHECK-NEXT: [[A:%.]] = add i32 [[Y]], [[X:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[X:%.]], i32 [[Y]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%y = sdiv i32 %yp, 2442 ; thwart complexity-based-canonicalization		%y = sdiv i32 %yp, 2442 ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %y, %x		%a = add i32 %y, %x
%c = icmp ult i32 %y, %notx		%c = icmp ult i32 %y, %notx
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_commute_select_commute_add(i32 %xp, i32 %yp) {		define i32 @uadd_sat_commute_select_commute_add(i32 %xp, i32 %yp) {
; CHECK-LABEL: @uadd_sat_commute_select_commute_add(		; CHECK-LABEL: @uadd_sat_commute_select_commute_add(
; CHECK-NEXT: [[X:%.]] = urem i32 42, [[XP:%.]]		; CHECK-NEXT: [[X:%.]] = urem i32 42, [[XP:%.]]
; CHECK-NEXT: [[Y:%.]] = sdiv i32 [[YP:%.]], 2442		; CHECK-NEXT: [[Y:%.]] = sdiv i32 [[YP:%.]], 2442
; CHECK-NEXT: [[A:%.*]] = add nsw i32 [[X]], [[Y]]		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.uadd.sat.i32(i32 [[X]], i32 [[Y]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%x = urem i32 42, %xp ; thwart complexity-based-canonicalization		%x = urem i32 42, %xp ; thwart complexity-based-canonicalization
%y = sdiv i32 %yp, 2442 ; thwart complexity-based-canonicalization		%y = sdiv i32 %yp, 2442 ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %x, %y		%a = add i32 %x, %y
%c = icmp ult i32 %y, %notx		%c = icmp ult i32 %y, %notx
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}

define <2 x i32> @uadd_sat_commute_select_ugt(<2 x i32> %x, <2 x i32> %y) {		define <2 x i32> @uadd_sat_commute_select_ugt(<2 x i32> %x, <2 x i32> %y) {
; CHECK-LABEL: @uadd_sat_commute_select_ugt(		; CHECK-LABEL: @uadd_sat_commute_select_ugt(
; CHECK-NEXT: [[A:%.]] = add <2 x i32> [[Y:%.]], [[X:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = call <2 x i32> @llvm.uadd.sat.v2i32(<2 x i32> [[X:%.]], <2 x i32> [[Y:%.*]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> [[A]], [[Y]]		; CHECK-NEXT: ret <2 x i32> [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> <i32 -1, i32 -1>, <2 x i32> [[A]]
; CHECK-NEXT: ret <2 x i32> [[TMP2]]
;		;
%notx = xor <2 x i32> %x, <i32 -1, i32 -1>		%notx = xor <2 x i32> %x, <i32 -1, i32 -1>
%a = add <2 x i32> %y, %x		%a = add <2 x i32> %y, %x
%c = icmp ugt <2 x i32> %notx, %y		%c = icmp ugt <2 x i32> %notx, %y
%r = select <2 x i1> %c, <2 x i32> %a, <2 x i32> <i32 -1, i32 -1>		%r = select <2 x i1> %c, <2 x i32> %a, <2 x i32> <i32 -1, i32 -1>
ret <2 x i32> %r		ret <2 x i32> %r
}		}

define i32 @uadd_sat_commute_select_ugt_commute_add(i32 %xp, i32 %y) {		define i32 @uadd_sat_commute_select_ugt_commute_add(i32 %xp, i32 %y) {
; CHECK-LABEL: @uadd_sat_commute_select_ugt_commute_add(		; CHECK-LABEL: @uadd_sat_commute_select_ugt_commute_add(
; CHECK-NEXT: [[X:%.]] = srem i32 42, [[XP:%.]]		; CHECK-NEXT: [[X:%.]] = srem i32 42, [[XP:%.]]
; CHECK-NEXT: [[A:%.]] = add i32 [[X]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[X]], i32 [[Y:%.]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%x = srem i32 42, %xp ; thwart complexity-based-canonicalization		%x = srem i32 42, %xp ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %x, %y		%a = add i32 %x, %y
%c = icmp ugt i32 %notx, %y		%c = icmp ugt i32 %notx, %y
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}
Show All 28 Lines	;
ret i32 %r		ret i32 %r
}		}

; The add may include a 'not' op rather than the cmp.		; The add may include a 'not' op rather than the cmp.

define i32 @uadd_sat_not(i32 %x, i32 %y) {		define i32 @uadd_sat_not(i32 %x, i32 %y) {
; CHECK-LABEL: @uadd_sat_not(		; CHECK-LABEL: @uadd_sat_not(
; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1		; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1
; CHECK-NEXT: [[A:%.]] = add i32 [[NOTX]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[NOTX]], i32 [[Y:%.]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %notx, %y		%a = add i32 %notx, %y
%c = icmp ult i32 %x, %y		%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_not_commute_add(i32 %xp, i32 %yp) {		define i32 @uadd_sat_not_commute_add(i32 %xp, i32 %yp) {
; CHECK-LABEL: @uadd_sat_not_commute_add(		; CHECK-LABEL: @uadd_sat_not_commute_add(
; CHECK-NEXT: [[X:%.]] = srem i32 42, [[XP:%.]]		; CHECK-NEXT: [[X:%.]] = srem i32 42, [[XP:%.]]
; CHECK-NEXT: [[Y:%.]] = urem i32 42, [[YP:%.]]		; CHECK-NEXT: [[Y:%.]] = urem i32 42, [[YP:%.]]
; CHECK-NEXT: [[NOTX:%.*]] = xor i32 [[X]], -1		; CHECK-NEXT: [[NOTX:%.*]] = xor i32 [[X]], -1
; CHECK-NEXT: [[A:%.*]] = add nsw i32 [[Y]], [[NOTX]]		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.uadd.sat.i32(i32 [[Y]], i32 [[NOTX]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%x = srem i32 42, %xp ; thwart complexity-based-canonicalization		%x = srem i32 42, %xp ; thwart complexity-based-canonicalization
%y = urem i32 42, %yp ; thwart complexity-based-canonicalization		%y = urem i32 42, %yp ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %y, %notx		%a = add i32 %y, %notx
%c = icmp ult i32 %x, %y		%c = icmp ult i32 %x, %y
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_not_ugt(i32 %x, i32 %y) {		define i32 @uadd_sat_not_ugt(i32 %x, i32 %y) {
; CHECK-LABEL: @uadd_sat_not_ugt(		; CHECK-LABEL: @uadd_sat_not_ugt(
; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1		; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1
; CHECK-NEXT: [[A:%.]] = add i32 [[NOTX]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[NOTX]], i32 [[Y:%.]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %notx, %y		%a = add i32 %notx, %y
%c = icmp ugt i32 %y, %x		%c = icmp ugt i32 %y, %x
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define <2 x i32> @uadd_sat_not_ugt_commute_add(<2 x i32> %x, <2 x i32> %yp) {		define <2 x i32> @uadd_sat_not_ugt_commute_add(<2 x i32> %x, <2 x i32> %yp) {
; CHECK-LABEL: @uadd_sat_not_ugt_commute_add(		; CHECK-LABEL: @uadd_sat_not_ugt_commute_add(
; CHECK-NEXT: [[Y:%.]] = sdiv <2 x i32> [[YP:%.]], <i32 2442, i32 4242>		; CHECK-NEXT: [[Y:%.]] = sdiv <2 x i32> [[YP:%.]], <i32 2442, i32 4242>
; CHECK-NEXT: [[NOTX:%.]] = xor <2 x i32> [[X:%.]], <i32 -1, i32 -1>		; CHECK-NEXT: [[NOTX:%.]] = xor <2 x i32> [[X:%.]], <i32 -1, i32 -1>
; CHECK-NEXT: [[A:%.*]] = add <2 x i32> [[Y]], [[NOTX]]		; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i32> @llvm.uadd.sat.v2i32(<2 x i32> [[Y]], <2 x i32> [[NOTX]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> [[A]], [[Y]]		; CHECK-NEXT: ret <2 x i32> [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> <i32 -1, i32 -1>, <2 x i32> [[A]]
; CHECK-NEXT: ret <2 x i32> [[TMP2]]
;		;
%y = sdiv <2 x i32> %yp, <i32 2442, i32 4242> ; thwart complexity-based-canonicalization		%y = sdiv <2 x i32> %yp, <i32 2442, i32 4242> ; thwart complexity-based-canonicalization
%notx = xor <2 x i32> %x, <i32 -1, i32 -1>		%notx = xor <2 x i32> %x, <i32 -1, i32 -1>
%a = add <2 x i32> %y, %notx		%a = add <2 x i32> %y, %notx
%c = icmp ugt <2 x i32> %y, %x		%c = icmp ugt <2 x i32> %y, %x
%r = select <2 x i1> %c, <2 x i32> <i32 -1, i32 -1>, <2 x i32> %a		%r = select <2 x i1> %c, <2 x i32> <i32 -1, i32 -1>, <2 x i32> %a
ret <2 x i32> %r		ret <2 x i32> %r
}		}

define i32 @uadd_sat_not_commute_select(i32 %x, i32 %y) {		define i32 @uadd_sat_not_commute_select(i32 %x, i32 %y) {
; CHECK-LABEL: @uadd_sat_not_commute_select(		; CHECK-LABEL: @uadd_sat_not_commute_select(
; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1		; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1
; CHECK-NEXT: [[A:%.]] = add i32 [[NOTX]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[NOTX]], i32 [[Y:%.]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %notx, %y		%a = add i32 %notx, %y
%c = icmp ult i32 %y, %x		%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_not_commute_select_commute_add(i32 %x, i32 %yp) {		define i32 @uadd_sat_not_commute_select_commute_add(i32 %x, i32 %yp) {
; CHECK-LABEL: @uadd_sat_not_commute_select_commute_add(		; CHECK-LABEL: @uadd_sat_not_commute_select_commute_add(
; CHECK-NEXT: [[Y:%.]] = sdiv i32 42, [[YP:%.]]		; CHECK-NEXT: [[Y:%.]] = sdiv i32 42, [[YP:%.]]
; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1		; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1
; CHECK-NEXT: [[A:%.*]] = add i32 [[Y]], [[NOTX]]		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.uadd.sat.i32(i32 [[Y]], i32 [[NOTX]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%y = sdiv i32 42, %yp ; thwart complexity-based-canonicalization		%y = sdiv i32 42, %yp ; thwart complexity-based-canonicalization
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %y, %notx		%a = add i32 %y, %notx
%c = icmp ult i32 %y, %x		%c = icmp ult i32 %y, %x
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}

define <2 x i32> @uadd_sat_not_commute_select_ugt(<2 x i32> %xp, <2 x i32> %yp) {		define <2 x i32> @uadd_sat_not_commute_select_ugt(<2 x i32> %xp, <2 x i32> %yp) {
; CHECK-LABEL: @uadd_sat_not_commute_select_ugt(		; CHECK-LABEL: @uadd_sat_not_commute_select_ugt(
; CHECK-NEXT: [[X:%.]] = urem <2 x i32> <i32 42, i32 -42>, [[XP:%.]]		; CHECK-NEXT: [[X:%.]] = urem <2 x i32> <i32 42, i32 -42>, [[XP:%.]]
; CHECK-NEXT: [[Y:%.]] = srem <2 x i32> <i32 12, i32 412>, [[YP:%.]]		; CHECK-NEXT: [[Y:%.]] = srem <2 x i32> <i32 12, i32 412>, [[YP:%.]]
; CHECK-NEXT: [[NOTX:%.*]] = xor <2 x i32> [[X]], <i32 -1, i32 -1>		; CHECK-NEXT: [[NOTX:%.*]] = xor <2 x i32> [[X]], <i32 -1, i32 -1>
; CHECK-NEXT: [[A:%.*]] = add <2 x i32> [[Y]], [[NOTX]]		; CHECK-NEXT: [[TMP1:%.*]] = call <2 x i32> @llvm.uadd.sat.v2i32(<2 x i32> [[Y]], <2 x i32> [[NOTX]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult <2 x i32> [[A]], [[Y]]		; CHECK-NEXT: ret <2 x i32> [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select <2 x i1> [[TMP1]], <2 x i32> <i32 -1, i32 -1>, <2 x i32> [[A]]
; CHECK-NEXT: ret <2 x i32> [[TMP2]]
;		;
%x = urem <2 x i32> <i32 42, i32 -42>, %xp ; thwart complexity-based-canonicalization		%x = urem <2 x i32> <i32 42, i32 -42>, %xp ; thwart complexity-based-canonicalization
%y = srem <2 x i32> <i32 12, i32 412>, %yp ; thwart complexity-based-canonicalization		%y = srem <2 x i32> <i32 12, i32 412>, %yp ; thwart complexity-based-canonicalization
%notx = xor <2 x i32> %x, <i32 -1, i32 -1>		%notx = xor <2 x i32> %x, <i32 -1, i32 -1>
%a = add <2 x i32> %y, %notx		%a = add <2 x i32> %y, %notx
%c = icmp ugt <2 x i32> %x, %y		%c = icmp ugt <2 x i32> %x, %y
%r = select <2 x i1> %c, <2 x i32> %a, <2 x i32> <i32 -1, i32 -1>		%r = select <2 x i1> %c, <2 x i32> %a, <2 x i32> <i32 -1, i32 -1>
ret <2 x i32> %r		ret <2 x i32> %r
}		}

define i32 @uadd_sat_not_commute_select_ugt_commute_add(i32 %x, i32 %y) {		define i32 @uadd_sat_not_commute_select_ugt_commute_add(i32 %x, i32 %y) {
; CHECK-LABEL: @uadd_sat_not_commute_select_ugt_commute_add(		; CHECK-LABEL: @uadd_sat_not_commute_select_ugt_commute_add(
; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1		; CHECK-NEXT: [[NOTX:%.]] = xor i32 [[X:%.]], -1
; CHECK-NEXT: [[A:%.]] = add i32 [[NOTX]], [[Y:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[NOTX]], i32 [[Y:%.]])
; CHECK-NEXT: [[TMP1:%.*]] = icmp ult i32 [[A]], [[Y]]		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%notx = xor i32 %x, -1		%notx = xor i32 %x, -1
%a = add i32 %notx, %y		%a = add i32 %notx, %y
%c = icmp ugt i32 %x, %y		%c = icmp ugt i32 %x, %y
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_constant(i32 %x) {		define i32 @uadd_sat_constant(i32 %x) {
; CHECK-LABEL: @uadd_sat_constant(		; CHECK-LABEL: @uadd_sat_constant(
; CHECK-NEXT: [[A:%.]] = add i32 [[X:%.]], 42		; CHECK-NEXT: [[A:%.]] = add i32 [[X:%.]], 42
; CHECK-NEXT: [[C:%.*]] = icmp ugt i32 [[X]], -43		; CHECK-NEXT: [[C:%.*]] = icmp ugt i32 [[X]], -43
; CHECK-NEXT: [[R:%.*]] = select i1 [[C]], i32 -1, i32 [[A]]		; CHECK-NEXT: [[R:%.*]] = select i1 [[C]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[R]]		; CHECK-NEXT: ret i32 [[R]]
;		;
%a = add i32 %x, 42		%a = add i32 %x, 42
%c = icmp ugt i32 %x, -43		%c = icmp ugt i32 %x, -43
%r = select i1 %c, i32 -1, i32 %a		%r = select i1 %c, i32 -1, i32 %a
ret i32 %r		ret i32 %r
}		}

define i32 @uadd_sat_constant_commute(i32 %x) {		define i32 @uadd_sat_constant_commute(i32 %x) {
; CHECK-LABEL: @uadd_sat_constant_commute(		; CHECK-LABEL: @uadd_sat_constant_commute(
; CHECK-NEXT: [[A:%.]] = add i32 [[X:%.]], 42		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.uadd.sat.i32(i32 [[X:%.]], i32 42)
; CHECK-NEXT: [[TMP1:%.*]] = icmp ugt i32 [[X]], -43		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i32 -1, i32 [[A]]
; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%a = add i32 %x, 42		%a = add i32 %x, 42
%c = icmp ult i32 %x, -43		%c = icmp ult i32 %x, -43
%r = select i1 %c, i32 %a, i32 -1		%r = select i1 %c, i32 %a, i32 -1
ret i32 %r		ret i32 %r
}		}

define <4 x i32> @uadd_sat_constant_vec(<4 x i32> %x) {		define <4 x i32> @uadd_sat_constant_vec(<4 x i32> %x) {
; CHECK-LABEL: @uadd_sat_constant_vec(		; CHECK-LABEL: @uadd_sat_constant_vec(
; CHECK-NEXT: [[A:%.]] = add <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[A:%.]] = add <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>
; CHECK-NEXT: [[C:%.*]] = icmp ugt <4 x i32> [[X]], <i32 -43, i32 -43, i32 -43, i32 -43>		; CHECK-NEXT: [[C:%.*]] = icmp ugt <4 x i32> [[X]], <i32 -43, i32 -43, i32 -43, i32 -43>
; CHECK-NEXT: [[R:%.*]] = select <4 x i1> [[C]], <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> [[A]]		; CHECK-NEXT: [[R:%.*]] = select <4 x i1> [[C]], <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> [[A]]
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%a = add <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>		%a = add <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
%c = icmp ugt <4 x i32> %x, <i32 -43, i32 -43, i32 -43, i32 -43>		%c = icmp ugt <4 x i32> %x, <i32 -43, i32 -43, i32 -43, i32 -43>
%r = select <4 x i1> %c, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> %a		%r = select <4 x i1> %c, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> %a
ret <4 x i32> %r		ret <4 x i32> %r
}		}

define <4 x i32> @uadd_sat_constant_vec_commute(<4 x i32> %x) {		define <4 x i32> @uadd_sat_constant_vec_commute(<4 x i32> %x) {
; CHECK-LABEL: @uadd_sat_constant_vec_commute(		; CHECK-LABEL: @uadd_sat_constant_vec_commute(
; CHECK-NEXT: [[A:%.]] = add <4 x i32> [[X:%.]], <i32 42, i32 42, i32 42, i32 42>		; CHECK-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.uadd.sat.v4i32(<4 x i32> [[X:%.]], <4 x i32> <i32 42, i32 42, i32 42, i32 42>)
; CHECK-NEXT: [[TMP1:%.*]] = icmp ugt <4 x i32> [[X]], <i32 -43, i32 -43, i32 -43, i32 -43>		; CHECK-NEXT: ret <4 x i32> [[TMP1]]
; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, <4 x i32> [[A]]
; CHECK-NEXT: ret <4 x i32> [[TMP2]]
;		;
%a = add <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>		%a = add <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
%c = icmp ult <4 x i32> %x, <i32 -43, i32 -43, i32 -43, i32 -43>		%c = icmp ult <4 x i32> %x, <i32 -43, i32 -43, i32 -43, i32 -43>
%r = select <4 x i1> %c, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>		%r = select <4 x i1> %c, <4 x i32> %a, <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
ret <4 x i32> %r		ret <4 x i32> %r
}		}

define <4 x i32> @uadd_sat_constant_vec_commute_undefs(<4 x i32> %x) {		define <4 x i32> @uadd_sat_constant_vec_commute_undefs(<4 x i32> %x) {
Show All 12 Lines

test/Transforms/InstCombine/unsigned_saturated_sub.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -instcombine -S < %s \| FileCheck %s

	; Transforms for unsigned saturated subtraction idioms are tested here.			; Transforms for unsigned saturated subtraction idioms are tested here.
	; In all cases, we want to form a canonical min/max op (the compare and			; In all cases, we want to form a canonical min/max op (the compare and
	; select operands are the same), so that is recognized by the backend.			; select operands are the same), so that is recognized by the backend.
	; The backend recognition is tested in test/CodeGen/X86/psubus.ll.			; The backend recognition is tested in test/CodeGen/X86/psubus.ll.
				spatelUnsubmitted Done Reply Inline Actions Remove/update this comment. spatel: Remove/update this comment.

	declare void @use(i64)			declare void @use(i64)

	; (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)			; (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)

	define i64 @max_sub_ugt(i64 %a, i64 %b) {			define i64 @max_sub_ugt(i64 %a, i64 %b) {
	; CHECK-LABEL: @max_sub_ugt(			; CHECK-LABEL: @max_sub_ugt(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[A]], i64 [[B]]			; CHECK-NEXT: ret i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[TMP2]], [[B]]
	; CHECK-NEXT: ret i64 [[TMP3]]
	;			;
	%cmp = icmp ugt i64 %a, %b			%cmp = icmp ugt i64 %a, %b
	%sub = sub i64 %a, %b			%sub = sub i64 %a, %b
	%sel = select i1 %cmp, i64 %sub ,i64 0			%sel = select i1 %cmp, i64 %sub ,i64 0
	ret i64 %sel			ret i64 %sel
	}			}

	; (a >= b) ? a - b : 0 -> ((a >= b) ? a : b) - b)			; (a >= b) ? a - b : 0 -> ((a >= b) ? a : b) - b)

	define i64 @max_sub_uge(i64 %a, i64 %b) {			define i64 @max_sub_uge(i64 %a, i64 %b) {
	; CHECK-LABEL: @max_sub_uge(			; CHECK-LABEL: @max_sub_uge(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ult i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[B]], i64 [[A]]			; CHECK-NEXT: ret i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[TMP2]], [[B]]
	; CHECK-NEXT: ret i64 [[TMP3]]
	;			;
	%cmp = icmp uge i64 %a, %b			%cmp = icmp uge i64 %a, %b
	%sub = sub i64 %a, %b			%sub = sub i64 %a, %b
	%sel = select i1 %cmp, i64 %sub ,i64 0			%sel = select i1 %cmp, i64 %sub ,i64 0
	ret i64 %sel			ret i64 %sel
	}			}

	; Again, with vectors:			; Again, with vectors:
	; (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)			; (a > b) ? a - b : 0 -> ((a > b) ? a : b) - b)

	define <4 x i32> @max_sub_ugt_vec(<4 x i32> %a, <4 x i32> %b) {			define <4 x i32> @max_sub_ugt_vec(<4 x i32> %a, <4 x i32> %b) {
	; CHECK-LABEL: @max_sub_ugt_vec(			; CHECK-LABEL: @max_sub_ugt_vec(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ugt <4 x i32> [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call <4 x i32> @llvm.usub.sat.v4i32(<4 x i32> [[A:%.]], <4 x i32> [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[A]], <4 x i32> [[B]]			; CHECK-NEXT: ret <4 x i32> [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP2]], [[B]]
	; CHECK-NEXT: ret <4 x i32> [[TMP3]]
	;			;
	%cmp = icmp ugt <4 x i32> %a, %b			%cmp = icmp ugt <4 x i32> %a, %b
	%sub = sub <4 x i32> %a, %b			%sub = sub <4 x i32> %a, %b
	%sel = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> zeroinitializer			%sel = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> zeroinitializer
	ret <4 x i32> %sel			ret <4 x i32> %sel
	}			}

	; Use extra ops to thwart icmp swapping canonicalization.			; Use extra ops to thwart icmp swapping canonicalization.
	; (b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)			; (b < a) ? a - b : 0 -> ((a > b) ? a : b) - b)

	define i64 @max_sub_ult(i64 %a, i64 %b) {			define i64 @max_sub_ult(i64 %a, i64 %b) {
	; CHECK-LABEL: @max_sub_ult(			; CHECK-LABEL: @max_sub_ult(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ult i64 [[B:%.]], [[A:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[A]], i64 [[B]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[TMP2]], [[B]]
	; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[B]], [[A]]			; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[B]], [[A]]
	; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])			; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])
	; CHECK-NEXT: ret i64 [[TMP3]]			; CHECK-NEXT: ret i64 [[TMP1]]
	;			;
	%cmp = icmp ult i64 %b, %a			%cmp = icmp ult i64 %b, %a
	%sub = sub i64 %a, %b			%sub = sub i64 %a, %b
	%sel = select i1 %cmp, i64 %sub ,i64 0			%sel = select i1 %cmp, i64 %sub ,i64 0
	%extrasub = sub i64 %b, %a			%extrasub = sub i64 %b, %a
	call void @use(i64 %extrasub)			call void @use(i64 %extrasub)
	ret i64 %sel			ret i64 %sel
	}			}

	; (b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)			; (b > a) ? 0 : a - b -> ((a > b) ? a : b) - b)

	define i64 @max_sub_ugt_sel_swapped(i64 %a, i64 %b) {			define i64 @max_sub_ugt_sel_swapped(i64 %a, i64 %b) {
	; CHECK-LABEL: @max_sub_ugt_sel_swapped(			; CHECK-LABEL: @max_sub_ugt_sel_swapped(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i64 [[B:%.]], [[A:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[B]], i64 [[A]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[TMP2]], [[B]]
	; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[B]], [[A]]			; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[B]], [[A]]
	; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])			; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])
	; CHECK-NEXT: ret i64 [[TMP3]]			; CHECK-NEXT: ret i64 [[TMP1]]
	;			;
	%cmp = icmp ugt i64 %b, %a			%cmp = icmp ugt i64 %b, %a
	%sub = sub i64 %a, %b			%sub = sub i64 %a, %b
	%sel = select i1 %cmp, i64 0 ,i64 %sub			%sel = select i1 %cmp, i64 0 ,i64 %sub
	%extrasub = sub i64 %b, %a			%extrasub = sub i64 %b, %a
	call void @use(i64 %extrasub)			call void @use(i64 %extrasub)
	ret i64 %sel			ret i64 %sel
	}			}

	; (a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)			; (a < b) ? 0 : a - b -> ((a > b) ? a : b) - b)

	define i64 @max_sub_ult_sel_swapped(i64 %a, i64 %b) {			define i64 @max_sub_ult_sel_swapped(i64 %a, i64 %b) {
	; CHECK-LABEL: @max_sub_ult_sel_swapped(			; CHECK-LABEL: @max_sub_ult_sel_swapped(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ult i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[B]], i64 [[A]]			; CHECK-NEXT: ret i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[TMP2]], [[B]]
	; CHECK-NEXT: ret i64 [[TMP3]]
	;			;
	%cmp = icmp ult i64 %a, %b			%cmp = icmp ult i64 %a, %b
	%sub = sub i64 %a, %b			%sub = sub i64 %a, %b
	%sel = select i1 %cmp, i64 0 ,i64 %sub			%sel = select i1 %cmp, i64 0 ,i64 %sub
	ret i64 %sel			ret i64 %sel
	}			}

	; ((a > b) ? b - a : 0) -> (b - ((a > b) ? a : b))			; ((a > b) ? b - a : 0) -> (b - ((a > b) ? a : b))

	define i64 @neg_max_sub_ugt(i64 %a, i64 %b) {			define i64 @neg_max_sub_ugt(i64 %a, i64 %b) {
	; CHECK-LABEL: @neg_max_sub_ugt(			; CHECK-LABEL: @neg_max_sub_ugt(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[A]], i64 [[B]]			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 0, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[B]], [[TMP2]]
	; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[A]], [[B]]			; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[A]], [[B]]
	; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])			; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])
	; CHECK-NEXT: ret i64 [[TMP3]]			; CHECK-NEXT: ret i64 [[TMP2]]
	;			;
	%cmp = icmp ugt i64 %a, %b			%cmp = icmp ugt i64 %a, %b
	%sub = sub i64 %b, %a			%sub = sub i64 %b, %a
	%sel = select i1 %cmp, i64 %sub ,i64 0			%sel = select i1 %cmp, i64 %sub ,i64 0
	%extrasub = sub i64 %a, %b			%extrasub = sub i64 %a, %b
	call void @use(i64 %extrasub)			call void @use(i64 %extrasub)
	ret i64 %sel			ret i64 %sel
	}			}

	; ((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)			; ((b < a) ? b - a : 0) -> - ((a > b) ? a : b) - b)

	define i64 @neg_max_sub_ult(i64 %a, i64 %b) {			define i64 @neg_max_sub_ult(i64 %a, i64 %b) {
	; CHECK-LABEL: @neg_max_sub_ult(			; CHECK-LABEL: @neg_max_sub_ult(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ugt i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[A]], i64 [[B]]			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 0, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[B]], [[TMP2]]			; CHECK-NEXT: ret i64 [[TMP2]]
	; CHECK-NEXT: ret i64 [[TMP3]]
	;			;
	%cmp = icmp ult i64 %b, %a			%cmp = icmp ult i64 %b, %a
	%sub = sub i64 %b, %a			%sub = sub i64 %b, %a
	%sel = select i1 %cmp, i64 %sub ,i64 0			%sel = select i1 %cmp, i64 %sub ,i64 0
	ret i64 %sel			ret i64 %sel
	}			}

	; ((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)			; ((b > a) ? 0 : b - a) -> - ((a > b) ? a : b) - b)
				dmgreenUnsubmitted Done Reply Inline Actions Do these comments need updating? dmgreen: Do these comments need updating?

	define i64 @neg_max_sub_ugt_sel_swapped(i64 %a, i64 %b) {			define i64 @neg_max_sub_ugt_sel_swapped(i64 %a, i64 %b) {
	; CHECK-LABEL: @neg_max_sub_ugt_sel_swapped(			; CHECK-LABEL: @neg_max_sub_ugt_sel_swapped(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ult i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[B]], i64 [[A]]			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 0, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[B]], [[TMP2]]			; CHECK-NEXT: ret i64 [[TMP2]]
	; CHECK-NEXT: ret i64 [[TMP3]]
	;			;
	%cmp = icmp ugt i64 %b, %a			%cmp = icmp ugt i64 %b, %a
	%sub = sub i64 %b, %a			%sub = sub i64 %b, %a
	%sel = select i1 %cmp, i64 0 ,i64 %sub			%sel = select i1 %cmp, i64 0 ,i64 %sub
	ret i64 %sel			ret i64 %sel
	}			}

	; ((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)			; ((a < b) ? 0 : b - a) -> - ((a > b) ? a : b) - b)

	define i64 @neg_max_sub_ult_sel_swapped(i64 %a, i64 %b) {			define i64 @neg_max_sub_ult_sel_swapped(i64 %a, i64 %b) {
	; CHECK-LABEL: @neg_max_sub_ult_sel_swapped(			; CHECK-LABEL: @neg_max_sub_ult_sel_swapped(
	; CHECK-NEXT: [[TMP1:%.]] = icmp ult i64 [[A:%.]], [[B:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = call i64 @llvm.usub.sat.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[TMP2:%.*]] = select i1 [[TMP1]], i64 [[B]], i64 [[A]]			; CHECK-NEXT: [[TMP2:%.*]] = sub i64 0, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = sub i64 [[B]], [[TMP2]]
	; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[A]], [[B]]			; CHECK-NEXT: [[EXTRASUB:%.*]] = sub i64 [[A]], [[B]]
	; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])			; CHECK-NEXT: call void @use(i64 [[EXTRASUB]])
	; CHECK-NEXT: ret i64 [[TMP3]]			; CHECK-NEXT: ret i64 [[TMP2]]
	;			;
	%cmp = icmp ult i64 %a, %b			%cmp = icmp ult i64 %a, %b
	%sub = sub i64 %b, %a			%sub = sub i64 %b, %a
	%sel = select i1 %cmp, i64 0 ,i64 %sub			%sel = select i1 %cmp, i64 0 ,i64 %sub
	%extrasub = sub i64 %a, %b			%extrasub = sub i64 %a, %b
	call void @use(i64 %extrasub)			call void @use(i64 %extrasub)
	ret i64 %sel			ret i64 %sel
	}			}