This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns with constant/splat values
ClosedPublic

Authored by craig.topper on Aug 22 2018, 10:59 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
majnemer
efriedma

Commits

rGbec15b651646: [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns…
rL340480: [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns…

Summary

If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.

I'm not entirely sure if we can assume the constant is canonicalized to the RHS. So let me know if I need to handle that.

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that. Should the depth limit be bumped by 2 since we went through 2 nodes?

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Aug 22 2018, 10:59 AM

craig.topper added a reviewer: efriedma.Aug 22 2018, 11:37 AM

I'm not entirely sure if we can assume the constant is canonicalized to the RHS.

Instcombine does this in canonicalizeMinMaxWithConstant if the compare has one use. Probably good enough.

Should the depth limit be bumped by 2 since we went through 2 nodes?

Bumping by 1 should be fine given the pattern-matching is cheap; the point of the depth limit is just to prevent non-linear compile-time, anyway.

Can you factor out the clamp matching into a separate helper?

lib/Analysis/ValueTracking.cpp
2385 ↗	(On Diff #162002)	CLow and CHigh have to be constants so you can perform this "sle" check?

craig.topper added inline comments.Aug 22 2018, 12:50 PM

lib/Analysis/ValueTracking.cpp
2385 ↗	(On Diff #162002)	Maybe we could use computeKnownBits to prove an ordering of Low/High in some other cases using leading zeros/ones?

efriedma added inline comments.Aug 22 2018, 1:29 PM

lib/Analysis/ValueTracking.cpp
2385 ↗	(On Diff #162002)	Yes, that would work... but probably best to leave that out for now, until someone finds a case where it matters.

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that.

Thinking about it a bit more, there's probably a good reason for that: if the number of sign bits for the variable input matters, either the SMIN or the SMAX is a no-op.

Hwo tricky would uniform/non-uniform vector support be?

Add a helper function

In D51112#1209885, @efriedma wrote:

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that.

Thinking about it a bit more, there's probably a good reason for that: if the number of sign bits for the variable input matters, either the SMIN or the SMAX is a no-op.

That's true, but are we capable of making that optimization today.

In D51112#1209937, @craig.topper wrote:

In D51112#1209885, @efriedma wrote:

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that.

Thinking about it a bit more, there's probably a good reason for that: if the number of sign bits for the variable input matters, either the SMIN or the SMAX is a no-op.

That's true, but are we capable of making that optimization today.

Not sure off the top of my head? But it's easy to implement.

Remove the recursion for clamps

LGTM

This revision is now accepted and ready to land.Aug 22 2018, 3:56 PM

Closed by commit rL340480: [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns… (authored by ctopper). · Explain WhyAug 22 2018, 4:28 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

ValueTracking.cpp

38 lines

test/

Transforms/

InstCombine/

max_known_bits.ll

78 lines

Diff 162103

llvm/trunk/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 2,203 Lines • ▼ Show 20 Lines
/// for all of the elements in the vector.		/// for all of the elements in the vector.
bool MaskedValueIsZero(const Value *V, const APInt &Mask, unsigned Depth,		bool MaskedValueIsZero(const Value *V, const APInt &Mask, unsigned Depth,
const Query &Q) {		const Query &Q) {
KnownBits Known(Mask.getBitWidth());		KnownBits Known(Mask.getBitWidth());
computeKnownBits(V, Known, Depth, Q);		computeKnownBits(V, Known, Depth, Q);
return Mask.isSubsetOf(Known.Zero);		return Mask.isSubsetOf(Known.Zero);
}		}

		// Match a signed min+max clamp pattern like smax(smin(In, CHigh), CLow).
		// Returns the input and lower/upper bounds.
		static bool isSignedMinMaxClamp(const Value Select, const Value &In,
		const APInt &CLow, const APInt &CHigh) {
		assert(isa<SelectInst>(Select) && "Input should be a SelectInst!");

		const Value LHS, RHS, LHS2, RHS2;
		SelectPatternFlavor SPF = matchSelectPattern(Select, LHS, RHS).Flavor;
		if (SPF != SPF_SMAX && SPF != SPF_SMIN)
		return false;

		if (!match(RHS, m_APInt(CLow)))
		return false;

		SelectPatternFlavor SPF2 = matchSelectPattern(LHS, LHS2, RHS2).Flavor;
		if (getInverseMinMaxFlavor(SPF) != SPF2)
		return false;

		if (!match(RHS2, m_APInt(CHigh)))
		return false;

		if (SPF == SPF_SMIN)
		std::swap(CLow, CHigh);

		In = LHS2;
		return CLow->sle(*CHigh);
		}

/// For vector constants, loop over the elements and find the constant with the		/// For vector constants, loop over the elements and find the constant with the
/// minimum number of sign bits. Return 0 if the value is not a vector constant		/// minimum number of sign bits. Return 0 if the value is not a vector constant
/// or if any element was not analyzed; otherwise, return the count for the		/// or if any element was not analyzed; otherwise, return the count for the
/// element with the minimum number of sign bits.		/// element with the minimum number of sign bits.
static unsigned computeNumSignBitsVectorConstant(const Value *V,		static unsigned computeNumSignBitsVectorConstant(const Value *V,
unsigned TyBits) {		unsigned TyBits) {
const auto *CV = dyn_cast<Constant>(V);		const auto *CV = dyn_cast<Constant>(V);
if (!CV \|\| !CV->getType()->isVectorTy())		if (!CV \|\| !CV->getType()->isVectorTy())
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	if (Tmp != 1) {
Tmp2 = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);		Tmp2 = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);
FirstAnswer = std::min(Tmp, Tmp2);		FirstAnswer = std::min(Tmp, Tmp2);
// We computed what we know about the sign bits as our first		// We computed what we know about the sign bits as our first
// answer. Now proceed to the generic code that uses		// answer. Now proceed to the generic code that uses
// computeKnownBits, and pick whichever answer is better.		// computeKnownBits, and pick whichever answer is better.
}		}
break;		break;

case Instruction::Select:		case Instruction::Select: {
		// If we have a clamp pattern, we know that the number of sign bits will be
		// the minimum of the clamp min/max range.
		const Value *X;
		const APInt CLow, CHigh;
		if (isSignedMinMaxClamp(U, X, CLow, CHigh))
		return std::min(CLow->getNumSignBits(), CHigh->getNumSignBits());

Tmp = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);		Tmp = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);
if (Tmp == 1) break;		if (Tmp == 1) break;
Tmp2 = ComputeNumSignBits(U->getOperand(2), Depth + 1, Q);		Tmp2 = ComputeNumSignBits(U->getOperand(2), Depth + 1, Q);
return std::min(Tmp, Tmp2);		return std::min(Tmp, Tmp2);
		}

case Instruction::Add:		case Instruction::Add:
// Add can have at most one carry bit. Thus we know that the output		// Add can have at most one carry bit. Thus we know that the output
// is, at worst, one more bit than the inputs.		// is, at worst, one more bit than the inputs.
Tmp = ComputeNumSignBits(U->getOperand(0), Depth + 1, Q);		Tmp = ComputeNumSignBits(U->getOperand(0), Depth + 1, Q);
if (Tmp == 1) break;		if (Tmp == 1) break;

// Special case decrementing a value (ADD X, -1):		// Special case decrementing a value (ADD X, -1):
▲ Show 20 Lines • Show All 2,835 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/max_known_bits.ll

Show All 11 Lines	;
%t2 = zext i16 %t1 to i32		%t2 = zext i16 %t1 to i32
%t3 = icmp ult i32 %t2, 255		%t3 = icmp ult i32 %t2, 255
%t4 = select i1 %t3, i32 %t2, i32 255		%t4 = select i1 %t3, i32 %t2, i32 255
%t5 = trunc i32 %t4 to i16		%t5 = trunc i32 %t4 to i16
%t6 = and i16 %t5, 255		%t6 = and i16 %t5, 255
ret i16 %t6		ret i16 %t6
}		}

		; This contains a min/max pair to clamp a value to 12 bits.
		; By analyzing the clamp pattern, we can tell the add doesn't have signed overflow.
		define i16 @min_max_clamp(i16 %x) {
		; CHECK-LABEL: @min_max_clamp(
		; CHECK-NEXT: [[A:%.]] = icmp sgt i16 [[X:%.]], -2048
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 -2048
		; CHECK-NEXT: [[C:%.*]] = icmp slt i16 [[B]], 2047
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 2047
		; CHECK-NEXT: [[E:%.*]] = add nsw i16 [[D]], 1
		; CHECK-NEXT: ret i16 [[E]]
		;
		%a = icmp sgt i16 %x, -2048
		%b = select i1 %a, i16 %x, i16 -2048
		%c = icmp slt i16 %b, 2047
		%d = select i1 %c, i16 %b, i16 2047
		%e = add i16 %d, 1
		ret i16 %e
		}

		; Same as above with min/max reversed.
		define i16 @min_max_clamp_2(i16 %x) {
		; CHECK-LABEL: @min_max_clamp_2(
		; CHECK-NEXT: [[A:%.]] = icmp slt i16 [[X:%.]], 2047
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 2047
		; CHECK-NEXT: [[C:%.*]] = icmp sgt i16 [[B]], -2048
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 -2048
		; CHECK-NEXT: [[E:%.*]] = add nsw i16 [[D]], 1
		; CHECK-NEXT: ret i16 [[E]]
		;
		%a = icmp slt i16 %x, 2047
		%b = select i1 %a, i16 %x, i16 2047
		%c = icmp sgt i16 %b, -2048
		%d = select i1 %c, i16 %b, i16 -2048
		%e = add i16 %d, 1
		ret i16 %e
		}

		; This contains a min/max pair to clamp a value to 12 bits.
		; By analyzing the clamp pattern, we can tell that the second add doesn't
		; overflow the original type and can be moved before the extend.
		define i32 @min_max_clamp_3(i16 %x) {
		; CHECK-LABEL: @min_max_clamp_3(
		; CHECK-NEXT: [[A:%.]] = icmp sgt i16 [[X:%.]], -2048
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 -2048
		; CHECK-NEXT: [[C:%.*]] = icmp slt i16 [[B]], 2047
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 2047
		; CHECK-NEXT: [[G:%.*]] = sext i16 [[D]] to i32
		; CHECK-NEXT: ret i32 [[G]]
		;
		%a = icmp sgt i16 %x, -2048
		%b = select i1 %a, i16 %x, i16 -2048
		%c = icmp slt i16 %b, 2047
		%d = select i1 %c, i16 %b, i16 2047
		%e = add i16 %d, 1
		%f = sext i16 %e to i32
		%g = add i32 %f, -1
		ret i32 %g
		}

		; Same as above with min/max order reversed
		define i32 @min_max_clamp_4(i16 %x) {
		; CHECK-LABEL: @min_max_clamp_4(
		; CHECK-NEXT: [[A:%.]] = icmp slt i16 [[X:%.]], 2047
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 2047
		; CHECK-NEXT: [[C:%.*]] = icmp sgt i16 [[B]], -2048
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 -2048
		; CHECK-NEXT: [[G:%.*]] = sext i16 [[D]] to i32
		; CHECK-NEXT: ret i32 [[G]]
		;
		%a = icmp slt i16 %x, 2047
		%b = select i1 %a, i16 %x, i16 2047
		%c = icmp sgt i16 %b, -2048
		%d = select i1 %c, i16 %b, i16 -2048
		%e = add i16 %d, 1
		%f = sext i16 %e to i32
		%g = add i32 %f, -1
		ret i32 %g
		}