This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns with constant/splat values
ClosedPublic

Authored by craig.topper on Aug 22 2018, 10:59 AM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
majnemer
efriedma

Commits

rGbec15b651646: [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns…
rL340480: [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns…

Summary

If we have a min/max pair we can do a better job of counting sign bits if we look at them together. This is similar to what is done in the SelectionDAG version of computeNumSignBits for ISD::SMAX/SMIN.

I'm not entirely sure if we can assume the constant is canonicalized to the RHS. So let me know if I need to handle that.

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that. Should the depth limit be bumped by 2 since we went through 2 nodes?

Diff Detail

Event Timeline

craig.topper created this revision.Aug 22 2018, 10:59 AM

craig.topper added a reviewer: efriedma.Aug 22 2018, 11:37 AM

I'm not entirely sure if we can assume the constant is canonicalized to the RHS.

Instcombine does this in canonicalizeMinMaxWithConstant if the compare has one use. Probably good enough.

Should the depth limit be bumped by 2 since we went through 2 nodes?

Bumping by 1 should be fine given the pattern-matching is cheap; the point of the depth limit is just to prevent non-linear compile-time, anyway.

Can you factor out the clamp matching into a separate helper?

lib/Analysis/ValueTracking.cpp
2385	CLow and CHigh have to be constants so you can perform this "sle" check?

craig.topper added inline comments.Aug 22 2018, 12:50 PM

lib/Analysis/ValueTracking.cpp
2385	Maybe we could use computeKnownBits to prove an ordering of Low/High in some other cases using leading zeros/ones?

efriedma added inline comments.Aug 22 2018, 1:29 PM

lib/Analysis/ValueTracking.cpp
2385	Yes, that would work... but probably best to leave that out for now, until someone finds a case where it matters.

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that.

Thinking about it a bit more, there's probably a good reason for that: if the number of sign bits for the variable input matters, either the SMIN or the SMAX is a no-op.

Hwo tricky would uniform/non-uniform vector support be?

Add a helper function

In D51112#1209885, @efriedma wrote:

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that.

Thinking about it a bit more, there's probably a good reason for that: if the number of sign bits for the variable input matters, either the SMIN or the SMAX is a no-op.

That's true, but are we capable of making that optimization today.

In D51112#1209937, @craig.topper wrote:

In D51112#1209885, @efriedma wrote:

I'm still calling computeNumSignBits on the other input. I don't think SelectionDAG does that.

Thinking about it a bit more, there's probably a good reason for that: if the number of sign bits for the variable input matters, either the SMIN or the SMAX is a no-op.

That's true, but are we capable of making that optimization today.

Not sure off the top of my head? But it's easy to implement.

Remove the recursion for clamps

LGTM

This revision is now accepted and ready to land.Aug 22 2018, 3:56 PM

Closed by commit rL340480: [ValueTracking] Teach computeNumSignBits to understand min/max clamp patterns… (authored by ctopper). · Explain WhyAug 22 2018, 4:28 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Analysis/

ValueTracking.cpp

22 lines

test/

Transforms/

InstCombine/

max_known_bits.ll

78 lines

Diff 162002

lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 2,364 Lines • ▼ Show 20 Lines	if (Tmp != 1) {
Tmp2 = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);		Tmp2 = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);
FirstAnswer = std::min(Tmp, Tmp2);		FirstAnswer = std::min(Tmp, Tmp2);
// We computed what we know about the sign bits as our first		// We computed what we know about the sign bits as our first
// answer. Now proceed to the generic code that uses		// answer. Now proceed to the generic code that uses
// computeKnownBits, and pick whichever answer is better.		// computeKnownBits, and pick whichever answer is better.
}		}
break;		break;

case Instruction::Select:		case Instruction::Select: {
		// If we have a clamp pattern, we know that the number of sign bits will be
		// the minimum of the clamp min/max range.
		const Value LHS, RHS, LHS2, RHS2;
		const APInt CLow, CHigh;
		SelectPatternFlavor SPF = matchSelectPattern(U, LHS, RHS).Flavor;
		if ((SPF == SPF_SMAX \|\| SPF == SPF_SMIN) && match(RHS, m_APInt(CLow))) {
		SelectPatternFlavor SPF2 = matchSelectPattern(LHS, LHS2, RHS2).Flavor;
		if (getInverseMinMaxFlavor(SPF) == SPF2 && match(RHS2, m_APInt(CHigh))) {
		if (SPF == SPF_SMIN)
		std::swap(CLow, CHigh);

		if (CLow->sle(*CHigh)) {
		efriedmaUnsubmitted Not Done Reply Inline Actions CLow and CHigh have to be constants so you can perform this "sle" check? efriedma: CLow and CHigh have to be constants so you can perform this "sle" check?
		craig.topperAuthorUnsubmitted Not Done Reply Inline Actions Maybe we could use computeKnownBits to prove an ordering of Low/High in some other cases using leading zeros/ones? craig.topper: Maybe we could use computeKnownBits to prove an ordering of Low/High in some other cases using…
		efriedmaUnsubmitted Not Done Reply Inline Actions Yes, that would work... but probably best to leave that out for now, until someone finds a case where it matters. efriedma: Yes, that would work... but probably best to leave that out for now, until someone finds a case…
		Tmp = ComputeNumSignBits(LHS2, Depth + 1, Q);
		return std::max(Tmp, std::min(CLow->getNumSignBits(),
		CHigh->getNumSignBits()));
		}
		}
		}

Tmp = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);		Tmp = ComputeNumSignBits(U->getOperand(1), Depth + 1, Q);
if (Tmp == 1) break;		if (Tmp == 1) break;
Tmp2 = ComputeNumSignBits(U->getOperand(2), Depth + 1, Q);		Tmp2 = ComputeNumSignBits(U->getOperand(2), Depth + 1, Q);
return std::min(Tmp, Tmp2);		return std::min(Tmp, Tmp2);
		}

case Instruction::Add:		case Instruction::Add:
// Add can have at most one carry bit. Thus we know that the output		// Add can have at most one carry bit. Thus we know that the output
// is, at worst, one more bit than the inputs.		// is, at worst, one more bit than the inputs.
Tmp = ComputeNumSignBits(U->getOperand(0), Depth + 1, Q);		Tmp = ComputeNumSignBits(U->getOperand(0), Depth + 1, Q);
if (Tmp == 1) break;		if (Tmp == 1) break;

// Special case decrementing a value (ADD X, -1):		// Special case decrementing a value (ADD X, -1):
▲ Show 20 Lines • Show All 2,835 Lines • Show Last 20 Lines

test/Transforms/InstCombine/max_known_bits.ll

Show All 11 Lines	;
%t2 = zext i16 %t1 to i32		%t2 = zext i16 %t1 to i32
%t3 = icmp ult i32 %t2, 255		%t3 = icmp ult i32 %t2, 255
%t4 = select i1 %t3, i32 %t2, i32 255		%t4 = select i1 %t3, i32 %t2, i32 255
%t5 = trunc i32 %t4 to i16		%t5 = trunc i32 %t4 to i16
%t6 = and i16 %t5, 255		%t6 = and i16 %t5, 255
ret i16 %t6		ret i16 %t6
}		}

		; This contains a min/max pair to clamp a value to 12 bits.
		; By analyzing the clamp pattern, we can tell the add doesn't have signed overflow.
		define i16 @min_max_clamp(i16 %x) {
		; CHECK-LABEL: @min_max_clamp(
		; CHECK-NEXT: [[A:%.]] = icmp sgt i16 [[X:%.]], -2048
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 -2048
		; CHECK-NEXT: [[C:%.*]] = icmp slt i16 [[B]], 2047
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 2047
		; CHECK-NEXT: [[E:%.*]] = add nsw i16 [[D]], 1
		; CHECK-NEXT: ret i16 [[E]]
		;
		%a = icmp sgt i16 %x, -2048
		%b = select i1 %a, i16 %x, i16 -2048
		%c = icmp slt i16 %b, 2047
		%d = select i1 %c, i16 %b, i16 2047
		%e = add i16 %d, 1
		ret i16 %e
		}

		; Same as above with min/max reversed.
		define i16 @min_max_clamp_2(i16 %x) {
		; CHECK-LABEL: @min_max_clamp_2(
		; CHECK-NEXT: [[A:%.]] = icmp slt i16 [[X:%.]], 2047
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 2047
		; CHECK-NEXT: [[C:%.*]] = icmp sgt i16 [[B]], -2048
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 -2048
		; CHECK-NEXT: [[E:%.*]] = add nsw i16 [[D]], 1
		; CHECK-NEXT: ret i16 [[E]]
		;
		%a = icmp slt i16 %x, 2047
		%b = select i1 %a, i16 %x, i16 2047
		%c = icmp sgt i16 %b, -2048
		%d = select i1 %c, i16 %b, i16 -2048
		%e = add i16 %d, 1
		ret i16 %e
		}

		; This contains a min/max pair to clamp a value to 12 bits.
		; By analyzing the clamp pattern, we can tell that the second add doesn't
		; overflow the original type and can be moved before the extend.
		define i32 @min_max_clamp_3(i16 %x) {
		; CHECK-LABEL: @min_max_clamp_3(
		; CHECK-NEXT: [[A:%.]] = icmp sgt i16 [[X:%.]], -2048
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 -2048
		; CHECK-NEXT: [[C:%.*]] = icmp slt i16 [[B]], 2047
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 2047
		; CHECK-NEXT: [[G:%.*]] = sext i16 [[D]] to i32
		; CHECK-NEXT: ret i32 [[G]]
		;
		%a = icmp sgt i16 %x, -2048
		%b = select i1 %a, i16 %x, i16 -2048
		%c = icmp slt i16 %b, 2047
		%d = select i1 %c, i16 %b, i16 2047
		%e = add i16 %d, 1
		%f = sext i16 %e to i32
		%g = add i32 %f, -1
		ret i32 %g
		}

		; Same as above with min/max order reversed
		define i32 @min_max_clamp_4(i16 %x) {
		; CHECK-LABEL: @min_max_clamp_4(
		; CHECK-NEXT: [[A:%.]] = icmp slt i16 [[X:%.]], 2047
		; CHECK-NEXT: [[B:%.*]] = select i1 [[A]], i16 [[X]], i16 2047
		; CHECK-NEXT: [[C:%.*]] = icmp sgt i16 [[B]], -2048
		; CHECK-NEXT: [[D:%.*]] = select i1 [[C]], i16 [[B]], i16 -2048
		; CHECK-NEXT: [[G:%.*]] = sext i16 [[D]] to i32
		; CHECK-NEXT: ret i32 [[G]]
		;
		%a = icmp slt i16 %x, 2047
		%b = select i1 %a, i16 %x, i16 2047
		%c = icmp sgt i16 %b, -2048
		%d = select i1 %c, i16 %b, i16 -2048
		%e = add i16 %d, 1
		%f = sext i16 %e to i32
		%g = add i32 %f, -1
		ret i32 %g
		}