This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Avoid signed integer overflow
ClosedPublic

Authored by mssimpso on Aug 19 2016, 12:02 PM.

Download Raw Diff

Details

Reviewers

dsanders
vkalintiris
mkuper

Commits

rGdf2ab917ad82: [SLP] Avoid signed integer overflow
rL279562: [SLP] Avoid signed integer overflow

Summary

The test case included with r279125 exposed an existing signed integer overflow. Since getTreeCost can return INT_MAX, we have to be careful to avoid undefined behavior when summing it with other costs, such as getReductionCost.

This patch removes the possibility of assigning a cost INT_MAX. Since we were previously using INT_MAX as an indicator for "should not vectorize", we now explicitly check this condition with "canVectorizeTree" before computing a cost.

This patch adds a run-line to the test case used for r279125 that ensures we don't vectorize it. Previously, this line would vectorize the test case because of an undefined cost.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 68718.Aug 19 2016, 12:02 PM

mssimpso retitled this revision from to [SLP] Avoid signed integer overflow.

mssimpso updated this object.

mssimpso added a reviewer: mkuper.

mssimpso added subscribers: hans, llvm-commits, kcc.

Herald added subscribers: mzolotukhin, mcrosier. · View Herald TranscriptAug 19 2016, 12:02 PM

Using SaturatingAdd here seems right, but:

a) I'd prefer it if the test did check something beyond "don't crash".
b) It would be better it if one of the language lawyers looked at the SaturatingAdd implementation. In particular, I'm not sure casting an out-of-range unsigned to signed is well-defined.

In D23723#521236, @mkuper wrote:

Using SaturatingAdd here seems right, but:

a) I'd prefer it if the test did check something beyond "don't crash".

Yeah, I agree. The difficulty now is that the total cost will be computed as INT_MAX (correctly), which will prevent vectorization in the first place. This really obfuscates the original problem. INT_MAX is set because of the depth < 3 check. The best thing to do might be to add a new command line option specifying the depth at which we declare the tree completely unprofitable to vectorize (instead of hard-coding it to 3). We can then raise the threshold in one run-line to avoid INT_MAX and re-enable vectorization. Another run-line can use the default threshold and ensure the cost doesn't wrap.

b) It would be better it if one of the language lawyers looked at the SaturatingAdd implementation. In particular, I'm not sure casting an out-of-range unsigned to signed is well-defined.

Sure, I'll add more folks to the review.

mssimpso added a reviewer: dsanders.Aug 19 2016, 3:58 PM

Daniel,

Would you mind taking a look at the signed version of SaturatingAdd I added when you have a chance. I think you added the existing unsigned version. Thanks!

compnerd added a subscriber: compnerd.Aug 19 2016, 5:11 PM

mssimpso mentioned this in D23410: [SLP] Initialize VectorizedValue when gathering.Aug 20 2016, 8:33 AM

Hi Michael,

According to @gberry, the signed SaturatingAdd implementation I added isn't well-defined as you guessed. I started rewriting it, but I think a better solution might be to avoid INT_MAX all together. We're using INT_MAX to essentially indicate "do not vectorize". I think it makes better sense to put the INT_MAX logic in a function that returns a bool.

I've updated the patch to do this. I've also kept the existing test case intact, and added a new run-line checking for no vectorization. The test was being vectorized by chance due to the undefined behavior.

Herald added a reviewer: vkalintiris. · View Herald TranscriptAug 22 2016, 2:37 PM

This LGTM except for the name. :-)

I think the name shouldVectorizeTree() implies that if true, we will vectorize - while in fact, we still need to pass the cost check.
I'd say either (a) rename it to something more explicit (something that references the fact we're checking whether the tree is tiny), or (b) have a single function to perform both the "tininess" check and the cost check.

This revision is now accepted and ready to land.Aug 22 2016, 3:59 PM

Thanks, Michael! I'll rewrite the function as isTreeTinyAndNotFullyVectorizable.

Closed by commit rL279562: [SLP] Avoid signed integer overflow (authored by mssimpso). · Explain WhyAug 23 2016, 1:57 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Support/

MathExtras.h

22 lines

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

3 lines

test/

Transforms/

SLPVectorizer/

AArch64/

gather-root.ll

107 lines

Diff 68718

include/llvm/Support/MathExtras.h

Show First 20 Lines • Show All 761 Lines • ▼ Show 20 Lines	SaturatingAdd(T X, T Y, bool *ResultOverflowed = nullptr) {
T Z = X + Y;		T Z = X + Y;
Overflowed = (Z < X \|\| Z < Y);		Overflowed = (Z < X \|\| Z < Y);
if (Overflowed)		if (Overflowed)
return std::numeric_limits<T>::max();		return std::numeric_limits<T>::max();
else		else
return Z;		return Z;
}		}

		/// Add two signed integers, X and Y, of type T. Clamp the result to the
		/// maximum or minimum representable value of T on overflow. ResultOverflowed
		/// indicates if the result is larger than the maximum representable value of
		/// type T or smaller than the minimum representable value of type T.
		template <typename T>
		typename std::enable_if<std::is_signed<T>::value, T>::type
		SaturatingAdd(T X, T Y, bool *ResultOverflowed = nullptr) {
		typedef typename std::make_unsigned<T>::type U;
		bool Dummy;
		bool &Overflowed = ResultOverflowed ? *ResultOverflowed : Dummy;
		T Z = (U)X + (U)Y;
		bool OverflowedNegative = (X < 0 && Y < 0 && Z >= 0);
		bool OverflowedPositive = (X >= 0 && Y >= 0 && Z < 0);
		Overflowed = OverflowedNegative \|\| OverflowedPositive;
		if (OverflowedNegative)
		return std::numeric_limits<T>::min();
		else if (OverflowedPositive)
		return std::numeric_limits<T>::max();
		else
		return Z;
		}

/// Multiply two unsigned integers, X and Y, of type T. Clamp the result to the		/// Multiply two unsigned integers, X and Y, of type T. Clamp the result to the
/// maximum representable value of T on overflow. ResultOverflowed indicates if		/// maximum representable value of T on overflow. ResultOverflowed indicates if
/// the result is larger than the maximum representable value of type T.		/// the result is larger than the maximum representable value of type T.
template <typename T>		template <typename T>
typename std::enable_if<std::is_unsigned<T>::value, T>::type		typename std::enable_if<std::is_unsigned<T>::value, T>::type
SaturatingMultiply(T X, T Y, bool *ResultOverflowed = nullptr) {		SaturatingMultiply(T X, T Y, bool *ResultOverflowed = nullptr) {
bool Dummy;		bool Dummy;
bool &Overflowed = ResultOverflowed ? *ResultOverflowed : Dummy;		bool &Overflowed = ResultOverflowed ? *ResultOverflowed : Dummy;
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 4,168 Lines • ▼ Show 20 Lines	for (; i < NumReducedVals - ReduxWidth + 1; i += ReduxWidth) {
V.buildTree(VL, ReductionOps);		V.buildTree(VL, ReductionOps);
if (V.shouldReorder()) {		if (V.shouldReorder()) {
SmallVector<Value *, 8> Reversed(VL.rbegin(), VL.rend());		SmallVector<Value *, 8> Reversed(VL.rbegin(), VL.rend());
V.buildTree(Reversed, ReductionOps);		V.buildTree(Reversed, ReductionOps);
}		}
V.computeMinimumValueSizes();		V.computeMinimumValueSizes();

// Estimate cost.		// Estimate cost.
int Cost = V.getTreeCost() + getReductionCost(TTI, ReducedVals[i]);		int Cost =
		SaturatingAdd(V.getTreeCost(), getReductionCost(TTI, ReducedVals[i]));
if (Cost >= -SLPCostThreshold)		if (Cost >= -SLPCostThreshold)
break;		break;

DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:" << Cost		DEBUG(dbgs() << "SLP: Vectorizing horizontal reduction at cost:" << Cost
<< ". (HorRdx)\n");		<< ". (HorRdx)\n");

// Vectorize a tree.		// Vectorize a tree.
DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();		DebugLoc Loc = cast<Instruction>(ReducedVals[i])->getDebugLoc();
▲ Show 20 Lines • Show All 558 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s --check-prefix=DEFAULT			; REQUIRES: asserts
	; RUN: opt < %s -slp-recursion-max-depth=0 -slp-vectorizer -S \| FileCheck %s --check-prefix=GATHER			; RUN: opt < %s -slp-recursion-max-depth=0 -slp-vectorizer -S

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	@a = common global [80 x i8] zeroinitializer, align 16			define void @PR28330() {

	; DEFAULT-LABEL: @PR28330(
	; DEFAULT: %tmp17 = phi i32 [ %tmp34, %for.body ], [ 0, %entry ]
	; DEFAULT: %tmp18 = phi i32 [ %tmp35, %for.body ], [ %n, %entry ]
	; DEFAULT: %[[S0:.+]] = select <8 x i1> %1, <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; DEFAULT: %[[R0:.+]] = shufflevector <8 x i32> %[[S0]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; DEFAULT: %[[R1:.+]] = add <8 x i32> %[[S0]], %[[R0]]
	; DEFAULT: %[[R2:.+]] = shufflevector <8 x i32> %[[R1]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; DEFAULT: %[[R3:.+]] = add <8 x i32> %[[R1]], %[[R2]]
	; DEFAULT: %[[R4:.+]] = shufflevector <8 x i32> %[[R3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; DEFAULT: %[[R5:.+]] = add <8 x i32> %[[R3]], %[[R4]]
	; DEFAULT: %[[R6:.+]] = extractelement <8 x i32> %[[R5]], i32 0
	; DEFAULT: %tmp34 = add i32 %[[R6]], %tmp17
	;
	; GATHER-LABEL: @PR28330(
	; GATHER: %tmp17 = phi i32 [ %tmp34, %for.body ], [ 0, %entry ]
	; GATHER: %tmp18 = phi i32 [ %tmp35, %for.body ], [ %n, %entry ]
	; GATHER: %tmp19 = select i1 %tmp1, i32 -720, i32 -80
	; GATHER: %tmp21 = select i1 %tmp3, i32 -720, i32 -80
	; GATHER: %tmp23 = select i1 %tmp5, i32 -720, i32 -80
	; GATHER: %tmp25 = select i1 %tmp7, i32 -720, i32 -80
	; GATHER: %tmp27 = select i1 %tmp9, i32 -720, i32 -80
	; GATHER: %tmp29 = select i1 %tmp11, i32 -720, i32 -80
	; GATHER: %tmp31 = select i1 %tmp13, i32 -720, i32 -80
	; GATHER: %tmp33 = select i1 %tmp15, i32 -720, i32 -80
	; GATHER: %[[I0:.+]] = insertelement <8 x i32> undef, i32 %tmp19, i32 0
	; GATHER: %[[I1:.+]] = insertelement <8 x i32> %[[I0]], i32 %tmp21, i32 1
	; GATHER: %[[I2:.+]] = insertelement <8 x i32> %[[I1]], i32 %tmp23, i32 2
	; GATHER: %[[I3:.+]] = insertelement <8 x i32> %[[I2]], i32 %tmp25, i32 3
	; GATHER: %[[I4:.+]] = insertelement <8 x i32> %[[I3]], i32 %tmp27, i32 4
	; GATHER: %[[I5:.+]] = insertelement <8 x i32> %[[I4]], i32 %tmp29, i32 5
	; GATHER: %[[I6:.+]] = insertelement <8 x i32> %[[I5]], i32 %tmp31, i32 6
	; GATHER: %[[I7:.+]] = insertelement <8 x i32> %[[I6]], i32 %tmp33, i32 7
	; GATHER: %[[R0:.+]] = shufflevector <8 x i32> %[[I7]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
	; GATHER: %[[R1:.+]] = add <8 x i32> %[[I7]], %[[R0]]
	; GATHER: %[[R2:.+]] = shufflevector <8 x i32> %[[R1]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; GATHER: %[[R3:.+]] = add <8 x i32> %[[R1]], %[[R2]]
	; GATHER: %[[R4:.+]] = shufflevector <8 x i32> %[[R3]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; GATHER: %[[R5:.+]] = add <8 x i32> %[[R3]], %[[R4]]
	; GATHER: %[[R6:.+]] = extractelement <8 x i32> %[[R5]], i32 0
	; GATHER: %tmp34 = add i32 %[[R6]], %tmp17

	define void @PR28330(i32 %n) {
	entry:			entry:
	%tmp0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	%tmp1 = icmp eq i8 %tmp0, 0
	%tmp2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	%tmp3 = icmp eq i8 %tmp2, 0
	%tmp4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	%tmp5 = icmp eq i8 %tmp4, 0
	%tmp6 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	%tmp7 = icmp eq i8 %tmp6, 0
	%tmp8 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	%tmp9 = icmp eq i8 %tmp8, 0
	%tmp10 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	%tmp11 = icmp eq i8 %tmp10, 0
	%tmp12 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	%tmp13 = icmp eq i8 %tmp12, 0
	%tmp14 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	%tmp15 = icmp eq i8 %tmp14, 0
	br label %for.body			br label %for.body

	for.body:			for.body:
	%tmp17 = phi i32 [ %tmp34, %for.body ], [ 0, %entry ]			%s.047 = phi i32 [ 0, %entry ], [ %add82, %for.body ]
	%tmp18 = phi i32 [ %tmp35, %for.body ], [ %n, %entry ]			%sub5.sub = select i1 undef, i32 undef, i32 undef
	%tmp19 = select i1 %tmp1, i32 -720, i32 -80			%add = add nsw i32 %sub5.sub, %s.047
	%tmp20 = add i32 %tmp17, %tmp19			%v.1 = select i1 undef, i32 undef, i32 undef
	%tmp21 = select i1 %tmp3, i32 -720, i32 -80			%add16 = add nsw i32 %add, %v.1
	%tmp22 = add i32 %tmp20, %tmp21			%sub25.sub21 = select i1 undef, i32 undef, i32 undef
	%tmp23 = select i1 %tmp5, i32 -720, i32 -80			%add27 = add nsw i32 %add16, %sub25.sub21
	%tmp24 = add i32 %tmp22, %tmp23			%v.3 = select i1 undef, i32 undef, i32 undef
	%tmp25 = select i1 %tmp7, i32 -720, i32 -80			%add38 = add nsw i32 %add27, %v.3
	%tmp26 = add i32 %tmp24, %tmp25			%sub47.sub43 = select i1 undef, i32 undef, i32 undef
	%tmp27 = select i1 %tmp9, i32 -720, i32 -80			%add49 = add nsw i32 %add38, %sub47.sub43
	%tmp28 = add i32 %tmp26, %tmp27			%v.5 = select i1 undef, i32 undef, i32 undef
	%tmp29 = select i1 %tmp11, i32 -720, i32 -80			%add60 = add nsw i32 %add49, %v.5
	%tmp30 = add i32 %tmp28, %tmp29			%sub69.sub65 = select i1 undef, i32 undef, i32 undef
	%tmp31 = select i1 %tmp13, i32 -720, i32 -80			%add71 = add nsw i32 %add60, %sub69.sub65
	%tmp32 = add i32 %tmp30, %tmp31			%v.7 = select i1 undef, i32 undef, i32 undef
	%tmp33 = select i1 %tmp15, i32 -720, i32 -80			%add82 = add nsw i32 %add71, %v.7
	%tmp34 = add i32 %tmp32, %tmp33			br label %for.body
	%tmp35 = add nsw i32 %tmp18, -1
	%tmp36 = icmp eq i32 %tmp35, 0
	br i1 %tmp36, label %for.end, label %for.body

	for.end:
	ret void
	}			}