This is an archive of the discontinued LLVM Phabricator instance.

Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
ClosedPublic

Authored by eeckstein on Dec 2 2014, 5:08 AM.

Download Raw Diff

Details

Reviewers

reames
gottesmm
bkramer

Summary

Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

Beside that, I did also a refactoring: I extracted the creation of the resulting struct in to a separate function CreateOverflowResult().

Diff Detail

Event Timeline

eeckstein updated this revision to Diff 16805.Dec 2 2014, 5:08 AM

eeckstein retitled this revision from to Strength reduce intrinsics with overflow into regular arithmetic operations if possible..

eeckstein updated this object.

eeckstein edited the test plan for this revision. (Show Details)

eeckstein added a reviewer: bkramer.

eeckstein added a subscriber: Unknown Object (MLST).

In the future, please split your changes into the smallest chunks you can. In this case, you've got three pieces:

The refactoring to draw out the factory function - LGTM
The signed and unsigned add cases with existing functions - LGTM
The signed multiplication case with a new helper function - Needs a second opinion. I don't spot anything obviously wrong, but I don't trust my own reasoning around overflow. I suspect you could also use this helper function in the normal signed mul case as well.

If you want to split it for submission and only hold the last feel free.

lib/Transforms/InstCombine/InstCombine.h
337	Nit: CreateOverflowResultTuple or CreateOverflowTuple extremely minor, feel free to ignore if you disagree
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1017	Brief comment here would be good. You appear to be using the fact that a N bit number multiplied by a M bit number can yield a maximum of an N+M+1 bit number right?
lib/Transforms/InstCombine/InstCombineCalls.cpp
372	General comment: Doing refactorings in separate changes from semantic changes makes it much easier and faster to review. Strongly, strongly preferred.
439	This change is clearly fine. As a general point, it feels like this code could be factored to share a lot more of the logic from the normal arithmetic cases. However, if you do that, it should be a separate change.
494	This change is the one I'm not entirely sure of - due to the implementation of the helper function - and would like a second opinion on.

Everything except for the function WillNotOverflowSignedMul looks great. I also agree with Philip that the refactoring should go in as a separate patch. The first part of WillNotOverflowSignedMul looks correct. The second part (SignBits == BitWidth + 1) is more complex and I prefer that someone else would take a look. I believe that Michael Gottesman is a good candidate for reviewing this part of the code.

I did a split of the change:
I committed the refactoring in r224006.

This updated diff is the remaining part: the strength reduction for sub/mul.

I would comment this a little differently. I think it is good to have the Hacker's Delight mention, but IIRC LLVM has some specific rules about this. I would just ask on the list or irc. The actual implementation looks fine to me (it is exactly the same as hacker's delight).

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1016	I would put the comment below about Hacker's delight up here and add a quick high level explanation of what you are doing.
1017	Make sure to mention that underestimating the number of sign bits gives you a more conservative answer so the fact that we are underestimating can not cause us to get the wrong answer.
1021	You should add why it is interesting that the result is n+m significant bits. Otherwise the line does not add anything.
1029	There are 2x ambiguous cases. I would say what they actually are. Also you should mention that we are not handling the second case. I would also specify that you are only handling a part of the first ambiguous case that can be known easily via the information at hand pi.e. that there is a low likelihood that we will have enough information to check the product, but we can check the sign = )].
1038	I would remove this if you do what I suggested with the comment above.

eeckstein updated this revision to Diff 17272.Dec 15 2014, 2:55 AM

eeckstein edited edge metadata.

Michael,

I updated the comments.

I think it is good to have the Hacker's Delight mention, but IIRC LLVM has some specific rules about this.

I changed it to the style of other references in the llvm sources.

LGTM

This revision is now accepted and ready to land.Dec 16 2014, 4:21 PM

committed in r224417

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombine.h

1 line

InstCombineAddSub.cpp

45 lines

InstCombineCalls.cpp

15 lines

test/

Transforms/

InstCombine/

intrinsics.ll

107 lines

Diff 17272

lib/Transforms/InstCombine/InstCombine.h

Context not available.
	bool WillNotOverflowUnsignedAdd(Value LHS, Value RHS, Instruction *CxtI);	bool WillNotOverflowUnsignedAdd(Value LHS, Value RHS, Instruction *CxtI);
	bool WillNotOverflowSignedSub(Value LHS, Value RHS, Instruction *CxtI);	bool WillNotOverflowSignedSub(Value LHS, Value RHS, Instruction *CxtI);
	bool WillNotOverflowUnsignedSub(Value LHS, Value RHS, Instruction *CxtI);	bool WillNotOverflowUnsignedSub(Value LHS, Value RHS, Instruction *CxtI);
		bool WillNotOverflowSignedMul(Value LHS, Value RHS, Instruction *CxtI);
	Value EmitGEPOffset(User GEP);	Value EmitGEPOffset(User GEP);
	Instruction scalarizePHI(ExtractElementInst &EI, PHINode PN);	Instruction scalarizePHI(ExtractElementInst &EI, PHINode PN);
	Value EvaluateInDifferentElementOrder(Value V, ArrayRef<int> Mask);	Value EvaluateInDifferentElementOrder(Value V, ArrayRef<int> Mask);
Context not available.
		reamesUnsubmitted Not Done Reply Inline Actions Nit: CreateOverflowResultTuple or CreateOverflowTuple extremely minor, feel free to ignore if you disagree reames: Nit: CreateOverflowResultTuple or CreateOverflowTuple extremely minor, feel free to ignore if…

lib/Transforms/InstCombine/InstCombineAddSub.cpp

Context not available.
	return false;	return false;
	}	}

		/// \brief Return true if we can prove that:
		/// (mul LHS, RHS) === (mul nsw LHS, RHS)
		bool InstCombiner::WillNotOverflowSignedMul(Value LHS, Value RHS,
		Instruction *CxtI) {
		if (IntegerType *IT = dyn_cast<IntegerType>(LHS->getType())) {

		gottesmmUnsubmitted Not Done Reply Inline Actions I would put the comment below about Hacker's delight up here and add a quick high level explanation of what you are doing. gottesmm: I would put the comment below about Hacker's delight up here and add a quick high level…
		// Multiplying n * m significant bits yields a result of n + m significant
		reamesUnsubmitted Not Done Reply Inline Actions Brief comment here would be good. You appear to be using the fact that a N bit number multiplied by a M bit number can yield a maximum of an N+M+1 bit number right? reames: Brief comment here would be good. You appear to be using the fact that a N bit number…
		gottesmmUnsubmitted Not Done Reply Inline Actions Make sure to mention that underestimating the number of sign bits gives you a more conservative answer so the fact that we are underestimating can not cause us to get the wrong answer. gottesmm: Make sure to mention that underestimating the number of sign bits gives you a more conservative…
		// bits. If the total number of significant bits does not exceed the
		// result bit width (minus 1), there is no overflow.
		// This means if we have enough leading sign bits in the operands
		// we can guarantee that the result does not overflow.
		gottesmmUnsubmitted Not Done Reply Inline Actions You should add why it is interesting that the result is n+m significant bits. Otherwise the line does not add anything. gottesmm: You should add why it is interesting that the result is n+m significant bits. Otherwise the…
		// Ref: "Hacker's Delight" by Henry Warren
		unsigned BitWidth = IT->getBitWidth();

		// Note that underestimating the number of sign bits gives a more
		// conservative answer.
		unsigned SignBits = ComputeNumSignBits(LHS, 0, CxtI) +
		ComputeNumSignBits(RHS, 0, CxtI);

		gottesmmUnsubmitted Not Done Reply Inline Actions There are 2x ambiguous cases. I would say what they actually are. Also you should mention that we are not handling the second case. I would also specify that you are only handling a part of the first ambiguous case that can be known easily via the information at hand pi.e. that there is a low likelihood that we will have enough information to check the product, but we can check the sign = )]. gottesmm: There are 2x ambiguous cases. I would say what they actually are. Also you should mention that…
		// First handle the easy case: if we have enough sign bits there's
		// definitely no overflow.
		if (SignBits > BitWidth + 1)
		return true;

		// There are two ambiguous cases where there can be no overflow:
		// SignBits == BitWidth + 1 and
		// SignBits == BitWidth
		// The second case is difficult to check, therefore we only handle the
		gottesmmUnsubmitted Not Done Reply Inline Actions I would remove this if you do what I suggested with the comment above. gottesmm: I would remove this if you do what I suggested with the comment above.
		// first case.
		if (SignBits == BitWidth + 1) {
		// It overflows only when both arguments are negative and the true
		// product is exactly the minimum negative number.
		// E.g. mul i16 with 17 sign bits: 0xff00 * 0xff80 = 0x8000
		// For simplicity we just check if at least one side is not negative.
		bool LHSNonNegative, LHSNegative;
		bool RHSNonNegative, RHSNegative;
		ComputeSignBit(LHS, LHSNonNegative, LHSNegative, DL, 0, AT, CxtI, DT);
		ComputeSignBit(RHS, RHSNonNegative, RHSNegative, DL, 0, AT, CxtI, DT);
		if (LHSNonNegative \|\| RHSNonNegative)
		return true;
		}
		}
		return false;
		}

	// Checks if any operand is negative and we can convert add to sub.	// Checks if any operand is negative and we can convert add to sub.
	// This function checks for following negative patterns	// This function checks for following negative patterns
	// ADD(XOR(OR(Z, NOT(C)), C)), 1) == NEG(AND(Z, C))	// ADD(XOR(OR(Z, NOT(C)), C)), 1) == NEG(AND(Z, C))
Context not available.

lib/Transforms/InstCombine/InstCombineCalls.cpp

Context not available.
	return CreateOverflowTuple(II, LHS, false, /ReUseName/false);	return CreateOverflowTuple(II, LHS, false, /ReUseName/false);
	}	}
	}	}
		if (II->getIntrinsicID() == Intrinsic::ssub_with_overflow) {
		if (WillNotOverflowSignedSub(LHS, RHS, II)) {
		return CreateOverflowTuple(II, Builder->CreateNSWSub(LHS, RHS), false);
		}
		} else {
		if (WillNotOverflowUnsignedSub(LHS, RHS, II)) {
		return CreateOverflowTuple(II, Builder->CreateNUWSub(LHS, RHS), false);
		}
		}
	break;	break;
		reamesUnsubmitted Not Done Reply Inline Actions This change is clearly fine. As a general point, it feels like this code could be factored to share a lot more of the logic from the normal arithmetic cases. However, if you do that, it should be a separate change. reames: This change is clearly fine. As a general point, it feels like this code could be factored…
	}	}
	case Intrinsic::umul_with_overflow: {	case Intrinsic::umul_with_overflow: {
Context not available.
	/ReUseName/false);	/ReUseName/false);
	}	}
	}	}
		if (II->getIntrinsicID() == Intrinsic::smul_with_overflow) {
		Value LHS = II->getArgOperand(0), RHS = II->getArgOperand(1);
		if (WillNotOverflowSignedMul(LHS, RHS, II)) {
		return CreateOverflowTuple(II, Builder->CreateNSWMul(LHS, RHS), false);
		}
		}
		reamesUnsubmitted Not Done Reply Inline Actions This change is the one I'm not entirely sure of - due to the implementation of the helper function - and would like a second opinion on. reames: This change is the one I'm not entirely sure of - due to the implementation of the helper…
	break;	break;
	case Intrinsic::minnum:	case Intrinsic::minnum:
	case Intrinsic::maxnum: {	case Intrinsic::maxnum: {
Context not available.

test/Transforms/InstCombine/intrinsics.ll

	; RUN: opt -instcombine -S < %s \| FileCheck %s	; RUN: opt -instcombine -S < %s \| FileCheck %s

	%overflow.result = type {i8, i1}	%overflow.result = type {i8, i1}
		%ov.result.32 = type { i32, i1 }

	declare %overflow.result @llvm.uadd.with.overflow.i8(i8, i8)
	declare { i32, i1 } @llvm.sadd.with.overflow.i32(i32, i32)	declare %overflow.result @llvm.uadd.with.overflow.i8(i8, i8) nounwind readnone
	declare %overflow.result @llvm.umul.with.overflow.i8(i8, i8)	declare %overflow.result @llvm.umul.with.overflow.i8(i8, i8) nounwind readnone
		declare %ov.result.32 @llvm.sadd.with.overflow.i32(i32, i32) nounwind readnone
		declare %ov.result.32 @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
		declare %ov.result.32 @llvm.ssub.with.overflow.i32(i32, i32) nounwind readnone
		declare %ov.result.32 @llvm.usub.with.overflow.i32(i32, i32) nounwind readnone
		declare %ov.result.32 @llvm.smul.with.overflow.i32(i32, i32) nounwind readnone
		declare %ov.result.32 @llvm.umul.with.overflow.i32(i32, i32) nounwind readnone
	declare double @llvm.powi.f64(double, i32) nounwind readonly	declare double @llvm.powi.f64(double, i32) nounwind readonly
	declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone	declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone
	declare i32 @llvm.ctlz.i32(i32, i1) nounwind readnone	declare i32 @llvm.ctlz.i32(i32, i1) nounwind readnone
Context not available.
	}	}

	; PR20194	; PR20194
	define { i32, i1 } @saddtest1(i8 %a, i8 %b) {	define %ov.result.32 @saddtest_nsw(i8 %a, i8 %b) {
	%A = sext i8 %a to i32	%A = sext i8 %a to i32
	%B = sext i8 %b to i32	%B = sext i8 %b to i32
	%x = call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %A, i32 %B)	%x = call %ov.result.32 @llvm.sadd.with.overflow.i32(i32 %A, i32 %B)
	ret { i32, i1 } %x	ret %ov.result.32 %x
	; CHECK-LABEL: @saddtest1	; CHECK-LABEL: @saddtest_nsw
	; CHECK: %x = add nsw i32 %A, %B	; CHECK: %x = add nsw i32 %A, %B
	; CHECK-NEXT: %1 = insertvalue { i32, i1 } { i32 undef, i1 false }, i32 %x, 0	; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
	; CHECK-NEXT: ret { i32, i1 } %1	; CHECK-NEXT: ret %ov.result.32 %1
	}	}

		define %ov.result.32 @uaddtest_nuw(i32 %a, i32 %b) {
		%A = and i32 %a, 2147483647
		%B = and i32 %b, 2147483647
		%x = call %ov.result.32 @llvm.uadd.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @uaddtest_nuw
		; CHECK: %x = add nuw i32 %A, %B
		; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
		; CHECK-NEXT: ret %ov.result.32 %1
		}

		define %ov.result.32 @ssubtest_nsw(i8 %a, i8 %b) {
		%A = sext i8 %a to i32
		%B = sext i8 %b to i32
		%x = call %ov.result.32 @llvm.ssub.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @ssubtest_nsw
		; CHECK: %x = sub nsw i32 %A, %B
		; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
		; CHECK-NEXT: ret %ov.result.32 %1
		}

		define %ov.result.32 @usubtest_nuw(i32 %a, i32 %b) {
		%A = or i32 %a, 2147483648
		%B = and i32 %b, 2147483647
		%x = call %ov.result.32 @llvm.usub.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @usubtest_nuw
		; CHECK: %x = sub nuw i32 %A, %B
		; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
		; CHECK-NEXT: ret %ov.result.32 %1
		}

		define %ov.result.32 @smultest1_nsw(i32 %a, i32 %b) {
		%A = and i32 %a, 4095 ; 0xfff
		%B = and i32 %b, 524287; 0x7ffff
		%x = call %ov.result.32 @llvm.smul.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @smultest1_nsw
		; CHECK: %x = mul nsw i32 %A, %B
		; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
		; CHECK-NEXT: ret %ov.result.32 %1
		}

		define %ov.result.32 @smultest2_nsw(i32 %a, i32 %b) {
		%A = ashr i32 %a, 16
		%B = ashr i32 %b, 16
		%x = call %ov.result.32 @llvm.smul.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @smultest2_nsw
		; CHECK: %x = mul nsw i32 %A, %B
		; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
		; CHECK-NEXT: ret %ov.result.32 %1
		}

		define %ov.result.32 @smultest3_sw(i32 %a, i32 %b) {
		%A = ashr i32 %a, 16
		%B = ashr i32 %b, 15
		%x = call %ov.result.32 @llvm.smul.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @smultest3_sw
		; CHECK: %x = call %ov.result.32 @llvm.smul.with.overflow.i32(i32 %A, i32 %B)
		; CHECK-NEXT: ret %ov.result.32 %x
		}

		define %ov.result.32 @umultest_nuw(i32 %a, i32 %b) {
		%A = and i32 %a, 65535 ; 0xffff
		%B = and i32 %b, 65535 ; 0xffff
		%x = call %ov.result.32 @llvm.umul.with.overflow.i32(i32 %A, i32 %B)
		ret %ov.result.32 %x
		; CHECK-LABEL: @umultest_nuw
		; CHECK: %x = mul nuw i32 %A, %B
		; CHECK-NEXT: %1 = insertvalue %ov.result.32 { i32 undef, i1 false }, i32 %x, 0
		; CHECK-NEXT: ret %ov.result.32 %1
		}





	define i8 @umultest1(i8 %A, i1* %overflowPtr) {	define i8 @umultest1(i8 %A, i1* %overflowPtr) {
	%x = call %overflow.result @llvm.umul.with.overflow.i8(i8 0, i8 %A)	%x = call %overflow.result @llvm.umul.with.overflow.i8(i8 0, i8 %A)
	%y = extractvalue %overflow.result %x, 0	%y = extractvalue %overflow.result %x, 0
Context not available.
	; CHECK-NEXT: ret i8 %A	; CHECK-NEXT: ret i8 %A
	}	}

	%ov.result.32 = type { i32, i1 }
	declare %ov.result.32 @llvm.umul.with.overflow.i32(i32, i32) nounwind readnone

	define i32 @umultest3(i32 %n) nounwind {	define i32 @umultest3(i32 %n) nounwind {
	%shr = lshr i32 %n, 2	%shr = lshr i32 %n, 2
	%mul = call %ov.result.32 @llvm.umul.with.overflow.i32(i32 %shr, i32 3)	%mul = call %ov.result.32 @llvm.umul.with.overflow.i32(i32 %shr, i32 3)
Context not available.