This is an archive of the discontinued LLVM Phabricator instance.

[BypassSlowDivision] Improve our handling of divisions by constants
ClosedPublic

Authored by sanjoy on Sep 25 2017, 6:04 PM.

Download Raw Diff

Details

Reviewers

Commits

rGaa92cae14e99: [BypassSlowDivision] Improve our handling of divisions by constants
rGeda7a86d42ff: [BypassSlowDivision] Improve our handling of divisions by constants
rL319677: [BypassSlowDivision] Improve our handling of divisions by constants
rL314253: [BypassSlowDivision] Improve our handling of divisions by constants

Summary

Don't bail out on constant divisors for divisions that can be narrowed without
introducing control flow . This gives us a 32 bit multiply instead of an
emulated 64 bit multiply in the generated PTX assembly.

Diff Detail

Build Status

Buildable 10607
Build 10607: arc lint + arc unit

Event Timeline

sanjoy created this revision.Sep 25 2017, 6:04 PM

Herald added subscribers: mcrosier, jholewinski. · View Herald TranscriptSep 25 2017, 6:04 PM

jlebar added inline comments.Sep 25 2017, 11:17 PM

lib/Transforms/Utils/BypassSlowDivision.cpp
359	The divisor also stays a constant in the other two cases, so is this really the thing that makes us want to do this transformation but not the other ones when the divisor is a constant?

sanjoy added inline comments.Sep 26 2017, 10:08 AM

lib/Transforms/Utils/BypassSlowDivision.cpp
359	The way I worked this out in my head is that, for the optimization to be worth it, the perf improvement from doing a shorter op must be more than the perf regression due to control flow. When the op is divide the assumption is that this tradeoff is worth it, but when the op is multiply (division by constant) this tradeoff is not worth it. However, if we can narrow the op (divide or multiply) without any control flow then there is no tradeoff -- we should always narrow the op.

update comments

jlebar accepted this revision.Sep 26 2017, 2:48 PM

This revision is now accepted and ready to land.Sep 26 2017, 2:48 PM

Harbormaster completed remote builds in B10586: Diff 116626.Sep 26 2017, 3:31 PM

Harbormaster completed remote builds in B10607: Diff 116714.

Closed by commit rL314253: [BypassSlowDivision] Improve our handling of divisions by constants (authored by sanjoy). · Explain WhySep 26 2017, 3:35 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Utils/

BypassSlowDivision.cpp

20 lines

test/

Transforms/

CodeGenPrepare/

NVPTX/

bypass-slow-div.ll

77 lines

Diff 116714

lib/Transforms/Utils/BypassSlowDivision.cpp

	Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines
	}			}

	/// Substitutes the div/rem instruction with code that checks the value of the			/// Substitutes the div/rem instruction with code that checks the value of the
	/// operands and uses a shorter-faster div/rem instruction when possible.			/// operands and uses a shorter-faster div/rem instruction when possible.
	Optional<QuotRemPair> FastDivInsertionTask::insertFastDivAndRem() {			Optional<QuotRemPair> FastDivInsertionTask::insertFastDivAndRem() {
	Value *Dividend = SlowDivOrRem->getOperand(0);			Value *Dividend = SlowDivOrRem->getOperand(0);
	Value *Divisor = SlowDivOrRem->getOperand(1);			Value *Divisor = SlowDivOrRem->getOperand(1);

	if (isa<ConstantInt>(Divisor)) {
	// Keep division by a constant for DAGCombiner.
	return None;
	}

	VisitedSetTy SetL;			VisitedSetTy SetL;
	ValueRange DividendRange = getValueRange(Dividend, SetL);			ValueRange DividendRange = getValueRange(Dividend, SetL);
	if (DividendRange == VALRNG_LIKELY_LONG)			if (DividendRange == VALRNG_LIKELY_LONG)
	return None;			return None;

	VisitedSetTy SetR;			VisitedSetTy SetR;
	ValueRange DivisorRange = getValueRange(Divisor, SetR);			ValueRange DivisorRange = getValueRange(Divisor, SetR);
	if (DivisorRange == VALRNG_LIKELY_LONG)			if (DivisorRange == VALRNG_LIKELY_LONG)
	return None;			return None;

	bool DividendShort = (DividendRange == VALRNG_KNOWN_SHORT);			bool DividendShort = (DividendRange == VALRNG_KNOWN_SHORT);
	bool DivisorShort = (DivisorRange == VALRNG_KNOWN_SHORT);			bool DivisorShort = (DivisorRange == VALRNG_KNOWN_SHORT);

	if (DividendShort && DivisorShort) {			if (DividendShort && DivisorShort) {
	// If both operands are known to be short then just replace the long			// If both operands are known to be short then just replace the long
	// division with a short one in-place.			// division with a short one in-place. Since we're not introducing control
				// flow in this case, narrowing the division is always a win, even if the
				// divisor is a constant (and will later get replaced by a multiplication).
				jlebarUnsubmitted Not Done Reply Inline Actions The divisor also stays a constant in the other two cases, so is this really the thing that makes us want to do this transformation but not the other ones when the divisor is a constant? jlebar: The divisor also stays a constant in the other two cases, so is this really the thing that…
				sanjoyAuthorUnsubmitted Not Done Reply Inline Actions The way I worked this out in my head is that, for the optimization to be worth it, the perf improvement from doing a shorter op must be more than the perf regression due to control flow. When the op is divide the assumption is that this tradeoff is worth it, but when the op is multiply (division by constant) this tradeoff is not worth it. However, if we can narrow the op (divide or multiply) without any control flow then there is no tradeoff -- we should always narrow the op. sanjoy: The way I worked this out in my head is that, for the optimization to be worth it, the perf…

	IRBuilder<> Builder(SlowDivOrRem);			IRBuilder<> Builder(SlowDivOrRem);
	Value *TruncDividend = Builder.CreateTrunc(Dividend, BypassType);			Value *TruncDividend = Builder.CreateTrunc(Dividend, BypassType);
	Value *TruncDivisor = Builder.CreateTrunc(Divisor, BypassType);			Value *TruncDivisor = Builder.CreateTrunc(Divisor, BypassType);
	Value *TruncDiv = Builder.CreateUDiv(TruncDividend, TruncDivisor);			Value *TruncDiv = Builder.CreateUDiv(TruncDividend, TruncDivisor);
	Value *TruncRem = Builder.CreateURem(TruncDividend, TruncDivisor);			Value *TruncRem = Builder.CreateURem(TruncDividend, TruncDivisor);
	Value *ExtDiv = Builder.CreateZExt(TruncDiv, getSlowType());			Value *ExtDiv = Builder.CreateZExt(TruncDiv, getSlowType());
	Value *ExtRem = Builder.CreateZExt(TruncRem, getSlowType());			Value *ExtRem = Builder.CreateZExt(TruncRem, getSlowType());
	return QuotRemPair(ExtDiv, ExtRem);			return QuotRemPair(ExtDiv, ExtRem);
	} else if (DividendShort && !isSignedOp()) {			}

				if (isa<ConstantInt>(Divisor)) {
				// If the divisor is not a constant, DAGCombiner will convert it to a
				// multiplication by a magic constant. It isn't clear if it is worth
				// introducing control flow to get a narrower multiply.
				return None;
				}

				if (DividendShort && !isSignedOp()) {
	// If the division is unsigned and Dividend is known to be short, then			// If the division is unsigned and Dividend is known to be short, then
	// either			// either
	// 1) Divisor is less or equal to Dividend, and the result can be computed			// 1) Divisor is less or equal to Dividend, and the result can be computed
	// with a short division.			// with a short division.
	// 2) Divisor is greater than Dividend. In this case, no division is needed			// 2) Divisor is greater than Dividend. In this case, no division is needed
	// at all: The quotient is 0 and the remainder is equal to Dividend.			// at all: The quotient is 0 and the remainder is equal to Dividend.
	//			//
	// So instead of checking at runtime whether Divisor fits into BypassType,			// So instead of checking at runtime whether Divisor fits into BypassType,
	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

test/Transforms/CodeGenPrepare/NVPTX/bypass-slow-div.ll

Show All 21 Lines	define void @rem_only(i64 %a, i64 %b, i64* %retptr) {
; CHECK: urem i32		; CHECK: urem i32
; CHECK-NOT: div		; CHECK-NOT: div
; CHECK: rem i64		; CHECK: rem i64
; CHECK-NOT: div		; CHECK-NOT: div
%d = srem i64 %a, %b		%d = srem i64 %a, %b
store i64 %d, i64* %retptr		store i64 %d, i64* %retptr
ret void		ret void
}		}

		; CHECK-LABEL: @udiv_by_constant(
		define i64 @udiv_by_constant(i32 %a) {
		; CHECK-NEXT: [[A_ZEXT:%.]] = zext i32 [[A:%.]] to i64
		; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[A_ZEXT]] to i32
		; CHECK-NEXT: [[TMP2:%.*]] = udiv i32 [[TMP1]], 50
		; CHECK-NEXT: [[TMP3:%.*]] = zext i32 [[TMP2]] to i64
		; CHECK-NEXT: ret i64 [[TMP3]]

		%a.zext = zext i32 %a to i64
		%wide.div = udiv i64 %a.zext, 50
		ret i64 %wide.div
		}

		; CHECK-LABEL: @urem_by_constant(
		define i64 @urem_by_constant(i32 %a) {
		; CHECK-NEXT: [[A_ZEXT:%.]] = zext i32 [[A:%.]] to i64
		; CHECK-NEXT: [[TMP1:%.*]] = trunc i64 [[A_ZEXT]] to i32
		; CHECK-NEXT: [[TMP2:%.*]] = urem i32 [[TMP1]], 50
		; CHECK-NEXT: [[TMP3:%.*]] = zext i32 [[TMP2]] to i64
		; CHECK-NEXT: ret i64 [[TMP3]]

		%a.zext = zext i32 %a to i64
		%wide.div = urem i64 %a.zext, 50
		ret i64 %wide.div
		}

		; Negative test: instead of emitting a runtime check on %a, we prefer to let the
		; DAGCombiner transform this division by constant into a multiplication (with a
		; "magic constant").
		;
		; CHECK-LABEL: @udiv_by_constant_negative_0(
		define i64 @udiv_by_constant_negative_0(i64 %a) {
		; CHECK-NEXT: [[WIDE_DIV:%.]] = udiv i64 [[A:%.]], 50
		; CHECK-NEXT: ret i64 [[WIDE_DIV]]

		%wide.div = udiv i64 %a, 50
		ret i64 %wide.div
		}

		; Negative test: while we know the dividend is short, the divisor isn't. This
		; test is here for completeness, but instcombine will optimize this to return 0.
		;
		; CHECK-LABEL: @udiv_by_constant_negative_1(
		define i64 @udiv_by_constant_negative_1(i32 %a) {
		; CHECK-NEXT: [[A_ZEXT:%.]] = zext i32 [[A:%.]] to i64
		; CHECK-NEXT: [[WIDE_DIV:%.*]] = udiv i64 [[A_ZEXT]], 8589934592
		; CHECK-NEXT: ret i64 [[WIDE_DIV]]

		%a.zext = zext i32 %a to i64
		%wide.div = udiv i64 %a.zext, 8589934592 ;; == 1 << 33
		ret i64 %wide.div
		}

		; URem version of udiv_by_constant_negative_0
		;
		; CHECK-LABEL: @urem_by_constant_negative_0(
		define i64 @urem_by_constant_negative_0(i64 %a) {
		; CHECK-NEXT: [[WIDE_DIV:%.]] = urem i64 [[A:%.]], 50
		; CHECK-NEXT: ret i64 [[WIDE_DIV]]

		%wide.div = urem i64 %a, 50
		ret i64 %wide.div
		}

		; URem version of udiv_by_constant_negative_1
		;
		; CHECK-LABEL: @urem_by_constant_negative_1(
		define i64 @urem_by_constant_negative_1(i32 %a) {
		; CHECK-NEXT: [[A_ZEXT:%.]] = zext i32 [[A:%.]] to i64
		; CHECK-NEXT: [[WIDE_DIV:%.*]] = urem i64 [[A_ZEXT]], 8589934592
		; CHECK-NEXT: ret i64 [[WIDE_DIV]]

		%a.zext = zext i32 %a to i64
		%wide.div = urem i64 %a.zext, 8589934592 ;; == 1 << 33
		ret i64 %wide.div
		}