This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
4/14
InstCombineAddSub.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
2/4
add-sitofp.ll

Differential D31182

[InstCombine] fadd double (sitofp x), y check that the promotion is valid
ClosedPublic

Authored by apilipenko on Mar 21 2017, 5:00 AM.

Download Raw Diff

Details

Reviewers

spatel
andrew.w.kaylor
scanon
jmolloy
hfinkel

Commits

rG134d94f9a339: [InstCombine] fadd double (sitofp x), y check that the promotion is valid
rL301018: [InstCombine] fadd double (sitofp x), y check that the promotion is valid

Summary

Doing these transformations check that the result of integer addition is representable in the FP type.
(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))

This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036

Diff Detail

Event Timeline

apilipenko created this revision.Mar 21 2017, 5:00 AM

boris.ulasevich added a subscriber: boris.ulasevich.Mar 21 2017, 6:58 AM

boris.ulasevich added inline comments.

test/Transforms/InstCombine/add-sitofp.ll
22	Why do we need lshr here? %n + 1 = ((%m >>> 24) && %b) + 1 takes range 1..256 - it fits quite well to 23-bit single precision mantissa, and optimisation can be applied here, isn't it? Does it make sense for you? Do I miss something?

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

In D31182#706365, @spatel wrote:

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

I would say no. For example on AMDGPU a f32 add is exactly as fast as an i32 add, but the i32 add has an additional carry output constraint to deal with

The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

No. There exist GPU architectures where a 32b integer addition is a more expensive operation than a 32b FP add.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1401	s/mantissa/significand/ (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22) Also, `semanticsPrecision` returns the number of bits in the significand (which includes the implicit "integral" bit when present). There's no "plus one for the sign bit".

scanon added inline comments.Mar 21 2017, 9:38 AM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1429	Aren't we missing a check that RHSIntVal can be exactly in the floating-point type here? I.e. `if (RHSIntVal->getType()->getIntegerBitWidth() <= MaxRepresentableBits)`, mirroring the check for LHS above?

andrew.w.kaylor added a subscriber: andrew.w.kaylor.Mar 21 2017, 11:28 AM

andrew.w.kaylor added inline comments.Mar 21 2017, 1:00 PM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404	This check seems more conservative than we'd want it to be. For instance, in the example in the comments we know that 'x & 1234' is can be represented as a 12-bit signed integer, but if x is i32 this code will return 32 for LHSIntVal->getType()->getIntegerBitWidth(). This directly relates to Boris' comment on the test case.
1410	The MaxRepresentableBits calculation and comparison won't be necessary if neither this condition nor the condition at line 1424 is true. I think the code should be organized to check these dyn_cast conditions first, particularly if you are going to do something more complicated with the number of bits required.
1413	This condition will block either transformation and so it could be checked at line 1395 and save us a bit of work sometimes.
test/Transforms/InstCombine/add-sitofp.ll
27	Can you add a negative test for the '(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))' case also?

andrew.w.kaylor added inline comments.Mar 21 2017, 1:06 PM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1414	It looks like getFPToSI and getSIToFP will end up using the APFloat class. Wouldn't it be better to just use APFloat here and call APFloat::isInteger()?

In D31182#706365, @spatel wrote:

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

In D31182#707598, @apilipenko wrote:

In D31182#706365, @spatel wrote:

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

Agreed - solving the miscompiling is the most important thing. Since we can do that quickly with a small patch, let's go ahead. We can decide what is canonical form as a next step, but please add a 'TODO' comment about that.

This example leads into the same gray area we have with several instcombine folds: it's not clear if the value-track-ability is enough to justify the transform, whether a backend should be responsible for fixing something that is not optimal for all targets, if the removal of an IR instruction overrides either of those...

Address review comments.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404	Given that we are uncertain if we need this transform at all (see the discussion above) I'd prefer not to make it too smart without a good reason. So, unless we have a clear reason why we want this code to be more aggressive I'd prefer to keep this fix as simple as it is now. For more aggressive version of this transform we should rely on Float2Int pass and make it smarter if needed.
1413	The second transform looks for either RHSConv or LHSConv to have one use, so it doesn't always block the second transform.
1414	It sounds like a separate refactoring to the existing code.
1429	We should be fine as is. I assume that the LHS, RHS of the addition have the same types (there can be an assert) and there is also a check below that integer types are the same.
test/Transforms/InstCombine/add-sitofp.ll
22	I just duplicated the example above. I agree it is confusing in this context. I'll simplify the negative test case to make it more clear that the result of the operation is not guaranteed to fit into float type.

andrew.w.kaylor added inline comments.Mar 23 2017, 3:32 PM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404	That seems reasonable, but can you add a comment indicating that it's being conservative and that the example transformation might not happen?
1413	I see. I missed that.
1414	Yes, that's true.

Ping.

Update, I didn't see that there was a request to add a comment. Will do and update the patch,

This addresses my concerns. Gentle ping to the other reviewers.

See inline for some nits.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1393–1394	Is that clang-formatted? Please upload the version with the TODO/FIXME comments.
test/Transforms/InstCombine/add-sitofp.ll
23	"highes" -> "highest"; same typo is repeated below. Also, I appreciate changing this test file to use FileCheck, but could you run utils/update_test_checks.py instead of hand-editing the CHECK lines? That makes the assertions stronger and easier to update (we're pretty sure we'll want to update these one way or another).

Add the comment about overly conservative legality check, add the test demonstrating it.

apilipenko marked 3 inline comments as done.Apr 13 2017, 6:07 AM

The code changes look good to me, but I'm not sure if the tests cover the case that Andy requested, so I'll let him comment.

@andrew.w.kaylor - ping, do you have any comments about the latest revision?

I'm happy with this. Thanks for the improvements!

This revision is now accepted and ready to land.Apr 21 2017, 10:32 AM

Closed by commit rL301018: [InstCombine] fadd double (sitofp x), y check that the promotion is valid (authored by apilipenko). · Explain WhyApr 21 2017, 11:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

60 lines

test/

Transforms/

InstCombine/

add-sitofp.ll

85 lines

Diff 95113

lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,379 Lines • ▼ Show 20 Lines	if (Value *V = dyn_castFNegVal(RHS)) {
RI->copyFastMathFlags(&I);		RI->copyFastMathFlags(&I);
return RI;		return RI;
}		}

// Check for (fadd double (sitofp x), y), see if we can merge this into an		// Check for (fadd double (sitofp x), y), see if we can merge this into an
// integer add followed by a promotion.		// integer add followed by a promotion.
if (SIToFPInst *LHSConv = dyn_cast<SIToFPInst>(LHS)) {		if (SIToFPInst *LHSConv = dyn_cast<SIToFPInst>(LHS)) {
Value *LHSIntVal = LHSConv->getOperand(0);		Value *LHSIntVal = LHSConv->getOperand(0);
		Type *FPType = LHSConv->getType();

		// TODO: This check is overly conservative. In many cases known bits
		// analysis can tell us that the result of the addition has less significant
		// bits than the integer type can hold.
		auto IsValidPromotion = [](Type FTy, Type ITy) {
		// Do we have enough bits in the significand to represent the result of
		spatelUnsubmitted Done Reply Inline Actions Is that clang-formatted? Please upload the version with the TODO/FIXME comments. spatel: Is that clang-formatted? Please upload the version with the TODO/FIXME comments.
		// the integer addition?
		unsigned MaxRepresentableBits =
		APFloat::semanticsPrecision(FTy->getFltSemantics());
		return ITy->getIntegerBitWidth() <= MaxRepresentableBits;
		};

// (fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))		// (fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
		scanonUnsubmitted Done Reply Inline Actions s/mantissa/significand/ (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22) Also, `semanticsPrecision` returns the number of bits in the significand (which includes the implicit "integral" bit when present). There's no "plus one for the sign bit". scanon: s/mantissa/significand/ (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22) Also…
// ... if the constant fits in the integer value. This is useful for things		// ... if the constant fits in the integer value. This is useful for things
// like (double)(x & 1234) + 4.0 -> (double)((X & 1234)+4) which no longer		// like (double)(x & 1234) + 4.0 -> (double)((X & 1234)+4) which no longer
// requires a constant pool load, and generally allows the add to be better		// requires a constant pool load, and generally allows the add to be better
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This check seems more conservative than we'd want it to be. For instance, in the example in the comments we know that 'x & 1234' is can be represented as a 12-bit signed integer, but if x is i32 this code will return 32 for LHSIntVal->getType()->getIntegerBitWidth(). This directly relates to Boris' comment on the test case. andrew.w.kaylor: This check seems more conservative than we'd want it to be. For instance, in the example in…
		apilipenkoAuthorUnsubmitted Not Done Reply Inline Actions Given that we are uncertain if we need this transform at all (see the discussion above) I'd prefer not to make it too smart without a good reason. So, unless we have a clear reason why we want this code to be more aggressive I'd prefer to keep this fix as simple as it is now. For more aggressive version of this transform we should rely on Float2Int pass and make it smarter if needed. apilipenko: Given that we are uncertain if we need this transform at all (see the discussion above) I'd…
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions That seems reasonable, but can you add a comment indicating that it's being conservative and that the example transformation might not happen? andrew.w.kaylor: That seems reasonable, but can you add a comment indicating that it's being conservative and…
// instcombined.		// instcombined.
if (ConstantFP *CFP = dyn_cast<ConstantFP>(RHS)) {		if (ConstantFP *CFP = dyn_cast<ConstantFP>(RHS))
		if (IsValidPromotion(FPType, LHSIntVal->getType())) {
Constant *CI =		Constant *CI =
ConstantExpr::getFPToSI(CFP, LHSIntVal->getType());		ConstantExpr::getFPToSI(CFP, LHSIntVal->getType());
if (LHSConv->hasOneUse() &&		if (LHSConv->hasOneUse() &&
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions The MaxRepresentableBits calculation and comparison won't be necessary if neither this condition nor the condition at line 1424 is true. I think the code should be organized to check these dyn_cast conditions first, particularly if you are going to do something more complicated with the number of bits required. andrew.w.kaylor: The MaxRepresentableBits calculation and comparison won't be necessary if neither this…
ConstantExpr::getSIToFP(CI, I.getType()) == CFP &&		ConstantExpr::getSIToFP(CI, I.getType()) == CFP &&
WillNotOverflowSignedAdd(LHSIntVal, CI, I)) {		WillNotOverflowSignedAdd(LHSIntVal, CI, I)) {
// Insert the new integer add.		// Insert the new integer add.
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This condition will block either transformation and so it could be checked at line 1395 and save us a bit of work sometimes. andrew.w.kaylor: This condition will block either transformation and so it could be checked at line 1395 and…
		apilipenkoAuthorUnsubmitted Not Done Reply Inline Actions The second transform looks for either RHSConv or LHSConv to have one use, so it doesn't always block the second transform. apilipenko: The second transform looks for either RHSConv or LHSConv to have one use, so it doesn't always…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I see. I missed that. andrew.w.kaylor: I see. I missed that.
Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,		Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions It looks like getFPToSI and getSIToFP will end up using the APFloat class. Wouldn't it be better to just use APFloat here and call APFloat::isInteger()? andrew.w.kaylor: It looks like getFPToSI and getSIToFP will end up using the APFloat class. Wouldn't it be…
		apilipenkoAuthorUnsubmitted Not Done Reply Inline Actions It sounds like a separate refactoring to the existing code. apilipenko: It sounds like a separate refactoring to the existing code.
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Yes, that's true. andrew.w.kaylor: Yes, that's true.
CI, "addconv");		CI, "addconv");
return new SIToFPInst(NewAdd, I.getType());		return new SIToFPInst(NewAdd, I.getType());
}		}
}		}

// (fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))		// (fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))
if (SIToFPInst *RHSConv = dyn_cast<SIToFPInst>(RHS)) {		if (SIToFPInst *RHSConv = dyn_cast<SIToFPInst>(RHS)) {
Value *RHSIntVal = RHSConv->getOperand(0);		Value *RHSIntVal = RHSConv->getOperand(0);
		// It's enough to check LHS types only because we require int types to
		// be the same for this transform.
		if (IsValidPromotion(FPType, LHSIntVal->getType())) {
// Only do this if x/y have the same type, if at least one of them has a		// Only do this if x/y have the same type, if at least one of them has a
// single use (so we don't increase the number of int->fp conversions),		// single use (so we don't increase the number of int->fp conversions),
// and if the integer add will not overflow.		// and if the integer add will not overflow.
if (LHSIntVal->getType() == RHSIntVal->getType() &&		if (LHSIntVal->getType() == RHSIntVal->getType() &&
		scanonUnsubmitted Not Done Reply Inline Actions Aren't we missing a check that RHSIntVal can be exactly in the floating-point type here? I.e. `if (RHSIntVal->getType()->getIntegerBitWidth() <= MaxRepresentableBits)`, mirroring the check for LHS above? scanon: Aren't we missing a check that RHSIntVal can be exactly in the floating-point type here? I.e.
		apilipenkoAuthorUnsubmitted Not Done Reply Inline Actions We should be fine as is. I assume that the LHS, RHS of the addition have the same types (there can be an assert) and there is also a check below that integer types are the same. apilipenko: We should be fine as is. I assume that the LHS, RHS of the addition have the same types (there…
(LHSConv->hasOneUse() \|\| RHSConv->hasOneUse()) &&		(LHSConv->hasOneUse() \|\| RHSConv->hasOneUse()) &&
WillNotOverflowSignedAdd(LHSIntVal, RHSIntVal, I)) {		WillNotOverflowSignedAdd(LHSIntVal, RHSIntVal, I)) {
// Insert the new integer add.		// Insert the new integer add.
Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,		Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,
RHSIntVal, "addconv");		RHSIntVal, "addconv");
return new SIToFPInst(NewAdd, I.getType());		return new SIToFPInst(NewAdd, I.getType());
}		}
}		}
}		}
		}

// select C, 0, B + select C, A, 0 -> select C, A, B		// select C, 0, B + select C, A, 0 -> select C, A, B
{		{
Value A1, B1, C1, A2, B2, C2;		Value A1, B1, C1, A2, B2, C2;
if (match(LHS, m_Select(m_Value(C1), m_Value(A1), m_Value(B1))) &&		if (match(LHS, m_Select(m_Value(C1), m_Value(A1), m_Value(B1))) &&
match(RHS, m_Select(m_Value(C2), m_Value(A2), m_Value(B2)))) {		match(RHS, m_Select(m_Value(C2), m_Value(A2), m_Value(B2)))) {
if (C1 == C2) {		if (C1 == C2) {
Constant Z1=nullptr, Z2=nullptr;		Constant Z1=nullptr, Z2=nullptr;
▲ Show 20 Lines • Show All 350 Lines • Show Last 20 Lines

test/Transforms/InstCombine/add-sitofp.ll

	Show All 9 Lines
	; CHECK-NEXT: ret double [[P]]			; CHECK-NEXT: ret double [[P]]
	;			;
	%m = lshr i32 %a, 24			%m = lshr i32 %a, 24
	%n = and i32 %m, %b			%n = and i32 %m, %b
	%o = sitofp i32 %n to double			%o = sitofp i32 %n to double
	%p = fadd double %o, 1.0			%p = fadd double %o, 1.0
	ret double %p			ret double %p
	}			}

				define double @test(i32 %a) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[ADDCONV:%.*]] = add nuw nsw i32 [[A_AND]], 1
				boris.ulasevichUnsubmitted Not Done Reply Inline Actions Why do we need lshr here? %n + 1 = ((%m >>> 24) && %b) + 1 takes range 1..256 - it fits quite well to 23-bit single precision mantissa, and optimisation can be applied here, isn't it? Does it make sense for you? Do I miss something? boris.ulasevich: Why do we need lshr here? %n + 1 = ((%m >>> 24) && %b) + 1 takes range 1..256 - it fits quite…
				apilipenkoAuthorUnsubmitted Not Done Reply Inline Actions I just duplicated the example above. I agree it is confusing in this context. I'll simplify the negative test case to make it more clear that the result of the operation is not guaranteed to fit into float type. apilipenko: I just duplicated the example above. I agree it is confusing in this context. I'll simplify the…
				; CHECK-NEXT: [[RES:%.*]] = sitofp i32 [[ADDCONV]] to double
				spatelUnsubmitted Done Reply Inline Actions "highes" -> "highest"; same typo is repeated below. Also, I appreciate changing this test file to use FileCheck, but could you run utils/update_test_checks.py instead of hand-editing the CHECK lines? That makes the assertions stronger and easier to update (we're pretty sure we'll want to update these one way or another). spatel: "highes" -> "highest"; same typo is repeated below. Also, I appreciate changing this test file…
				; CHECK-NEXT: ret double [[RES]]
				;
				; Drop two highest bits to guarantee that %a + 1 doesn't overflow
				%a_and = and i32 %a, 1073741823
				andrew.w.kaylorUnsubmitted Done Reply Inline Actions Can you add a negative test for the '(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))' case also? andrew.w.kaylor: Can you add a negative test for the '(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int…
				%a_and_fp = sitofp i32 %a_and to double
				%res = fadd double %a_and_fp, 1.0
				ret double %res
				}

				define float @test_neg(i32 %a) {
				; CHECK-LABEL: @test_neg(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[A_AND_FP:%.*]] = sitofp i32 [[A_AND]] to float
				; CHECK-NEXT: [[RES:%.*]] = fadd float [[A_AND_FP]], 1.000000e+00
				; CHECK-NEXT: ret float [[RES]]
				;
				; Drop two highest bits to guarantee that %a + 1 doesn't overflow
				%a_and = and i32 %a, 1073741823
				%a_and_fp = sitofp i32 %a_and to float
				%res = fadd float %a_and_fp, 1.0
				ret float %res
				}

				define double @test_2(i32 %a, i32 %b) {
				; CHECK-LABEL: @test_2(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[B_AND:%.]] = and i32 [[B:%.]], 1073741823
				; CHECK-NEXT: [[ADDCONV:%.*]] = add nuw nsw i32 [[A_AND]], [[B_AND]]
				; CHECK-NEXT: [[RES:%.*]] = sitofp i32 [[ADDCONV]] to double
				; CHECK-NEXT: ret double [[RES]]
				;
				; Drop two highest bits to guarantee that %a + %b doesn't overflow
				%a_and = and i32 %a, 1073741823
				%b_and = and i32 %b, 1073741823

				%a_and_fp = sitofp i32 %a_and to double
				%b_and_fp = sitofp i32 %b_and to double

				%res = fadd double %a_and_fp, %b_and_fp
				ret double %res
				}

				define float @test_2_neg(i32 %a, i32 %b) {
				; CHECK-LABEL: @test_2_neg(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[B_AND:%.]] = and i32 [[B:%.]], 1073741823
				; CHECK-NEXT: [[A_AND_FP:%.*]] = sitofp i32 [[A_AND]] to float
				; CHECK-NEXT: [[B_AND_FP:%.*]] = sitofp i32 [[B_AND]] to float
				; CHECK-NEXT: [[RES:%.*]] = fadd float [[A_AND_FP]], [[B_AND_FP]]
				; CHECK-NEXT: ret float [[RES]]
				;
				; Drop two highest bits to guarantee that %a + %b doesn't overflow
				%a_and = and i32 %a, 1073741823
				%b_and = and i32 %b, 1073741823

				%a_and_fp = sitofp i32 %a_and to float
				%b_and_fp = sitofp i32 %b_and to float

				%res = fadd float %a_and_fp, %b_and_fp
				ret float %res
				}

				; This test demonstrates overly conservative legality check. The float addition
				; can be replaced with the integer addition because the result of the operation
				; can be represented in float, but we don't do that now.
				define float @test_3(i32 %a, i32 %b) {
				; CHECK-LABEL: @test_3(
				; CHECK-NEXT: [[M:%.]] = lshr i32 [[A:%.]], 24
				; CHECK-NEXT: [[N:%.]] = and i32 [[M]], [[B:%.]]
				; CHECK-NEXT: [[O:%.*]] = sitofp i32 [[N]] to float
				; CHECK-NEXT: [[P:%.*]] = fadd float [[O]], 1.000000e+00
				; CHECK-NEXT: ret float [[P]]
				;
				%m = lshr i32 %a, 24
				%n = and i32 %m, %b
				%o = sitofp i32 %n to float
				%p = fadd float %o, 1.0
				ret float %p
				}