This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineAddSub.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
add-sitofp.ll

Differential D31182

[InstCombine] fadd double (sitofp x), y check that the promotion is valid
ClosedPublic

Authored by apilipenko on Mar 21 2017, 5:00 AM.

Download Raw Diff

Details

Reviewers

spatel
andrew.w.kaylor
scanon
jmolloy
hfinkel

Commits

rG134d94f9a339: [InstCombine] fadd double (sitofp x), y check that the promotion is valid
rL301018: [InstCombine] fadd double (sitofp x), y check that the promotion is valid

Summary

Doing these transformations check that the result of integer addition is representable in the FP type.
(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))

This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036

Diff Detail

Repository: rL LLVM

Event Timeline

apilipenko created this revision.Mar 21 2017, 5:00 AM

boris.ulasevich added a subscriber: boris.ulasevich.Mar 21 2017, 6:58 AM

boris.ulasevich added inline comments.

test/Transforms/InstCombine/add-sitofp.ll
16 ↗	(On Diff #92458)	Why do we need lshr here? %n + 1 = ((%m >>> 24) && %b) + 1 takes range 1..256 - it fits quite well to 23-bit single precision mantissa, and optimisation can be applied here, isn't it? Does it make sense for you? Do I miss something?

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

In D31182#706365, @spatel wrote:

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

I would say no. For example on AMDGPU a f32 add is exactly as fast as an i32 add, but the i32 add has an additional carry output constraint to deal with

The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

No. There exist GPU architectures where a 32b integer addition is a more expensive operation than a 32b FP add.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1401 ↗	(On Diff #92458)	s/mantissa/significand/ (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22) Also, `semanticsPrecision` returns the number of bits in the significand (which includes the implicit "integral" bit when present). There's no "plus one for the sign bit".

scanon added inline comments.Mar 21 2017, 9:38 AM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1429 ↗	(On Diff #92458)	Aren't we missing a check that RHSIntVal can be exactly in the floating-point type here? I.e. `if (RHSIntVal->getType()->getIntegerBitWidth() <= MaxRepresentableBits)`, mirroring the check for LHS above?

andrew.w.kaylor added a subscriber: andrew.w.kaylor.Mar 21 2017, 11:28 AM

andrew.w.kaylor added inline comments.Mar 21 2017, 1:00 PM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404 ↗	(On Diff #92458)	This check seems more conservative than we'd want it to be. For instance, in the example in the comments we know that 'x & 1234' is can be represented as a 12-bit signed integer, but if x is i32 this code will return 32 for LHSIntVal->getType()->getIntegerBitWidth(). This directly relates to Boris' comment on the test case.
1410 ↗	(On Diff #92458)	The MaxRepresentableBits calculation and comparison won't be necessary if neither this condition nor the condition at line 1424 is true. I think the code should be organized to check these dyn_cast conditions first, particularly if you are going to do something more complicated with the number of bits required.
1413 ↗	(On Diff #92458)	This condition will block either transformation and so it could be checked at line 1395 and save us a bit of work sometimes.
test/Transforms/InstCombine/add-sitofp.ll
21 ↗	(On Diff #92458)	Can you add a negative test for the '(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))' case also?

andrew.w.kaylor added inline comments.Mar 21 2017, 1:06 PM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1414 ↗	(On Diff #92458)	It looks like getFPToSI and getSIToFP will end up using the APFloat class. Wouldn't it be better to just use APFloat here and call APFloat::isInteger()?

In D31182#706365, @spatel wrote:

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

In D31182#707598, @apilipenko wrote:

In D31182#706365, @spatel wrote:

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

Agreed - solving the miscompiling is the most important thing. Since we can do that quickly with a small patch, let's go ahead. We can decide what is canonical form as a next step, but please add a 'TODO' comment about that.

This example leads into the same gray area we have with several instcombine folds: it's not clear if the value-track-ability is enough to justify the transform, whether a backend should be responsible for fixing something that is not optimal for all targets, if the removal of an IR instruction overrides either of those...

Address review comments.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404 ↗	(On Diff #92458)	Given that we are uncertain if we need this transform at all (see the discussion above) I'd prefer not to make it too smart without a good reason. So, unless we have a clear reason why we want this code to be more aggressive I'd prefer to keep this fix as simple as it is now. For more aggressive version of this transform we should rely on Float2Int pass and make it smarter if needed.
1413 ↗	(On Diff #92458)	The second transform looks for either RHSConv or LHSConv to have one use, so it doesn't always block the second transform.
1414 ↗	(On Diff #92458)	It sounds like a separate refactoring to the existing code.
1429 ↗	(On Diff #92458)	We should be fine as is. I assume that the LHS, RHS of the addition have the same types (there can be an assert) and there is also a check below that integer types are the same.
test/Transforms/InstCombine/add-sitofp.ll
16 ↗	(On Diff #92458)	I just duplicated the example above. I agree it is confusing in this context. I'll simplify the negative test case to make it more clear that the result of the operation is not guaranteed to fit into float type.

andrew.w.kaylor added inline comments.Mar 23 2017, 3:32 PM

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404 ↗	(On Diff #92458)	That seems reasonable, but can you add a comment indicating that it's being conservative and that the example transformation might not happen?
1413 ↗	(On Diff #92458)	I see. I missed that.
1414 ↗	(On Diff #92458)	Yes, that's true.

Ping.

Update, I didn't see that there was a request to add a comment. Will do and update the patch,

This addresses my concerns. Gentle ping to the other reviewers.

See inline for some nits.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1402–1403 ↗	(On Diff #92843)	Is that clang-formatted? Please upload the version with the TODO/FIXME comments.
test/Transforms/InstCombine/add-sitofp.ll
17 ↗	(On Diff #92843)	"highes" -> "highest"; same typo is repeated below. Also, I appreciate changing this test file to use FileCheck, but could you run utils/update_test_checks.py instead of hand-editing the CHECK lines? That makes the assertions stronger and easier to update (we're pretty sure we'll want to update these one way or another).

Add the comment about overly conservative legality check, add the test demonstrating it.

apilipenko marked 3 inline comments as done.Apr 13 2017, 6:07 AM

The code changes look good to me, but I'm not sure if the tests cover the case that Andy requested, so I'll let him comment.

@andrew.w.kaylor - ping, do you have any comments about the latest revision?

I'm happy with this. Thanks for the improvements!

This revision is now accepted and ready to land.Apr 21 2017, 10:32 AM

Closed by commit rL301018: [InstCombine] fadd double (sitofp x), y check that the promotion is valid (authored by apilipenko). · Explain WhyApr 21 2017, 11:58 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

60 lines

test/

Transforms/

InstCombine/

add-sitofp.ll

85 lines

Diff 96197

llvm/trunk/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,379 Lines • ▼ Show 20 Lines	if (Value *V = dyn_castFNegVal(RHS)) {
RI->copyFastMathFlags(&I);		RI->copyFastMathFlags(&I);
return RI;		return RI;
}		}

// Check for (fadd double (sitofp x), y), see if we can merge this into an		// Check for (fadd double (sitofp x), y), see if we can merge this into an
// integer add followed by a promotion.		// integer add followed by a promotion.
if (SIToFPInst *LHSConv = dyn_cast<SIToFPInst>(LHS)) {		if (SIToFPInst *LHSConv = dyn_cast<SIToFPInst>(LHS)) {
Value *LHSIntVal = LHSConv->getOperand(0);		Value *LHSIntVal = LHSConv->getOperand(0);
		Type *FPType = LHSConv->getType();

		// TODO: This check is overly conservative. In many cases known bits
		// analysis can tell us that the result of the addition has less significant
		// bits than the integer type can hold.
		auto IsValidPromotion = [](Type FTy, Type ITy) {
		// Do we have enough bits in the significand to represent the result of
		// the integer addition?
		unsigned MaxRepresentableBits =
		APFloat::semanticsPrecision(FTy->getFltSemantics());
		return ITy->getIntegerBitWidth() <= MaxRepresentableBits;
		};

// (fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))		// (fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
// ... if the constant fits in the integer value. This is useful for things		// ... if the constant fits in the integer value. This is useful for things
// like (double)(x & 1234) + 4.0 -> (double)((X & 1234)+4) which no longer		// like (double)(x & 1234) + 4.0 -> (double)((X & 1234)+4) which no longer
// requires a constant pool load, and generally allows the add to be better		// requires a constant pool load, and generally allows the add to be better
// instcombined.		// instcombined.
if (ConstantFP *CFP = dyn_cast<ConstantFP>(RHS)) {		if (ConstantFP *CFP = dyn_cast<ConstantFP>(RHS))
		if (IsValidPromotion(FPType, LHSIntVal->getType())) {
Constant *CI =		Constant *CI =
ConstantExpr::getFPToSI(CFP, LHSIntVal->getType());		ConstantExpr::getFPToSI(CFP, LHSIntVal->getType());
if (LHSConv->hasOneUse() &&		if (LHSConv->hasOneUse() &&
ConstantExpr::getSIToFP(CI, I.getType()) == CFP &&		ConstantExpr::getSIToFP(CI, I.getType()) == CFP &&
WillNotOverflowSignedAdd(LHSIntVal, CI, I)) {		WillNotOverflowSignedAdd(LHSIntVal, CI, I)) {
// Insert the new integer add.		// Insert the new integer add.
Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,		Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,
CI, "addconv");		CI, "addconv");
return new SIToFPInst(NewAdd, I.getType());		return new SIToFPInst(NewAdd, I.getType());
}		}
}		}

// (fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))		// (fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))
if (SIToFPInst *RHSConv = dyn_cast<SIToFPInst>(RHS)) {		if (SIToFPInst *RHSConv = dyn_cast<SIToFPInst>(RHS)) {
Value *RHSIntVal = RHSConv->getOperand(0);		Value *RHSIntVal = RHSConv->getOperand(0);
		// It's enough to check LHS types only because we require int types to
		// be the same for this transform.
		if (IsValidPromotion(FPType, LHSIntVal->getType())) {
// Only do this if x/y have the same type, if at least one of them has a		// Only do this if x/y have the same type, if at least one of them has a
// single use (so we don't increase the number of int->fp conversions),		// single use (so we don't increase the number of int->fp conversions),
// and if the integer add will not overflow.		// and if the integer add will not overflow.
if (LHSIntVal->getType() == RHSIntVal->getType() &&		if (LHSIntVal->getType() == RHSIntVal->getType() &&
(LHSConv->hasOneUse() \|\| RHSConv->hasOneUse()) &&		(LHSConv->hasOneUse() \|\| RHSConv->hasOneUse()) &&
WillNotOverflowSignedAdd(LHSIntVal, RHSIntVal, I)) {		WillNotOverflowSignedAdd(LHSIntVal, RHSIntVal, I)) {
// Insert the new integer add.		// Insert the new integer add.
Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,		Value *NewAdd = Builder->CreateNSWAdd(LHSIntVal,
RHSIntVal, "addconv");		RHSIntVal, "addconv");
return new SIToFPInst(NewAdd, I.getType());		return new SIToFPInst(NewAdd, I.getType());
}		}
}		}
}		}
		}

// select C, 0, B + select C, A, 0 -> select C, A, B		// select C, 0, B + select C, A, 0 -> select C, A, B
{		{
Value A1, B1, C1, A2, B2, C2;		Value A1, B1, C1, A2, B2, C2;
if (match(LHS, m_Select(m_Value(C1), m_Value(A1), m_Value(B1))) &&		if (match(LHS, m_Select(m_Value(C1), m_Value(A1), m_Value(B1))) &&
match(RHS, m_Select(m_Value(C2), m_Value(A2), m_Value(B2)))) {		match(RHS, m_Select(m_Value(C2), m_Value(A2), m_Value(B2)))) {
if (C1 == C2) {		if (C1 == C2) {
Constant Z1=nullptr, Z2=nullptr;		Constant Z1=nullptr, Z2=nullptr;
▲ Show 20 Lines • Show All 355 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/add-sitofp.ll

	Show All 9 Lines
	; CHECK-NEXT: ret double [[P]]			; CHECK-NEXT: ret double [[P]]
	;			;
	%m = lshr i32 %a, 24			%m = lshr i32 %a, 24
	%n = and i32 %m, %b			%n = and i32 %m, %b
	%o = sitofp i32 %n to double			%o = sitofp i32 %n to double
	%p = fadd double %o, 1.0			%p = fadd double %o, 1.0
	ret double %p			ret double %p
	}			}

				define double @test(i32 %a) {
				; CHECK-LABEL: @test(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[ADDCONV:%.*]] = add nuw nsw i32 [[A_AND]], 1
				; CHECK-NEXT: [[RES:%.*]] = sitofp i32 [[ADDCONV]] to double
				; CHECK-NEXT: ret double [[RES]]
				;
				; Drop two highest bits to guarantee that %a + 1 doesn't overflow
				%a_and = and i32 %a, 1073741823
				%a_and_fp = sitofp i32 %a_and to double
				%res = fadd double %a_and_fp, 1.0
				ret double %res
				}

				define float @test_neg(i32 %a) {
				; CHECK-LABEL: @test_neg(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[A_AND_FP:%.*]] = sitofp i32 [[A_AND]] to float
				; CHECK-NEXT: [[RES:%.*]] = fadd float [[A_AND_FP]], 1.000000e+00
				; CHECK-NEXT: ret float [[RES]]
				;
				; Drop two highest bits to guarantee that %a + 1 doesn't overflow
				%a_and = and i32 %a, 1073741823
				%a_and_fp = sitofp i32 %a_and to float
				%res = fadd float %a_and_fp, 1.0
				ret float %res
				}

				define double @test_2(i32 %a, i32 %b) {
				; CHECK-LABEL: @test_2(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[B_AND:%.]] = and i32 [[B:%.]], 1073741823
				; CHECK-NEXT: [[ADDCONV:%.*]] = add nuw nsw i32 [[A_AND]], [[B_AND]]
				; CHECK-NEXT: [[RES:%.*]] = sitofp i32 [[ADDCONV]] to double
				; CHECK-NEXT: ret double [[RES]]
				;
				; Drop two highest bits to guarantee that %a + %b doesn't overflow
				%a_and = and i32 %a, 1073741823
				%b_and = and i32 %b, 1073741823

				%a_and_fp = sitofp i32 %a_and to double
				%b_and_fp = sitofp i32 %b_and to double

				%res = fadd double %a_and_fp, %b_and_fp
				ret double %res
				}

				define float @test_2_neg(i32 %a, i32 %b) {
				; CHECK-LABEL: @test_2_neg(
				; CHECK-NEXT: [[A_AND:%.]] = and i32 [[A:%.]], 1073741823
				; CHECK-NEXT: [[B_AND:%.]] = and i32 [[B:%.]], 1073741823
				; CHECK-NEXT: [[A_AND_FP:%.*]] = sitofp i32 [[A_AND]] to float
				; CHECK-NEXT: [[B_AND_FP:%.*]] = sitofp i32 [[B_AND]] to float
				; CHECK-NEXT: [[RES:%.*]] = fadd float [[A_AND_FP]], [[B_AND_FP]]
				; CHECK-NEXT: ret float [[RES]]
				;
				; Drop two highest bits to guarantee that %a + %b doesn't overflow
				%a_and = and i32 %a, 1073741823
				%b_and = and i32 %b, 1073741823

				%a_and_fp = sitofp i32 %a_and to float
				%b_and_fp = sitofp i32 %b_and to float

				%res = fadd float %a_and_fp, %b_and_fp
				ret float %res
				}

				; This test demonstrates overly conservative legality check. The float addition
				; can be replaced with the integer addition because the result of the operation
				; can be represented in float, but we don't do that now.
				define float @test_3(i32 %a, i32 %b) {
				; CHECK-LABEL: @test_3(
				; CHECK-NEXT: [[M:%.]] = lshr i32 [[A:%.]], 24
				; CHECK-NEXT: [[N:%.]] = and i32 [[M]], [[B:%.]]
				; CHECK-NEXT: [[O:%.*]] = sitofp i32 [[N]] to float
				; CHECK-NEXT: [[P:%.*]] = fadd float [[O]], 1.000000e+00
				; CHECK-NEXT: ret float [[P]]
				;
				%m = lshr i32 %a, 24
				%n = and i32 %m, %b
				%o = sitofp i32 %n to float
				%p = fadd float %o, 1.0
				ret float %p
				}