This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
38
InstCombineCasts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
5
sitofp.ll

Differential D19178

Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float type
Needs ReviewPublic

Authored by aarzee on Apr 15 2016, 1:53 PM.

Download Raw Diff

Details

Reviewers

majnemer
scanon
escha

Summary

Previously, FoldItoFPtoI would only check if the width of the integer type fits into the width of the mantissa. This change will now let it fold if it can establish that the actual value of the integer fits within the floating point type (considering mantissa and exponent).

Was originally D19177, but was notified on IRC that since I didn't add llvm-commits as a subscriber, it wouldn't show up on that list.

Diff Detail

Event Timeline

aarzee updated this revision to Diff 53948.Apr 15 2016, 1:53 PM

aarzee retitled this revision from to Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float type.

aarzee updated this object.

aarzee added reviewers: majnemer, scanon, escha.

aarzee mentioned this in D19177: Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float type.

aarzee added a subscriber: llvm-commits.

Wouldn't it make sense to add a test also for case where it is NOT safe to fold ?

In D19178#402921, @deadalnix wrote:

Wouldn't it make sense to add a test also for case where it is NOT safe to fold ?

That should be covered by the existing tests, namely 13, 17, and 19.

aarzee updated this revision to Diff 54047.Apr 18 2016, 5:23 AM

Fix typo in test (CHECK-NEST -> CHECK-NEXT)

Any feedback on this?

reames requested changes to this revision.Apr 21 2016, 7:58 AM

reames added a reviewer: reames.

reames added a subscriber: reames.

reames added inline comments.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1426	The second part of this comment doesn't make any sense to me. I can understand why an integer value of less than mantissa size (actual bits uses) could be converted lossly to and from a floating point type. Why does the exponent come into this? Are you trying to reason about integer values larger than the mantissa with precise representations in floating point? If so, could I ask that you split this into two patches: Handle just the mantissa case. Extend it to the full complexity. Doing it that way would make it far more obvious what's going on. I would personally be comfortable signing off on the former, but would not be on the later. Also, extracting out a helper function called something like "hasPreciseFPRepresentation" would make this code far more clear. (If I'm actually understanding it correctly at all!)
test/Transforms/InstCombine/sitofp.ll
191	FYI, you're going to need a lot more test cases to convince me this is correct. Even for the split patch.

This revision now requires changes to proceed.Apr 21 2016, 7:58 AM

aarzee added inline comments.Apr 21 2016, 8:23 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1426	Yes, that's what I'm doing. The exponent comes into play because (for example) `(2 ** 24 - 1) << 10` has 23 significant bits, which fits into the mantissa, but has 9 leading zeroes, which is too large for the exponent (8 bits). Could you be more specific about the splitting of the patch? I'm not sure what you mean by 'just the mantissa case' - if you mean the integer fits into the mantissa without any exponent, that's what FoldItoFPtoI already does, but based on bit-width of the integer type, not the integer value.

Made the code shorter and more concise, added some comments, added a new test. For some reason I'm having an issue building LLVM so I wasn't able to personally verify that this works as intended / at all.

Highest possible trailing zeros was off by one (exponent is either unsigned or biased, with non-negative values ranging from 0 to 2^(x - 1), where x is the bit width of the exponent).

reames added inline comments.Apr 21 2016, 9:26 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1426	I was suggesting that you first post a patch which implements this optimization for integers which can be represented in floating point without scaling (i.e. without using the exponent for anything non-trivial). Once that's in, the scaling part of the change becomes far easier to reason about in isolation. To be clear, I am not going to sign off on the change in its current form. I don't understand floating point deeply enough to reason it in this way. You're welcome to find an alternate reviewer, but if you want a LGTM from me, splitting it is going to be required.

Made the change suggested by @reames to only consider the mantissa. Scaling and exponent is no longer considered. Should be more easily verifiable.

FWIW I still believe that Diff 2 is (at least close to) correct, and I'd still like to pursue implementing it.

I've attached a file containing the latest version of the full proposed patch as well as a Python 3 script to test and verify whether the optimization is applied correctly. It requires a bit of editing, as shown in the top and bottom of the script. It's supposed to write any failed generated IR to exN.ll (where N is some number), but I'm not sure whether it actually does. Python's multiprocessing module is finicky. Note that you must use a floating point type that is supported by the platform you're on (I tried 'half' on x86-64 and got weird results because it was legalized to a float).

patch3.txt92 KBDownload

floattestgen.py4 KBDownload

Comments inline.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1466	This isn't what you want. In particular, this doesn't produce a useful answer for any (known) positive number since you'll immediately exit the loop. A better approach here is to look for a prefix which is known to be either zero or one. (i.e. 0000 or 1111, not 0101.) For a signed number, you can simply drop those bits entirely since the sign is encoded separately. For an unsigned, you can only drop a zero prefix. This can be written easily using countNumberOfLeadingX
test/Transforms/InstCombine/sitofp.ll
192	Absolutely minimum testing you're going to need: sitofp positive number in and out of range sitofp negative number in and out of range uitofp number in and out of range All in the form of LIT tests.

This revision now requires changes to proceed.May 9 2016, 7:07 PM

I've decided to go ahead and submit the full change (considering exponent as well as mantissa) and I've added a few more tests as @reames suggested.

mcrosier added a subscriber: mcrosier.May 10 2016, 1:11 PM

mcrosier added inline comments.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1454	How about: bool Safe = ActualSize <= MantissaWidth; // Compute known bits to see if the integer value fits into the fp type. if (!Safe) { ... } if (Safe) { ... Note: Local variables should be capitalized.
1476	Please capitalize Only and add a period. http://llvm.org/docs/CodingStandards.html#commenting

Replace for loops with count{Leading,Trailing}{Zeros,Ones}.

Added the changes suggested by @mcrosier.

Same as last diff except that I didn't rush through without actually reading the comment.

Previous advice is being actively ignored and outside of my expertise. If anyone else wants to review feel free, I don't have the time to cycle on this.

scanon added inline comments.May 12 2016, 8:33 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1426	A few things: This: and that the number of trailing zeros fits into the exponent is really confused. That's not how it [floating-point encoding] works at all. A correct algorithm for determining if an integer n can be exactly represented is (pseudocode): m = absoluteValue(n) i = mostSignificantNonzeroBit(m) j = leastSignificantNonzeroBit(m) if (i - j < numberOfSignificandBits) { // significand is representable, check exponent if (i <= greatestFiniteExponent) { // number can be exactly represented } } // number cannot be exactly represented In particular, if we look at your example, `(2 24 - 1) << 10`, this number absolutely is exactly representable in single-precision. We have: m = 224-1 << 10 i = 33 j = 10 i-j = 23 < numberOfSignificandBits = 24 i = 33 <= greatestFiniteExponent = 127. Please don't use the term "mantissa". It is frequently used, even in the LLVM sources, but it is subtly wrong (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22). The correct term is "significand"; any new written code should use that instead.

aarzee added inline comments.May 22 2016, 4:18 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1426	Shouldn't it be `j <= greatestFiniteExponent`? (I'm clearly not an expert here, so I just wanted to check).

scanon added inline comments.May 23 2016, 7:38 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1426	No, it should be `i`. Normal binary floating-point numbers have the form +/- significand * 2^exponent, where significand is a number in [1,2). Thus, the MSB a representable integer is always 2^exponent.

Updated variable names and improved correctness.

eli.friedman added a subscriber: eli.friedman.May 29 2016, 10:31 AM

Question re: usage of 'mantissa' vs 'significand' - should getFPMantissaWidth be renamed?

Ping.

Needs more comments in the code, and more tests. In particular, you have zero tests of values with trailing zeros.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1467	Why are you subtracting one here? The use of std::max here is suspicious; you should be treating UUUUUUUU and UUUUUUU0 differently.
1490	I suspect what you're doing here is wrong, but it's hard to follow without any comments.
1497	What are you trying to check here? (Maybe you're looking for APFloat::semanticsMaxExponent?)

Added additional comments, adjustments, and fixes for code clarity. Added exponent-related tests.

Thanks very much for the feedback, @eli.friedman. Hopefully I've addressed most of your concerns. Regarding the tests: could you think of cases that the current tests don't cover?

This is taking so many iterations to get to a reasonable place that I think you need some other approach to testing; you clearly aren't finding the interesting cases. Have you considered some sort of test generator, which randomly generates known one and known zero masks, then exhaustively checks whether the end result is consistent?

Indeed I have (although the tests generated are exhaustive, not random), and I attached a script to do so earlier in the thread: https://reviews.llvm.org/F1891038

eli.friedman added inline comments.Jul 24 2016, 3:16 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	If a value has unknown sign, isn't KnownZero.countLeadingOnes() always going to zero?
1508	Thinking about it a bit more, we don't actually care if the exponent overflows; the end result would be undef, so it doesn't matter.

aarzee added inline comments.Jul 24 2016, 4:11 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	I believe that's correct, yes. Thanks for catching that. This means: The absolute value of an integer with unknown sign and bitwidth X has a maximum bitwidth X; We know that we already checked whether the width of the integer fits into the significand earlier in the function, and if we reached this point in the function that comparison was false; So if we reach this point in the function with unknown sign, we can immediately return nullptr. We can add this line after the definitions of KnownNegative and PossiblyNegative, and remove the respective else blocks later in the function: if (PossiblyNegative && !KnownNegative) // If the sign of the number is unknown, the bit range will be the width of the integer type, // and we already checked whether that will fit into the fp type earlier in the function, so // we can exit early. return nullptr; I'll do this in the next revision.

aarzee added inline comments.Jul 24 2016, 4:39 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1508	Is this explicitly defined somewhere?

eli.friedman added inline comments.Jul 24 2016, 4:43 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	I think you can still optimize based on trailing zero bits even if the sign isn't known.
1508	http://llvm.org/docs/LangRef.html#fptosi-to-instruction . "If the value cannot fit in ty2, the results are undefined."

aarzee added inline comments.Jul 24 2016, 5:41 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp

1497

Again, you're right. I wasn't thinking about the least significant possibly set bit. How about this:

// If the number is known to be negative, the highest possibly set bit
// is the lowest bit not known to be one.
// Otherwise, it's the highest bit not known to be zero.

if (KnownNegative)
  MostSignificantPossiblySetBit = int(KnownOne.countTrailingOnes()); 
else
  MostSignificantPossiblySetBit = int(BitWidth - KnownZero.countLeadingOnes()) - 1;

1508

FWIW that's for fptosi, not sitofp, but under sitofp I see this: "If the value cannot fit in the floating point value, the results are undefined."
Thank you. I'll remove that check.

eli.friedman added inline comments.Jul 24 2016, 6:29 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	I assume you meant to use countLeadingOnes for both? Looks roughly right.

aarzee added inline comments.Jul 25 2016, 10:38 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	Nope. Lowest bit not known to be one is KnownOne.countTrailingOnes, and highest bit not known to be zero is BitWidth - KnownZero.countLeadingOnes() - 1.

eli.friedman added inline comments.Jul 25 2016, 10:52 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	What do you need the lowest bit not known to be one for? If the input is known to be an even number, that will always be zero. (If your testing script didn't catch this, you need a more thorough testing script.)

aarzee added inline comments.Jul 25 2016, 11:29 AM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	You're right again. This is pretty intellectually trying for me, so apologies for those errors. The problem with the testing script is that it is currently exhaustive and runs in exponential time, so running it with floats and 32-bit integers is impractical, and I have no hardware that supports half-precision.

eli.friedman added inline comments.Jul 25 2016, 12:17 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	Can you use some sort of software half-precision floating-point library? numpy has a "float16" type which should support the relevant conversion operations.

I would post the latest diff but the place I'm staying at has no Internet currently.

aarzee added inline comments.Jul 27 2016, 5:07 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	I've tried it and it's still impractical at ~43 million tests.

eli.friedman added inline comments.Jul 27 2016, 6:08 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	Brute-forcing 43 million tests in an hour on a single-core 2GHz desktop computer comes out to around 200,000 cycles per test. If you're trying to run `opt -instcombine` 43 million times, that might be a problem, but you should be able to easily fit into a 200,000 cycle budget if you copy-paste the computation from this patch into a C++ test harness. (That wouldn't be a complete end-to-end test, but it would be enough to show the tricky bits work correctly.)

aarzee added inline comments.Jul 29 2016, 12:05 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1497	I managed to pull out the relevant code and call it from Python, and it is significantly faster. So now running 43m tests shouldn't be an issue. And indeed, the lack of a specific condition for unknown sign led to issues. So I added a condition that if the sign is unknown, the highest possibly set bit is the bit to the right of the sign. Small caveat here: if the sign bit is unknown and the rest of the bits are all known zero, the highest possibly set bit will be one to the right of the sign bit, while the lowest possibly set bit is the sign bit, leading to a -1 range. It's not necessary to correct this to 1, because 1 will always fit into any significand, and -1 is less than any significand.

Additional fixes and comments.

I have verified the accuracy of this version of the code with a testing script that I will attach here.

Attached are a Python test script (best run in PyPy) and C++ testing program that work together to exhaustively verify this code for a given floating point format. See the comments in either file.

floattest.cpp2 KBDownload

ftg2.py2 KBDownload

Overall, this is looking much better; simple is good. :)

lib/Transforms/InstCombine/InstCombineCasts.cpp
1483	Leftover comment?
1507	Doesn't the + 1 here cancel out the -1 in the setting of MostSignificantPossiblySetBit? That would make the math a bit more straightforward.
test/Transforms/InstCombine/sitofp.ll
272	Did you find any interesting new cases worth having a separate regression test?

aarzee added inline comments.Jul 30 2016, 5:51 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1483	I got rid of that in my local copy but I didn't update my diff. Thanks for catching that.
1507	The reasoning behind this is that I wanted the values for both least and most significant possibly set bit to be zero indexed; I thought it might be confusing if one was and the other wasn't.
test/Transforms/InstCombine/sitofp.ll
272	Yes, I will add some tests with unknown sign.

Additional comments and tests for unknown sign.

eli.friedman added inline comments.Jul 30 2016, 7:53 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1507	Are you sure subtracting one actually makes the most significant bit zero-indexed? Normally, finding the distance between two array indexes is just simple subtraction; the extra "+1" looks weird. It's also kind of suspicious that your "zero-based" indexing can return a negative number (as an extreme case, if a 16-bit integer is all ones, your code says the most significant possibly set bit has index -1).
test/Transforms/InstCombine/sitofp.ll
278	Maybe also add a corresponding test which doesn't fold?

aarzee added inline comments.Jul 31 2016, 2:35 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1507	I'm going to take a look at the actual behavior, I think the least significant possibly set bit might be one-indexed; in any case something is behaving differently than I intended. WRT the all-ones case, if a number has no unknown bits then it is by nature a constant, and would be handled by ConstProp, right?

eli.friedman added inline comments.Jul 31 2016, 3:03 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1507	Yes, in the all-ones case it would get turned into a constant; not sure off the top of my head if that's guaranteed to happen before this code runs, though.

aarzee added inline comments.Jul 31 2016, 3:35 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1507	After checking I remembered the reason for the +1: what I'm calling BitRange is actually the width; for example, the width between the 1s in 0b101 would be 3. Maybe width isn't the best word. Whatever the case, this is the reason.

eli.friedman added inline comments.Jul 31 2016, 3:50 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1507	Oh, hmm... not how I would think of it. b\| 1 \| 0 \| 1 \| 3 2 1 0 high low So the high index is 3, the low index is zero, the significant bits are the bits inside the range, and the length of the range is the high index minus the low index. This is how iterators and ranges in C++ work.

aarzee added inline comments.Jul 31 2016, 4:41 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
1507	Right, I hadn't thought of it that way; that is how ranges are expressed in most languages I work with. I'll make the change.

Modified most and least significant possibly set bit values to act more like range bounds; modified and added unknown sign tests.

Ping.

I'd like to know if it's possible for a value to be known to be -1 without being a constant; currently the code would not act correctly if this were the case because I made the assumption that it would be a constant and that the code wouldn't be run.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

73 lines

test/

Transforms/

InstCombine/

sitofp.ll

100 lines

Diff 56780

lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 475 Lines • ▼ Show 20 Lines
}		}

Instruction *InstCombiner::visitTrunc(TruncInst &CI) {		Instruction *InstCombiner::visitTrunc(TruncInst &CI) {
if (Instruction *Result = commonCastTransforms(CI))		if (Instruction *Result = commonCastTransforms(CI))
return Result;		return Result;

// Test if the trunc is the user of a select which is part of a		// Test if the trunc is the user of a select which is part of a
// minimum or maximum operation. If so, don't do any more simplification.		// minimum or maximum operation. If so, don't do any more simplification.
// Even simplifying demanded bits can break the canonical form of a		// Even simplifying demanded bits can break the canonical form of a
// min/max.		// min/max.
Value LHS, RHS;		Value LHS, RHS;
if (SelectInst *SI = dyn_cast<SelectInst>(CI.getOperand(0)))		if (SelectInst *SI = dyn_cast<SelectInst>(CI.getOperand(0)))
if (matchSelectPattern(SI, LHS, RHS).Flavor != SPF_UNKNOWN)		if (matchSelectPattern(SI, LHS, RHS).Flavor != SPF_UNKNOWN)
return nullptr;		return nullptr;

// See if we can simplify any instructions used by the input whose sole		// See if we can simplify any instructions used by the input whose sole
// purpose is to compute bits we don't care about.		// purpose is to compute bits we don't care about.
if (SimplifyDemandedInstructionBits(CI))		if (SimplifyDemandedInstructionBits(CI))
return &CI;		return &CI;

Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);
Type DestTy = CI.getType(), SrcTy = Src->getType();		Type DestTy = CI.getType(), SrcTy = Src->getType();

▲ Show 20 Lines • Show All 629 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitSExt(SExtInst &CI) {
// purpose is to compute bits we don't care about.		// purpose is to compute bits we don't care about.
if (SimplifyDemandedInstructionBits(CI))		if (SimplifyDemandedInstructionBits(CI))
return &CI;		return &CI;

Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);
Type SrcTy = Src->getType(), DestTy = CI.getType();		Type SrcTy = Src->getType(), DestTy = CI.getType();

// If we know that the value being extended is positive, we can use a zext		// If we know that the value being extended is positive, we can use a zext
// instead.		// instead.
bool KnownZero, KnownOne;		bool KnownZero, KnownOne;
ComputeSignBit(Src, KnownZero, KnownOne, 0, &CI);		ComputeSignBit(Src, KnownZero, KnownOne, 0, &CI);
if (KnownZero) {		if (KnownZero) {
Value *ZExt = Builder->CreateZExt(Src, DestTy);		Value *ZExt = Builder->CreateZExt(Src, DestTy);
return replaceInstUsesWith(CI, ZExt);		return replaceInstUsesWith(CI, ZExt);
}		}

// Attempt to extend the entire input expression tree to the destination		// Attempt to extend the entire input expression tree to the destination
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
Instruction *InstCombiner::visitFPExt(CastInst &CI) {		Instruction *InstCombiner::visitFPExt(CastInst &CI) {
return commonCastTransforms(CI);		return commonCastTransforms(CI);
}		}

// fpto{s/u}i({u/s}itofp(X)) --> X or zext(X) or sext(X) or trunc(X)		// fpto{s/u}i({u/s}itofp(X)) --> X or zext(X) or sext(X) or trunc(X)
// This is safe if the intermediate type has enough bits in its mantissa to		// This is safe if the intermediate type has enough bits in its mantissa to
// accurately represent all values of X. For example, this won't work with		// accurately represent all values of X. For example, this won't work with
// i64 -> float -> i64.		// i64 -> float -> i64.
		// However, this is also safe if we can establish the range between most and
		// least significant set bits of abs(X) fits into the mantissa, and that the
		// number of trailing zeros fits into the exponent.
		reamesUnsubmitted Not Done Reply Inline Actions The second part of this comment doesn't make any sense to me. I can understand why an integer value of less than mantissa size (actual bits uses) could be converted lossly to and from a floating point type. Why does the exponent come into this? Are you trying to reason about integer values larger than the mantissa with precise representations in floating point? If so, could I ask that you split this into two patches: Handle just the mantissa case. Extend it to the full complexity. Doing it that way would make it far more obvious what's going on. I would personally be comfortable signing off on the former, but would not be on the later. Also, extracting out a helper function called something like "hasPreciseFPRepresentation" would make this code far more clear. (If I'm actually understanding it correctly at all!) reames: The second part of this comment doesn't make any sense to me. I can understand why an integer…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Yes, that's what I'm doing. The exponent comes into play because (for example) `(2 24 - 1) << 10` has 23 significant bits, which fits into the mantissa, but has 9 leading zeroes, which is too large for the exponent (8 bits). Could you be more specific about the splitting of the patch? I'm not sure what you mean by 'just the mantissa case' - if you mean the integer fits into the mantissa without any exponent, that's what FoldItoFPtoI already does, but based on bit-width of the integer type, not the integer value. aarzee: Yes, that's what I'm doing. The exponent comes into play because (for example) `(2 24 - 1)…
		reamesUnsubmitted Not Done Reply Inline Actions I was suggesting that you first post a patch which implements this optimization for integers which can be represented in floating point without scaling (i.e. without using the exponent for anything non-trivial). Once that's in, the scaling part of the change becomes far easier to reason about in isolation. To be clear, I am not going to sign off on the change in its current form. I don't understand floating point deeply enough to reason it in this way. You're welcome to find an alternate reviewer, but if you want a LGTM from me, splitting it is going to be required. reames: I was suggesting that you first post a patch which implements this optimization for integers…
		scanonUnsubmitted Not Done Reply Inline Actions A few things: This: and that the number of trailing zeros fits into the exponent is really confused. That's not how it [floating-point encoding] works at all. A correct algorithm for determining if an integer n can be exactly represented is (pseudocode): m = absoluteValue(n) i = mostSignificantNonzeroBit(m) j = leastSignificantNonzeroBit(m) if (i - j < numberOfSignificandBits) { // significand is representable, check exponent if (i <= greatestFiniteExponent) { // number can be exactly represented } } // number cannot be exactly represented In particular, if we look at your example, `(2 24 - 1) << 10`, this number absolutely is exactly representable in single-precision. We have: m = 224-1 << 10 i = 33 j = 10 i-j = 23 < numberOfSignificandBits = 24 i = 33 <= greatestFiniteExponent = 127. Please don't use the term "mantissa". It is frequently used, even in the LLVM sources, but it is subtly wrong (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22). The correct term is "significand"; any new written code should use that instead. scanon: A few things: 1. This: > and that the number of trailing zeros fits into the exponent is…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Shouldn't it be `j <= greatestFiniteExponent`? (I'm clearly not an expert here, so I just wanted to check). aarzee: Shouldn't it be `j <= greatestFiniteExponent`? (I'm clearly not an expert here, so I just…
		scanonUnsubmitted Not Done Reply Inline Actions No, it should be `i`. Normal binary floating-point numbers have the form +/- significand * 2^exponent, where significand is a number in [1,2). Thus, the MSB a representable integer is always 2^exponent. scanon: No, it should be `i`. Normal binary floating-point numbers have the form +/- significand *…
Instruction *InstCombiner::FoldItoFPtoI(Instruction &FI) {		Instruction *InstCombiner::FoldItoFPtoI(Instruction &FI) {
if (!isa<UIToFPInst>(FI.getOperand(0)) && !isa<SIToFPInst>(FI.getOperand(0)))		if (!isa<UIToFPInst>(FI.getOperand(0)) && !isa<SIToFPInst>(FI.getOperand(0)))
return nullptr;		return nullptr;
Instruction *OpI = cast<Instruction>(FI.getOperand(0));		Instruction *OpI = cast<Instruction>(FI.getOperand(0));

Value *SrcI = OpI->getOperand(0);		Value *SrcI = OpI->getOperand(0);
Type *FITy = FI.getType();		Type *FITy = FI.getType();
Type *OpITy = OpI->getType();		Type *OpITy = OpI->getType();
Show All 9 Lines	Instruction *InstCombiner::FoldItoFPtoI(Instruction &FI) {
// of the input range and output range.		// of the input range and output range.

// This means this is also safe for a signed input and unsigned output, since		// This means this is also safe for a signed input and unsigned output, since
// a negative input would lead to undefined behavior.		// a negative input would lead to undefined behavior.
int InputSize = (int)SrcTy->getScalarSizeInBits() - IsInputSigned;		int InputSize = (int)SrcTy->getScalarSizeInBits() - IsInputSigned;
int OutputSize = (int)FITy->getScalarSizeInBits() - IsOutputSigned;		int OutputSize = (int)FITy->getScalarSizeInBits() - IsOutputSigned;
int ActualSize = std::min(InputSize, OutputSize);		int ActualSize = std::min(InputSize, OutputSize);

if (ActualSize <= OpITy->getFPMantissaWidth()) {		int MantissaWidth = OpITy->getFPMantissaWidth();

		bool safe;
		mcrosierUnsubmitted Not Done Reply Inline Actions How about: bool Safe = ActualSize <= MantissaWidth; // Compute known bits to see if the integer value fits into the fp type. if (!Safe) { ... } if (Safe) { ... Note: Local variables should be capitalized. mcrosier: How about: bool Safe = ActualSize <= MantissaWidth; // Compute known bits to see if the…

		if (ActualSize <= MantissaWidth) {
		safe = true;
		// No need to compute known bits.
		} else {
		// Now try and see if the integer value fits into the fp type.
		int LeastSignificantPossiblySetBit = 0;
		int MostSignificantPossiblySetBit = 0;
		int HighestNotKnownZero = 0;
		uint32_t BitWidth = SrcTy->getScalarSizeInBits();
		APInt KnownZero(BitWidth, 0), KnownOne(BitWidth, 0);
		computeKnownBits(SrcI, KnownZero, KnownOne, 0, &FI);
		reamesUnsubmitted Not Done Reply Inline Actions This isn't what you want. In particular, this doesn't produce a useful answer for any (known) positive number since you'll immediately exit the loop. A better approach here is to look for a prefix which is known to be either zero or one. (i.e. 0000 or 1111, not 0101.) For a signed number, you can simply drop those bits entirely since the sign is encoded separately. For an unsigned, you can only drop a zero prefix. This can be written easily using countNumberOfLeadingX reames: This isn't what you want. In particular, this doesn't produce a useful answer for any (known)…
		bool PossiblyNegative = IsInputSigned && !KnownZero[BitWidth - 1];
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Why are you subtracting one here? The use of std::max here is suspicious; you should be treating UUUUUUUU and UUUUUUU0 differently. eli.friedman: Why are you subtracting one here? The use of std::max here is suspicious; you should be…
		bool KnownNegative = PossiblyNegative && KnownOne[BitWidth - 1];
		for (unsigned i = 0; i < BitWidth; ++i) {
		if (!KnownZero[i]) {
		LeastSignificantPossiblySetBit = i;
		break;
		}
		}
		if (!KnownNegative) {
		// only necessary if it's not known negative (possibly positive)
		mcrosierUnsubmitted Not Done Reply Inline Actions Please capitalize Only and add a period. http://llvm.org/docs/CodingStandards.html#commenting mcrosier: Please capitalize Only and add a period. http://llvm.org/docs/CodingStandards.html#commenting
		for (unsigned i = 1, j; i < BitWidth + 1; ++i) {
		j = BitWidth - i;
		if (!KnownZero[j]) {
		HighestNotKnownZero = j;
		break;
		}
		}
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Leftover comment? eli.friedman: Leftover comment?
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions I got rid of that in my local copy but I didn't update my diff. Thanks for catching that. aarzee: I got rid of that in my local copy but I didn't update my diff. Thanks for catching that.
		}
		if (!PossiblyNegative) {
		MostSignificantPossiblySetBit = HighestNotKnownZero;
		} else {
		if (KnownOne.countTrailingZeros() == BitWidth - 1) {
		MostSignificantPossiblySetBit = BitWidth - 1;
		} else {
		eli.friedmanUnsubmitted Not Done Reply Inline Actions I suspect what you're doing here is wrong, but it's hard to follow without any comments. eli.friedman: I suspect what you're doing here is wrong, but it's hard to follow without any comments.
		int LowestNotKnownOne = -1;
		for (unsigned i = 0; i < BitWidth; ++i) {
		if (!KnownOne[i]) {
		LowestNotKnownOne = i;
		break;
		}
		}
		eli.friedmanUnsubmitted Not Done Reply Inline Actions What are you trying to check here? (Maybe you're looking for APFloat::semanticsMaxExponent?) eli.friedman: What are you trying to check here? (Maybe you're looking for APFloat::semanticsMaxExponent?)
		eli.friedmanUnsubmitted Not Done Reply Inline Actions If a value has unknown sign, isn't KnownZero.countLeadingOnes() always going to zero? eli.friedman: If a value has unknown sign, isn't KnownZero.countLeadingOnes() always going to zero?
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions I believe that's correct, yes. Thanks for catching that. This means: The absolute value of an integer with unknown sign and bitwidth X has a maximum bitwidth X; We know that we already checked whether the width of the integer fits into the significand earlier in the function, and if we reached this point in the function that comparison was false; So if we reach this point in the function with unknown sign, we can immediately return nullptr. We can add this line after the definitions of KnownNegative and PossiblyNegative, and remove the respective else blocks later in the function: if (PossiblyNegative && !KnownNegative) // If the sign of the number is unknown, the bit range will be the width of the integer type, // and we already checked whether that will fit into the fp type earlier in the function, so // we can exit early. return nullptr; I'll do this in the next revision. aarzee: I believe that's correct, yes. Thanks for catching that. This means: - The absolute value of…
		eli.friedmanUnsubmitted Not Done Reply Inline Actions I think you can still optimize based on trailing zero bits even if the sign isn't known. eli.friedman: I think you can still optimize based on trailing zero bits even if the sign isn't known.
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Again, you're right. I wasn't thinking about the least significant possibly set bit. How about this: // If the number is known to be negative, the highest possibly set bit // is the lowest bit not known to be one. // Otherwise, it's the highest bit not known to be zero. if (KnownNegative) MostSignificantPossiblySetBit = int(KnownOne.countTrailingOnes()); else MostSignificantPossiblySetBit = int(BitWidth - KnownZero.countLeadingOnes()) - 1; aarzee: Again, you're right. I wasn't thinking about the least significant possibly set bit. How about…
		eli.friedmanUnsubmitted Not Done Reply Inline Actions I assume you meant to use countLeadingOnes for both? Looks roughly right. eli.friedman: I assume you meant to use countLeadingOnes for both? Looks roughly right.
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Nope. Lowest bit not known to be one is KnownOne.countTrailingOnes, and highest bit not known to be zero is BitWidth - KnownZero.countLeadingOnes() - 1. aarzee: Nope. Lowest bit not known to be one is KnownOne.countTrailingOnes, and highest bit not known…
		eli.friedmanUnsubmitted Not Done Reply Inline Actions What do you need the lowest bit not known to be one for? If the input is known to be an even number, that will always be zero. (If your testing script didn't catch this, you need a more thorough testing script.) eli.friedman: What do you need the lowest bit not known to be one for? If the input is known to be an even…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions You're right again. This is pretty intellectually trying for me, so apologies for those errors. The problem with the testing script is that it is currently exhaustive and runs in exponential time, so running it with floats and 32-bit integers is impractical, and I have no hardware that supports half-precision. aarzee: You're right again. This is pretty intellectually trying for me, so apologies for those errors.
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Can you use some sort of software half-precision floating-point library? numpy has a "float16" type which should support the relevant conversion operations. eli.friedman: Can you use some sort of software half-precision floating-point library? numpy has a "float16"…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions I've tried it and it's still impractical at ~43 million tests. aarzee: I've tried it and it's still impractical at ~43 million tests.
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Brute-forcing 43 million tests in an hour on a single-core 2GHz desktop computer comes out to around 200,000 cycles per test. If you're trying to run `opt -instcombine` 43 million times, that might be a problem, but you should be able to easily fit into a 200,000 cycle budget if you copy-paste the computation from this patch into a C++ test harness. (That wouldn't be a complete end-to-end test, but it would be enough to show the tricky bits work correctly.) eli.friedman: Brute-forcing 43 million tests in an hour on a single-core 2GHz desktop computer comes out to…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions I managed to pull out the relevant code and call it from Python, and it is significantly faster. So now running 43m tests shouldn't be an issue. And indeed, the lack of a specific condition for unknown sign led to issues. So I added a condition that if the sign is unknown, the highest possibly set bit is the bit to the right of the sign. Small caveat here: if the sign bit is unknown and the rest of the bits are all known zero, the highest possibly set bit will be one to the right of the sign bit, while the lowest possibly set bit is the sign bit, leading to a -1 range. It's not necessary to correct this to 1, because 1 will always fit into any significand, and -1 is less than any significand. aarzee: I managed to pull out the relevant code and call it from Python, and it is significantly faster.
		if (LowestNotKnownOne != int(BitWidth - 2))
		++LowestNotKnownOne;
		if (KnownNegative) {
		MostSignificantPossiblySetBit = LowestNotKnownOne;
		} else {
		MostSignificantPossiblySetBit = std::max(HighestNotKnownZero,
		LowestNotKnownOne);
		}
		}
		}
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Doesn't the + 1 here cancel out the -1 in the setting of MostSignificantPossiblySetBit? That would make the math a bit more straightforward. eli.friedman: Doesn't the + 1 here cancel out the -1 in the setting of MostSignificantPossiblySetBit? That…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions The reasoning behind this is that I wanted the values for both least and most significant possibly set bit to be zero indexed; I thought it might be confusing if one was and the other wasn't. aarzee: The reasoning behind this is that I wanted the values for both least and most significant…
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Are you sure subtracting one actually makes the most significant bit zero-indexed? Normally, finding the distance between two array indexes is just simple subtraction; the extra "+1" looks weird. It's also kind of suspicious that your "zero-based" indexing can return a negative number (as an extreme case, if a 16-bit integer is all ones, your code says the most significant possibly set bit has index -1). eli.friedman: Are you sure subtracting one actually makes the most significant bit zero-indexed? Normally…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions I'm going to take a look at the actual behavior, I think the least significant possibly set bit might be one-indexed; in any case something is behaving differently than I intended. WRT the all-ones case, if a number has no unknown bits then it is by nature a constant, and would be handled by ConstProp, right? aarzee: I'm going to take a look at the actual behavior, I think the least significant possibly set bit…
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Yes, in the all-ones case it would get turned into a constant; not sure off the top of my head if that's guaranteed to happen before this code runs, though. eli.friedman: Yes, in the all-ones case it would get turned into a constant; not sure off the top of my head…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions After checking I remembered the reason for the +1: what I'm calling BitRange is actually the width; for example, the width between the 1s in 0b101 would be 3. Maybe width isn't the best word. Whatever the case, this is the reason. aarzee: After checking I remembered the reason for the +1: what I'm calling BitRange is actually the…
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Oh, hmm... not how I would think of it. b\| 1 \| 0 \| 1 \| 3 2 1 0 high low So the high index is 3, the low index is zero, the significant bits are the bits inside the range, and the length of the range is the high index minus the low index. This is how iterators and ranges in C++ work. eli.friedman: Oh, hmm... not how I would think of it. b\| 1 \| 0 \| 1 \| 3 2 1 0…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Right, I hadn't thought of it that way; that is how ranges are expressed in most languages I work with. I'll make the change. aarzee: Right, I hadn't thought of it that way; that is how ranges are expressed in most languages I…
		int BitRange = MostSignificantPossiblySetBit -
		eli.friedmanUnsubmitted Not Done Reply Inline Actions Thinking about it a bit more, we don't actually care if the exponent overflows; the end result would be undef, so it doesn't matter. eli.friedman: Thinking about it a bit more, we don't actually care if the exponent overflows; the end result…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Is this explicitly defined somewhere? aarzee: Is this explicitly defined somewhere?
		eli.friedmanUnsubmitted Not Done Reply Inline Actions http://llvm.org/docs/LangRef.html#fptosi-to-instruction . "If the value cannot fit in ty2, the results are undefined." eli.friedman: http://llvm.org/docs/LangRef.html#fptosi-to-instruction . "If the value cannot fit in ty2, the…
		aarzeeAuthorUnsubmitted Not Done Reply Inline Actions FWIW that's for fptosi, not sitofp, but under sitofp I see this: "If the value cannot fit in the floating point value, the results are undefined." Thank you. I'll remove that check. aarzee: FWIW that's for fptosi, not sitofp, but under sitofp I see this: "If the value cannot fit in…
		LeastSignificantPossiblySetBit + 1;
		safe = BitRange <= MantissaWidth && MostSignificantPossiblySetBit <=
		(int)OpITy->getScalarSizeInBits();
		}

		if (safe) {
if (FITy->getScalarSizeInBits() > SrcTy->getScalarSizeInBits()) {		if (FITy->getScalarSizeInBits() > SrcTy->getScalarSizeInBits()) {
if (IsInputSigned && IsOutputSigned)		if (IsInputSigned && IsOutputSigned)
return new SExtInst(SrcI, FITy);		return new SExtInst(SrcI, FITy);
return new ZExtInst(SrcI, FITy);		return new ZExtInst(SrcI, FITy);
}		}
if (FITy->getScalarSizeInBits() < SrcTy->getScalarSizeInBits())		if (FITy->getScalarSizeInBits() < SrcTy->getScalarSizeInBits())
return new TruncInst(SrcI, FITy);		return new TruncInst(SrcI, FITy);
if (SrcTy == FITy)		if (SrcTy == FITy)
▲ Show 20 Lines • Show All 586 Lines • Show Last 20 Lines

test/Transforms/InstCombine/sitofp.ll

	Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines
	; CHECK: sitofp			; CHECK: sitofp
	; CHECK-NEXT: fptosi			; CHECK-NEXT: fptosi
	define i55 @test19(i64 %A) nounwind {			define i55 @test19(i64 %A) nounwind {
	%B = sitofp i64 %A to double			%B = sitofp i64 %A to double
	%C = fptosi double %B to i55			%C = fptosi double %B to i55
	ret i55 %C			ret i55 %C
	}			}

				; This should fold because even though the bit width of the integer
				; is greater than that of the mantissa, we can establish that the
				; actual value of the integer will fit into the mantissa.
				; (Bits of %B: 00000000UUUUUUUUUUUUUUUUUUUUUUUU, U = unknown)
				; CHECK-LABEL: test20
				; CHECK: and
				; CHECK-NEXT: ret i32
				reamesUnsubmitted Not Done Reply Inline Actions FYI, you're going to need a lot more test cases to convince me this is correct. Even for the split patch. reames: FYI, you're going to need a lot more test cases to convince me this is correct. Even for the…
				define i32 @test20(i32 %A) nounwind {
				reamesUnsubmitted Not Done Reply Inline Actions Absolutely minimum testing you're going to need: sitofp positive number in and out of range sitofp negative number in and out of range uitofp number in and out of range All in the form of LIT tests. reames: Absolutely minimum testing you're going to need: sitofp positive number in and out of range…
				%B = and i32 %A, 16777215
				%C = sitofp i32 %B to float
				%D = fptosi float %C to i32
				ret i32 %D
				}

				; Same as last test, but unsigned.
				; CHECK-LABEL: test21
				; CHECK: and
				; CHECK-NEXT: ret i32
				define i32 @test21(i32 %A) nounwind {
				%B = and i32 %A, 16777215
				%C = uitofp i32 %B to float
				%D = fptoui float %C to i32
				ret i32 %D
				}

				; This should fold, because even though the effective exponent is
				; 2**24, which does not fit into float's exponent, we can just
				; shift the mantissa 1 bit to the left to compensate.
				; (Bits of %B: 0000000U000000000000000000000000, U = unknown)
				; CHECK-LABEL: test22
				; CHECK: and
				; CHECK-NEXT: ret i32
				define i32 @test22(i32 %A) nounwind {
				%B = and i32 %A, 16777216
				%C = sitofp i32 %B to float
				%D = fptosi float %C to i32
				ret i32 %D
				}

				; Same as last test, but unsigned.
				; CHECK-LABEL: test23
				; CHECK: and
				; CHECK-NEXT: ret i32
				define i32 @test23(i32 %A) nounwind {
				%B = and i32 %A, 16777216
				%C = uitofp i32 %B to float
				%D = fptoui float %C to i32
				ret i32 %D
				}

				; This should not fold, because the integer's value does not fit
				; into the mantissa.
				; (Bits of %B: 0000000UUUUUUUUUUUUUUUUUUUUUUUUU, U = unknown)
				; CHECK-LABEL: test24
				; CHECK: sitofp
				; CHECK-NEXT: fptosi
				define i32 @test24(i32 %A) nounwind {
				%B = and i32 %A, 33554431
				%C = sitofp i32 %B to float
				%D = fptosi float %C to i32
				ret i32 %D
				}

				; Same as last test, but unsigned.
				; CHECK-LABEL: test25
				; CHECK: uitofp
				; CHECK-NEXT: fptoui
				define i32 @test25(i32 %A) nounwind {
				%B = and i32 %A, 33554431
				%C = uitofp i32 %B to float
				%D = fptoui float %C to i32
				ret i32 %D
				}

				; This should fold because due to the sign being set, the highest
				; possibly set bit of the absolute value is the bit to the left of
				; the highest not known one bit (in this case, position 0 + 1 = 1).
				; (Bits of %B: 1111111111111111111111111111111U, U = unknown)
				; CHECK-LABEL: test26
				; CHECK: or
				; CHECK-NEXT: ret i32
				define i32 @test26(i32 %A) nounwind {
				%B = or i32 %A, -2
				%C = sitofp i32 %B to float
				%D = fptosi float %C to i32
				ret i32 %D
				}

				eli.friedmanUnsubmitted Not Done Reply Inline Actions Did you find any interesting new cases worth having a separate regression test? eli.friedman: Did you find any interesting new cases worth having a separate regression test?
				aarzeeAuthorUnsubmitted Not Done Reply Inline Actions Yes, I will add some tests with unknown sign. aarzee: Yes, I will add some tests with unknown sign.
				; This should not fold because of the same property.
				; (Bits of %B: 10000000UUUUUUUUUUUUUUUUUUUUUUUU, U = unknown)
				; CHECK-LABEL: test27
				; CHECK: sitofp
				; CHECK-NEXT: fptosi
				define i32 @test27(i32 %A) nounwind {
				eli.friedmanUnsubmitted Not Done Reply Inline Actions Maybe also add a corresponding test which doesn't fold? eli.friedman: Maybe also add a corresponding test which doesn't fold?
				%B = and i32 %A, 16777215
				%C = or i32 -2147483648, %B
				%D = sitofp i32 %C to float
				%E = fptosi float %D to i32
				ret i32 %E
				}

This is an archive of the discontinued LLVM Phabricator instance.

Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float typeNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56780

lib/Transforms/InstCombine/InstCombineCasts.cpp

test/Transforms/InstCombine/sitofp.ll

Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float type
Needs ReviewPublic