Page MenuHomePhabricator

Broaden FoldItoFPtoI to try and establish whether the integer value fits into the float type
Needs ReviewPublic

Authored by aarzee on Apr 15 2016, 1:53 PM.

Details

Summary

Previously, FoldItoFPtoI would only check if the width of the integer type fits into the width of the mantissa. This change will now let it fold if it can establish that the actual value of the integer fits within the floating point type (considering mantissa and exponent).

Was originally D19177, but was notified on IRC that since I didn't add llvm-commits as a subscriber, it wouldn't show up on that list.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Fix typo in test (CHECK-NEST -> CHECK-NEXT)

Any feedback on this?

reames requested changes to this revision.Apr 21 2016, 7:58 AM
reames added a reviewer: reames.
reames added a subscriber: reames.
reames added inline comments.
lib/Transforms/InstCombine/InstCombineCasts.cpp
1425

The second part of this comment doesn't make any sense to me. I can understand why an integer value of less than mantissa size (actual bits uses) could be converted lossly to and from a floating point type. Why does the exponent come into this? Are you trying to reason about integer values larger than the mantissa with precise representations in floating point?

If so, could I ask that you split this into two patches:

  1. Handle just the mantissa case.
  2. Extend it to the full complexity.

Doing it that way would make it far more obvious what's going on. I would personally be comfortable signing off on the former, but would not be on the later.

Also, extracting out a helper function called something like "hasPreciseFPRepresentation" would make this code far more clear. (If I'm actually understanding it correctly at all!)

test/Transforms/InstCombine/sitofp.ll
191

FYI, you're going to need a lot more test cases to convince me this is correct. Even for the split patch.

This revision now requires changes to proceed.Apr 21 2016, 7:58 AM
aarzee added inline comments.Apr 21 2016, 8:23 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1425

Yes, that's what I'm doing.

The exponent comes into play because (for example) (2 ** 24 - 1) << 10 has 23 significant bits, which fits into the mantissa, but has 9 leading zeroes, which is too large for the exponent (8 bits).

Could you be more specific about the splitting of the patch? I'm not sure what you mean by 'just the mantissa case' - if you mean the integer fits into the mantissa without any exponent, that's what FoldItoFPtoI already does, but based on bit-width of the integer type, not the integer value.

aarzee updated this revision to Diff 54515.Apr 21 2016, 8:55 AM
aarzee edited edge metadata.

Made the code shorter and more concise, added some comments, added a new test. For some reason I'm having an issue building LLVM so I wasn't able to personally verify that this works as intended / at all.

aarzee updated this revision to Diff 54516.Apr 21 2016, 9:01 AM
aarzee edited edge metadata.

Highest possible trailing zeros was off by one (exponent is either unsigned or biased, with non-negative values ranging from 0 to 2^(x - 1), where x is the bit width of the exponent).

reames added inline comments.Apr 21 2016, 9:26 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1425

I was suggesting that you first post a patch which implements this optimization for integers which can be represented in floating point without scaling (i.e. without using the exponent for anything non-trivial). Once that's in, the scaling part of the change becomes far easier to reason about in isolation.

To be clear, I am not going to sign off on the change in its current form. I don't understand floating point deeply enough to reason it in this way. You're welcome to find an alternate reviewer, but if you want a LGTM from me, splitting it is going to be required.

aarzee updated this revision to Diff 55987.May 3 2016, 7:04 AM
aarzee updated this object.

Made the change suggested by @reames to only consider the mantissa. Scaling and exponent is no longer considered. Should be more easily verifiable.

aarzee added a comment.May 3 2016, 8:21 AM

FWIW I still believe that Diff 2 is (at least close to) correct, and I'd still like to pursue implementing it.

I've attached a file containing the latest version of the full proposed patch as well as a Python 3 script to test and verify whether the optimization is applied correctly. It requires a bit of editing, as shown in the top and bottom of the script. It's supposed to write any failed generated IR to exN.ll (where N is some number), but I'm not sure whether it actually does. Python's multiprocessing module is finicky. Note that you must use a floating point type that is supported by the platform you're on (I tried 'half' on x86-64 and got weird results because it was legalized to a float).

reames requested changes to this revision.May 9 2016, 7:07 PM
reames edited edge metadata.

Comments inline.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1466

This isn't what you want. In particular, this doesn't produce a useful answer for any (known) positive number since you'll immediately exit the loop.

A better approach here is to look for a prefix which is known to be either zero or one. (i.e. 0000 or 1111, not 0101.) For a signed number, you can simply drop those bits entirely since the sign is encoded separately. For an unsigned, you can only drop a zero prefix. This can be written easily using countNumberOfLeadingX

test/Transforms/InstCombine/sitofp.ll
192

Absolutely minimum testing you're going to need:
sitofp positive number in and out of range
sitofp negative number in and out of range
uitofp number in and out of range

All in the form of LIT tests.

This revision now requires changes to proceed.May 9 2016, 7:07 PM
aarzee updated this revision to Diff 56780.May 10 2016, 12:29 PM
aarzee updated this object.
aarzee edited edge metadata.

I've decided to go ahead and submit the full change (considering exponent as well as mantissa) and I've added a few more tests as @reames suggested.

mcrosier added inline comments.
lib/Transforms/InstCombine/InstCombineCasts.cpp
1454

How about:

bool Safe = ActualSize <= MantissaWidth;

// Compute known bits to see if the integer value fits into the fp type.
if (!Safe) {
  ...
}

if (Safe) {
  ...

Note: Local variables should be capitalized.

1476

Please capitalize Only and add a period.

http://llvm.org/docs/CodingStandards.html#commenting

aarzee updated this revision to Diff 56801.May 10 2016, 1:44 PM
aarzee edited edge metadata.

Replace for loops with count{Leading,Trailing}{Zeros,Ones}.

aarzee updated this revision to Diff 56804.May 10 2016, 1:46 PM

Added the changes suggested by @mcrosier.

aarzee updated this revision to Diff 56806.May 10 2016, 1:49 PM

Same as last diff except that I didn't rush through without actually reading the comment.

reames resigned from this revision.May 11 2016, 5:08 PM
reames removed a reviewer: reames.

Previous advice is being actively ignored and outside of my expertise. If anyone else wants to review feel free, I don't have the time to cycle on this.

scanon added inline comments.May 12 2016, 8:33 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1425

A few things:

  1. This:

and that the number of trailing zeros fits into the exponent

is really confused. That's not how it [floating-point encoding] works at all. A correct algorithm for determining if an integer n can be exactly represented is (pseudocode):

m = absoluteValue(n)
i = mostSignificantNonzeroBit(m)
j = leastSignificantNonzeroBit(m)
if (i - j < numberOfSignificandBits) {
  // significand is representable, check exponent
  if (i <= greatestFiniteExponent) {
    // number can be exactly represented
  }
}
// number cannot be exactly represented

In particular, if we look at your example, (2 ** 24 - 1) << 10, this number absolutely is exactly representable in single-precision. We have:

m = 2**24-1 << 10
i = 33
j = 10
i-j = 23 < numberOfSignificandBits = 24
i = 33 <= greatestFiniteExponent = 127.
  1. Please don't use the term "mantissa". It is frequently used, even in the LLVM sources, but it is subtly wrong (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22). The correct term is "significand"; any new written code should use that instead.
aarzee added inline comments.May 22 2016, 4:18 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1425

Shouldn't it be j <= greatestFiniteExponent? (I'm clearly not an expert here, so I just wanted to check).

scanon added inline comments.May 23 2016, 7:38 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1425

No, it should be i. Normal binary floating-point numbers have the form +/- significand * 2^exponent, where significand is a number in [1,2). Thus, the MSB a representable integer is always 2^exponent.

aarzee updated this revision to Diff 58879.May 27 2016, 8:58 PM

Updated variable names and improved correctness.

Question re: usage of 'mantissa' vs 'significand' - should getFPMantissaWidth be renamed?

Needs more comments in the code, and more tests. In particular, you have zero tests of values with trailing zeros.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1467

Why are you subtracting one here? The use of std::max here is suspicious; you should be treating UUUUUUUU and UUUUUUU0 differently.

1490

I suspect what you're doing here is wrong, but it's hard to follow without any comments.

1497

What are you trying to check here? (Maybe you're looking for APFloat::semanticsMaxExponent?)

aarzee updated this revision to Diff 65262.Jul 23 2016, 5:28 PM

Added additional comments, adjustments, and fixes for code clarity. Added exponent-related tests.

Thanks very much for the feedback, @eli.friedman. Hopefully I've addressed most of your concerns. Regarding the tests: could you think of cases that the current tests don't cover?

This is taking so many iterations to get to a reasonable place that I think you need some other approach to testing; you clearly aren't finding the interesting cases. Have you considered some sort of test generator, which randomly generates known one and known zero masks, then exhaustively checks whether the end result is consistent?

Indeed I have (although the tests generated are exhaustive, not random), and I attached a script to do so earlier in the thread: https://reviews.llvm.org/F1891038

eli.friedman added inline comments.Jul 24 2016, 3:16 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

If a value has unknown sign, isn't KnownZero.countLeadingOnes() always going to zero?

1508

Thinking about it a bit more, we don't actually care if the exponent overflows; the end result would be undef, so it doesn't matter.

aarzee added inline comments.Jul 24 2016, 4:11 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

I believe that's correct, yes. Thanks for catching that.

This means:

  • The absolute value of an integer with unknown sign and bitwidth X has a maximum bitwidth X;
  • We know that we already checked whether the width of the integer fits into the significand earlier in the function, and if we reached this point in the function that comparison was false;
  • So if we reach this point in the function with unknown sign, we can immediately return nullptr.

We can add this line after the definitions of KnownNegative and PossiblyNegative, and remove the respective else blocks later in the function:

if (PossiblyNegative && !KnownNegative)
  // If the sign of the number is unknown, the bit range will be the width of the integer type,
  // and we already checked whether that will fit into the fp type earlier in the function, so
  // we can exit early.
  return nullptr;

I'll do this in the next revision.

aarzee added inline comments.Jul 24 2016, 4:39 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1508

Is this explicitly defined somewhere?

eli.friedman added inline comments.Jul 24 2016, 4:43 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

I think you can still optimize based on trailing zero bits even if the sign isn't known.

1508

http://llvm.org/docs/LangRef.html#fptosi-to-instruction . "If the value cannot fit in ty2, the results are undefined."

aarzee added inline comments.Jul 24 2016, 5:41 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

Again, you're right. I wasn't thinking about the least significant possibly set bit. How about this:

// If the number is known to be negative, the highest possibly set bit
// is the lowest bit not known to be one.
// Otherwise, it's the highest bit not known to be zero.

if (KnownNegative)
  MostSignificantPossiblySetBit = int(KnownOne.countTrailingOnes()); 
else
  MostSignificantPossiblySetBit = int(BitWidth - KnownZero.countLeadingOnes()) - 1;
1508

FWIW that's for fptosi, not sitofp, but under sitofp I see this: "If the value cannot fit in the floating point value, the results are undefined."
Thank you. I'll remove that check.

eli.friedman added inline comments.Jul 24 2016, 6:29 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

I assume you meant to use countLeadingOnes for both? Looks roughly right.

aarzee added inline comments.Jul 25 2016, 10:38 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

Nope. Lowest bit not known to be one is KnownOne.countTrailingOnes, and highest bit not known to be zero is BitWidth - KnownZero.countLeadingOnes() - 1.

eli.friedman added inline comments.Jul 25 2016, 10:52 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

What do you need the lowest bit not known to be one for? If the input is known to be an even number, that will always be zero. (If your testing script didn't catch this, you need a more thorough testing script.)

aarzee added inline comments.Jul 25 2016, 11:29 AM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

You're right again. This is pretty intellectually trying for me, so apologies for those errors.

The problem with the testing script is that it is currently exhaustive and runs in exponential time, so running it with floats and 32-bit integers is impractical, and I have no hardware that supports half-precision.

eli.friedman added inline comments.Jul 25 2016, 12:17 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

Can you use some sort of software half-precision floating-point library? numpy has a "float16" type which should support the relevant conversion operations.

I would post the latest diff but the place I'm staying at has no Internet currently.

aarzee added inline comments.Jul 27 2016, 5:07 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

I've tried it and it's still impractical at ~43 million tests.

eli.friedman added inline comments.Jul 27 2016, 6:08 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

Brute-forcing 43 million tests in an hour on a single-core 2GHz desktop computer comes out to around 200,000 cycles per test. If you're trying to run opt -instcombine 43 million times, that might be a problem, but you should be able to easily fit into a 200,000 cycle budget if you copy-paste the computation from this patch into a C++ test harness. (That wouldn't be a complete end-to-end test, but it would be enough to show the tricky bits work correctly.)

aarzee added inline comments.Jul 29 2016, 12:05 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1497

I managed to pull out the relevant code and call it from Python, and it is significantly faster. So now running 43m tests shouldn't be an issue. And indeed, the lack of a specific condition for unknown sign led to issues. So I added a condition that if the sign is unknown, the highest possibly set bit is the bit to the right of the sign. Small caveat here: if the sign bit is unknown and the rest of the bits are all known zero, the highest possibly set bit will be one to the right of the sign bit, while the lowest possibly set bit is the sign bit, leading to a -1 range. It's not necessary to correct this to 1, because 1 will always fit into any significand, and -1 is less than any significand.

aarzee updated this revision to Diff 66232.Jul 30 2016, 5:17 PM

Additional fixes and comments.

I have verified the accuracy of this version of the code with a testing script that I will attach here.

Attached are a Python test script (best run in PyPy) and C++ testing program that work together to exhaustively verify this code for a given floating point format. See the comments in either file.

Overall, this is looking much better; simple is good. :)

lib/Transforms/InstCombine/InstCombineCasts.cpp
1483

Leftover comment?

1507

Doesn't the + 1 here cancel out the -1 in the setting of MostSignificantPossiblySetBit? That would make the math a bit more straightforward.

test/Transforms/InstCombine/sitofp.ll
272

Did you find any interesting new cases worth having a separate regression test?

aarzee added inline comments.Jul 30 2016, 5:51 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1483

I got rid of that in my local copy but I didn't update my diff. Thanks for catching that.

1507

The reasoning behind this is that I wanted the values for both least and most significant possibly set bit to be zero indexed; I thought it might be confusing if one was and the other wasn't.

test/Transforms/InstCombine/sitofp.ll
272

Yes, I will add some tests with unknown sign.

aarzee updated this revision to Diff 66236.Jul 30 2016, 7:06 PM

Additional comments and tests for unknown sign.

eli.friedman added inline comments.Jul 30 2016, 7:53 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1507

Are you sure subtracting one actually makes the most significant bit zero-indexed? Normally, finding the distance between two array indexes is just simple subtraction; the extra "+1" looks weird. It's also kind of suspicious that your "zero-based" indexing can return a negative number (as an extreme case, if a 16-bit integer is all ones, your code says the most significant possibly set bit has index -1).

test/Transforms/InstCombine/sitofp.ll
278

Maybe also add a corresponding test which doesn't fold?

aarzee added inline comments.Jul 31 2016, 2:35 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1507

I'm going to take a look at the actual behavior, I think the least significant possibly set bit might be one-indexed; in any case something is behaving differently than I intended.

WRT the all-ones case, if a number has no unknown bits then it is by nature a constant, and would be handled by ConstProp, right?

eli.friedman added inline comments.Jul 31 2016, 3:03 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1507

Yes, in the all-ones case it would get turned into a constant; not sure off the top of my head if that's guaranteed to happen before this code runs, though.

aarzee added inline comments.Jul 31 2016, 3:35 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1507

After checking I remembered the reason for the +1: what I'm calling BitRange is actually the width; for example, the width between the 1s in 0b101 would be 3. Maybe width isn't the best word. Whatever the case, this is the reason.

eli.friedman added inline comments.Jul 31 2016, 3:50 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1507

Oh, hmm... not how I would think of it.

b|  1  |  0  |  1  |
 3     2     1     0
 high              low

So the high index is 3, the low index is zero, the significant bits are the bits inside the range, and the length of the range is the high index minus the low index. This is how iterators and ranges in C++ work.

aarzee added inline comments.Jul 31 2016, 4:41 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
1507

Right, I hadn't thought of it that way; that is how ranges are expressed in most languages I work with. I'll make the change.

aarzee updated this revision to Diff 66257.Jul 31 2016, 4:51 PM

Modified most and least significant possibly set bit values to act more like range bounds; modified and added unknown sign tests.

Ping.

I'd like to know if it's possible for a value to be known to be -1 without being a constant; currently the code would not act correctly if this were the case because I made the assumption that it would be a constant and that the code wouldn't be run.