Page MenuHomePhabricator

Saturating float to int casts.
Needs ReviewPublic

Authored by ebevhan on Nov 20 2018, 3:49 AM.

Details

Summary

This is the first part split off from D54696.


This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ.

The intrinsics have overloaded source and result type and support vector operands:

i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc

On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example:

i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19

I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like:

f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something like this instead:

i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening.

Original patch by @nikic.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nikic marked 3 inline comments as done.Nov 3 2019, 3:05 AM

Rebase.

aykevl added a comment.EditedDec 3 2019, 11:36 AM

For what it's worth, RISC-V implements semantics close to but not exactly like ARM, with NaN being treated as if it was +Infinity.
Source: https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf#page=63

(Also, I need these intrinsics as well for a Go compiler).

@RKSimon @scanon @nikic

These intrinsics are necessary to implement Nontrapping Float To Int Conversions, which have now been merged with the mainline WebAssembly spec. Without these intrinsics, a significant amount of code is required to get them to function (using Rust's implementation as an example, amount of code varies by language). Since the WebAssembly standard now requires these operations, I would like to know the status of this PR and how close it is to being merged.

ebevhan added a subscriber: ebevhan.Aug 7 2020, 7:41 AM

These intrinsics would likely also be useful for implementing Embedded-C's floating point to fixed-point conversion in Clang.

nikic abandoned this revision.Aug 7 2020, 2:41 PM

I've tried to get this in for a long time, but I don't think there's enough interest in this functionality. Rust ended up doing the saturation itself, even though that produces code that is much worse than a custom lowered intrinsic. At some point you have to cut your losses :)

aykevl added a comment.Aug 7 2020, 6:03 PM

That's unfortunate. I was looking forward to this to use in TinyGo. For TinyGo it doesn't matter what the exact behavior is, just that it is something sensible (and not UB). Saturating float operations would be great.

Also, wouldn't Rust start using this once it becomes available in LLVM? I remember from the GitHub issue that performance was a concern.

It seems like there are quite a few people who would like to use this intrinsic, though. The only lack of interest seems to be in reviewing.

I was planning on submitting something similar for the fixed-point support, but now that I know this patch exists, I feel like helping this move forward is the better option.

It seems like there are quite a few people who would like to use this intrinsic, though. The only lack of interest seems to be in reviewing.
I was planning on submitting something similar for the fixed-point support, but now that I know this patch exists, I feel like helping this move forward is the better option.

That seems like the right way to proceed (commandeer/reopen) - there was definitely interest from the early reviewer comments, but maybe they're tied up with other stuff, so just add more people to the list.
Personally, I didn't look at this closely because I had higher priority patches and I thought other reviewers would continue. I don't think there are any fundamental blockers. Sometimes you just have to keep pinging. :)

It seems like there are quite a few people who would like to use this intrinsic, though. The only lack of interest seems to be in reviewing.
I was planning on submitting something similar for the fixed-point support, but now that I know this patch exists, I feel like helping this move forward is the better option.

That seems like the right way to proceed (commandeer/reopen) - there was definitely interest from the early reviewer comments, but maybe they're tied up with other stuff, so just add more people to the list.
Personally, I didn't look at this closely because I had higher priority patches and I thought other reviewers would continue. I don't think there are any fundamental blockers. Sometimes you just have to keep pinging. :)

Well, I suspect that the GIsel people will want a GIsel lowering of the intrinsics to match the ISelDAG ones, but other than that the patch looks good from what I can see.

I will probably commandeer this soon and try for another push at getting it in if @nikic doesn't mind.

Ka-Ka added a subscriber: Ka-Ka.Aug 12 2020, 3:25 AM
ebevhan commandeered this revision.Aug 14 2020, 8:50 AM
ebevhan added a reviewer: nikic.

Thank you for picking this up. Some pointers from my side:

  • I would recommend against including GlobalISel support in the initial implementation. Unless trivial, SDAG and GlobalISel changes should never be made in the same patch, as the reviewers for these parts of the codebase are essentially disjoint.
  • You may want to replace the VT specifying the saturation width with a simple constant integer operand. This is the approach that the fixed-point ISD opcodes went with, and it should make legalization a bit simpler.
  • I originally restricted this patch to a minimum viable implementation, in the hope that it would make review easier and get this landed quickly. Given how things turned out, it might make sense to put the soft float and vector legalization (which are part of D54696) back into this patch, so that legalization support is complete and this can stand on its own.
  • For soft float legalization, I would switch from libcall legalization to expanding and recursively legalizing. Adding so many new compiler-rt functions is an unnecessary burden.
  • Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

The fptosi instruction doesn't have side-effects. llvm.experimental.constrained.fptosi can raise floating-point exceptions. I'm not sure anyone has looked at actually trapping.

Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

The fptosi instruction doesn't have side-effects. llvm.experimental.constrained.fptosi can raise floating-point exceptions. I'm not sure anyone has looked at actually trapping.

Thanks for the clarification. The changes I vaguely remembered here were apparently D53794 and D67105, which were indeed related to FPEs. I presume it is not a problem if saturating fptoi causes FPEs as part of normal operation?

Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

The fptosi instruction doesn't have side-effects. llvm.experimental.constrained.fptosi can raise floating-point exceptions. I'm not sure anyone has looked at actually trapping.

Thanks for the clarification. The changes I vaguely remembered here were apparently D53794 and D67105, which were indeed related to FPEs. I presume it is not a problem if saturating fptoi causes FPEs as part of normal operation?

Right, you can ignore the possibility of floating-point exceptions. (We might at some point want to implement llvm.experimental.constrained.fptosi.sat, but that would be a separate intrinsic.)

ebevhan reclaimed this revision.Aug 17 2020, 1:00 AM

Thank you for picking this up. Some pointers from my side:

  • I would recommend against including GlobalISel support in the initial implementation. Unless trivial, SDAG and GlobalISel changes should never be made in the same patch, as the reviewers for these parts of the codebase are essentially disjoint.

All right.

  • You may want to replace the VT specifying the saturation width with a simple constant integer operand. This is the approach that the fixed-point ISD opcodes went with, and it should make legalization a bit simpler.

Well, the value on the fixed point nodes isn't for saturation, and I think there are actually a few instances in which the fixed point nodes (such as division) could have used a saturation width for simplifying legalization. I didn't add it, though. It's very convenient for these nodes.

  • I originally restricted this patch to a minimum viable implementation, in the hope that it would make review easier and get this landed quickly. Given how things turned out, it might make sense to put the soft float and vector legalization (which are part of D54696) back into this patch, so that legalization support is complete and this can stand on its own.
  • For soft float legalization, I would switch from libcall legalization to expanding and recursively legalizing. Adding so many new compiler-rt functions is an unnecessary burden.

Hm, okay. I actually did originally fold both of those into this patch but then took it out since it wasn't originally part of it.

I agree on the libcalls. I was also planning on doing the expansion directly instead of going for libcalls. In most cases it shouldn't be that much worse than if we had a direct libcall. The worst case is 4 libcalls, but I think that on targets that do not support floating point, it should be expected to be slow anyway.

ebevhan updated this revision to Diff 285971.Aug 17 2020, 5:13 AM
ebevhan edited the summary of this revision. (Show Details)

Rebased and addressed comments.

  • Included all vector, result expansion and float softening in the patch. The latter two are done by simply performing the standard expansion. In the worst case, this can result in up to 4 libcalls, but I think this is preferable to adding a massive number of new ones.
  • Replaced the VT operand with an integer constant. For vectors, this is the scalar saturation width.
ebevhan added inline comments.Aug 17 2020, 5:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8047

I also added this to solve issues in expanding f16 FP_TO_XINT nodes to libcalls. Libcalls with a source of f16 don't seem to exist, so we can't emit such nodes if we will end up making libcalls of them.

Maybe this should check operation and type legality, though.

ebevhan updated this revision to Diff 285990.Aug 17 2020, 5:59 AM

Removed stray merge mistake.

nikic added inline comments.Aug 17 2020, 12:55 PM
llvm/include/llvm/IR/Intrinsics.td
1264

These should also be IntrWillReturn, which has been added in the meantime.

ebevhan updated this revision to Diff 286254.Aug 18 2020, 5:04 AM

Added IntrWillReturn.

ebevhan updated this revision to Diff 286279.Aug 18 2020, 7:17 AM
ebevhan retitled this revision from Saturating float to int casts: Basics [1/n] to Saturating float to int casts..
ebevhan edited the summary of this revision. (Show Details)

Updated summary on the second operand of the node.

ebevhan updated this revision to Diff 286286.Aug 18 2020, 8:23 AM

Fixed minor formatting nits.

Adding some more reviewers to get some more eyes on this.

As @spatel mentioned this is probably pretty close to being finished, so it would be great to get this final stretch over and done with.

RKSimon added inline comments.Aug 18 2020, 9:30 AM
llvm/docs/LangRef.rst
16149

Add (this includes negative infinity) ?

ebevhan updated this revision to Diff 286505.Aug 19 2020, 1:47 AM

Addressed comment.

Missing globalisel support

Missing globalisel support

I did mention GIsel before I commandeered the patch, but @nikic thought it was better to do GIsel in a separate step rather than integrate it into this patch. Is that acceptable?

I have yet to start on the GIsel implementation, however, so I don't really have anything to show just yet.

I can't competently review LLVM code-gen patches, but I can review intrinsic design. If the intended use case here is fixed point, should there be a scale parameter rather than expecting the caller to scale the input? Or is scaling the input basically the expected implementation on all hardware? Scaling the input can be lossy if we overflow to infinity, so that would be making a tacit assumption here that if that happens, the original value must've been outside the representation range of the integer. For float (and bfloat), the exponent goes up to 127, so you would need to be converting to a pretty large fixed-point format to have problems with lossy overflow. For half, the exponent only goes up to 16, so converting to a 32-bit fixed-point format with a scale could overflow, and you probably won't be able to use these intrinsics.

Please take care of the clang-format lints.

There are some missing optimizations here. It's probably worth teaching LegalizeVectorOps to expand the conversion into vector ops. And for conversions that correspond to a native instruction, there's probably some more efficient code sequence. But those can be dealt with in followups.

Making the saturation type an argument seemed a little strange at first glance, but it seems to work out reasonably well. Maybe could change the argument to be a type, like ISD::SIGN_EXTEND_INREG.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2684

This can't be the only place that uses this loop; can it be refactored?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
167

Do we need a corresponding ScalarizeVecOp_FP_TO_XINT_SAT?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8047

In theory, legalization should ensure we generate appropriate libcalls.

Maybe that doesn't actually happen at the moment, though.

I can't competently review LLVM code-gen patches, but I can review intrinsic design. If the intended use case here is fixed point, should there be a scale parameter rather than expecting the caller to scale the input? Or is scaling the input basically the expected implementation on all hardware? Scaling the input can be lossy if we overflow to infinity, so that would be making a tacit assumption here that if that happens, the original value must've been outside the representation range of the integer. For float (and bfloat), the exponent goes up to 127, so you would need to be converting to a pretty large fixed-point format to have problems with lossy overflow. For half, the exponent only goes up to 16, so converting to a 32-bit fixed-point format with a scale could overflow, and you probably won't be able to use these intrinsics.

This is a good point that I hadn't considered. I left a comment in D86632.

I was originally planning on adding such an intrinsic (with a scaling factor) but dropped the idea when I found this patch. You are right in that the small exponent of half precision is problematic, though. Not sure what to do about that.

Please take care of the clang-format lints.

I presume it's fine to only fix the ones that don't conform to some existing unconventional code layout?

There are some missing optimizations here. It's probably worth teaching LegalizeVectorOps to expand the conversion into vector ops. And for conversions that correspond to a native instruction, there's probably some more efficient code sequence. But those can be dealt with in followups.

For improved native support, I've pulled out D86079 and D86078 from the original patch.

By expand into vector ops, you mean to avoid breaking up the vector operation into scalar operations and then legalizing those, and instead expanding the vector operation directly to the corresponding operations? What would be the criteria for doing this rather than splitting it up first?

Making the saturation type an argument seemed a little strange at first glance, but it seems to work out reasonably well. Maybe could change the argument to be a type, like ISD::SIGN_EXTEND_INREG.

It was originally a type operand similar to SIGN_EXTEND_INREG, but the suggestion was to replace it with a constant operand instead to make things simpler. The only thing we care about is the saturation bit width.

It might have been fine to make it a scalar type operand even for vector types, though.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2684

There are similar loops in some of the other PromoteLegal* functions. The loops in those functions could probably be refactored - they look compatible - but this loop isn't quite the same since the signed nodes can't be used to implement the unsigned ones.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8047

There's some conditions in the libcall emission which do cause promotion of the source before emission, but it doesn't happen in every case (even though no libcalls from half->int seem to exist in the first place).

I was originally planning on adding such an intrinsic (with a scaling factor) but dropped the idea when I found this patch. You are right in that the small exponent of half precision is problematic, though. Not sure what to do about that.

It'd be a problem with float and bfloat, too, if we ever have to support any 128-bit fixed-point formats. I think planning around this is probably the better approach.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

If we're going down that path, maybe simpler to just mess with the float scaling factor. If a float is close to the largest finite float, in IEEE formats, the fractional part must be zero. So we can do the float-to-int conversion on the unscaled float, and scale the resulting integer. Probably faster than trying to explicitly destructure a float.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

If we're going down that path, maybe simpler to just mess with the float scaling factor. If a float is close to the largest finite float, in IEEE formats, the fractional part must be zero. So we can do the float-to-int conversion on the unscaled float, and scale the resulting integer. Probably faster than trying to explicitly destructure a float.

Ah, good point.

I presume it's fine to only fix the ones that don't conform to some existing unconventional code layout?

Yes.

By expand into vector ops, you mean to avoid breaking up the vector operation into scalar operations and then legalizing those, and instead expanding the vector operation directly to the corresponding operations? What would be the criteria for doing this rather than splitting it up first?

Yes. Probably sufficient to guard this with a check that the vector operations in question are legal.

It was originally a type operand similar to SIGN_EXTEND_INREG, but the suggestion was to replace it with a constant operand instead to make things simpler. The only thing we care about is the saturation bit width.

Not a big deal either way.

It might have been fine to make it a scalar type operand even for vector types, though.

This is how SIGN_EXTEND_INREG works, I think?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2684

Okay.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8047

Maybe we have the relevant code in type legalization, but not LegalizeDAG, or something like that?

ebevhan updated this revision to Diff 288960.Mon, Aug 31, 9:08 AM

Addressed comments.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

If we're going down that path, maybe simpler to just mess with the float scaling factor. If a float is close to the largest finite float, in IEEE formats, the fractional part must be zero. So we can do the float-to-int conversion on the unscaled float, and scale the resulting integer. Probably faster than trying to explicitly destructure a float.

I'm not sure how this would work. The float does not have to be close to its finite limit in order to contain a valid and representable fixed-point value.

I'm trying to determine if the only criteria for not being able to use a fmul+fptoint-pattern for fixed-point conversion is whether or not the fixed-point scaling factor fits in the exponent. It should be possible to decompose the operation into a bitcast+mask+some shifting in the case where we can't use fmul+fptoint.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1356

I'm unsure if I've done the right thing here. We don't even get to this function for any of the test cases.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8047

I think the spot that I had issues with was during integer result expansion, actually. ExpandIntRes_FP_TO_SINT.

If the integer result type must be expanded, but the source type is f16 (and legal), it will try to emit a libcall which does not exist. It was probably f16 -> i128 in my case.

There are cases for promotion and softening in the function, but those only take effect if the float operand must be promoted or softened, which it doesn't have to be if the type is legal.

ebevhan updated this revision to Diff 290262.Mon, Sep 7, 5:42 AM

Rebased.

I don't know why the premerge tests are failing. I don't see those failures locally, so it's rather hard to debug it, especially since the premerge build lacks proper debug info.

It's probably related to the vector expansion I added, but I don't see why it would be failing on Phab and not on my machine.

ebevhan updated this revision to Diff 290676.Wed, Sep 9, 2:19 AM

Fixed failing test cases.

ebevhan updated this revision to Diff 290714.Wed, Sep 9, 4:54 AM

Removed vector expansion.

Doing this makes the improved expansion for AArch64 in D86078 quite bad.

ebevhan updated this revision to Diff 291573.Mon, Sep 14, 7:35 AM

Fix clang-format warnings.

Any more feedback on this?