This is an archive of the discontinued LLVM Phabricator instance.

Saturating float to int casts.
ClosedPublic

Authored by bjope on Nov 20 2018, 3:49 AM.

Details

Summary

This is the first part split off from D54696.


This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ.

The intrinsics have overloaded source and result type and support vector operands:

i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc

On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is an integer constant specifying the scalar saturation width. The idea here is that initially the second operand and the scalar width of the result type are the same, but they may change during type legalization. For example:

i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, 19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, 19

I went for this approach, because saturated conversion does not compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like:

f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something like this instead:

i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

Initial tests are for AArch64, x86_64 and ARM. This exercises all of the scalar and vector legalization. ARM is included to test float softening.

Original patch by @nikic.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nikic added a comment.Nov 20 2018, 5:52 AM

One thing I'm uncertain about here are the operands of the FP_TO_XINT_SAT opcode. There are basically three ways I could see this being represented, with a post-legalization saturation to v4i19:

  1. v4i32 = fp_to_xint_sat f, VT:v4i19
  2. v4i32 = fp_to_xint_sat f, VT:i19
  3. v4i32 = fp_to_xint_sat f, 19

That is, either the second operand is the type to which we saturate (possibly a vector), the scalarization thereof, or the scalar bitwidth.

The current implementation is the first and is modeled after what sign_extend_inreg does. The disadvantage is that the argument requires special handling, in particular during vector legalizations. It might be possible to reuse a bit more code in that area if it is stored as the scalarized type. On the other hand that feels somewhat asymmetric.

The last variant is probably not great because it does not make it obvious that the saturation width must be static.

t.p.northover added inline comments.
docs/LangRef.rst
13105 ↗(On Diff #174743)

What's going on here? All the other intrinsics I know specify the return type first in the prototype. I didn't even think it was something you could override.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2524 ↗(On Diff #174743)

Maybe put the normal assertion in here?

assert(NewOutTy.isInteger() && "Ran out of possibilities!");
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5686 ↗(On Diff #174743)

I was expecting this tag to be the scalar type in the vector situation. Does something get neater this way round?

The current implementation is the first and is modeled after what sign_extend_inreg does.

That's pretty persuasive. Though possibly something we could fix in sign-extend if enough people think the scalar is better.

nikic updated this revision to Diff 174769.Nov 20 2018, 7:01 AM
nikic edited the summary of this revision. (Show Details)

Put result type first in the type suffix of the intrinsics. Add an assertion.

nikic marked 2 inline comments as done.Nov 20 2018, 7:09 AM
nikic added inline comments.
docs/LangRef.rst
13105 ↗(On Diff #174743)

You're absolutely right, I've flipped the order now, here and in tests. It looks like LLVM is very forgiving of incorrect intrinsic name suffixes and just normalizes them to be correct.

nikic updated this revision to Diff 175763.Nov 28 2018, 1:56 PM
nikic marked 2 inline comments as done.

Rebase and fix some formatting.

nikic added a comment.Dec 4 2018, 6:27 AM

Ping. Would appreciate some feedback, especially regarding the structure of the SelectionDAG nodes.

nikic updated this revision to Diff 177839.Dec 12 2018, 4:19 AM
nikic edited the summary of this revision. (Show Details)

Add X86 tests.

I think the consistency argument is good, so I'm in favour of the structure you chose now. My main issue at the moment is the tests: hard-coding register usage leads to very fragile tests.

It might be better to split the test files into fptosi-sat-*.ll and fptoui-sat-*.ll

test/CodeGen/X86/fptoi-sat-scalar.ll
2 ↗(On Diff #177839)

Please add i686 coverage:

; RUN: llc < %s -mtriple=i686-linux | FileCheck %s --check-prefix=X86,X86-X87
; RUN: llc < %s -mtriple=i686-linux -mattr=+sse2 | FileCheck %s --check-prefix=X86,X86-SSE
; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
nikic updated this revision to Diff 178205.Dec 14 2018, 2:13 AM

Add i686 tests, split test files for fptoui and fptosi.

I had to comment out a few more tests on the X86 side, because they would need libcall legalizations on i686, which are not implemented in this patch.

nikic added a comment.Dec 21 2018, 7:33 AM

I think the consistency argument is good, so I'm in favour of the structure you chose now. My main issue at the moment is the tests: hard-coding register usage leads to very fragile tests.

Is there any way to avoid hardcoded register usage in an automated manner? With vector coverage, this is probably going to need about 15k lines of tests just for two platforms, and they're going to change quite a bit until codegen is finalized. I don't think it's productive to write and rewrite those by hand.

I think the consistency argument is good, so I'm in favour of the structure you chose now. My main issue at the moment is the tests: hard-coding register usage leads to very fragile tests.

Is there any way to avoid hardcoded register usage in an automated manner?

From previous discussions, the hardcoded/spelled-out register names are intentional.

With vector coverage, this is probably going to need about 15k lines of tests just for two platforms, and they're going to change quite a bit until codegen is finalized. I don't think it's productive to write and rewrite those by hand.

+1, update_llc_test_checks.py is the standard tool for the task, and is the only sane way to have meaningful, reasonable test coverage without going insane from having to write CHECK lines by hand.

arsenm added a subscriber: arsenm.Feb 8 2019, 11:15 AM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
961–963 ↗(On Diff #179284)

I'm pretty sure this will warn. There's no point in doing this anyway since you'll get the appropriate error if you just leave this out

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2358 ↗(On Diff #179284)

Ditto

nikic updated this revision to Diff 186017.Feb 8 2019, 11:46 AM

Remove unimplemented legalization stubs.

nikic updated this revision to Diff 186541.Feb 12 2019, 2:03 PM
nikic marked 2 inline comments as done.

Add nounwind to avoid CFI.

aykevl added a subscriber: aykevl.Apr 12 2019, 6:00 AM
lzutao added a subscriber: lzutao.Jun 26 2019, 8:39 PM

Adding more potential reviewers

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2019, 12:00 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
fhahn added a subscriber: fhahn.Jun 28 2019, 11:37 AM
scanon requested changes to this revision.Jul 22 2019, 1:27 PM
scanon added a subscriber: scanon.

Reviewers: what do we need to get this across the finish line?

llvm/docs/LangRef.rst
16251

typo: "the same"

16302

Ditto.

llvm/include/llvm/CodeGen/ISDOpcodes.h
526

Can we rewrite this to be a bit more clear? I find something like the following easier to understand:

If the FP value is NaN, the result is 0. Otherwise it is clamped to the range representable by the type of operand 1, and the fractional part is discarded.

This is partially a matter of taste, of course.

This revision now requires changes to proceed.Jul 22 2019, 1:27 PM
nikic updated this revision to Diff 227604.Nov 3 2019, 3:05 AM
nikic marked 3 inline comments as done.

Rebase.

aykevl added a comment.EditedDec 3 2019, 11:36 AM

For what it's worth, RISC-V implements semantics close to but not exactly like ARM, with NaN being treated as if it was +Infinity.
Source: https://content.riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf#page=63

(Also, I need these intrinsics as well for a Go compiler).

@RKSimon @scanon @nikic

These intrinsics are necessary to implement Nontrapping Float To Int Conversions, which have now been merged with the mainline WebAssembly spec. Without these intrinsics, a significant amount of code is required to get them to function (using Rust's implementation as an example, amount of code varies by language). Since the WebAssembly standard now requires these operations, I would like to know the status of this PR and how close it is to being merged.

ebevhan added a subscriber: ebevhan.Aug 7 2020, 7:41 AM

These intrinsics would likely also be useful for implementing Embedded-C's floating point to fixed-point conversion in Clang.

nikic abandoned this revision.Aug 7 2020, 2:41 PM

I've tried to get this in for a long time, but I don't think there's enough interest in this functionality. Rust ended up doing the saturation itself, even though that produces code that is much worse than a custom lowered intrinsic. At some point you have to cut your losses :)

aykevl added a comment.Aug 7 2020, 6:03 PM

That's unfortunate. I was looking forward to this to use in TinyGo. For TinyGo it doesn't matter what the exact behavior is, just that it is something sensible (and not UB). Saturating float operations would be great.

Also, wouldn't Rust start using this once it becomes available in LLVM? I remember from the GitHub issue that performance was a concern.

It seems like there are quite a few people who would like to use this intrinsic, though. The only lack of interest seems to be in reviewing.

I was planning on submitting something similar for the fixed-point support, but now that I know this patch exists, I feel like helping this move forward is the better option.

It seems like there are quite a few people who would like to use this intrinsic, though. The only lack of interest seems to be in reviewing.
I was planning on submitting something similar for the fixed-point support, but now that I know this patch exists, I feel like helping this move forward is the better option.

That seems like the right way to proceed (commandeer/reopen) - there was definitely interest from the early reviewer comments, but maybe they're tied up with other stuff, so just add more people to the list.
Personally, I didn't look at this closely because I had higher priority patches and I thought other reviewers would continue. I don't think there are any fundamental blockers. Sometimes you just have to keep pinging. :)

It seems like there are quite a few people who would like to use this intrinsic, though. The only lack of interest seems to be in reviewing.
I was planning on submitting something similar for the fixed-point support, but now that I know this patch exists, I feel like helping this move forward is the better option.

That seems like the right way to proceed (commandeer/reopen) - there was definitely interest from the early reviewer comments, but maybe they're tied up with other stuff, so just add more people to the list.
Personally, I didn't look at this closely because I had higher priority patches and I thought other reviewers would continue. I don't think there are any fundamental blockers. Sometimes you just have to keep pinging. :)

Well, I suspect that the GIsel people will want a GIsel lowering of the intrinsics to match the ISelDAG ones, but other than that the patch looks good from what I can see.

I will probably commandeer this soon and try for another push at getting it in if @nikic doesn't mind.

Ka-Ka added a subscriber: Ka-Ka.Aug 12 2020, 3:25 AM
ebevhan commandeered this revision.Aug 14 2020, 8:50 AM
ebevhan added a reviewer: nikic.

Thank you for picking this up. Some pointers from my side:

  • I would recommend against including GlobalISel support in the initial implementation. Unless trivial, SDAG and GlobalISel changes should never be made in the same patch, as the reviewers for these parts of the codebase are essentially disjoint.
  • You may want to replace the VT specifying the saturation width with a simple constant integer operand. This is the approach that the fixed-point ISD opcodes went with, and it should make legalization a bit simpler.
  • I originally restricted this patch to a minimum viable implementation, in the hope that it would make review easier and get this landed quickly. Given how things turned out, it might make sense to put the soft float and vector legalization (which are part of D54696) back into this patch, so that legalization support is complete and this can stand on its own.
  • For soft float legalization, I would switch from libcall legalization to expanding and recursively legalizing. Adding so many new compiler-rt functions is an unnecessary burden.
  • Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

The fptosi instruction doesn't have side-effects. llvm.experimental.constrained.fptosi can raise floating-point exceptions. I'm not sure anyone has looked at actually trapping.

Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

The fptosi instruction doesn't have side-effects. llvm.experimental.constrained.fptosi can raise floating-point exceptions. I'm not sure anyone has looked at actually trapping.

Thanks for the clarification. The changes I vaguely remembered here were apparently D53794 and D67105, which were indeed related to FPEs. I presume it is not a problem if saturating fptoi causes FPEs as part of normal operation?

Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.

The fptosi instruction doesn't have side-effects. llvm.experimental.constrained.fptosi can raise floating-point exceptions. I'm not sure anyone has looked at actually trapping.

Thanks for the clarification. The changes I vaguely remembered here were apparently D53794 and D67105, which were indeed related to FPEs. I presume it is not a problem if saturating fptoi causes FPEs as part of normal operation?

Right, you can ignore the possibility of floating-point exceptions. (We might at some point want to implement llvm.experimental.constrained.fptosi.sat, but that would be a separate intrinsic.)

ebevhan reclaimed this revision.Aug 17 2020, 1:00 AM

Thank you for picking this up. Some pointers from my side:

  • I would recommend against including GlobalISel support in the initial implementation. Unless trivial, SDAG and GlobalISel changes should never be made in the same patch, as the reviewers for these parts of the codebase are essentially disjoint.

All right.

  • You may want to replace the VT specifying the saturation width with a simple constant integer operand. This is the approach that the fixed-point ISD opcodes went with, and it should make legalization a bit simpler.

Well, the value on the fixed point nodes isn't for saturation, and I think there are actually a few instances in which the fixed point nodes (such as division) could have used a saturation width for simplifying legalization. I didn't add it, though. It's very convenient for these nodes.

  • I originally restricted this patch to a minimum viable implementation, in the hope that it would make review easier and get this landed quickly. Given how things turned out, it might make sense to put the soft float and vector legalization (which are part of D54696) back into this patch, so that legalization support is complete and this can stand on its own.
  • For soft float legalization, I would switch from libcall legalization to expanding and recursively legalizing. Adding so many new compiler-rt functions is an unnecessary burden.

Hm, okay. I actually did originally fold both of those into this patch but then took it out since it wasn't originally part of it.

I agree on the libcalls. I was also planning on doing the expansion directly instead of going for libcalls. In most cases it shouldn't be that much worse than if we had a direct libcall. The worst case is 4 libcalls, but I think that on targets that do not support floating point, it should be expected to be slow anyway.

ebevhan updated this revision to Diff 285971.Aug 17 2020, 5:13 AM
ebevhan edited the summary of this revision. (Show Details)

Rebased and addressed comments.

  • Included all vector, result expansion and float softening in the patch. The latter two are done by simply performing the standard expansion. In the worst case, this can result in up to 4 libcalls, but I think this is preferable to adding a massive number of new ones.
  • Replaced the VT operand with an integer constant. For vectors, this is the scalar saturation width.
ebevhan added inline comments.Aug 17 2020, 5:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8104

I also added this to solve issues in expanding f16 FP_TO_XINT nodes to libcalls. Libcalls with a source of f16 don't seem to exist, so we can't emit such nodes if we will end up making libcalls of them.

Maybe this should check operation and type legality, though.

ebevhan updated this revision to Diff 285990.Aug 17 2020, 5:59 AM

Removed stray merge mistake.

nikic added inline comments.Aug 17 2020, 12:55 PM
llvm/include/llvm/IR/Intrinsics.td
1274

These should also be IntrWillReturn, which has been added in the meantime.

ebevhan updated this revision to Diff 286254.Aug 18 2020, 5:04 AM

Added IntrWillReturn.

ebevhan updated this revision to Diff 286279.Aug 18 2020, 7:17 AM
ebevhan retitled this revision from Saturating float to int casts: Basics [1/n] to Saturating float to int casts..
ebevhan edited the summary of this revision. (Show Details)

Updated summary on the second operand of the node.

ebevhan updated this revision to Diff 286286.Aug 18 2020, 8:23 AM

Fixed minor formatting nits.

Adding some more reviewers to get some more eyes on this.

As @spatel mentioned this is probably pretty close to being finished, so it would be great to get this final stretch over and done with.

RKSimon added inline comments.Aug 18 2020, 9:30 AM
llvm/docs/LangRef.rst
16259

Add (this includes negative infinity) ?

ebevhan updated this revision to Diff 286505.Aug 19 2020, 1:47 AM

Addressed comment.

Missing globalisel support

Missing globalisel support

I did mention GIsel before I commandeered the patch, but @nikic thought it was better to do GIsel in a separate step rather than integrate it into this patch. Is that acceptable?

I have yet to start on the GIsel implementation, however, so I don't really have anything to show just yet.

I can't competently review LLVM code-gen patches, but I can review intrinsic design. If the intended use case here is fixed point, should there be a scale parameter rather than expecting the caller to scale the input? Or is scaling the input basically the expected implementation on all hardware? Scaling the input can be lossy if we overflow to infinity, so that would be making a tacit assumption here that if that happens, the original value must've been outside the representation range of the integer. For float (and bfloat), the exponent goes up to 127, so you would need to be converting to a pretty large fixed-point format to have problems with lossy overflow. For half, the exponent only goes up to 16, so converting to a 32-bit fixed-point format with a scale could overflow, and you probably won't be able to use these intrinsics.

Please take care of the clang-format lints.

There are some missing optimizations here. It's probably worth teaching LegalizeVectorOps to expand the conversion into vector ops. And for conversions that correspond to a native instruction, there's probably some more efficient code sequence. But those can be dealt with in followups.

Making the saturation type an argument seemed a little strange at first glance, but it seems to work out reasonably well. Maybe could change the argument to be a type, like ISD::SIGN_EXTEND_INREG.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2716

This can't be the only place that uses this loop; can it be refactored?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
167

Do we need a corresponding ScalarizeVecOp_FP_TO_XINT_SAT?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8104

In theory, legalization should ensure we generate appropriate libcalls.

Maybe that doesn't actually happen at the moment, though.

I can't competently review LLVM code-gen patches, but I can review intrinsic design. If the intended use case here is fixed point, should there be a scale parameter rather than expecting the caller to scale the input? Or is scaling the input basically the expected implementation on all hardware? Scaling the input can be lossy if we overflow to infinity, so that would be making a tacit assumption here that if that happens, the original value must've been outside the representation range of the integer. For float (and bfloat), the exponent goes up to 127, so you would need to be converting to a pretty large fixed-point format to have problems with lossy overflow. For half, the exponent only goes up to 16, so converting to a 32-bit fixed-point format with a scale could overflow, and you probably won't be able to use these intrinsics.

This is a good point that I hadn't considered. I left a comment in D86632.

I was originally planning on adding such an intrinsic (with a scaling factor) but dropped the idea when I found this patch. You are right in that the small exponent of half precision is problematic, though. Not sure what to do about that.

Please take care of the clang-format lints.

I presume it's fine to only fix the ones that don't conform to some existing unconventional code layout?

There are some missing optimizations here. It's probably worth teaching LegalizeVectorOps to expand the conversion into vector ops. And for conversions that correspond to a native instruction, there's probably some more efficient code sequence. But those can be dealt with in followups.

For improved native support, I've pulled out D86079 and D86078 from the original patch.

By expand into vector ops, you mean to avoid breaking up the vector operation into scalar operations and then legalizing those, and instead expanding the vector operation directly to the corresponding operations? What would be the criteria for doing this rather than splitting it up first?

Making the saturation type an argument seemed a little strange at first glance, but it seems to work out reasonably well. Maybe could change the argument to be a type, like ISD::SIGN_EXTEND_INREG.

It was originally a type operand similar to SIGN_EXTEND_INREG, but the suggestion was to replace it with a constant operand instead to make things simpler. The only thing we care about is the saturation bit width.

It might have been fine to make it a scalar type operand even for vector types, though.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2716

There are similar loops in some of the other PromoteLegal* functions. The loops in those functions could probably be refactored - they look compatible - but this loop isn't quite the same since the signed nodes can't be used to implement the unsigned ones.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8104

There's some conditions in the libcall emission which do cause promotion of the source before emission, but it doesn't happen in every case (even though no libcalls from half->int seem to exist in the first place).

I was originally planning on adding such an intrinsic (with a scaling factor) but dropped the idea when I found this patch. You are right in that the small exponent of half precision is problematic, though. Not sure what to do about that.

It'd be a problem with float and bfloat, too, if we ever have to support any 128-bit fixed-point formats. I think planning around this is probably the better approach.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

If we're going down that path, maybe simpler to just mess with the float scaling factor. If a float is close to the largest finite float, in IEEE formats, the fractional part must be zero. So we can do the float-to-int conversion on the unscaled float, and scale the resulting integer. Probably faster than trying to explicitly destructure a float.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

If we're going down that path, maybe simpler to just mess with the float scaling factor. If a float is close to the largest finite float, in IEEE formats, the fractional part must be zero. So we can do the float-to-int conversion on the unscaled float, and scale the resulting integer. Probably faster than trying to explicitly destructure a float.

Ah, good point.

I presume it's fine to only fix the ones that don't conform to some existing unconventional code layout?

Yes.

By expand into vector ops, you mean to avoid breaking up the vector operation into scalar operations and then legalizing those, and instead expanding the vector operation directly to the corresponding operations? What would be the criteria for doing this rather than splitting it up first?

Yes. Probably sufficient to guard this with a check that the vector operations in question are legal.

It was originally a type operand similar to SIGN_EXTEND_INREG, but the suggestion was to replace it with a constant operand instead to make things simpler. The only thing we care about is the saturation bit width.

Not a big deal either way.

It might have been fine to make it a scalar type operand even for vector types, though.

This is how SIGN_EXTEND_INREG works, I think?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2716

Okay.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8104

Maybe we have the relevant code in type legalization, but not LegalizeDAG, or something like that?

ebevhan updated this revision to Diff 288960.Aug 31 2020, 9:08 AM

Addressed comments.

At least for the IEEE formats, you should be able to just destructure the bit-pattern of the float, right? Normalize and do some extends and shifts.

If we're going down that path, maybe simpler to just mess with the float scaling factor. If a float is close to the largest finite float, in IEEE formats, the fractional part must be zero. So we can do the float-to-int conversion on the unscaled float, and scale the resulting integer. Probably faster than trying to explicitly destructure a float.

I'm not sure how this would work. The float does not have to be close to its finite limit in order to contain a valid and representable fixed-point value.

I'm trying to determine if the only criteria for not being able to use a fmul+fptoint-pattern for fixed-point conversion is whether or not the fixed-point scaling factor fits in the exponent. It should be possible to decompose the operation into a bitcast+mask+some shifting in the case where we can't use fmul+fptoint.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1360

I'm unsure if I've done the right thing here. We don't even get to this function for any of the test cases.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8104

I think the spot that I had issues with was during integer result expansion, actually. ExpandIntRes_FP_TO_SINT.

If the integer result type must be expanded, but the source type is f16 (and legal), it will try to emit a libcall which does not exist. It was probably f16 -> i128 in my case.

There are cases for promotion and softening in the function, but those only take effect if the float operand must be promoted or softened, which it doesn't have to be if the type is legal.

ebevhan updated this revision to Diff 290262.Sep 7 2020, 5:42 AM

Rebased.

I don't know why the premerge tests are failing. I don't see those failures locally, so it's rather hard to debug it, especially since the premerge build lacks proper debug info.

It's probably related to the vector expansion I added, but I don't see why it would be failing on Phab and not on my machine.

ebevhan updated this revision to Diff 290676.Sep 9 2020, 2:19 AM

Fixed failing test cases.

ebevhan updated this revision to Diff 290714.Sep 9 2020, 4:54 AM

Removed vector expansion.

Doing this makes the improved expansion for AArch64 in D86078 quite bad.

ebevhan updated this revision to Diff 291573.Sep 14 2020, 7:35 AM

Fix clang-format warnings.

Any more feedback on this?

Another gentle ping.

Design seems reasonable to me, but I can't review LLVM CodeGen patches.

It doesn't seem like anyone has any more immediate feedback. @efriedma, have all of your review comments been adequately addressed?

Are there any outstanding issues left or is this perhaps good enough to be submitted?

RKSimon added inline comments.Oct 27 2020, 3:07 AM
llvm/docs/LangRef.rst
16263

representable unsigned integer

16313

representable signed integer

16316

representable signed integer

llvm/include/llvm/CodeGen/ISDOpcodes.h
747

Maybe explicitly say that the op1 width may be smaller or equal (but not larger) to the result width

bjope updated this revision to Diff 301861.Oct 30 2020, 5:12 AM

Address review comments from RKSimon.

bjope commandeered this revision.Oct 30 2020, 5:16 AM
bjope added a reviewer: ebevhan.

Big thanks to @ebevhan for doing lots of work moving forward with the fixed-point number (embedded-c) support.

Although, Bevin currently got some other assignments, so I'll try to help out by commandeering this patch.

bjope marked 5 inline comments as done.Oct 30 2020, 5:17 AM
bjope updated this revision to Diff 304630.Nov 11 2020, 1:16 PM

Add back test cases (they were accidentally removed when commandeering this patch and rebasing earlier).

Btw, I think all review comments have been addressed. So it would be nice to know what do to next, e.g. to be able to move forward with the rest of the patches in the stack depending on this one.

Another little ping.

rkruppe removed a subscriber: rkruppe.Nov 26 2020, 5:31 AM
bjope added a comment.Dec 14 2020, 4:19 AM

Gentle ping again.

I've basically just inherited this from @ebevhan , who continued the initial work done by @nikic since there were some interest in these intrinsics on llvm-dev.

Afaik this patch alone is just addition of new intrinsics and simple lowering of those. So apart from adding a bunch of code and tests it shouldn't impact the existing codegen anyway. This patch is however currently a blocker for continuing the work with Embedded-C support when it comes to some fixed<->floating point conversions. So it would be nice to understand what I need to do to move forward with this patch. Any more concerns? Is it good to go? Do we need to find other ways to move forward with the Embedded-C support (such as doing lots of expansion in clang instead of using intrinsics, although that seems wrong if there were other potential use cases for these intrinsics)?

I'm not entirely sure what you're still waiting for, really. It's a big patch with a lot of diffuse responsibilities, but you've gotten sign-off from individual people across at least most of it. Do you feel like there's something significant that hasn't been reviewed?

I'm not entirely sure what you're still waiting for, really. It's a big patch with a lot of diffuse responsibilities, but you've gotten sign-off from individual people across at least most of it. Do you feel like there's something significant that hasn't been reviewed?

Well, I've just assumed that we should wait for an LGTM after the last fixups of earlier review comments.

But as you say, there haven't been any objections to adding the intrinsics, and there are no outstanding complaints. So maybe it is safe to assume that no one will object if I simply land this.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 18 2020, 2:10 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
nikic added a comment.Jan 10 2021, 4:44 AM

@bjope There are two more revisions based on this one, D86079 implements improved X86 lowerings and is already accepted, and D86078 implements improved AArch64 lowerings. Do you plan to land these as well?

bjope added a comment.Jan 11 2021, 6:27 AM

@bjope There are two more revisions based on this one, D86079 implements improved X86 lowerings and is already accepted, and D86078 implements improved AArch64 lowerings. Do you plan to land these as well?

My plan has been to move forward with D86632 (which is ready to land and makes use of the new intrisics for conversions between fixed point and floating point).
And then also to land the patch that improves lowering for X86 (D86079) as it has been accepted.

I haven't looked closely at the patch that improves lowering for AArch64 (D86078). I need to figure out if there are any review comments that haven't been addressed yet. If it ends up being within my comfort zone I'll try to move forward with that patch as well :-) I'll at least make sure there is some kind of status update to that patch as I don't think we can't rely on Bevin picking it up again soon.