Page MenuHomePhabricator

Saturating float to int casts: Basics [1/n]
Needs ReviewPublic

Authored by nikic on Nov 20 2018, 3:49 AM.

Details

Summary

This is the first part split off from D54696.


This patch adds support for the fptoui.sat and fptosi.sat intrinsics, which provide basically the same functionality as the existing fptoui and fptosi instructions, but will saturate (or return 0 for NaN) on values unrepresentable in the target type, instead of returning poison. Related mailing list discussion can be found at: https://groups.google.com/d/msg/llvm-dev/cgDFaBmCnDQ/CZAIMj4IBAAJ.

The intrinsics have overloaded source and result type and support vector operands:

i32 @llvm.fptoui.sat.i32.f32(float %f)
i100 @llvm.fptoui.sat.i100.f64(double %f)
<4 x i32> @llvm.fptoui.sat.v4i32.v4f16(half %f)
// etc

On the SelectionDAG layer two new ISD opcodes are added, FP_TO_UINT_SAT and FP_TO_SINT_SAT. These opcodes have two operands and one result. The second operand is a value type operand specifying the saturation width. The idea here is that initially the second operand and the result type are the same, but they may change during type legalization. For example:

i19 @llvm.fptsi.sat.i19.f32(float %f)
// builds
i19 fp_to_sint_sat f, VT:i19
// type legalizes (through integer result promotion)
i32 fp_to_sint_sat f, VT:i19

I went for this approach, because saturated conversion does no compose well. There is no good way of "adjusting" a saturating conversion to i32 into one to i19 short of saturating twice. Specifying the saturation width separately allows directly saturating to the correct width.

There are two baseline expansions for the fp_to_xint_sat opcodes. If the integer bounds can be exactly represented in the float type and fminnum/fmaxnum are legal, we can expand to something like:

f = fmaxnum f, FP(MIN)
f = fminnum f, FP(MAX)
i = fptoxi f
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

If the bounds cannot be exactly represented, we expand to something like this instead:

i = fptoxi f
i = select f ult FP(MIN), MIN, i
i = select f ogt FP(MAX), MAX, i
i = select f uo f, 0, i # unnecessary if unsigned as 0 = MIN

It should be noted that this expansion assumes a non-trapping fptoxi.

This patch includes only basic legalizations (promoting float operands and integer results), and has asserting dummy methods to mark all the other parts that need to be implemented. We essentially have to implement the full suit of legalizations, because the fp_to_xint_sat instructions fall in the unfortunate category of having different result and operand type (so one does not legalize the other) and also having an additional value type argument that requires special handling.

Initial tests are for AArch64 and x86_64, for the scalar case and skipping some parts that need libcalls. This will be expanded in later patches, when the necessary legalizations are implemnented. See D54696 for a preview of where this is going.

Diff Detail

Event Timeline

nikic created this revision.Nov 20 2018, 3:49 AM

One thing I'm uncertain about here are the operands of the FP_TO_XINT_SAT opcode. There are basically three ways I could see this being represented, with a post-legalization saturation to v4i19:

  1. v4i32 = fp_to_xint_sat f, VT:v4i19
  2. v4i32 = fp_to_xint_sat f, VT:i19
  3. v4i32 = fp_to_xint_sat f, 19

That is, either the second operand is the type to which we saturate (possibly a vector), the scalarization thereof, or the scalar bitwidth.

The current implementation is the first and is modeled after what sign_extend_inreg does. The disadvantage is that the argument requires special handling, in particular during vector legalizations. It might be possible to reuse a bit more code in that area if it is stored as the scalarized type. On the other hand that feels somewhat asymmetric.

The last variant is probably not great because it does not make it obvious that the saturation width must be static.

t.p.northover added inline comments.
docs/LangRef.rst
13105 ↗(On Diff #174743)

What's going on here? All the other intrinsics I know specify the return type first in the prototype. I didn't even think it was something you could override.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2524 ↗(On Diff #174743)

Maybe put the normal assertion in here?

assert(NewOutTy.isInteger() && "Ran out of possibilities!");
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5686 ↗(On Diff #174743)

I was expecting this tag to be the scalar type in the vector situation. Does something get neater this way round?

The current implementation is the first and is modeled after what sign_extend_inreg does.

That's pretty persuasive. Though possibly something we could fix in sign-extend if enough people think the scalar is better.

nikic updated this revision to Diff 174769.Nov 20 2018, 7:01 AM
nikic edited the summary of this revision. (Show Details)

Put result type first in the type suffix of the intrinsics. Add an assertion.

nikic marked 2 inline comments as done.Nov 20 2018, 7:09 AM
nikic added inline comments.
docs/LangRef.rst
13105 ↗(On Diff #174743)

You're absolutely right, I've flipped the order now, here and in tests. It looks like LLVM is very forgiving of incorrect intrinsic name suffixes and just normalizes them to be correct.

nikic updated this revision to Diff 175763.Nov 28 2018, 1:56 PM
nikic marked 2 inline comments as done.

Rebase and fix some formatting.

nikic added a comment.Dec 4 2018, 6:27 AM

Ping. Would appreciate some feedback, especially regarding the structure of the SelectionDAG nodes.

nikic updated this revision to Diff 177839.Dec 12 2018, 4:19 AM
nikic edited the summary of this revision. (Show Details)

Add X86 tests.

I think the consistency argument is good, so I'm in favour of the structure you chose now. My main issue at the moment is the tests: hard-coding register usage leads to very fragile tests.

It might be better to split the test files into fptosi-sat-*.ll and fptoui-sat-*.ll

test/CodeGen/X86/fptoi-sat-scalar.ll
2 ↗(On Diff #177839)

Please add i686 coverage:

; RUN: llc < %s -mtriple=i686-linux | FileCheck %s --check-prefix=X86,X86-X87
; RUN: llc < %s -mtriple=i686-linux -mattr=+sse2 | FileCheck %s --check-prefix=X86,X86-SSE
; RUN: llc < %s -mtriple=x86_64-linux | FileCheck %s --check-prefix=X64
nikic updated this revision to Diff 178205.Dec 14 2018, 2:13 AM

Add i686 tests, split test files for fptoui and fptosi.

I had to comment out a few more tests on the X86 side, because they would need libcall legalizations on i686, which are not implemented in this patch.

nikic added a comment.Dec 21 2018, 7:33 AM

I think the consistency argument is good, so I'm in favour of the structure you chose now. My main issue at the moment is the tests: hard-coding register usage leads to very fragile tests.

Is there any way to avoid hardcoded register usage in an automated manner? With vector coverage, this is probably going to need about 15k lines of tests just for two platforms, and they're going to change quite a bit until codegen is finalized. I don't think it's productive to write and rewrite those by hand.

I think the consistency argument is good, so I'm in favour of the structure you chose now. My main issue at the moment is the tests: hard-coding register usage leads to very fragile tests.

Is there any way to avoid hardcoded register usage in an automated manner?

From previous discussions, the hardcoded/spelled-out register names are intentional.

With vector coverage, this is probably going to need about 15k lines of tests just for two platforms, and they're going to change quite a bit until codegen is finalized. I don't think it's productive to write and rewrite those by hand.

+1, update_llc_test_checks.py is the standard tool for the task, and is the only sane way to have meaningful, reasonable test coverage without going insane from having to write CHECK lines by hand.

arsenm added a subscriber: arsenm.Feb 8 2019, 11:15 AM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
961–963 ↗(On Diff #179284)

I'm pretty sure this will warn. There's no point in doing this anyway since you'll get the appropriate error if you just leave this out

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2358 ↗(On Diff #179284)

Ditto

nikic updated this revision to Diff 186017.Feb 8 2019, 11:46 AM

Remove unimplemented legalization stubs.

nikic updated this revision to Diff 186541.Feb 12 2019, 2:03 PM
nikic marked 2 inline comments as done.

Add nounwind to avoid CFI.

aykevl added a subscriber: aykevl.Apr 12 2019, 6:00 AM
lzutao added a subscriber: lzutao.Wed, Jun 26, 8:39 PM

Adding more potential reviewers

Herald added a project: Restricted Project. · View Herald TranscriptThu, Jun 27, 12:00 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
fhahn added a subscriber: fhahn.Fri, Jun 28, 11:37 AM