Page MenuHomePhabricator

Add constrained int->FP intrinsics
Needs ReviewPublic

Authored by kpn on Oct 21 2019, 12:07 PM.

Details

Summary

This adds constrained intrinsics for the signed and unsigned conversions of integers to floating-point.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
uweigand added inline comments.Fri, Nov 15, 10:02 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2439

This is actually the more problematic case, because this may round the wrong way depending on the current rounding mode.

For example, if the input is a large i64 value that does not exactly fit into the result type, and the rounding mode is "towards zero", then the result should be the FP value immediately smaller than the exact result. But since we're actually doing a signed conversion, the input will be interpreted as a negative value (of large absolute value), and rounded "towards zero", i.e. to the FP value immediately larger than the exact result of converting the negative value. After adding back the bias, we'll then have the FP value immediately larger than the exact result of an (unsigned) conversion of the original value, which is incorrect.

It actually looks like this this has nothing to do with constrained FP semantics, this code simply gives the wrong result even for regular FP operations ...

(GCC avoid this by either converting to a larger intermediate FP type first, if possible, or else avoiding the rounding issue by ensuring only positive values are passed to the signed-to-FP intermediate conversion.)

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1246

This algorithm however looks fine to me. The SINT_TO_FP operations always operate on positive values, and even more so, since they only use a half-word, the result is guaranteed to fit exactly into the target FP type, so there is never any rounding or exception. (These then don't really need to be strict operations.) The FMUL only increments the exponent, so again there is no rounding or exception.

The FADD at the end rounds correctly, and raises the correct exceptions, so this looks all good.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2629

This confuses me again. It seems this may generate a SINT_TO_FP -> HalfVT -> FP_ROUND -> OutVT chain, which introduces a potential double rounding that can lead to incorrect results even disregarding any constrained FP semantics ...

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6153

This also looks correct to me. The STRICT_SINT_TO_FP will round correctly in any rounding mode, if my understanding of the algorithm is correct, and it will also raise the inexact exception if appropriate. The STRICT_FADD is just a multiply by two, which does not depend on rounding and cannot raise any exceptions given the input (so it might as well be just a plain FADD).

But what confuses me a bit again is why we have two algorithms for UINT_TO_FP expansion: a correct one here, and the incorrect one above in ExpandLegalINT_TO_FP? Under what circumstances will we ever end up using the incorrect one?

6205

Same comment as above.

craig.topper added inline comments.Tue, Nov 26, 12:23 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2976–2977

Extra indentation

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1118

I think the input chain should be an argument here if we're going to make this a reusable function. Op could theoretically be a node with 2 data results and a chain which would make this wrong. It should probably also return std::pair of the two results so the caller doesn't have to assume where the chain is for nodes that were created inside.

craig.topper added inline comments.Tue, Nov 26, 3:15 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2629

For the only case we have tests for i64->f16. I think any integer value large enough to cause rounding when converted to f32 would be too large to represent at all in f16. Since f16's max exponent of 15 is less than the length of f32's mantissa.

uweigand added inline comments.Wed, Nov 27, 7:02 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2629

Ah yes, you're right. So this should be fine with non-strict semantics.

And for strict semantics, we should also be fine. The int->f32 conversion can only raise an inexact exception, and this only in cases where the int->f16 conversion should raise an inexact. The f32->f16 conversion due to construction of the input also can only raise an inexact exception, and again only in cases where we should have one. Conversely, in every case where we should have an inexact exception, one (or both) of the intermediate steps will raise it. (I think there may be cases where we get two, but that should be fine even for strict semantics.)

uweigand added inline comments.Wed, Nov 27, 7:58 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2629

Following up on myself: of course the f32->f16 conversion can also overflow, but again only in cases (and in fact in exactly in those cases) where the original int->f16 conversion should have overflowed. So again this should be fine.

Sorry again for the confusion, I'm not really used to thinking about f16 :-)

craig.topper added inline comments.Wed, Nov 27, 8:08 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2629

We should probably fix the code only do this for f16. I think we could construct a vXi128->vXf32 test that would go through this and I think it would be wrong.

craig.topper added inline comments.Wed, Nov 27, 9:25 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2629

Nevermind, I think we don't use the algorithm if the input type will be scalarized. I expect vXi128 will be scalarized on all targets.

kpn updated this revision to Diff 231300.Wed, Nov 27, 11:05 AM

Update for review comments. Use MERGE_VALUES node. Eliminate hacks to avoid x86 custom lowering. This means more x86 target support.

I have not addressed the algorithm issues pointed out by Ulrich. So far I'm trying to just get the existing algorithms converted for strict use. Would it make sense to hold off on algorithm changes until another patch?

craig.topper added inline comments.Wed, Nov 27, 11:32 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
996

Line the arguments up with the l ine above. Same for the two below

18550–18551

Please add a FIXME

18737

getValue

18738

getValue

18759

You can DAG.getMergeValues which takes less arguments

18804

Use Sub.getValue(1)

18806

Don't we need to propagate the chain result too?

18826

Pleas add a FIXME here

19059

You can just use Fild.getValue(1)

19084

Use getValue

19092

Use getValue

kpn marked an inline comment as done.Wed, Nov 27, 12:04 PM
kpn added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
18806

When given a strict FP node LowerOperation() will require that the resulting new node has the chain in the same place in the values. So, no, I don't think so unless we want to thread the std::pair all the way back up.

craig.topper added inline comments.Wed, Nov 27, 12:35 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
18806

Oh this works because the node is either a STRICT_FP_EXTEND or STRICT_FP_ROUND so the node in first and second is the same. But we should probably use a MERGE_VALUES here to not rely on that.

craig.topper added inline comments.Wed, Nov 27, 1:26 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7359

This only works if N2 is result 1 of N1.

In D69275#1762098, @kpn wrote:

I have not addressed the algorithm issues pointed out by Ulrich. So far I'm trying to just get the existing algorithms converted for strict use. Would it make sense to hold off on algorithm changes until another patch?

For me this would be fine as a first step, but we should at least add FIXME statements to those algorithms where we already know there are strict semantics problems.

uweigand added inline comments.Fri, Nov 29, 7:36 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4125–4126

Should document Chain here.

Or maybe it would be better to use MERGE_VALUES to return both components if appropriate?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
175–176

If we pass Node, then all other arguments are really redundant, aren't they?

176–181

Hmm, wouldn't we also need to update this routine? Or can we say that promotion is not appropriate for handling strict semantics anyway?

2975

Why do we have to do the Replace... thing here instead of just appending both to Results (like is done for other nodes with chain)?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
327–328

This should now also use ValVT, I think.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9189

Hmm. There's an UnrollVectorOp_StrictFP in LegalizeVectorTypes.cpp. If we change this routine to also handle some strict operations, maybe we should go all the way and just have merge the routines completely?

llvm/lib/IR/Verifier.cpp
4809

I believe the NumOperands check is now redunant with common tests handled above.

kpn marked an inline comment as done.Mon, Dec 2, 6:34 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7359

I don't understand. N2 is a constant that is either 0 or 1. What will happen if it is discarded here?

This code was lifted straight out of getNode() somewhere around line 5194. Without it the X86 target dies trying to lower a rounding of f64 to f64. This happens because getStrictFPExtendOrRound() returns a round when input and output are the same size. This mirrors the non-strict getFPExtendOrRound().

craig.topper added inline comments.Mon, Dec 2, 9:12 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7359

Oops you’re right about N2. But there is an issue with the chain. Which I think is what I was thinking N2 was when I wrote that. If the input chain didn’t come from N1 this is broken.

I don’t see any precedent for returning a MERGE_VALUES from getNode. I think we need to fix the caller of getStrictFPExtendOrRound to only call when necessary.

craig.topper commandeered this revision.Fri, Dec 6, 12:24 AM
craig.topper edited reviewers, added: kpn; removed: craig.topper.

Commandeering to update the X86 code and some other fixes/cleanup.

craig.topper marked an inline comment as done.

-Improve some of the X86 code.
-Add Promote support. Use it for i8/i16 on X86.
-Remove changes to UnrollVectorOp which seemed to be unexercised
-Some cleanup in ExpandLegalINT_TO_FP
-Drop the changes to getNode. We can't fold NOP conversions here and asserts were recently added in another patch.

Upload with context

Change the signature of SelectionDAGLegalize::ExpandLegalINT_TO_FP to allow it to update the Results vector directly.

kpn marked 6 inline comments as done.Fri, Dec 6, 5:04 AM

I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.

If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.

I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.

llvm/include/llvm/CodeGen/TargetLowering.h
4125–4126

Documentation. Check. I'll have that in my next round.

I don't remember why the prototype for expandFP_TO_UINT() got reformatted. But this code is already in the tree. Having expandUINT_TO_FP() be consistent with the existing tree seems like a good idea?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
175–176

Seems that way. I'll give it a try.

176–181

I don't have a test for it so I didn't change it.

Can we guarantee the result would be rounded back down? Seems like promotion would be invalid without that guarantee.

2975

I've had a lot of trouble with this, but at the moment I'm unable to reproduce any issue here. Let's simplify and see how it goes.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
327–328

Looks like it.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7359

Done. I've also added an assert to document that getStrictFPExtendOrRound() wants the lengths to be different.

In D69275#1772541, @kpn wrote:

I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.

If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.

I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.

Sorry. I should have sent a mail. If you want to upload over my last patch, I can merge my X86 changes into it.

What’s the serious issue you’ve encountered?

Restore @kpn 's last diff.

kpn added a comment.Mon, Dec 9, 5:00 AM

The calls DAG.UnrollVectorOp() can't be used in place of DAGTypeLegalizer::ReplaceValueWith() because then the replaced node doesn't get properly deleted. This then results in an assertion failure later because the replaced node wasn't legal. And having SelectionDAG call into DAGTypeLegalizer doesn't seem like a winning idea, at least not without some plumbing. So today I'm going to punt on merging those functions together and just post what I've got. After breakfast sometime.

kpn commandeered this revision.Tue, Dec 10, 6:04 AM
kpn edited reviewers, added: craig.topper; removed: kpn.

Prep to update patch.

kpn updated this revision to Diff 233072.Tue, Dec 10, 6:06 AM

Update for review comments.

I've looked over the common code changes again, and they now look good to me. I'll leave it to Craig to review the X86 changes ...

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–181

Right. In general promotion is not appropriate for strict semantics because you don't get the right exceptions (for overflow etc.).

2407–2425

NewChain seems superfluous?

kpn updated this revision to Diff 233144.Tue, Dec 10, 10:40 AM

Eliminate write-only variable.

kpn marked an inline comment as done.Tue, Dec 10, 10:41 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2407–2425

Right, fixed now. Thanks!

craig.topper added inline comments.Tue, Dec 10, 10:41 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–181

Promotion is definitely bad for fp->int, but is there really an issue for int->fp? We're just going to use a bigger int for the input. If the small int was going to overflow, it should still overflow when its extended.

uweigand added inline comments.Tue, Dec 10, 10:57 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–181

Ah, you're right. This should be fine.

@kpn: If you're looking for a test case, this would most likely involve something like a i16->f32 conversion.

craig.topper added inline comments.Tue, Dec 10, 11:27 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1160–1163

Can a strict node get here? The call site in vectorLegalizer::Expand only checks for the non-strict node.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9189

When are these changes needed? The STRICT_UINT_TO_FP handling in LegalizeVectorOps always scalarizes and goes through ExpandStrictFPOp.

craig.topper added inline comments.Tue, Dec 10, 11:44 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2407

Can we merge the 2 strict fp ifs here? The only thing we do between them is declare a new variable.

2409

Why is this not just DestVT != Sub.getValueType()

llvm/lib/Target/X86/X86ISelLowering.cpp
18744

We probably do need a strict version of FHADD, but until we have that we should just go to the shuffle + STRICT_FADD code below rather than silently dropping the chain.

craig.topper added inline comments.Tue, Dec 10, 12:48 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
19026–19032

This looks out of date. I recently changed this to return a std::pair of Result and Chain so it was obvious that the chain result was intended as an output. So we need a merge values here for strict fp now. This should be reflected in https://reviews.llvm.org/D71130

kpn marked 6 inline comments as done.Wed, Dec 11, 1:02 PM

Huh. I forgot to submit my comments earlier.

Anyway, I've merged in many changes from D71130, I've made a few changes as requested in LegalizeDAG.cpp, and I added the missing call to ExpandUINT_TO_FLOAT(). Now I'm seeing X86::VMULPDYrr die when running the vector-constrained-fp-intrinsics.ll test on the v4i32 type. It seems that in InstrEmitter.cpp it dies at the assertion at line 838 because NumOperands is 3. That's as far as I've gotten today.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–181

Agreed. I'm working on LegalizeDAG.cpp right now. I notice X86 already has Promote for i8 and i16.

2407

Done.

2409

Eh, I figured it'd be more clear this way. I'll change it.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1160–1163

Ah, I must have lost the change to call here, possibly with D69887 going in.

llvm/lib/Target/X86/X86ISelLowering.cpp
18744

Change lifted from D71130.

19026–19032

Yup, I just needed to rebase.

kpn updated this revision to Diff 233440.Wed, Dec 11, 1:04 PM

Address review comments, with changes to LegalizeDAG.cpp and LegalizeVectorOps.cpp.

One comment unaddressed still.

craig.topper added inline comments.Wed, Dec 11, 3:36 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1209

Src here should be Op.getOperand(0) I think. I think that's causing your MULPD crash.

1362

Can this code be replaced with DAG.UnrollVectorOp now? Or should we use this code for strict instead of DAG.UnrollVectorOp for ExpandUINT_TO_FLOAT?

kpn marked an inline comment as done.Thu, Dec 12, 5:53 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1209

That was it. Thanks!