Page MenuHomePhabricator

Add constrained int->FP intrinsics
ClosedPublic

Authored by kpn on Oct 21 2019, 12:07 PM.

Details

Summary

This adds constrained intrinsics for the signed and unsigned conversions of integers to floating-point.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
craig.topper added inline comments.Nov 27 2019, 11:32 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
996

Line the arguments up with the l ine above. Same for the two below

18550–18551

Please add a FIXME

18737

getValue

18738

getValue

18759

You can DAG.getMergeValues which takes less arguments

18804

Use Sub.getValue(1)

18806

Don't we need to propagate the chain result too?

18826

Pleas add a FIXME here

19092

Use getValue

kpn marked an inline comment as done.Nov 27 2019, 12:04 PM
kpn added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
18806

When given a strict FP node LowerOperation() will require that the resulting new node has the chain in the same place in the values. So, no, I don't think so unless we want to thread the std::pair all the way back up.

craig.topper added inline comments.Nov 27 2019, 12:35 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
18806

Oh this works because the node is either a STRICT_FP_EXTEND or STRICT_FP_ROUND so the node in first and second is the same. But we should probably use a MERGE_VALUES here to not rely on that.

craig.topper added inline comments.Nov 27 2019, 1:26 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374

This only works if N2 is result 1 of N1.

In D69275#1762098, @kpn wrote:

I have not addressed the algorithm issues pointed out by Ulrich. So far I'm trying to just get the existing algorithms converted for strict use. Would it make sense to hold off on algorithm changes until another patch?

For me this would be fine as a first step, but we should at least add FIXME statements to those algorithms where we already know there are strict semantics problems.

uweigand added inline comments.Nov 29 2019, 7:36 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4125–4126

Should document Chain here.

Or maybe it would be better to use MERGE_VALUES to return both components if appropriate?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
175–176

If we pass Node, then all other arguments are really redundant, aren't they?

176–178

Hmm, wouldn't we also need to update this routine? Or can we say that promotion is not appropriate for handling strict semantics anyway?

2964

Why do we have to do the Replace... thing here instead of just appending both to Results (like is done for other nodes with chain)?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
327–328

This should now also use ValVT, I think.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9202

Hmm. There's an UnrollVectorOp_StrictFP in LegalizeVectorTypes.cpp. If we change this routine to also handle some strict operations, maybe we should go all the way and just have merge the routines completely?

llvm/lib/IR/Verifier.cpp
4810

I believe the NumOperands check is now redunant with common tests handled above.

kpn marked an inline comment as done.Dec 2 2019, 6:34 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374

I don't understand. N2 is a constant that is either 0 or 1. What will happen if it is discarded here?

This code was lifted straight out of getNode() somewhere around line 5194. Without it the X86 target dies trying to lower a rounding of f64 to f64. This happens because getStrictFPExtendOrRound() returns a round when input and output are the same size. This mirrors the non-strict getFPExtendOrRound().

craig.topper added inline comments.Dec 2 2019, 9:12 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374

Oops you’re right about N2. But there is an issue with the chain. Which I think is what I was thinking N2 was when I wrote that. If the input chain didn’t come from N1 this is broken.

I don’t see any precedent for returning a MERGE_VALUES from getNode. I think we need to fix the caller of getStrictFPExtendOrRound to only call when necessary.

craig.topper commandeered this revision.Dec 6 2019, 12:24 AM
craig.topper edited reviewers, added: kpn; removed: craig.topper.

Commandeering to update the X86 code and some other fixes/cleanup.

craig.topper marked an inline comment as done.

-Improve some of the X86 code.
-Add Promote support. Use it for i8/i16 on X86.
-Remove changes to UnrollVectorOp which seemed to be unexercised
-Some cleanup in ExpandLegalINT_TO_FP
-Drop the changes to getNode. We can't fold NOP conversions here and asserts were recently added in another patch.

Upload with context

Change the signature of SelectionDAGLegalize::ExpandLegalINT_TO_FP to allow it to update the Results vector directly.

kpn marked 6 inline comments as done.Dec 6 2019, 5:04 AM

I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.

If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.

I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.

llvm/include/llvm/CodeGen/TargetLowering.h
4125–4126

Documentation. Check. I'll have that in my next round.

I don't remember why the prototype for expandFP_TO_UINT() got reformatted. But this code is already in the tree. Having expandUINT_TO_FP() be consistent with the existing tree seems like a good idea?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
175–176

Seems that way. I'll give it a try.

176–178

I don't have a test for it so I didn't change it.

Can we guarantee the result would be rounded back down? Seems like promotion would be invalid without that guarantee.

2964

I've had a lot of trouble with this, but at the moment I'm unable to reproduce any issue here. Let's simplify and see how it goes.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
327–328

Looks like it.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374

Done. I've also added an assert to document that getStrictFPExtendOrRound() wants the lengths to be different.

In D69275#1772541, @kpn wrote:

I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.

If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.

I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.

Sorry. I should have sent a mail. If you want to upload over my last patch, I can merge my X86 changes into it.

What’s the serious issue you’ve encountered?

Restore @kpn 's last diff.

kpn added a comment.Dec 9 2019, 5:00 AM

The calls DAG.UnrollVectorOp() can't be used in place of DAGTypeLegalizer::ReplaceValueWith() because then the replaced node doesn't get properly deleted. This then results in an assertion failure later because the replaced node wasn't legal. And having SelectionDAG call into DAGTypeLegalizer doesn't seem like a winning idea, at least not without some plumbing. So today I'm going to punt on merging those functions together and just post what I've got. After breakfast sometime.

kpn commandeered this revision.Dec 10 2019, 6:04 AM
kpn edited reviewers, added: craig.topper; removed: kpn.

Prep to update patch.

kpn updated this revision to Diff 233072.Dec 10 2019, 6:06 AM

Update for review comments.

I've looked over the common code changes again, and they now look good to me. I'll leave it to Craig to review the X86 changes ...

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178

Right. In general promotion is not appropriate for strict semantics because you don't get the right exceptions (for overflow etc.).

2396–2414

NewChain seems superfluous?

kpn updated this revision to Diff 233144.Dec 10 2019, 10:40 AM

Eliminate write-only variable.

kpn marked an inline comment as done.Dec 10 2019, 10:41 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2396–2414

Right, fixed now. Thanks!

craig.topper added inline comments.Dec 10 2019, 10:41 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178

Promotion is definitely bad for fp->int, but is there really an issue for int->fp? We're just going to use a bigger int for the input. If the small int was going to overflow, it should still overflow when its extended.

uweigand added inline comments.Dec 10 2019, 10:57 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178

Ah, you're right. This should be fine.

@kpn: If you're looking for a test case, this would most likely involve something like a i16->f32 conversion.

craig.topper added inline comments.Dec 10 2019, 11:27 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1160–1163

Can a strict node get here? The call site in vectorLegalizer::Expand only checks for the non-strict node.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9202

When are these changes needed? The STRICT_UINT_TO_FP handling in LegalizeVectorOps always scalarizes and goes through ExpandStrictFPOp.

craig.topper added inline comments.Dec 10 2019, 11:44 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2396

Can we merge the 2 strict fp ifs here? The only thing we do between them is declare a new variable.

2398

Why is this not just DestVT != Sub.getValueType()

llvm/lib/Target/X86/X86ISelLowering.cpp
18744

We probably do need a strict version of FHADD, but until we have that we should just go to the shuffle + STRICT_FADD code below rather than silently dropping the chain.

craig.topper added inline comments.Dec 10 2019, 12:48 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
19026–19032

This looks out of date. I recently changed this to return a std::pair of Result and Chain so it was obvious that the chain result was intended as an output. So we need a merge values here for strict fp now. This should be reflected in https://reviews.llvm.org/D71130

kpn marked 6 inline comments as done.Dec 11 2019, 1:02 PM

Huh. I forgot to submit my comments earlier.

Anyway, I've merged in many changes from D71130, I've made a few changes as requested in LegalizeDAG.cpp, and I added the missing call to ExpandUINT_TO_FLOAT(). Now I'm seeing X86::VMULPDYrr die when running the vector-constrained-fp-intrinsics.ll test on the v4i32 type. It seems that in InstrEmitter.cpp it dies at the assertion at line 838 because NumOperands is 3. That's as far as I've gotten today.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178

Agreed. I'm working on LegalizeDAG.cpp right now. I notice X86 already has Promote for i8 and i16.

2396

Done.

2398

Eh, I figured it'd be more clear this way. I'll change it.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1160–1163

Ah, I must have lost the change to call here, possibly with D69887 going in.

llvm/lib/Target/X86/X86ISelLowering.cpp
18744

Change lifted from D71130.

19026–19032

Yup, I just needed to rebase.

kpn updated this revision to Diff 233440.Dec 11 2019, 1:04 PM

Address review comments, with changes to LegalizeDAG.cpp and LegalizeVectorOps.cpp.

One comment unaddressed still.

craig.topper added inline comments.Dec 11 2019, 3:36 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1209

Src here should be Op.getOperand(0) I think. I think that's causing your MULPD crash.

1359–1364

Can this code be replaced with DAG.UnrollVectorOp now? Or should we use this code for strict instead of DAG.UnrollVectorOp for ExpandUINT_TO_FLOAT?

kpn marked an inline comment as done.Dec 12 2019, 5:53 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1209

That was it. Thanks!

kpn marked an inline comment as done.Dec 13 2019, 11:23 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9202

VectorLegalizer::ExpandUINT_TO_FLOAT() uses UnrollVectorOp() if the required instructions are not available. I have a change where it gets called from the top of ExpandStrictFPOp() and instead returns an empty value if it would have called UnrollVectorOp(). I think that may eliminate the need to modify UnrollVectorOp() since it will then fall through into the ExpandUINT_TO_FLOAT() logic instead. If it works it'll generate better code, and eliminating complexity would be nice.

kpn updated this revision to Diff 234104.Dec 16 2019, 10:42 AM

Address review comments. I've eliminated the need to call SelectionDAG::UnrollVectorOp() and therefore eliminated the changes to that function.

kpn marked an inline comment as done.Dec 16 2019, 10:46 AM
kpn added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1359–1364

I addressed this by eliminating the need to call DAG.UnrollVectorOp(). That function is now out of the discussion. The tests show that this way here usually results in shorter sequences of instructions as well.

General question for you and @uweigand that I realized today. Do we need to set the FPExcept bit in the flags for new nodes when we expand/promote operations?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1182

Isn't SDValue(nullptr, 0) equivalent to SDValue() and more consistent with other code?

1361

Can we just declare SDValue Res in the if condition so we don't need double parentheses?

kpn updated this revision to Diff 234122.Dec 16 2019, 12:53 PM

Address review comments.

This revision is now accepted and ready to land.Dec 16 2019, 12:56 PM

General question for you and @uweigand that I realized today. Do we need to set the FPExcept bit in the flags for new nodes when we expand/promote operations?

Ah yes, we need to do that.

This reminds me of a general concern: for all other flag bits, omitting the flag is conservatively safe, it just may impact performance. But for FPExcept, omitting the flag impacts correctness. Maybe we should go ahead and invert the sense of the flag (i.e. use a FPNoExcept flag instead of FPExcept) after all ...

This revision was automatically updated to reflect the committed changes.