This is an archive of the discontinued LLVM Phabricator instance.

This probably needs tests that will lower to single instructions. E.g. llvm.experimental.constrained.uitofp.v4f32.v4i32 should lower to a cvtps2dq on SSE2.

Maybe testing an AVX512VL target would be interesting too. They have really good support for different CVT variants.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1016	Should these be clustered with STRICT_LRINT/etc?
2415	Nit-picky: does this preserve the rounding results and flags raised? If the target doesn't support the itofp instruction, I'm not sure if we can do better anyway. Just wondering if anyone had thought this through...
2971	Is the early return necessary here? It stands out from the surrounding code...

In D69275#1724146, @cameron.mcinally wrote:

This probably needs tests that will lower to single instructions. E.g. llvm.experimental.constrained.uitofp.v4f32.v4i32 should lower to a cvtps2dq on SSE2.

Maybe testing an AVX512VL target would be interesting too. They have really good support for different CVT variants.

For a first cut I was hoping to avoid dealing with Custom lowerings. That's why there's no v4* test cases. Is this something that can be handled when the X86 backend support gets to this point?

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1016	The regular U/SINT_TO_FP nodes are handled a couple of lines above. That makes these clustered close to the matching non-constrained versions, which seemed appropriate. Is there a better place?
2415	Well, we don't have the metadata arguments at this point, so they can't be carried through to the extend/round instruction. And I don't know how a target that doesn't have an itofp instruction would handle a round instruction. Maybe? I don't know.
2971	Yes, it's necessary. The normal path falls through to ReplaceNode() which can't handle, for example, replacing a node with two results with one that has only one. That's why we have to explicitly return the chain and then handle node replacement "by hand" here. You'll see this in other STRICT_* cases in this switch.

In D69275#1725323, @kpn wrote:

In D69275#1724146, @cameron.mcinally wrote:

This probably needs tests that will lower to single instructions. E.g. llvm.experimental.constrained.uitofp.v4f32.v4i32 should lower to a cvtps2dq on SSE2.

Maybe testing an AVX512VL target would be interesting too. They have really good support for different CVT variants.

For a first cut I was hoping to avoid dealing with Custom lowerings. That's why there's no v4* test cases. Is this something that can be handled when the X86 backend support gets to this point?

Ah, right. I forgot about that. That's reasonable.

Besides the one inline comment, this all looks good. It's a big patch though, so I'll wait before accepting to give others time for review.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1016	I think the STRICT_LRINT/etc cases should be moved up and combined with these new cases. All the non-strict variants are clustered together. It would make sense to cluster the strict variants directly under those.

cameron.mcinally added inline comments.Oct 29 2019, 8:36 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1016	I suppose STRICT_LRINT/etc could be moved in a separate NFCI patch and then this patch just rebased. That's probably best. Sorry, just thinking aloud...

Do we need a Verifier check to ensure that the vector element counts match between source and dest type? Not sure what we do for the other direction.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2343	Ternary on the operand number?
2487	Drop the else since we early returned?
2972	Drop the else?
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1126	Is this longer than 80 columns?
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6122	A ternary operator on the operand number would probably be shorter?
llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	This check line is for IR, but this is an assembly test. CHECK isn't a valid check-prefix for this file. Which also means all of the CHECK-LABEL lines are broken in the existing tests aren't doing anything

kpn marked 7 inline comments as done.Oct 30 2019, 7:18 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1016	When one does something like this one notices that I'm calling TLI.getOperationAction() instead of TLI.getStrictFPOperationAction(). That seems like a bug in one of the two places. Hmmm.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1126	Yes. I'll run the patch through clang-format before my next upload.
llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	Ouch, yeah, that error is all over this file. How about I fix it for the new changes here and then fix the rest in another ticket?

kpn mentioned this in rG72bc291f9459: [NFC] Move this set of STRICT_* cases to be next to the non-strict cases..Oct 30 2019, 10:35 AM

Address review comments.

craig.topper added inline comments.Oct 30 2019, 11:00 AM

llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	What is the expected behavior here? We seem to doing some emulation of the operation instead of using the conversion instruction we have with SSE2.

kpn marked an inline comment as done.Oct 30 2019, 11:21 AM

kpn added inline comments.

llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	That may be an artifact of of the getOperationAction()/getStrictFPOperationAction() issue that I found after the rearranging Cameron asked for. I'm looking into it now.

It looks like I am going to have to do some Custom lowering work after all. Plus the changes to the IR verifier are missing.

llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	Yep, that's what caused it. If we use the getStrictFPOperationAction() route like the other STRICT nodes here then we end up with Custom lowerings. It's a shame there's no "Unassigned" "lowering". Currently there's no way to know if the target doesn't implement something or if it really does want to Expand like is currently returned. And LegalizeOp() doesn't bother trying to query the target like ExpandNode() does. That seems like a bug.

In D69275#1728938, @kpn wrote:

It looks like I am going to have to do some Custom lowering work after all. Plus the changes to the IR verifier are missing

llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	I didn't understand this. Isn't ExpandNode is only called from LegalizeOp? And doesn't LegalizeOp query the target to know to call ExpandNode?

kpn marked an inline comment as done.Oct 31 2019, 11:48 AM

kpn added inline comments.

llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	Both true, yes. But for STRICT nodes LegalizeOp calls getStrictFPOperationAction(), which just calls getOperationAction() for the non-strict version of the node. So there's no way for LegalizeOp to know if the target wants to handle the STRICT node any differently from the non-strict. It never actually calls getOperationAction() for a STRICT node.

uweigand added inline comments.Nov 4 2019, 9:16 AM

llvm/test/CodeGen/X86/fp-intrinsics.ll
1946	Maybe I'm confused here, but I thought I fixed this with my recent changes. These days, all code should always call getOperationAction on the strict node first, and only if that returns Expand, then it should call getStrictFPOperationAction to find out how the expansion is to be implemented. That means in LegalizeOp you should not call getStrictFPOperationAction. Really only ExpandNode should do so ...

pengfei added a subscriber: pengfei.Nov 4 2019, 9:22 PM

Address review comments. Add missing IR verifier pieces. Add the remaining test cases.

I've added the bare minimum X86 support to get the tests running.

kpn mentioned this in rG19bbdf6ca6be: Fix errors where we thought we were checking for labels but weren't due to use….Nov 11 2019, 9:36 AM

craig.topper added inline comments.Nov 11 2019, 10:56 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1241	Should we just be returning MERGE_VALUES from the custom handler for the strict fp nodes? That's how we handle loads and atomics.

kpn marked an inline comment as done.Nov 11 2019, 11:42 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1241	I'll take a look. Making this code simpler would be nice.

andrew.w.kaylor added inline comments.Nov 11 2019, 2:04 PM

llvm/docs/LangRef.rst
15703	This sounds incomplete. I think some of the information from the arguments section ought to be here instead.
llvm/test/CodeGen/X86/fp-intrinsics.ll
2004	I feel like this should also be checking for the mov instruction, which implicitly zero-extends the 32-bit input. Maybe even a comment explaining what happened, because just looking at the test and seeing that uitofp gets lowered to cvtsi2sd looks wrong.
2022	This sequence is quite opaque. I'm not sure it's the best way of handling this case, though I realize that's unrelated to your patch. What is related to your patch is the question of whether the subpd here might be introducing a spurious exception. I'd need to figure out what this is doing to answer that question.
2053	Some more context would be useful here. I believe there is a test a jump and an addss that are also relevant in the output.

I've switched the fp-intrinsic.ll test to use the update_llc_test_checks.py script and added more command lines in commit 774e829c29017d35e8af3b854f21c792caf30181. Ialso added more RUN lines to vector-constrained-fp-intrinsics.ll in 9e5116f756f05b68e8394e392027dca7bc574559 Can you rebase the tests here with those changes?

I like the switch to using update_llc_test_checks.py. Sure, I'll rebase, run that script and it'll be in the next round of this patch when I can.

llvm/test/CodeGen/X86/fp-intrinsics.ll
2053	These issues are solved by switching to using the update_llc_test_checks.py script that Craig ran. I'll have it in my next round of this patch when I can get to it.

Some comments on verifying the various algorithms for correctly respecting constrained FP semantics. I have not looked at the X86 back-end.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1016	Yes, the code below for STRICT_LRINT etc seems buggy, it should definitely call TLI.getOperationAction instead.
2415	I don't believe we need the metadata arguments here. Instead, we need to verify that this code sequence always rounds correctly according to the current rounding mode, no matter what it is, and also that this code sequence raises the same set of exceptions that a real IEEE compliant itofp operation would raise. So looking at the algorithm, it first constructs two double values "by hand", using only integer operations, which is always fine. Then those two are subtracted from one other. Due to the construction, neither can be NaN, and also the difference is always exactly representable. This means that the FSUB never raises any exception and does not depend on the current rounding mode, so this should also be fine. (So possibly, we don't even need a STRICT_FSUB here but can continue using a normal FSUB?) Finally, the result may be extended or rounded to another floating-point type. Extension should be fine since it never rounds and the input cannot be a NaN. Rounding should also be fine since it uses the correct (current) rounding mode, due to the construction of the input can never raise underflow or overflow exceptions, and raises the inexact exception exactly when it ought to be raised.
2428	This is actually the more problematic case, because this may round the wrong way depending on the current rounding mode. For example, if the input is a large i64 value that does not exactly fit into the result type, and the rounding mode is "towards zero", then the result should be the FP value immediately smaller than the exact result. But since we're actually doing a signed conversion, the input will be interpreted as a negative value (of large absolute value), and rounded "towards zero", i.e. to the FP value immediately larger than the exact result of converting the negative value. After adding back the bias, we'll then have the FP value immediately larger than the exact result of an (unsigned) conversion of the original value, which is incorrect. It actually looks like this this has nothing to do with constrained FP semantics, this code simply gives the wrong result even for regular FP operations ... (GCC avoid this by either converting to a larger intermediate FP type first, if possible, or else avoiding the rounding issue by ensuring only positive values are passed to the signed-to-FP intermediate conversion.)
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1246	This algorithm however looks fine to me. The SINT_TO_FP operations always operate on positive values, and even more so, since they only use a half-word, the result is guaranteed to fit exactly into the target FP type, so there is never any rounding or exception. (These then don't really need to be strict operations.) The FMUL only increments the exponent, so again there is no rounding or exception. The FADD at the end rounds correctly, and raises the correct exceptions, so this looks all good.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2581	This confuses me again. It seems this may generate a SINT_TO_FP -> HalfVT -> FP_ROUND -> OutVT chain, which introduces a potential double rounding that can lead to incorrect results even disregarding any constrained FP semantics ...
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
6165	This also looks correct to me. The STRICT_SINT_TO_FP will round correctly in any rounding mode, if my understanding of the algorithm is correct, and it will also raise the inexact exception if appropriate. The STRICT_FADD is just a multiply by two, which does not depend on rounding and cannot raise any exceptions given the input (so it might as well be just a plain FADD). But what confuses me a bit again is why we have two algorithms for UINT_TO_FP expansion: a correct one here, and the incorrect one above in ExpandLegalINT_TO_FP? Under what circumstances will we ever end up using the incorrect one?
6217	Same comment as above.

uweigand mentioned this in D70226: Add an option to disable strict float node mutating to an normal float node.Nov 18 2019, 7:23 AM

craig.topper added inline comments.Nov 26 2019, 12:23 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2965–2966	Extra indentation
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1122	I think the input chain should be an argument here if we're going to make this a reusable function. Op could theoretically be a node with 2 data results and a chain which would make this wrong. It should probably also return std::pair of the two results so the caller doesn't have to assume where the chain is for nodes that were created inside.

craig.topper added inline comments.Nov 26 2019, 3:15 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2581	For the only case we have tests for i64->f16. I think any integer value large enough to cause rounding when converted to f32 would be too large to represent at all in f16. Since f16's max exponent of 15 is less than the length of f32's mantissa.

uweigand added inline comments.Nov 27 2019, 7:02 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2581	Ah yes, you're right. So this should be fine with non-strict semantics. And for strict semantics, we should also be fine. The int->f32 conversion can only raise an inexact exception, and this only in cases where the int->f16 conversion should raise an inexact. The f32->f16 conversion due to construction of the input also can only raise an inexact exception, and again only in cases where we should have one. Conversely, in every case where we should have an inexact exception, one (or both) of the intermediate steps will raise it. (I think there may be cases where we get two, but that should be fine even for strict semantics.)

uweigand added inline comments.Nov 27 2019, 7:58 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2581	Following up on myself: of course the f32->f16 conversion can also overflow, but again only in cases (and in fact in exactly in those cases) where the original int->f16 conversion should have overflowed. So again this should be fine. Sorry again for the confusion, I'm not really used to thinking about f16 :-)

craig.topper added inline comments.Nov 27 2019, 8:08 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2581	We should probably fix the code only do this for f16. I think we could construct a vXi128->vXf32 test that would go through this and I think it would be wrong.

craig.topper added inline comments.Nov 27 2019, 9:25 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2581	Nevermind, I think we don't use the algorithm if the input type will be scalarized. I expect vXi128 will be scalarized on all targets.

Update for review comments. Use MERGE_VALUES node. Eliminate hacks to avoid x86 custom lowering. This means more x86 target support.

I have not addressed the algorithm issues pointed out by Ulrich. So far I'm trying to just get the existing algorithms converted for strict use. Would it make sense to hold off on algorithm changes until another patch?

craig.topper added inline comments.Nov 27 2019, 11:32 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
18804	Use Sub.getValue(1)
19059	You can just use Fild.getValue(1)
19084	Use getValue
19092	Use getValue

craig.topper added inline comments.Nov 27 2019, 11:32 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
996	Line the arguments up with the l ine above. Same for the two below
18550–18551	Please add a FIXME
18737	getValue
18738	getValue
18759	You can DAG.getMergeValues which takes less arguments
18806	Don't we need to propagate the chain result too?
18826	Pleas add a FIXME here

kpn marked an inline comment as done.Nov 27 2019, 12:04 PM

kpn added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
18806	When given a strict FP node LowerOperation() will require that the resulting new node has the chain in the same place in the values. So, no, I don't think so unless we want to thread the std::pair all the way back up.

craig.topper added inline comments.Nov 27 2019, 12:35 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
18806	Oh this works because the node is either a STRICT_FP_EXTEND or STRICT_FP_ROUND so the node in first and second is the same. But we should probably use a MERGE_VALUES here to not rely on that.

craig.topper added inline comments.Nov 27 2019, 1:26 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374	This only works if N2 is result 1 of N1.

In D69275#1762098, @kpn wrote:

I have not addressed the algorithm issues pointed out by Ulrich. So far I'm trying to just get the existing algorithms converted for strict use. Would it make sense to hold off on algorithm changes until another patch?

For me this would be fine as a first step, but we should at least add FIXME statements to those algorithms where we already know there are strict semantics problems.

uweigand added inline comments.Nov 29 2019, 7:36 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4125–4126	Should document Chain here. Or maybe it would be better to use MERGE_VALUES to return both components if appropriate?
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
175–176	If we pass Node, then all other arguments are really redundant, aren't they?
176–178	Hmm, wouldn't we also need to update this routine? Or can we say that promotion is not appropriate for handling strict semantics anyway?
2964	Why do we have to do the Replace... thing here instead of just appending both to Results (like is done for other nodes with chain)?
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
327–328	This should now also use ValVT, I think.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9202	Hmm. There's an UnrollVectorOp_StrictFP in LegalizeVectorTypes.cpp. If we change this routine to also handle some strict operations, maybe we should go all the way and just have merge the routines completely?
llvm/lib/IR/Verifier.cpp
4810	I believe the NumOperands check is now redunant with common tests handled above.

kpn marked an inline comment as done.Dec 2 2019, 6:34 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374	I don't understand. N2 is a constant that is either 0 or 1. What will happen if it is discarded here? This code was lifted straight out of getNode() somewhere around line 5194. Without it the X86 target dies trying to lower a rounding of f64 to f64. This happens because getStrictFPExtendOrRound() returns a round when input and output are the same size. This mirrors the non-strict getFPExtendOrRound().

craig.topper added inline comments.Dec 2 2019, 9:12 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374	Oops you’re right about N2. But there is an issue with the chain. Which I think is what I was thinking N2 was when I wrote that. If the input chain didn’t come from N1 this is broken. I don’t see any precedent for returning a MERGE_VALUES from getNode. I think we need to fix the caller of getStrictFPExtendOrRound to only call when necessary.

Commandeering to update the X86 code and some other fixes/cleanup.

-Improve some of the X86 code.
-Add Promote support. Use it for i8/i16 on X86.
-Remove changes to UnrollVectorOp which seemed to be unexercised
-Some cleanup in ExpandLegalINT_TO_FP
-Drop the changes to getNode. We can't fold NOP conversions here and asserts were recently added in another patch.

Upload with context

Change the signature of SelectionDAGLegalize::ExpandLegalINT_TO_FP to allow it to update the Results vector directly.

I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.

If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.

I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.

llvm/include/llvm/CodeGen/TargetLowering.h
4125–4126	Documentation. Check. I'll have that in my next round. I don't remember why the prototype for expandFP_TO_UINT() got reformatted. But this code is already in the tree. Having expandUINT_TO_FP() be consistent with the existing tree seems like a good idea?
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
175–176	Seems that way. I'll give it a try.
176–178	I don't have a test for it so I didn't change it. Can we guarantee the result would be rounded back down? Seems like promotion would be invalid without that guarantee.
2964	I've had a lot of trouble with this, but at the moment I'm unable to reproduce any issue here. Let's simplify and see how it goes.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
327–328	Looks like it.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7374	Done. I've also added an assert to document that getStrictFPExtendOrRound() wants the lengths to be different.

In D69275#1772541, @kpn wrote:

I'm still working on this ticket daily! I'm trying to merge the two vector unrolling functions like Ulrich suggested. But I ran into problems that lead me to think we may have a serious issue lurking that we'll need to fix. That's what I've been working on: trying to understand the issue and see if it needs further investigation.

If you are in a hurry then you could have sent me an email and I would have uploaded the diffs I've got without further investigation.

I'm leaving attached the comments on my work that I've been adding but haven't submitted until now.

Sorry. I should have sent a mail. If you want to upload over my last patch, I can merge my X86 changes into it.

What’s the serious issue you’ve encountered?

Restore @kpn 's last diff.

craig.topper mentioned this in D71130: [X86] Supplement to D69275 including promotion support.Dec 6 2019, 10:00 AM

LiuChen3 added a subscriber: LiuChen3.Dec 8 2019, 10:29 PM

The calls DAG.UnrollVectorOp() can't be used in place of DAGTypeLegalizer::ReplaceValueWith() because then the replaced node doesn't get properly deleted. This then results in an assertion failure later because the replaced node wasn't legal. And having SelectionDAG call into DAGTypeLegalizer doesn't seem like a winning idea, at least not without some plumbing. So today I'm going to punt on merging those functions together and just post what I've got. After breakfast sometime.

Prep to update patch.

Update for review comments.

I've looked over the common code changes again, and they now look good to me. I'll leave it to Craig to review the X86 changes ...

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178	Right. In general promotion is not appropriate for strict semantics because you don't get the right exceptions (for overflow etc.).
2396–2414	NewChain seems superfluous?

Eliminate write-only variable.

kpn marked an inline comment as done.Dec 10 2019, 10:41 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2396–2414	Right, fixed now. Thanks!

craig.topper added inline comments.Dec 10 2019, 10:41 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178	Promotion is definitely bad for fp->int, but is there really an issue for int->fp? We're just going to use a bigger int for the input. If the small int was going to overflow, it should still overflow when its extended.

uweigand added inline comments.Dec 10 2019, 10:57 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178	Ah, you're right. This should be fine. @kpn: If you're looking for a test case, this would most likely involve something like a i16->f32 conversion.

craig.topper added inline comments.Dec 10 2019, 11:27 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1160–1163	Can a strict node get here? The call site in vectorLegalizer::Expand only checks for the non-strict node.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9202	When are these changes needed? The STRICT_UINT_TO_FP handling in LegalizeVectorOps always scalarizes and goes through ExpandStrictFPOp.

craig.topper added inline comments.Dec 10 2019, 11:44 AM

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2396	Can we merge the 2 strict fp ifs here? The only thing we do between them is declare a new variable.
2398	Why is this not just DestVT != Sub.getValueType()
llvm/lib/Target/X86/X86ISelLowering.cpp
18744	We probably do need a strict version of FHADD, but until we have that we should just go to the shuffle + STRICT_FADD code below rather than silently dropping the chain.

craig.topper added inline comments.Dec 10 2019, 12:48 PM

llvm/lib/Target/X86/X86ISelLowering.cpp
19026–19032	This looks out of date. I recently changed this to return a std::pair of Result and Chain so it was obvious that the chain result was intended as an output. So we need a merge values here for strict fp now. This should be reflected in https://reviews.llvm.org/D71130

Huh. I forgot to submit my comments earlier.

Anyway, I've merged in many changes from D71130, I've made a few changes as requested in LegalizeDAG.cpp, and I added the missing call to ExpandUINT_TO_FLOAT(). Now I'm seeing X86::VMULPDYrr die when running the vector-constrained-fp-intrinsics.ll test on the v4i32 type. It seems that in InstrEmitter.cpp it dies at the assertion at line 838 because NumOperands is 3. That's as far as I've gotten today.

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
176–178	Agreed. I'm working on LegalizeDAG.cpp right now. I notice X86 already has Promote for i8 and i16.
2396	Done.
2398	Eh, I figured it'd be more clear this way. I'll change it.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1160–1163	Ah, I must have lost the change to call here, possibly with D69887 going in.
llvm/lib/Target/X86/X86ISelLowering.cpp
18744	Change lifted from D71130.
19026–19032	Yup, I just needed to rebase.

Address review comments, with changes to LegalizeDAG.cpp and LegalizeVectorOps.cpp.

One comment unaddressed still.

craig.topper added inline comments.Dec 11 2019, 3:36 PM

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1209	Src here should be Op.getOperand(0) I think. I think that's causing your MULPD crash.
1359–1364	Can this code be replaced with DAG.UnrollVectorOp now? Or should we use this code for strict instead of DAG.UnrollVectorOp for ExpandUINT_TO_FLOAT?

kpn marked an inline comment as done.Dec 12 2019, 5:53 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1209	That was it. Thanks!

kpn marked an inline comment as done.Dec 13 2019, 11:23 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9202	VectorLegalizer::ExpandUINT_TO_FLOAT() uses UnrollVectorOp() if the required instructions are not available. I have a change where it gets called from the top of ExpandStrictFPOp() and instead returns an empty value if it would have called UnrollVectorOp(). I think that may eliminate the need to modify UnrollVectorOp() since it will then fall through into the ExpandUINT_TO_FLOAT() logic instead. If it works it'll generate better code, and eliminating complexity would be nice.

Address review comments. I've eliminated the need to call SelectionDAG::UnrollVectorOp() and therefore eliminated the changes to that function.

kpn marked an inline comment as done.Dec 16 2019, 10:46 AM

kpn added inline comments.

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1359–1364	I addressed this by eliminating the need to call DAG.UnrollVectorOp(). That function is now out of the discussion. The tests show that this way here usually results in shorter sequences of instructions as well.

General question for you and @uweigand that I realized today. Do we need to set the FPExcept bit in the flags for new nodes when we expand/promote operations?

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1182	Isn't SDValue(nullptr, 0) equivalent to SDValue() and more consistent with other code?
1361	Can we just declare SDValue Res in the if condition so we don't need double parentheses?

Address review comments.

LGTM

This revision is now accepted and ready to land.Dec 16 2019, 12:56 PM

In D69275#1786456, @craig.topper wrote:

General question for you and @uweigand that I realized today. Do we need to set the FPExcept bit in the flags for new nodes when we expand/promote operations?

Ah yes, we need to do that.

This reminds me of a general concern: for all other flag bits, omitting the flag is conservatively safe, it just may impact performance. But for FPExcept, omitting the flag impacts correctness. Maybe we should go ahead and invert the sense of the flag (i.e. use a FPNoExcept flag instead of FPExcept) after all ...

Closed by commit rGb1d8576b0a9f: This adds constrained intrinsics for the signed and unsigned conversions of… (authored by kpn). · Explain WhyDec 17 2019, 7:09 AM

This revision was automatically updated to reflect the committed changes.

qiucf mentioned this in D83376: [Legalizer] Fix wrong operand in split vector helper.Jul 8 2020, 2:39 AM

qiucf mentioned this in rG4254ed5c325c: [Legalizer] Fix wrong operand in split vector helper.Jul 8 2020, 7:01 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

72 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

7 lines

SelectionDAG.h

5 lines

TargetLowering.h

8 lines

IR/

ConstrainedOps.def

2 lines

Intrinsics.td

10 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

120 lines

LegalizeVectorOps.cpp

69 lines

LegalizeVectorTypes.cpp

44 lines

SelectionDAG.cpp

14 lines

SelectionDAGDumper.cpp

2 lines

TargetLowering.cpp

39 lines

IR/

Verifier.cpp

22 lines

Target/

X86/

X86ISelLowering.cpp

174 lines

test/

CodeGen/

X86/

fp-intrinsics.ll

772 lines

vector-constrained-fp-intrinsics.ll

1540 lines

Feature/

fp-intrinsics.ll

24 lines

Diff 234295

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 15,661 Lines • ▼ Show 20 Lines
	The second argument specifies the exception behavior as described above.			The second argument specifies the exception behavior as described above.

	Semantics:			Semantics:
	""""""""""			""""""""""

	The result produced is a signed integer converted from the floating			The result produced is a signed integer converted from the floating
	point operand. The value is truncated, so it is rounded towards zero.			point operand. The value is truncated, so it is rounded towards zero.

				'``llvm.experimental.constrained.uitofp``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare <ty2>
				@llvm.experimental.constrained.uitofp(<type> <value>,
				metadata <rounding mode>,
				metadata <exception behavior>)

				Overview:
				"""""""""

				The '``llvm.experimental.constrained.uitofp``' intrinsic converts an
				unsigned integer ``value`` to a floating-point of type ``ty2``.

				Arguments:
				""""""""""

				The first argument to the '``llvm.experimental.constrained.uitofp``'
				intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
				<t_vector>` of integer values.

				The second and third arguments specify the rounding mode and exception
				behavior as described above.

				Semantics:
				""""""""""

				An inexact floating-point exception will be raised if rounding is required.
				Any result produced is a floating point value converted from the input
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This sounds incomplete. I think some of the information from the arguments section ought to be here instead. andrew.w.kaylor: This sounds incomplete. I think some of the information from the arguments section ought to be…
				integer operand.

				'``llvm.experimental.constrained.sitofp``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare <ty2>
				@llvm.experimental.constrained.sitofp(<type> <value>,
				metadata <rounding mode>,
				metadata <exception behavior>)

				Overview:
				"""""""""

				The '``llvm.experimental.constrained.sitofp``' intrinsic converts a
				signed integer ``value`` to a floating-point of type ``ty2``.

				Arguments:
				""""""""""

				The first argument to the '``llvm.experimental.constrained.sitofp``'
				intrinsic must be an :ref:`integer <t_integer>` or :ref:`vector
				<t_vector>` of integer values.

				The second and third arguments specify the rounding mode and exception
				behavior as described above.

				Semantics:
				""""""""""

				An inexact floating-point exception will be raised if rounding is required.
				Any result produced is a floating point value converted from the input
				integer operand.

	'``llvm.experimental.constrained.fptrunc``' Intrinsic			'``llvm.experimental.constrained.fptrunc``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	::			::

	▲ Show 20 Lines • Show All 2,541 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	enum NodeType {

/// STRICT_FP_TO_[US]INT - Convert a floating point value to a signed or		/// STRICT_FP_TO_[US]INT - Convert a floating point value to a signed or
/// unsigned integer. These have the same semantics as fptosi and fptoui		/// unsigned integer. These have the same semantics as fptosi and fptoui
/// in IR.		/// in IR.
/// They are used to limit optimizations while the DAG is being optimized.		/// They are used to limit optimizations while the DAG is being optimized.
STRICT_FP_TO_SINT,		STRICT_FP_TO_SINT,
STRICT_FP_TO_UINT,		STRICT_FP_TO_UINT,

		/// STRICT_[US]INT_TO_FP - Convert a signed or unsigned integer to
		/// a floating point value. These have the same semantics as sitofp and
		/// uitofp in IR.
		/// They are used to limit optimizations while the DAG is being optimized.
		STRICT_SINT_TO_FP,
		STRICT_UINT_TO_FP,

/// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating		/// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating
/// point type down to the precision of the destination VT. TRUNC is a		/// point type down to the precision of the destination VT. TRUNC is a
/// flag, which is always an integer that is zero or one. If TRUNC is 0,		/// flag, which is always an integer that is zero or one. If TRUNC is 0,
/// this is a normal rounding, if it is 1, this FP_ROUND is known to not		/// this is a normal rounding, if it is 1, this FP_ROUND is known to not
/// change the value of Y.		/// change the value of Y.
///		///
/// The TRUNC = 1 case is used in cases where we know that the value will		/// The TRUNC = 1 case is used in cases where we know that the value will
/// not be modified by the node, because Y is not using any of the extra		/// not be modified by the node, because Y is not using any of the extra
▲ Show 20 Lines • Show All 785 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 805 Lines • ▼ Show 20 Lines	#endif
///		///
/// Example: shuffle A, B, <0,5,2,7> -> shuffle B, A, <4,1,6,3>		/// Example: shuffle A, B, <0,5,2,7> -> shuffle B, A, <4,1,6,3>
SDValue getCommutedVectorShuffle(const ShuffleVectorSDNode &SV);		SDValue getCommutedVectorShuffle(const ShuffleVectorSDNode &SV);

/// Convert Op, which must be of float type, to the		/// Convert Op, which must be of float type, to the
/// float type VT, by either extending or rounding (by truncation).		/// float type VT, by either extending or rounding (by truncation).
SDValue getFPExtendOrRound(SDValue Op, const SDLoc &DL, EVT VT);		SDValue getFPExtendOrRound(SDValue Op, const SDLoc &DL, EVT VT);

		/// Convert Op, which must be a STRICT operation of float type, to the
		/// float type VT, by either extending or rounding (by truncation).
		std::pair<SDValue, SDValue>
		getStrictFPExtendOrRound(SDValue Op, SDValue Chain, const SDLoc &DL, EVT VT);

/// Convert Op, which must be of integer type, to the		/// Convert Op, which must be of integer type, to the
/// integer type VT, by either any-extending or truncating it.		/// integer type VT, by either any-extending or truncating it.
SDValue getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT);		SDValue getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT);

/// Convert Op, which must be of integer type, to the		/// Convert Op, which must be of integer type, to the
/// integer type VT, by either sign-extending or truncating it.		/// integer type VT, by either sign-extending or truncating it.
SDValue getSExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT);		SDValue getSExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT);

▲ Show 20 Lines • Show All 1,012 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

	Show First 20 Lines • Show All 4,116 Lines • ▼ Show 20 Lines
	/// Expand float(f32) to SINT(i64) conversion			/// Expand float(f32) to SINT(i64) conversion
	/// \param N Node to expand			/// \param N Node to expand
	/// \param Result output after conversion			/// \param Result output after conversion
	/// \returns True, if the expansion was successful, false otherwise			/// \returns True, if the expansion was successful, false otherwise
	bool expandFP_TO_SINT(SDNode *N, SDValue &Result, SelectionDAG &DAG) const;			bool expandFP_TO_SINT(SDNode *N, SDValue &Result, SelectionDAG &DAG) const;

	/// Expand float to UINT conversion			/// Expand float to UINT conversion
	/// \param N Node to expand			/// \param N Node to expand
	/// \param Result output after conversion			/// \param Result output after conversion
				/// \param Chain output chain after conversion
				uweigandUnsubmitted Not Done Reply Inline Actions Should document Chain here. Or maybe it would be better to use MERGE_VALUES to return both components if appropriate? uweigand: Should document Chain here. Or maybe it would be better to use MERGE_VALUES to return both…
				kpnAuthorUnsubmitted Done Reply Inline Actions Documentation. Check. I'll have that in my next round. I don't remember why the prototype for expandFP_TO_UINT() got reformatted. But this code is already in the tree. Having expandUINT_TO_FP() be consistent with the existing tree seems like a good idea? kpn: Documentation. Check. I'll have that in my next round. I don't remember why the prototype for…
	/// \returns True, if the expansion was successful, false otherwise			/// \returns True, if the expansion was successful, false otherwise
	bool expandFP_TO_UINT(SDNode *N, SDValue &Result, SDValue &Chain, SelectionDAG &DAG) const;			bool expandFP_TO_UINT(SDNode *N, SDValue &Result, SDValue &Chain,
				SelectionDAG &DAG) const;

	/// Expand UINT(i64) to double(f64) conversion			/// Expand UINT(i64) to double(f64) conversion
	/// \param N Node to expand			/// \param N Node to expand
	/// \param Result output after conversion			/// \param Result output after conversion
				/// \param Chain output chain after conversion
	/// \returns True, if the expansion was successful, false otherwise			/// \returns True, if the expansion was successful, false otherwise
	bool expandUINT_TO_FP(SDNode *N, SDValue &Result, SelectionDAG &DAG) const;			bool expandUINT_TO_FP(SDNode *N, SDValue &Result, SDValue &Chain,
				SelectionDAG &DAG) const;

	/// Expand fminnum/fmaxnum into fminnum_ieee/fmaxnum_ieee with quieted inputs.			/// Expand fminnum/fmaxnum into fminnum_ieee/fmaxnum_ieee with quieted inputs.
	SDValue expandFMINNUM_FMAXNUM(SDNode *N, SelectionDAG &DAG) const;			SDValue expandFMINNUM_FMAXNUM(SDNode *N, SelectionDAG &DAG) const;

	/// Expand CTPOP nodes. Expands vector/scalar CTPOP nodes,			/// Expand CTPOP nodes. Expands vector/scalar CTPOP nodes,
	/// vector nodes can only succeed if all operations are legal/custom.			/// vector nodes can only succeed if all operations are legal/custom.
	/// \param N Node to expand			/// \param N Node to expand
	/// \param Result output after conversion			/// \param Result output after conversion
	▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/include/llvm/IR/ConstrainedOps.def

	Show All 35 Lines
	// intrinsics.			// intrinsics.
	//			//
	INSTRUCTION(FAdd, 2, 1, experimental_constrained_fadd, FADD)			INSTRUCTION(FAdd, 2, 1, experimental_constrained_fadd, FADD)
	INSTRUCTION(FSub, 2, 1, experimental_constrained_fsub, FSUB)			INSTRUCTION(FSub, 2, 1, experimental_constrained_fsub, FSUB)
	INSTRUCTION(FMul, 2, 1, experimental_constrained_fmul, FMUL)			INSTRUCTION(FMul, 2, 1, experimental_constrained_fmul, FMUL)
	INSTRUCTION(FDiv, 2, 1, experimental_constrained_fdiv, FDIV)			INSTRUCTION(FDiv, 2, 1, experimental_constrained_fdiv, FDIV)
	INSTRUCTION(FRem, 2, 1, experimental_constrained_frem, FREM)			INSTRUCTION(FRem, 2, 1, experimental_constrained_frem, FREM)
	INSTRUCTION(FPExt, 1, 0, experimental_constrained_fpext, FP_EXTEND)			INSTRUCTION(FPExt, 1, 0, experimental_constrained_fpext, FP_EXTEND)
				INSTRUCTION(SIToFP, 1, 1, experimental_constrained_sitofp, SINT_TO_FP)
				INSTRUCTION(UIToFP, 1, 1, experimental_constrained_uitofp, UINT_TO_FP)
	INSTRUCTION(FPToSI, 1, 0, experimental_constrained_fptosi, FP_TO_SINT)			INSTRUCTION(FPToSI, 1, 0, experimental_constrained_fptosi, FP_TO_SINT)
	INSTRUCTION(FPToUI, 1, 0, experimental_constrained_fptoui, FP_TO_UINT)			INSTRUCTION(FPToUI, 1, 0, experimental_constrained_fptoui, FP_TO_UINT)
	INSTRUCTION(FPTrunc, 1, 1, experimental_constrained_fptrunc, FP_ROUND)			INSTRUCTION(FPTrunc, 1, 1, experimental_constrained_fptrunc, FP_ROUND)

	// These are definitions for compare instructions (signaling and quiet version).			// These are definitions for compare instructions (signaling and quiet version).
	// Both of these match to FCmp / SETCC.			// Both of these match to FCmp / SETCC.
	CMP_INSTRUCTION(FCmp, 2, 0, experimental_constrained_fcmp, FSETCC)			CMP_INSTRUCTION(FCmp, 2, 0, experimental_constrained_fcmp, FSETCC)
	CMP_INSTRUCTION(FCmp, 2, 0, experimental_constrained_fcmps, FSETCCS)			CMP_INSTRUCTION(FCmp, 2, 0, experimental_constrained_fcmps, FSETCCS)
	Show All 31 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	let IntrProperties = [IntrInaccessibleMemOnly, IntrWillReturn] in {
def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],		def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],		def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

		def int_experimental_constrained_sitofp : Intrinsic<[ llvm_anyfloat_ty ],
		[ llvm_anyint_ty,
		llvm_metadata_ty,
		llvm_metadata_ty ]>;

		def int_experimental_constrained_uitofp : Intrinsic<[ llvm_anyfloat_ty ],
		[ llvm_anyint_ty,
		llvm_metadata_ty,
		llvm_metadata_ty ]>;

def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;
▲ Show 20 Lines • Show All 694 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	private:
SDValue ExpandSCALAR_TO_VECTOR(SDNode *Node);		SDValue ExpandSCALAR_TO_VECTOR(SDNode *Node);
void ExpandDYNAMIC_STACKALLOC(SDNode *Node,		void ExpandDYNAMIC_STACKALLOC(SDNode *Node,
SmallVectorImpl<SDValue> &Results);		SmallVectorImpl<SDValue> &Results);
void getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL,		void getSignAsIntValue(FloatSignAsInt &State, const SDLoc &DL,
SDValue Value) const;		SDValue Value) const;
SDValue modifySignAsInt(const FloatSignAsInt &State, const SDLoc &DL,		SDValue modifySignAsInt(const FloatSignAsInt &State, const SDLoc &DL,
SDValue NewIntValue) const;		SDValue NewIntValue) const;
SDValue ExpandFCOPYSIGN(SDNode *Node) const;		SDValue ExpandFCOPYSIGN(SDNode *Node) const;
SDValue ExpandFABS(SDNode *Node) const;		SDValue ExpandFABS(SDNode *Node) const;
SDValue ExpandLegalINT_TO_FP(bool isSigned, SDValue Op0, EVT DestVT,		SDValue ExpandLegalINT_TO_FP(SDNode *Node, SDValue &Chain);
		uweigandUnsubmitted Not Done Reply Inline Actions If we pass Node, then all other arguments are really redundant, aren't they? uweigand: If we pass Node, then all other arguments are really redundant, aren't they?
		kpnAuthorUnsubmitted Done Reply Inline Actions Seems that way. I'll give it a try. kpn: Seems that way. I'll give it a try.
const SDLoc &dl);		void PromoteLegalINT_TO_FP(SDNode *N, const SDLoc &dl,
SDValue PromoteLegalINT_TO_FP(SDValue LegalOp, EVT DestVT, bool isSigned,		SmallVectorImpl<SDValue> &Results);
		uweigandUnsubmitted Not Done Reply Inline Actions Hmm, wouldn't we also need to update this routine? Or can we say that promotion is not appropriate for handling strict semantics anyway? uweigand: Hmm, wouldn't we also need to update this routine? Or can we say that promotion is not…
		kpnAuthorUnsubmitted Done Reply Inline Actions I don't have a test for it so I didn't change it. Can we guarantee the result would be rounded back down? Seems like promotion would be invalid without that guarantee. kpn: I don't have a test for it so I didn't change it. Can we guarantee the result would be rounded…
		uweigandUnsubmitted Not Done Reply Inline Actions Right. In general promotion is not appropriate for strict semantics because you don't get the right exceptions (for overflow etc.). uweigand: Right. In general promotion is not appropriate for strict semantics because you don't get the…
		craig.topperUnsubmitted Not Done Reply Inline Actions Promotion is definitely bad for fp->int, but is there really an issue for int->fp? We're just going to use a bigger int for the input. If the small int was going to overflow, it should still overflow when its extended. craig.topper: Promotion is definitely bad for fp->int, but is there really an issue for int->fp? We're just…
		uweigandUnsubmitted Not Done Reply Inline Actions Ah, you're right. This should be fine. @kpn: If you're looking for a test case, this would most likely involve something like a i16->f32 conversion. uweigand: Ah, you're right. This should be fine. @kpn: If you're looking for a test case, this would…
		kpnAuthorUnsubmitted Done Reply Inline Actions Agreed. I'm working on LegalizeDAG.cpp right now. I notice X86 already has Promote for i8 and i16. kpn: Agreed. I'm working on LegalizeDAG.cpp right now. I notice X86 already has Promote for i8 and…
const SDLoc &dl);
void PromoteLegalFP_TO_INT(SDNode *N, const SDLoc &dl,		void PromoteLegalFP_TO_INT(SDNode *N, const SDLoc &dl,
SmallVectorImpl<SDValue> &Results);		SmallVectorImpl<SDValue> &Results);

SDValue ExpandBITREVERSE(SDValue Op, const SDLoc &dl);		SDValue ExpandBITREVERSE(SDValue Op, const SDLoc &dl);
SDValue ExpandBSWAP(SDValue Op, const SDLoc &dl);		SDValue ExpandBSWAP(SDValue Op, const SDLoc &dl);

SDValue ExpandExtractFromVectorThroughStack(SDValue Op);		SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
SDValue ExpandInsertToVectorThroughStack(SDValue Op);		SDValue ExpandInsertToVectorThroughStack(SDValue Op);
▲ Show 20 Lines • Show All 817 Lines • ▼ Show 20 Lines	#endif
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
case ISD::LROUND:		case ISD::LROUND:
case ISD::LLROUND:		case ISD::LLROUND:
case ISD::LRINT:		case ISD::LRINT:
case ISD::LLRINT:		case ISD::LLRINT:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
Node->getOperand(0).getValueType());		Node->getOperand(0).getValueType());
break;		break;
		case ISD::STRICT_SINT_TO_FP:
		case ISD::STRICT_UINT_TO_FP:
case ISD::STRICT_LRINT:		case ISD::STRICT_LRINT:
case ISD::STRICT_LLRINT:		case ISD::STRICT_LLRINT:
case ISD::STRICT_LROUND:		case ISD::STRICT_LROUND:
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Should these be clustered with STRICT_LRINT/etc? cameron.mcinally: Should these be clustered with STRICT_LRINT/etc?
		kpnAuthorUnsubmitted Done Reply Inline Actions The regular U/SINT_TO_FP nodes are handled a couple of lines above. That makes these clustered close to the matching non-constrained versions, which seemed appropriate. Is there a better place? kpn: The regular U/SINT_TO_FP nodes are handled a couple of lines above. That makes these clustered…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions I think the STRICT_LRINT/etc cases should be moved up and combined with these new cases. All the non-strict variants are clustered together. It would make sense to cluster the strict variants directly under those. cameron.mcinally: I think the STRICT_LRINT/etc cases should be moved up and combined with these new cases. All…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions I suppose STRICT_LRINT/etc could be moved in a separate NFCI patch and then this patch just rebased. That's probably best. Sorry, just thinking aloud... cameron.mcinally: I suppose STRICT_LRINT/etc could be moved in a separate NFCI patch and then this patch just…
		kpnAuthorUnsubmitted Done Reply Inline Actions When one does something like this one notices that I'm calling TLI.getOperationAction() instead of TLI.getStrictFPOperationAction(). That seems like a bug in one of the two places. Hmmm. kpn: When one does something like this one notices that I'm calling TLI.getOperationAction() instead…
		uweigandUnsubmitted Not Done Reply Inline Actions Yes, the code below for STRICT_LRINT etc seems buggy, it should definitely call TLI.getOperationAction instead. uweigand: Yes, the code below for STRICT_LRINT etc seems buggy, it should definitely call TLI.
case ISD::STRICT_LLROUND:		case ISD::STRICT_LLROUND:
// These pseudo-ops are the same as the other STRICT_ ops except		// These pseudo-ops are the same as the other STRICT_ ops except
// they are registered with setOperationAction() using the input type		// they are registered with setOperationAction() using the input type
// instead of the output type.		// instead of the output type.
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
Node->getOperand(1).getValueType());		Node->getOperand(1).getValueType());
break;		break;
case ISD::SIGN_EXTEND_INREG: {		case ISD::SIGN_EXTEND_INREG: {
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	case TargetLowering::Custom:

if (Node->getNumValues() == 1) {		if (Node->getNumValues() == 1) {
LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");		LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
// We can just directly replace this node with the lowered value.		// We can just directly replace this node with the lowered value.
ReplaceNode(SDValue(Node, 0), Res);		ReplaceNode(SDValue(Node, 0), Res);
return;		return;
}		}

SmallVector<SDValue, 8> ResultVals;		SmallVector<SDValue, 8> ResultVals;
		craig.topperUnsubmitted Not Done Reply Inline Actions Should we just be returning MERGE_VALUES from the custom handler for the strict fp nodes? That's how we handle loads and atomics. craig.topper: Should we just be returning MERGE_VALUES from the custom handler for the strict fp nodes?
		kpnAuthorUnsubmitted Done Reply Inline Actions I'll take a look. Making this code simpler would be nice. kpn: I'll take a look. Making this code simpler would be nice.
for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)		for (unsigned i = 0, e = Node->getNumValues(); i != e; ++i)
ResultVals.push_back(Res.getValue(i));		ResultVals.push_back(Res.getValue(i));
LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");		LLVM_DEBUG(dbgs() << "Successfully custom legalized node\n");
ReplaceNode(Node, ResultVals.data());		ReplaceNode(Node, ResultVals.data());
return;		return;
}		}
LLVM_DEBUG(dbgs() << "Could not custom legalize node\n");		LLVM_DEBUG(dbgs() << "Could not custom legalize node\n");
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
▲ Show 20 Lines • Show All 1,084 Lines • ▼ Show 20 Lines	SelectionDAGLegalize::ExpandSinCosLibCall(SDNode *Node,
Results.push_back(		Results.push_back(
DAG.getLoad(RetVT, dl, CallInfo.second, CosPtr, MachinePointerInfo()));		DAG.getLoad(RetVT, dl, CallInfo.second, CosPtr, MachinePointerInfo()));
}		}

/// This function is responsible for legalizing a		/// This function is responsible for legalizing a
/// INT_TO_FP operation of the specified operand when the target requests that		/// INT_TO_FP operation of the specified operand when the target requests that
/// we expand it. At this point, we know that the result and operand types are		/// we expand it. At this point, we know that the result and operand types are
/// legal for the target.		/// legal for the target.
SDValue SelectionDAGLegalize::ExpandLegalINT_TO_FP(bool isSigned, SDValue Op0,		SDValue SelectionDAGLegalize::ExpandLegalINT_TO_FP(SDNode *Node,
EVT DestVT,		SDValue &Chain) {
		craig.topperUnsubmitted Done Reply Inline Actions Ternary on the operand number? craig.topper: Ternary on the operand number?
const SDLoc &dl) {		bool isSigned = (Node->getOpcode() == ISD::STRICT_SINT_TO_FP \|\|
		Node->getOpcode() == ISD::SINT_TO_FP);
		EVT DestVT = Node->getValueType(0);
		SDLoc dl(Node);
		unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
		SDValue Op0 = Node->getOperand(OpNo);
EVT SrcVT = Op0.getValueType();		EVT SrcVT = Op0.getValueType();

// TODO: Should any fast-math-flags be set for the created nodes?		// TODO: Should any fast-math-flags be set for the created nodes?
LLVM_DEBUG(dbgs() << "Legalizing INT_TO_FP\n");		LLVM_DEBUG(dbgs() << "Legalizing INT_TO_FP\n");
if (SrcVT == MVT::i32 && TLI.isTypeLegal(MVT::f64)) {		if (SrcVT == MVT::i32 && TLI.isTypeLegal(MVT::f64)) {
LLVM_DEBUG(dbgs() << "32-bit [signed\|unsigned] integer to float/double "		LLVM_DEBUG(dbgs() << "32-bit [signed\|unsigned] integer to float/double "
"expansion\n");		"expansion\n");

Show All 30 Lines	if (SrcVT == MVT::i32 && TLI.isTypeLegal(MVT::f64)) {
// load the constructed double		// load the constructed double
SDValue Load =		SDValue Load =
DAG.getLoad(MVT::f64, dl, Store2, StackSlot, MachinePointerInfo());		DAG.getLoad(MVT::f64, dl, Store2, StackSlot, MachinePointerInfo());
// FP constant to bias correct the final result		// FP constant to bias correct the final result
SDValue Bias = DAG.getConstantFP(isSigned ?		SDValue Bias = DAG.getConstantFP(isSigned ?
BitsToDouble(0x4330000080000000ULL) :		BitsToDouble(0x4330000080000000ULL) :
BitsToDouble(0x4330000000000000ULL),		BitsToDouble(0x4330000000000000ULL),
dl, MVT::f64);		dl, MVT::f64);
// subtract the bias		// Subtract the bias and get the final result.
		craig.topperUnsubmitted Not Done Reply Inline Actions Can we merge the 2 strict fp ifs here? The only thing we do between them is declare a new variable. craig.topper: Can we merge the 2 strict fp ifs here? The only thing we do between them is declare a new…
		kpnAuthorUnsubmitted Done Reply Inline Actions Done. kpn: Done.
SDValue Sub = DAG.getNode(ISD::FSUB, dl, MVT::f64, Load, Bias);		SDValue Sub;
// final result		SDValue Result;
		craig.topperUnsubmitted Not Done Reply Inline Actions Why is this not just DestVT != Sub.getValueType() craig.topper: Why is this not just DestVT != Sub.getValueType()
		kpnAuthorUnsubmitted Done Reply Inline Actions Eh, I figured it'd be more clear this way. I'll change it. kpn: Eh, I figured it'd be more clear this way. I'll change it.
SDValue Result = DAG.getFPExtendOrRound(Sub, dl, DestVT);		if (Node->isStrictFPOpcode()) {
		Sub = DAG.getNode(ISD::STRICT_FSUB, dl, {MVT::f64, MVT::Other},
		{Node->getOperand(0), Load, Bias});
		if (DestVT != Sub.getValueType()) {
		std::pair<SDValue, SDValue> ResultPair;
		ResultPair =
		DAG.getStrictFPExtendOrRound(Sub, SDValue(Node, 1), dl, DestVT);
		Result = ResultPair.first;
		Chain = ResultPair.second;
		}
		else
		Result = Sub;
		} else {
		Sub = DAG.getNode(ISD::FSUB, dl, MVT::f64, Load, Bias);
		Result = DAG.getFPExtendOrRound(Sub, dl, DestVT);
		}
		uweigandUnsubmitted Not Done Reply Inline Actions NewChain seems superfluous? uweigand: NewChain seems superfluous?
		kpnAuthorUnsubmitted Done Reply Inline Actions Right, fixed now. Thanks! kpn: Right, fixed now. Thanks!
return Result;		return Result;
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Nit-picky: does this preserve the rounding results and flags raised? If the target doesn't support the itofp instruction, I'm not sure if we can do better anyway. Just wondering if anyone had thought this through... cameron.mcinally: Nit-picky: does this preserve the rounding results and flags raised? If the target doesn't…
		kpnAuthorUnsubmitted Done Reply Inline Actions Well, we don't have the metadata arguments at this point, so they can't be carried through to the extend/round instruction. And I don't know how a target that doesn't have an itofp instruction would handle a round instruction. Maybe? I don't know. kpn: Well, we don't have the metadata arguments at this point, so they can't be carried through to…
		uweigandUnsubmitted Not Done Reply Inline Actions I don't believe we need the metadata arguments here. Instead, we need to verify that this code sequence always rounds correctly according to the current rounding mode, no matter what it is, and also that this code sequence raises the same set of exceptions that a real IEEE compliant itofp operation would raise. So looking at the algorithm, it first constructs two double values "by hand", using only integer operations, which is always fine. Then those two are subtracted from one other. Due to the construction, neither can be NaN, and also the difference is always exactly representable. This means that the FSUB never raises any exception and does not depend on the current rounding mode, so this should also be fine. (So possibly, we don't even need a STRICT_FSUB here but can continue using a normal FSUB?) Finally, the result may be extended or rounded to another floating-point type. Extension should be fine since it never rounds and the input cannot be a NaN. Rounding should also be fine since it uses the correct (current) rounding mode, due to the construction of the input can never raise underflow or overflow exceptions, and raises the inexact exception exactly when it ought to be raised. uweigand: I don't believe we need the metadata arguments here. Instead, we need to verify that this code…
}		}
assert(!isSigned && "Legalize cannot Expand SINT_TO_FP for i64 yet");		assert(!isSigned && "Legalize cannot Expand SINT_TO_FP for i64 yet");
// Code below here assumes !isSigned without checking again.		// Code below here assumes !isSigned without checking again.
		// FIXME: This can produce slightly incorrect results. See details in
		// FIXME: https://reviews.llvm.org/D69275

SDValue Tmp1 = DAG.getNode(ISD::SINT_TO_FP, dl, DestVT, Op0);		SDValue Tmp1;
		if (Node->isStrictFPOpcode()) {
		Tmp1 = DAG.getNode(ISD::STRICT_SINT_TO_FP, dl, { DestVT, MVT::Other },
		{ Node->getOperand(0), Op0 });
		} else
		Tmp1 = DAG.getNode(ISD::SINT_TO_FP, dl, DestVT, Op0);

		uweigandUnsubmitted Not Done Reply Inline Actions This is actually the more problematic case, because this may round the wrong way depending on the current rounding mode. For example, if the input is a large i64 value that does not exactly fit into the result type, and the rounding mode is "towards zero", then the result should be the FP value immediately smaller than the exact result. But since we're actually doing a signed conversion, the input will be interpreted as a negative value (of large absolute value), and rounded "towards zero", i.e. to the FP value immediately larger than the exact result of converting the negative value. After adding back the bias, we'll then have the FP value immediately larger than the exact result of an (unsigned) conversion of the original value, which is incorrect. It actually looks like this this has nothing to do with constrained FP semantics, this code simply gives the wrong result even for regular FP operations ... (GCC avoid this by either converting to a larger intermediate FP type first, if possible, or else avoiding the rounding issue by ensuring only positive values are passed to the signed-to-FP intermediate conversion.) uweigand: This is actually the more problematic case, because this may round the wrong way depending on…
SDValue SignSet = DAG.getSetCC(dl, getSetCCResultType(SrcVT), Op0,		SDValue SignSet = DAG.getSetCC(dl, getSetCCResultType(SrcVT), Op0,
DAG.getConstant(0, dl, SrcVT), ISD::SETLT);		DAG.getConstant(0, dl, SrcVT), ISD::SETLT);
SDValue Zero = DAG.getIntPtrConstant(0, dl),		SDValue Zero = DAG.getIntPtrConstant(0, dl),
Four = DAG.getIntPtrConstant(4, dl);		Four = DAG.getIntPtrConstant(4, dl);
SDValue CstOffset = DAG.getSelect(dl, Zero.getValueType(),		SDValue CstOffset = DAG.getSelect(dl, Zero.getValueType(),
SignSet, Four, Zero);		SignSet, Four, Zero);

// If the sign bit of the integer is set, the large number will be treated		// If the sign bit of the integer is set, the large number will be treated
Show All 28 Lines	SDValue Load = DAG.getExtLoad(
ISD::EXTLOAD, dl, DestVT, DAG.getEntryNode(), CPIdx,		ISD::EXTLOAD, dl, DestVT, DAG.getEntryNode(), CPIdx,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,
Alignment);		Alignment);
HandleSDNode Handle(Load);		HandleSDNode Handle(Load);
LegalizeOp(Load.getNode());		LegalizeOp(Load.getNode());
FudgeInReg = Handle.getValue();		FudgeInReg = Handle.getValue();
}		}

		if (Node->isStrictFPOpcode()) {
		SDValue Result = DAG.getNode(ISD::STRICT_FADD, dl, { DestVT, MVT::Other },
		{ Tmp1.getValue(1), Tmp1, FudgeInReg });
		Chain = Result.getValue(1);
		return Result;
		}

return DAG.getNode(ISD::FADD, dl, DestVT, Tmp1, FudgeInReg);		return DAG.getNode(ISD::FADD, dl, DestVT, Tmp1, FudgeInReg);
}		}

/// This function is responsible for legalizing a		/// This function is responsible for legalizing a
/// *INT_TO_FP operation of the specified operand when the target requests that		/// *INT_TO_FP operation of the specified operand when the target requests that
/// we promote it. At this point, we know that the result and operand types are		/// we promote it. At this point, we know that the result and operand types are
/// legal for the target, and that there is a legal UINT_TO_FP or SINT_TO_FP		/// legal for the target, and that there is a legal UINT_TO_FP or SINT_TO_FP
/// operation that takes a larger input.		/// operation that takes a larger input.
		craig.topperUnsubmitted Done Reply Inline Actions Drop the else since we early returned? craig.topper: Drop the else since we early returned?
SDValue SelectionDAGLegalize::PromoteLegalINT_TO_FP(SDValue LegalOp, EVT DestVT,		void SelectionDAGLegalize::PromoteLegalINT_TO_FP(
bool isSigned,		SDNode *N, const SDLoc &dl, SmallVectorImpl<SDValue> &Results) {
const SDLoc &dl) {		bool IsStrict = N->isStrictFPOpcode();
		bool IsSigned = N->getOpcode() == ISD::SINT_TO_FP \|\|
		N->getOpcode() == ISD::STRICT_SINT_TO_FP;
		EVT DestVT = N->getValueType(0);
		SDValue LegalOp = N->getOperand(IsStrict ? 1 : 0);
		unsigned UIntOp = IsStrict ? ISD::STRICT_UINT_TO_FP : ISD::UINT_TO_FP;
		unsigned SIntOp = IsStrict ? ISD::STRICT_SINT_TO_FP : ISD::SINT_TO_FP;

// First step, figure out the appropriate *INT_TO_FP operation to use.		// First step, figure out the appropriate *INT_TO_FP operation to use.
EVT NewInTy = LegalOp.getValueType();		EVT NewInTy = LegalOp.getValueType();

unsigned OpToUse = 0;		unsigned OpToUse = 0;

// Scan for the appropriate larger type to use.		// Scan for the appropriate larger type to use.
while (true) {		while (true) {
NewInTy = (MVT::SimpleValueType)(NewInTy.getSimpleVT().SimpleTy+1);		NewInTy = (MVT::SimpleValueType)(NewInTy.getSimpleVT().SimpleTy+1);
assert(NewInTy.isInteger() && "Ran out of possibilities!");		assert(NewInTy.isInteger() && "Ran out of possibilities!");

// If the target supports SINT_TO_FP of this type, use it.		// If the target supports SINT_TO_FP of this type, use it.
if (TLI.isOperationLegalOrCustom(ISD::SINT_TO_FP, NewInTy)) {		if (TLI.isOperationLegalOrCustom(SIntOp, NewInTy)) {
OpToUse = ISD::SINT_TO_FP;		OpToUse = SIntOp;
break;		break;
}		}
if (isSigned) continue;		if (IsSigned)
		continue;

// If the target supports UINT_TO_FP of this type, use it.		// If the target supports UINT_TO_FP of this type, use it.
if (TLI.isOperationLegalOrCustom(ISD::UINT_TO_FP, NewInTy)) {		if (TLI.isOperationLegalOrCustom(UIntOp, NewInTy)) {
OpToUse = ISD::UINT_TO_FP;		OpToUse = UIntOp;
break;		break;
}		}

// Otherwise, try a larger type.		// Otherwise, try a larger type.
}		}

// Okay, we found the operation and type to use. Zero extend our input to the		// Okay, we found the operation and type to use. Zero extend our input to the
// desired type then run the operation on it.		// desired type then run the operation on it.
return DAG.getNode(OpToUse, dl, DestVT,		if (IsStrict) {
DAG.getNode(isSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND,		SDValue Res =
dl, NewInTy, LegalOp));		DAG.getNode(OpToUse, dl, {DestVT, MVT::Other},
		{N->getOperand(0),
		DAG.getNode(IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND,
		dl, NewInTy, LegalOp)});
		Results.push_back(Res);
		Results.push_back(Res.getValue(1));
		}

		Results.push_back(
		DAG.getNode(OpToUse, dl, DestVT,
		DAG.getNode(IsSigned ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND,
		dl, NewInTy, LegalOp)));
}		}

/// This function is responsible for legalizing a		/// This function is responsible for legalizing a
/// FP_TO_*INT operation of the specified operand when the target requests that		/// FP_TO_*INT operation of the specified operand when the target requests that
/// we promote it. At this point, we know that the result and operand types are		/// we promote it. At this point, we know that the result and operand types are
/// legal for the target, and that there is a legal FP_TO_UINT or FP_TO_SINT		/// legal for the target, and that there is a legal FP_TO_UINT or FP_TO_SINT
/// operation that returns a larger result.		/// operation that returns a larger result.
void SelectionDAGLegalize::PromoteLegalFP_TO_INT(SDNode *N, const SDLoc &dl,		void SelectionDAGLegalize::PromoteLegalFP_TO_INT(SDNode *N, const SDLoc &dl,
▲ Show 20 Lines • Show All 399 Lines • ▼ Show 20 Lines	case ISD::SIGN_EXTEND_INREG: {
SDValue ShiftCst = DAG.getConstant(BitsDiff, dl, ShiftAmountTy);		SDValue ShiftCst = DAG.getConstant(BitsDiff, dl, ShiftAmountTy);
Tmp1 = DAG.getNode(ISD::SHL, dl, Node->getValueType(0),		Tmp1 = DAG.getNode(ISD::SHL, dl, Node->getValueType(0),
Node->getOperand(0), ShiftCst);		Node->getOperand(0), ShiftCst);
Tmp1 = DAG.getNode(ISD::SRA, dl, Node->getValueType(0), Tmp1, ShiftCst);		Tmp1 = DAG.getNode(ISD::SRA, dl, Node->getValueType(0), Tmp1, ShiftCst);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
}		}
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
if (TLI.expandUINT_TO_FP(Node, Tmp1, DAG)) {		case ISD::STRICT_UINT_TO_FP:
		if (TLI.expandUINT_TO_FP(Node, Tmp1, Tmp2, DAG)) {
Results.push_back(Tmp1);		Results.push_back(Tmp1);
		if (Node->isStrictFPOpcode())
		Results.push_back(Tmp2);
break;		break;
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
		uweigandUnsubmitted Not Done Reply Inline Actions Why do we have to do the Replace... thing here instead of just appending both to Results (like is done for other nodes with chain)? uweigand: Why do we have to do the Replace... thing here instead of just appending both to Results (like…
		kpnAuthorUnsubmitted Done Reply Inline Actions I've had a lot of trouble with this, but at the moment I'm unable to reproduce any issue here. Let's simplify and see how it goes. kpn: I've had a lot of trouble with this, but at the moment I'm unable to reproduce any issue here.
Tmp1 = ExpandLegalINT_TO_FP(Node->getOpcode() == ISD::SINT_TO_FP,		case ISD::STRICT_SINT_TO_FP:
Node->getOperand(0), Node->getValueType(0), dl);		Tmp1 = ExpandLegalINT_TO_FP(Node, Tmp2);
		craig.topperUnsubmitted Not Done Reply Inline Actions Extra indentation craig.topper: Extra indentation
Results.push_back(Tmp1);		Results.push_back(Tmp1);
		if (Node->isStrictFPOpcode())
		Results.push_back(Tmp2);
break;		break;
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Is the early return necessary here? It stands out from the surrounding code... cameron.mcinally: Is the early return necessary here? It stands out from the surrounding code...
		kpnAuthorUnsubmitted Done Reply Inline Actions Yes, it's necessary. The normal path falls through to ReplaceNode() which can't handle, for example, replacing a node with two results with one that has only one. That's why we have to explicitly return the chain and then handle node replacement "by hand" here. You'll see this in other STRICT_* cases in this switch. kpn: Yes, it's necessary. The normal path falls through to ReplaceNode() which can't handle, for…
if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG))		if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG))
		craig.topperUnsubmitted Done Reply Inline Actions Drop the else? craig.topper: Drop the else?
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG)) {		if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG)) {
ReplaceNode(Node, Tmp1.getNode());		ReplaceNode(Node, Tmp1.getNode());
LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_TO_SINT node\n");		LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_TO_SINT node\n");
return true;		return true;
}		}
▲ Show 20 Lines • Show All 1,267 Lines • ▼ Show 20 Lines	void SelectionDAGLegalize::PromoteNode(SDNode *Node) {
MVT OVT = Node->getSimpleValueType(0);		MVT OVT = Node->getSimpleValueType(0);
if (Node->getOpcode() == ISD::UINT_TO_FP \|\|		if (Node->getOpcode() == ISD::UINT_TO_FP \|\|
Node->getOpcode() == ISD::SINT_TO_FP \|\|		Node->getOpcode() == ISD::SINT_TO_FP \|\|
Node->getOpcode() == ISD::SETCC \|\|		Node->getOpcode() == ISD::SETCC \|\|
Node->getOpcode() == ISD::EXTRACT_VECTOR_ELT \|\|		Node->getOpcode() == ISD::EXTRACT_VECTOR_ELT \|\|
Node->getOpcode() == ISD::INSERT_VECTOR_ELT) {		Node->getOpcode() == ISD::INSERT_VECTOR_ELT) {
OVT = Node->getOperand(0).getSimpleValueType();		OVT = Node->getOperand(0).getSimpleValueType();
}		}
		if (Node->getOpcode() == ISD::STRICT_UINT_TO_FP \|\|
		Node->getOpcode() == ISD::STRICT_SINT_TO_FP)
		OVT = Node->getOperand(1).getSimpleValueType();
if (Node->getOpcode() == ISD::BR_CC)		if (Node->getOpcode() == ISD::BR_CC)
OVT = Node->getOperand(2).getSimpleValueType();		OVT = Node->getOperand(2).getSimpleValueType();
MVT NVT = TLI.getTypeToPromoteTo(Node->getOpcode(), OVT);		MVT NVT = TLI.getTypeToPromoteTo(Node->getOpcode(), OVT);
SDLoc dl(Node);		SDLoc dl(Node);
SDValue Tmp1, Tmp2, Tmp3;		SDValue Tmp1, Tmp2, Tmp3;
switch (Node->getOpcode()) {		switch (Node->getOpcode()) {
case ISD::CTTZ:		case ISD::CTTZ:
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
Show All 38 Lines	void SelectionDAGLegalize::PromoteNode(SDNode *Node) {
}		}
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
PromoteLegalFP_TO_INT(Node, dl, Results);		PromoteLegalFP_TO_INT(Node, dl, Results);
break;		break;
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
		case ISD::STRICT_UINT_TO_FP:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
Tmp1 = PromoteLegalINT_TO_FP(Node->getOperand(0), Node->getValueType(0),		case ISD::STRICT_SINT_TO_FP:
Node->getOpcode() == ISD::SINT_TO_FP, dl);		PromoteLegalINT_TO_FP(Node, dl, Results);
Results.push_back(Tmp1);
break;		break;
case ISD::VAARG: {		case ISD::VAARG: {
SDValue Chain = Node->getOperand(0); // Get the chain.		SDValue Chain = Node->getOperand(0); // Get the chain.
SDValue Ptr = Node->getOperand(1); // Get the pointer.		SDValue Ptr = Node->getOperand(1); // Get the pointer.

unsigned TruncOp;		unsigned TruncOp;
if (OVT.isVector()) {		if (OVT.isVector()) {
TruncOp = ISD::BITCAST;		TruncOp = ISD::BITCAST;
▲ Show 20 Lines • Show All 449 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	for (auto J = Node->value_begin(), E = Node->value_end(); J != E; ++J)
HasVectorValueOrOp \|= J->isVector();		HasVectorValueOrOp \|= J->isVector();
for (const SDValue &Op : Node->op_values())		for (const SDValue &Op : Node->op_values())
HasVectorValueOrOp \|= Op.getValueType().isVector();		HasVectorValueOrOp \|= Op.getValueType().isVector();

if (!HasVectorValueOrOp)		if (!HasVectorValueOrOp)
return TranslateLegalizeResults(Op, Result);		return TranslateLegalizeResults(Op, Result);

TargetLowering::LegalizeAction Action = TargetLowering::Legal;		TargetLowering::LegalizeAction Action = TargetLowering::Legal;
		EVT ValVT;
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
return TranslateLegalizeResults(Op, Result);		return TranslateLegalizeResults(Op, Result);
#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \		#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC, DAGN) \
case ISD::STRICT_##DAGN:		case ISD::STRICT_##DAGN:
#include "llvm/IR/ConstrainedOps.def"		#include "llvm/IR/ConstrainedOps.def"
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		ValVT = Node->getValueType(0);
		if (Op.getOpcode() == ISD::STRICT_SINT_TO_FP \|\|
		Op.getOpcode() == ISD::STRICT_UINT_TO_FP)
		ValVT = Node->getOperand(1).getValueType();
		Action = TLI.getOperationAction(Node->getOpcode(), ValVT);
// If we're asked to expand a strict vector floating-point operation,		// If we're asked to expand a strict vector floating-point operation,
// by default we're going to simply unroll it. That is usually the		// by default we're going to simply unroll it. That is usually the
// best approach, except in the case where the resulting strict (scalar)		// best approach, except in the case where the resulting strict (scalar)
// operations would themselves use the fallback mutation to non-strict.		// operations would themselves use the fallback mutation to non-strict.
// In that specific case, just do the fallback on the vector op.		// In that specific case, just do the fallback on the vector op.
if (Action == TargetLowering::Expand && !TLI.isStrictFPEnabled() &&		if (Action == TargetLowering::Expand && !TLI.isStrictFPEnabled() &&
TLI.getStrictFPOperationAction(Node->getOpcode(),		TLI.getStrictFPOperationAction(Node->getOpcode(), ValVT) ==
		uweigandUnsubmitted Not Done Reply Inline Actions This should now also use ValVT, I think. uweigand: This should now also use ValVT, I think.
		kpnAuthorUnsubmitted Done Reply Inline Actions Looks like it. kpn: Looks like it.
Node->getValueType(0))		TargetLowering::Legal) {
== TargetLowering::Legal) {		EVT EltVT = ValVT.getVectorElementType();
EVT EltVT = Node->getValueType(0).getVectorElementType();
if (TLI.getOperationAction(Node->getOpcode(), EltVT)		if (TLI.getOperationAction(Node->getOpcode(), EltVT)
== TargetLowering::Expand &&		== TargetLowering::Expand &&
TLI.getStrictFPOperationAction(Node->getOpcode(), EltVT)		TLI.getStrictFPOperationAction(Node->getOpcode(), EltVT)
== TargetLowering::Legal)		== TargetLowering::Legal)
Action = TargetLowering::Legal;		Action = TargetLowering::Legal;
}		}
break;		break;
case ISD::ADD:		case ISD::ADD:
▲ Show 20 Lines • Show All 813 Lines • ▼ Show 20 Lines	if (TLI.expandFP_TO_UINT(Op.getNode(), Result, Chain, DAG)) {
return Result;		return Result;
}		}

// Otherwise go ahead and unroll.		// Otherwise go ahead and unroll.
return DAG.UnrollVectorOp(Op.getNode());		return DAG.UnrollVectorOp(Op.getNode());
}		}

SDValue VectorLegalizer::ExpandUINT_TO_FLOAT(SDValue Op) {		SDValue VectorLegalizer::ExpandUINT_TO_FLOAT(SDValue Op) {
EVT VT = Op.getOperand(0).getValueType();		bool IsStrict = Op.getNode()->isStrictFPOpcode();
		unsigned OpNo = IsStrict ? 1 : 0;
		SDValue Src = Op.getOperand(OpNo);
		EVT VT = Src.getValueType();
		craig.topperUnsubmitted Not Done Reply Inline Actions Can a strict node get here? The call site in vectorLegalizer::Expand only checks for the non-strict node. craig.topper: Can a strict node get here? The call site in vectorLegalizer::Expand only checks for the non…
		kpnAuthorUnsubmitted Done Reply Inline Actions Ah, I must have lost the change to call here, possibly with D69887 going in. kpn: Ah, I must have lost the change to call here, possibly with D69887 going in.
SDLoc DL(Op);		SDLoc DL(Op);

// Attempt to expand using TargetLowering.		// Attempt to expand using TargetLowering.
SDValue Result;		SDValue Result;
if (TLI.expandUINT_TO_FP(Op.getNode(), Result, DAG))		SDValue Chain;
		if (TLI.expandUINT_TO_FP(Op.getNode(), Result, Chain, DAG)) {
		if (IsStrict)
		// Relink the chain
		DAG.ReplaceAllUsesOfValueWith(Op.getValue(1), Chain);
return Result;		return Result;
		}

// Make sure that the SINT_TO_FP and SRL instructions are available.		// Make sure that the SINT_TO_FP and SRL instructions are available.
if (TLI.getOperationAction(ISD::SINT_TO_FP, VT) == TargetLowering::Expand \|\|		if (((!IsStrict && TLI.getOperationAction(ISD::SINT_TO_FP, VT) ==
		TargetLowering::Expand) \|\|
		(IsStrict && TLI.getOperationAction(ISD::STRICT_SINT_TO_FP, VT) ==
		TargetLowering::Expand)) \|\|
TLI.getOperationAction(ISD::SRL, VT) == TargetLowering::Expand)		TLI.getOperationAction(ISD::SRL, VT) == TargetLowering::Expand)
return DAG.UnrollVectorOp(Op.getNode());		return IsStrict ? SDValue() : DAG.UnrollVectorOp(Op.getNode());
		craig.topperUnsubmitted Not Done Reply Inline Actions Isn't SDValue(nullptr, 0) equivalent to SDValue() and more consistent with other code? craig.topper: Isn't SDValue(nullptr, 0) equivalent to SDValue() and more consistent with other code?

unsigned BW = VT.getScalarSizeInBits();		unsigned BW = VT.getScalarSizeInBits();
assert((BW == 64 \|\| BW == 32) &&		assert((BW == 64 \|\| BW == 32) &&
"Elements in vector-UINT_TO_FP must be 32 or 64 bits wide");		"Elements in vector-UINT_TO_FP must be 32 or 64 bits wide");

SDValue HalfWord = DAG.getConstant(BW / 2, DL, VT);		SDValue HalfWord = DAG.getConstant(BW / 2, DL, VT);

// Constants to clear the upper part of the word.		// Constants to clear the upper part of the word.
// Notice that we can also use SHL+SHR, but using a constant is slightly		// Notice that we can also use SHL+SHR, but using a constant is slightly
// faster on x86.		// faster on x86.
uint64_t HWMask = (BW == 64) ? 0x00000000FFFFFFFF : 0x0000FFFF;		uint64_t HWMask = (BW == 64) ? 0x00000000FFFFFFFF : 0x0000FFFF;
SDValue HalfWordMask = DAG.getConstant(HWMask, DL, VT);		SDValue HalfWordMask = DAG.getConstant(HWMask, DL, VT);

// Two to the power of half-word-size.		// Two to the power of half-word-size.
SDValue TWOHW = DAG.getConstantFP(1ULL << (BW / 2), DL, Op.getValueType());		SDValue TWOHW = DAG.getConstantFP(1ULL << (BW / 2), DL, Op.getValueType());

// Clear upper part of LO, lower HI		// Clear upper part of LO, lower HI
SDValue HI = DAG.getNode(ISD::SRL, DL, VT, Op.getOperand(0), HalfWord);		SDValue HI = DAG.getNode(ISD::SRL, DL, VT, Src, HalfWord);
SDValue LO = DAG.getNode(ISD::AND, DL, VT, Op.getOperand(0), HalfWordMask);		SDValue LO = DAG.getNode(ISD::AND, DL, VT, Src, HalfWordMask);

		if (IsStrict) {
		// Convert hi and lo to floats
		// Convert the hi part back to the upper values
		// TODO: Can any fast-math-flags be set on these nodes?
		SDValue fHI =
		DAG.getNode(ISD::STRICT_SINT_TO_FP, DL, {Op.getValueType(), MVT::Other},
		{Op.getOperand(0), HI});
		craig.topperUnsubmitted Not Done Reply Inline Actions Src here should be Op.getOperand(0) I think. I think that's causing your MULPD crash. craig.topper: Src here should be Op.getOperand(0) I think. I think that's causing your MULPD crash.
		kpnAuthorUnsubmitted Done Reply Inline Actions That was it. Thanks! kpn: That was it. Thanks!
		fHI = DAG.getNode(ISD::STRICT_FMUL, DL, {Op.getValueType(), MVT::Other},
		{SDValue(fHI.getNode(), 1), fHI, TWOHW});
		SDValue fLO =
		DAG.getNode(ISD::STRICT_SINT_TO_FP, DL, {Op.getValueType(), MVT::Other},
		{SDValue(fHI.getNode(), 1), LO});

		// Add the two halves
		SDValue Result =
		DAG.getNode(ISD::STRICT_FADD, DL, {Op.getValueType(), MVT::Other},
		{SDValue(fLO.getNode(), 1), fHI, fLO});

		// Relink the chain
		DAG.ReplaceAllUsesOfValueWith(Op.getValue(1), SDValue(Result.getNode(), 1));
		return Result;
		}

// Convert hi and lo to floats		// Convert hi and lo to floats
// Convert the hi part back to the upper values		// Convert the hi part back to the upper values
// TODO: Can any fast-math-flags be set on these nodes?		// TODO: Can any fast-math-flags be set on these nodes?
SDValue fHI = DAG.getNode(ISD::SINT_TO_FP, DL, Op.getValueType(), HI);		SDValue fHI = DAG.getNode(ISD::SINT_TO_FP, DL, Op.getValueType(), HI);
fHI = DAG.getNode(ISD::FMUL, DL, Op.getValueType(), fHI, TWOHW);		fHI = DAG.getNode(ISD::FMUL, DL, Op.getValueType(), fHI, TWOHW);
SDValue fLO = DAG.getNode(ISD::SINT_TO_FP, DL, Op.getValueType(), LO);		SDValue fLO = DAG.getNode(ISD::SINT_TO_FP, DL, Op.getValueType(), LO);

// Add the two halves		// Add the two halves
return DAG.getNode(ISD::FADD, DL, Op.getValueType(), fHI, fLO);		return DAG.getNode(ISD::FADD, DL, Op.getValueType(), fHI, fLO);
}		}

SDValue VectorLegalizer::ExpandFNEG(SDValue Op) {		SDValue VectorLegalizer::ExpandFNEG(SDValue Op) {
if (TLI.isOperationLegalOrCustom(ISD::FSUB, Op.getValueType())) {		if (TLI.isOperationLegalOrCustom(ISD::FSUB, Op.getValueType())) {
SDLoc DL(Op);		SDLoc DL(Op);
SDValue Zero = DAG.getConstantFP(-0.0, DL, Op.getValueType());		SDValue Zero = DAG.getConstantFP(-0.0, DL, Op.getValueType());
// TODO: If FNEG had fast-math-flags, they'd get propagated to this FSUB.		// TODO: If FNEG had fast-math-flags, they'd get propagated to this FSUB.
return DAG.getNode(ISD::FSUB, DL, Op.getValueType(),		return DAG.getNode(ISD::FSUB, DL, Op.getValueType(),
Zero, Op.getOperand(0));		Zero, Op.getOperand(0));
}		}
return DAG.UnrollVectorOp(Op.getNode());		return DAG.UnrollVectorOp(Op.getNode());
}		}
		uweigandUnsubmitted Not Done Reply Inline Actions This algorithm however looks fine to me. The SINT_TO_FP operations always operate on positive values, and even more so, since they only use a half-word, the result is guaranteed to fit exactly into the target FP type, so there is never any rounding or exception. (These then don't really need to be strict operations.) The FMUL only increments the exponent, so again there is no rounding or exception. The FADD at the end rounds correctly, and raises the correct exceptions, so this looks all good. uweigand: This algorithm however looks fine to me. The SINT_TO_FP operations always operate on positive…

SDValue VectorLegalizer::ExpandFSUB(SDValue Op) {		SDValue VectorLegalizer::ExpandFSUB(SDValue Op) {
// For floating-point values, (a-b) is the same as a+(-b). If FNEG is legal,		// For floating-point values, (a-b) is the same as a+(-b). If FNEG is legal,
// we can defer this to operation legalization where it will be lowered as		// we can defer this to operation legalization where it will be lowered as
// a+(-b).		// a+(-b).
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
if (TLI.isOperationLegalOrCustom(ISD::FNEG, VT) &&		if (TLI.isOperationLegalOrCustom(ISD::FNEG, VT) &&
TLI.isOperationLegalOrCustom(ISD::FADD, VT))		TLI.isOperationLegalOrCustom(ISD::FADD, VT))
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines

SDValue VectorLegalizer::ExpandFixedPointMul(SDValue Op) {		SDValue VectorLegalizer::ExpandFixedPointMul(SDValue Op) {
if (SDValue Expanded = TLI.expandFixedPointMul(Op.getNode(), DAG))		if (SDValue Expanded = TLI.expandFixedPointMul(Op.getNode(), DAG))
return Expanded;		return Expanded;
return DAG.UnrollVectorOp(Op.getNode());		return DAG.UnrollVectorOp(Op.getNode());
}		}

SDValue VectorLegalizer::ExpandStrictFPOp(SDValue Op) {		SDValue VectorLegalizer::ExpandStrictFPOp(SDValue Op) {
EVT VT = Op.getValueType();		if (Op.getOpcode() == ISD::STRICT_UINT_TO_FP) {
		if (SDValue Res = ExpandUINT_TO_FLOAT(Op))
		return Res;
		craig.topperUnsubmitted Not Done Reply Inline Actions Can we just declare SDValue Res in the if condition so we don't need double parentheses? craig.topper: Can we just declare SDValue Res in the if condition so we don't need double parentheses?
		}

		EVT VT = Op.getValue(0).getValueType();
		craig.topperUnsubmitted Not Done Reply Inline Actions Can this code be replaced with DAG.UnrollVectorOp now? Or should we use this code for strict instead of DAG.UnrollVectorOp for ExpandUINT_TO_FLOAT? craig.topper: Can this code be replaced with DAG.UnrollVectorOp now? Or should we use this code for strict…
		kpnAuthorUnsubmitted Done Reply Inline Actions I addressed this by eliminating the need to call DAG.UnrollVectorOp(). That function is now out of the discussion. The tests show that this way here usually results in shorter sequences of instructions as well. kpn: I addressed this by eliminating the need to call DAG.UnrollVectorOp(). That function is now out…
EVT EltVT = VT.getVectorElementType();		EVT EltVT = VT.getVectorElementType();
unsigned NumElems = VT.getVectorNumElements();		unsigned NumElems = VT.getVectorNumElements();
unsigned NumOpers = Op.getNumOperands();		unsigned NumOpers = Op.getNumOperands();
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

EVT TmpEltVT = EltVT;		EVT TmpEltVT = EltVT;
if (Op->getOpcode() == ISD::STRICT_FSETCC \|\|		if (Op->getOpcode() == ISD::STRICT_FSETCC \|\|
Op->getOpcode() == ISD::STRICT_FSETCCS)		Op->getOpcode() == ISD::STRICT_FSETCCS)
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	#endif
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::TRUNCATE:		case ISD::TRUNCATE:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
Res = ScalarizeVecOp_UnaryOp(N);		Res = ScalarizeVecOp_UnaryOp(N);
break;		break;
		case ISD::STRICT_SINT_TO_FP:
		case ISD::STRICT_UINT_TO_FP:
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
Res = ScalarizeVecOp_UnaryOp_StrictFP(N);		Res = ScalarizeVecOp_UnaryOp_StrictFP(N);
break;		break;
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
Res = ScalarizeVecOp_CONCAT_VECTORS(N);		Res = ScalarizeVecOp_CONCAT_VECTORS(N);
break;		break;
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
▲ Show 20 Lines • Show All 1,343 Lines • ▼ Show 20 Lines	case ISD::MSCATTER:
Res = SplitVecOp_MSCATTER(cast<MaskedScatterSDNode>(N), OpNo);		Res = SplitVecOp_MSCATTER(cast<MaskedScatterSDNode>(N), OpNo);
break;		break;
case ISD::MGATHER:		case ISD::MGATHER:
Res = SplitVecOp_MGATHER(cast<MaskedGatherSDNode>(N), OpNo);		Res = SplitVecOp_MGATHER(cast<MaskedGatherSDNode>(N), OpNo);
break;		break;
case ISD::VSELECT:		case ISD::VSELECT:
Res = SplitVecOp_VSELECT(N, OpNo);		Res = SplitVecOp_VSELECT(N, OpNo);
break;		break;
		case ISD::STRICT_SINT_TO_FP:
		case ISD::STRICT_UINT_TO_FP:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
if (N->getValueType(0).bitsLT(N->getOperand(0).getValueType()))		if (N->getValueType(0).bitsLT(
		N->getOperand(N->isStrictFPOpcode() ? 1 : 0).getValueType()))
Res = SplitVecOp_TruncateHelper(N);		Res = SplitVecOp_TruncateHelper(N);
else		else
Res = SplitVecOp_UnaryOp(N);		Res = SplitVecOp_UnaryOp(N);
break;		break;
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
▲ Show 20 Lines • Show All 544 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::SplitVecOp_TruncateHelper(SDNode *N) {
// %inhi = v4i32 extract_subvector %in, 4		// %inhi = v4i32 extract_subvector %in, 4
// %lo16 = v4i16 trunc v4i32 %inlo		// %lo16 = v4i16 trunc v4i32 %inlo
// %hi16 = v4i16 trunc v4i32 %inhi		// %hi16 = v4i16 trunc v4i32 %inhi
// %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16		// %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16
// %res = v8i8 trunc v8i16 %in16		// %res = v8i8 trunc v8i16 %in16
//		//
// Without this transform, the original truncate would end up being		// Without this transform, the original truncate would end up being
// scalarized, which is pretty much always a last resort.		// scalarized, which is pretty much always a last resort.
SDValue InVec = N->getOperand(0);		unsigned OpNo = N->isStrictFPOpcode() ? 1 : 0;
		SDValue InVec = N->getOperand(OpNo);
EVT InVT = InVec->getValueType(0);		EVT InVT = InVec->getValueType(0);
EVT OutVT = N->getValueType(0);		EVT OutVT = N->getValueType(0);
unsigned NumElements = OutVT.getVectorNumElements();		unsigned NumElements = OutVT.getVectorNumElements();
bool IsFloat = OutVT.isFloatingPoint();		bool IsFloat = OutVT.isFloatingPoint();

// Widening should have already made sure this is a power-two vector		// Widening should have already made sure this is a power-two vector
// if we're trying to split it at all. assert() that's true, just in case.		// if we're trying to split it at all. assert() that's true, just in case.
assert(!(NumElements & 1) && "Splitting vector, but not in half!");		assert(!(NumElements & 1) && "Splitting vector, but not in half!");
Show All 27 Lines	SDValue DAGTypeLegalizer::SplitVecOp_TruncateHelper(SDNode *N) {
GetSplitVector(InVec, InLoVec, InHiVec);		GetSplitVector(InVec, InLoVec, InHiVec);

// Truncate them to 1/2 the element size.		// Truncate them to 1/2 the element size.
EVT HalfElementVT = IsFloat ?		EVT HalfElementVT = IsFloat ?
EVT::getFloatingPointVT(InElementSize/2) :		EVT::getFloatingPointVT(InElementSize/2) :
EVT::getIntegerVT(*DAG.getContext(), InElementSize/2);		EVT::getIntegerVT(*DAG.getContext(), InElementSize/2);
EVT HalfVT = EVT::getVectorVT(*DAG.getContext(), HalfElementVT,		EVT HalfVT = EVT::getVectorVT(*DAG.getContext(), HalfElementVT,
NumElements/2);		NumElements/2);
SDValue HalfLo = DAG.getNode(N->getOpcode(), DL, HalfVT, InLoVec);
SDValue HalfHi = DAG.getNode(N->getOpcode(), DL, HalfVT, InHiVec);		SDValue HalfLo;
		SDValue HalfHi;
		SDValue Chain;
		if (N->isStrictFPOpcode()) {
		HalfLo = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other},
		{N->getOperand(0), HalfLo});
		HalfHi = DAG.getNode(N->getOpcode(), DL, {HalfVT, MVT::Other},
		{N->getOperand(0), HalfHi});
		// Legalize the chain result - switch anything that used the old chain to
		// use the new one.
		Chain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, HalfLo.getValue(1),
		HalfHi.getValue(1));
		} else {
		HalfLo = DAG.getNode(N->getOpcode(), DL, HalfVT, InLoVec);
		HalfHi = DAG.getNode(N->getOpcode(), DL, HalfVT, InHiVec);
		}
// Concatenate them to get the full intermediate truncation result.		// Concatenate them to get the full intermediate truncation result.
EVT InterVT = EVT::getVectorVT(*DAG.getContext(), HalfElementVT, NumElements);		EVT InterVT = EVT::getVectorVT(*DAG.getContext(), HalfElementVT, NumElements);
SDValue InterVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InterVT, HalfLo,		SDValue InterVec = DAG.getNode(ISD::CONCAT_VECTORS, DL, InterVT, HalfLo,
HalfHi);		HalfHi);
// Now finish up by truncating all the way down to the original result		// Now finish up by truncating all the way down to the original result
// type. This should normally be something that ends up being legal directly,		// type. This should normally be something that ends up being legal directly,
// but in theory if a target has very wide vectors and an annoyingly		// but in theory if a target has very wide vectors and an annoyingly
// restricted set of legal types, this split can chain to build things up.		// restricted set of legal types, this split can chain to build things up.

		if (N->isStrictFPOpcode()) {
		SDValue Res = DAG.getNode(
		ISD::STRICT_FP_ROUND, DL, {OutVT, MVT::Other},
		{Chain, InterVec,
		DAG.getTargetConstant(0, DL, TLI.getPointerTy(DAG.getDataLayout()))});
		// Relink the chain
		ReplaceValueWith(SDValue(N, 1), SDValue(Res.getNode(), 1));
		return Res;
		}
		uweigandUnsubmitted Not Done Reply Inline Actions This confuses me again. It seems this may generate a SINT_TO_FP -> HalfVT -> FP_ROUND -> OutVT chain, which introduces a potential double rounding that can lead to incorrect results even disregarding any constrained FP semantics ... uweigand: This confuses me again. It seems this may generate a SINT_TO_FP -> HalfVT -> FP_ROUND ->…
		craig.topperUnsubmitted Not Done Reply Inline Actions For the only case we have tests for i64->f16. I think any integer value large enough to cause rounding when converted to f32 would be too large to represent at all in f16. Since f16's max exponent of 15 is less than the length of f32's mantissa. craig.topper: For the only case we have tests for i64->f16. I think any integer value large enough to cause…
		uweigandUnsubmitted Not Done Reply Inline Actions Ah yes, you're right. So this should be fine with non-strict semantics. And for strict semantics, we should also be fine. The int->f32 conversion can only raise an inexact exception, and this only in cases where the int->f16 conversion should raise an inexact. The f32->f16 conversion due to construction of the input also can only raise an inexact exception, and again only in cases where we should have one. Conversely, in every case where we should have an inexact exception, one (or both) of the intermediate steps will raise it. (I think there may be cases where we get two, but that should be fine even for strict semantics.) uweigand: Ah yes, you're right. So this should be fine with non-strict semantics. And for strict…
		uweigandUnsubmitted Not Done Reply Inline Actions Following up on myself: of course the f32->f16 conversion can also overflow, but again only in cases (and in fact in exactly in those cases) where the original int->f16 conversion should have overflowed. So again this should be fine. Sorry again for the confusion, I'm not really used to thinking about f16 :-) uweigand: Following up on myself: of course the f32->f16 conversion can also overflow, but again only in…
		craig.topperUnsubmitted Not Done Reply Inline Actions We should probably fix the code only do this for f16. I think we could construct a vXi128->vXf32 test that would go through this and I think it would be wrong. craig.topper: We should probably fix the code only do this for f16. I think we could construct a vXi128…
		craig.topperUnsubmitted Not Done Reply Inline Actions Nevermind, I think we don't use the algorithm if the input type will be scalarized. I expect vXi128 will be scalarized on all targets. craig.topper: Nevermind, I think we don't use the algorithm if the input type will be scalarized. I expect…

return IsFloat		return IsFloat
? DAG.getNode(ISD::FP_ROUND, DL, OutVT, InterVec,		? DAG.getNode(ISD::FP_ROUND, DL, OutVT, InterVec,
DAG.getTargetConstant(		DAG.getTargetConstant(
0, DL, TLI.getPointerTy(DAG.getDataLayout())))		0, DL, TLI.getPointerTy(DAG.getDataLayout())))
: DAG.getNode(ISD::TRUNCATE, DL, OutVT, InterVec);		: DAG.getNode(ISD::TRUNCATE, DL, OutVT, InterVec);
}		}

SDValue DAGTypeLegalizer::SplitVecOp_VSETCC(SDNode *N) {		SDValue DAGTypeLegalizer::SplitVecOp_VSETCC(SDNode *N) {
▲ Show 20 Lines • Show All 436 Lines • ▼ Show 20 Lines
}		}

SDValue DAGTypeLegalizer::WidenVecRes_StrictFP(SDNode *N) {		SDValue DAGTypeLegalizer::WidenVecRes_StrictFP(SDNode *N) {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
		case ISD::STRICT_SINT_TO_FP:
		case ISD::STRICT_UINT_TO_FP:
return WidenVecRes_Convert_StrictFP(N);		return WidenVecRes_Convert_StrictFP(N);
default:		default:
break;		break;
}		}

// StrictFP op widening for operations that can trap.		// StrictFP op widening for operations that can trap.
unsigned NumOpers = N->getNumOperands();		unsigned NumOpers = N->getNumOperands();
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
▲ Show 20 Lines • Show All 1,104 Lines • ▼ Show 20 Lines	#endif
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
case ISD::FP_ROUND:		case ISD::FP_ROUND:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
		case ISD::STRICT_SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
		case ISD::STRICT_UINT_TO_FP:
case ISD::TRUNCATE:		case ISD::TRUNCATE:
Res = WidenVecOp_Convert(N);		Res = WidenVecOp_Convert(N);
break;		break;

case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
▲ Show 20 Lines • Show All 962 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,111 Lines • ▼ Show 20 Lines
}		}

SDValue SelectionDAG::getFPExtendOrRound(SDValue Op, const SDLoc &DL, EVT VT) {		SDValue SelectionDAG::getFPExtendOrRound(SDValue Op, const SDLoc &DL, EVT VT) {
return VT.bitsGT(Op.getValueType())		return VT.bitsGT(Op.getValueType())
? getNode(ISD::FP_EXTEND, DL, VT, Op)		? getNode(ISD::FP_EXTEND, DL, VT, Op)
: getNode(ISD::FP_ROUND, DL, VT, Op, getIntPtrConstant(0, DL));		: getNode(ISD::FP_ROUND, DL, VT, Op, getIntPtrConstant(0, DL));
}		}

		std::pair<SDValue, SDValue>
		SelectionDAG::getStrictFPExtendOrRound(SDValue Op, SDValue Chain,
		const SDLoc &DL, EVT VT) {
		craig.topperUnsubmitted Not Done Reply Inline Actions I think the input chain should be an argument here if we're going to make this a reusable function. Op could theoretically be a node with 2 data results and a chain which would make this wrong. It should probably also return std::pair of the two results so the caller doesn't have to assume where the chain is for nodes that were created inside. craig.topper: I think the input chain should be an argument here if we're going to make this a reusable…
		assert(!VT.bitsEq(Op.getValueType()) &&
		"Strict no-op FP extend/round not allowed.");
		SDValue Res =
		VT.bitsGT(Op.getValueType())
		craig.topperUnsubmitted Not Done Reply Inline Actions Is this longer than 80 columns? craig.topper: Is this longer than 80 columns?
		kpnAuthorUnsubmitted Done Reply Inline Actions Yes. I'll run the patch through clang-format before my next upload. kpn: Yes. I'll run the patch through clang-format before my next upload.
		? getNode(ISD::STRICT_FP_EXTEND, DL, {VT, MVT::Other}, {Chain, Op})
		: getNode(ISD::STRICT_FP_ROUND, DL, {VT, MVT::Other},
		{Chain, Op, getIntPtrConstant(0, DL)});

		return std::pair<SDValue, SDValue>(Res, SDValue(Res.getNode(), 1));
		}

SDValue SelectionDAG::getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {		SDValue SelectionDAG::getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {
return VT.bitsGT(Op.getValueType()) ?		return VT.bitsGT(Op.getValueType()) ?
getNode(ISD::ANY_EXTEND, DL, VT, Op) :		getNode(ISD::ANY_EXTEND, DL, VT, Op) :
getNode(ISD::TRUNCATE, DL, VT, Op);		getNode(ISD::TRUNCATE, DL, VT, Op);
}		}

SDValue SelectionDAG::getSExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {		SDValue SelectionDAG::getSExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {
return VT.bitsGT(Op.getValueType()) ?		return VT.bitsGT(Op.getValueType()) ?
▲ Show 20 Lines • Show All 6,224 Lines • ▼ Show 20 Lines	case ISD::STRICT_FP_EXTEND:
assert(VTList.VTs[0].isFloatingPoint() &&		assert(VTList.VTs[0].isFloatingPoint() &&
Ops[1].getValueType().isFloatingPoint() && "Invalid FP cast!");		Ops[1].getValueType().isFloatingPoint() && "Invalid FP cast!");
assert(VTList.VTs[0].isVector() == Ops[1].getValueType().isVector() &&		assert(VTList.VTs[0].isVector() == Ops[1].getValueType().isVector() &&
"STRICT_FP_EXTEND result type should be vector iff the operand "		"STRICT_FP_EXTEND result type should be vector iff the operand "
"type is vector!");		"type is vector!");
assert((!VTList.VTs[0].isVector() \|\|		assert((!VTList.VTs[0].isVector() \|\|
VTList.VTs[0].getVectorNumElements() ==		VTList.VTs[0].getVectorNumElements() ==
Ops[1].getValueType().getVectorNumElements()) &&		Ops[1].getValueType().getVectorNumElements()) &&
"Vector element count mismatch!");		"Vector element count mismatch!");
		craig.topperUnsubmitted Not Done Reply Inline Actions This only works if N2 is result 1 of N1. craig.topper: This only works if N2 is result 1 of N1.
		kpnAuthorUnsubmitted Done Reply Inline Actions I don't understand. N2 is a constant that is either 0 or 1. What will happen if it is discarded here? This code was lifted straight out of getNode() somewhere around line 5194. Without it the X86 target dies trying to lower a rounding of f64 to f64. This happens because getStrictFPExtendOrRound() returns a round when input and output are the same size. This mirrors the non-strict getFPExtendOrRound(). kpn: I don't understand. N2 is a constant that is either 0 or 1. What will happen if it is discarded…
		craig.topperUnsubmitted Done Reply Inline Actions Oops you’re right about N2. But there is an issue with the chain. Which I think is what I was thinking N2 was when I wrote that. If the input chain didn’t come from N1 this is broken. I don’t see any precedent for returning a MERGE_VALUES from getNode. I think we need to fix the caller of getStrictFPExtendOrRound to only call when necessary. craig.topper: Oops you’re right about N2. But there is an issue with the chain. Which I think is what I was…
		kpnAuthorUnsubmitted Done Reply Inline Actions Done. I've also added an assert to document that getStrictFPExtendOrRound() wants the lengths to be different. kpn: Done. I've also added an assert to document that getStrictFPExtendOrRound() wants the lengths…
assert(Ops[1].getValueType().bitsLT(VTList.VTs[0]) &&		assert(Ops[1].getValueType().bitsLT(VTList.VTs[0]) &&
"Invalid fpext node, dst <= src!");		"Invalid fpext node, dst <= src!");
break;		break;
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
assert(VTList.NumVTs == 2 && Ops.size() == 3 && "Invalid STRICT_FP_ROUND!");		assert(VTList.NumVTs == 2 && Ops.size() == 3 && "Invalid STRICT_FP_ROUND!");
assert(VTList.VTs[0].isVector() == Ops[1].getValueType().isVector() &&		assert(VTList.VTs[0].isVector() == Ops[1].getValueType().isVector() &&
"STRICT_FP_ROUND result type should be vector iff the operand "		"STRICT_FP_ROUND result type should be vector iff the operand "
"type is vector!");		"type is vector!");
▲ Show 20 Lines • Show All 1,811 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < Stages; ++i) {

PrevOp = Op;		PrevOp = Op;
}		}

BinOp = (ISD::NodeType)CandidateBinOp;		BinOp = (ISD::NodeType)CandidateBinOp;
return Op;		return Op;
}		}

SDValue SelectionDAG::UnrollVectorOp(SDNode *N, unsigned ResNE) {		SDValue SelectionDAG::UnrollVectorOp(SDNode *N, unsigned ResNE) {
		uweigandUnsubmitted Not Done Reply Inline Actions Hmm. There's an UnrollVectorOp_StrictFP in LegalizeVectorTypes.cpp. If we change this routine to also handle some strict operations, maybe we should go all the way and just have merge the routines completely? uweigand: Hmm. There's an UnrollVectorOp_StrictFP in LegalizeVectorTypes.cpp. If we change this…
		craig.topperUnsubmitted Not Done Reply Inline Actions When are these changes needed? The STRICT_UINT_TO_FP handling in LegalizeVectorOps always scalarizes and goes through ExpandStrictFPOp. craig.topper: When are these changes needed? The STRICT_UINT_TO_FP handling in LegalizeVectorOps always…
		kpnAuthorUnsubmitted Done Reply Inline Actions VectorLegalizer::ExpandUINT_TO_FLOAT() uses UnrollVectorOp() if the required instructions are not available. I have a change where it gets called from the top of ExpandStrictFPOp() and instead returns an empty value if it would have called UnrollVectorOp(). I think that may eliminate the need to modify UnrollVectorOp() since it will then fall through into the ExpandUINT_TO_FLOAT() logic instead. If it works it'll generate better code, and eliminating complexity would be nice. kpn: VectorLegalizer::ExpandUINT_TO_FLOAT() uses UnrollVectorOp() if the required instructions are…
assert(N->getNumValues() == 1 &&		assert(N->getNumValues() == 1 &&
"Can't unroll a vector with multiple results!");		"Can't unroll a vector with multiple results!");

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
unsigned NE = VT.getVectorNumElements();		unsigned NE = VT.getVectorNumElements();
EVT EltVT = VT.getVectorElementType();		EVT EltVT = VT.getVectorElementType();
SDLoc dl(N);		SDLoc dl(N);

▲ Show 20 Lines • Show All 524 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	#endif
case ISD::TRUNCATE: return "truncate";		case ISD::TRUNCATE: return "truncate";
case ISD::FP_ROUND: return "fp_round";		case ISD::FP_ROUND: return "fp_round";
case ISD::STRICT_FP_ROUND: return "strict_fp_round";		case ISD::STRICT_FP_ROUND: return "strict_fp_round";
case ISD::FLT_ROUNDS_: return "flt_rounds";		case ISD::FLT_ROUNDS_: return "flt_rounds";
case ISD::FP_EXTEND: return "fp_extend";		case ISD::FP_EXTEND: return "fp_extend";
case ISD::STRICT_FP_EXTEND: return "strict_fp_extend";		case ISD::STRICT_FP_EXTEND: return "strict_fp_extend";

case ISD::SINT_TO_FP: return "sint_to_fp";		case ISD::SINT_TO_FP: return "sint_to_fp";
		case ISD::STRICT_SINT_TO_FP: return "strict_sint_to_fp";
case ISD::UINT_TO_FP: return "uint_to_fp";		case ISD::UINT_TO_FP: return "uint_to_fp";
		case ISD::STRICT_UINT_TO_FP: return "strict_uint_to_fp";
case ISD::FP_TO_SINT: return "fp_to_sint";		case ISD::FP_TO_SINT: return "fp_to_sint";
case ISD::STRICT_FP_TO_SINT: return "strict_fp_to_sint";		case ISD::STRICT_FP_TO_SINT: return "strict_fp_to_sint";
case ISD::FP_TO_UINT: return "fp_to_uint";		case ISD::FP_TO_UINT: return "fp_to_uint";
case ISD::STRICT_FP_TO_UINT: return "strict_fp_to_uint";		case ISD::STRICT_FP_TO_UINT: return "strict_fp_to_uint";
case ISD::BITCAST: return "bitcast";		case ISD::BITCAST: return "bitcast";
case ISD::ADDRSPACECAST: return "addrspacecast";		case ISD::ADDRSPACECAST: return "addrspacecast";
case ISD::FP16_TO_FP: return "fp16_to_fp";		case ISD::FP16_TO_FP: return "fp16_to_fp";
case ISD::FP_TO_FP16: return "fp_to_fp16";		case ISD::FP_TO_FP16: return "fp_to_fp16";
▲ Show 20 Lines • Show All 639 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,110 Lines • ▼ Show 20 Lines	if (Strict) {
False = DAG.getNode(ISD::XOR, dl, DstVT, False,		False = DAG.getNode(ISD::XOR, dl, DstVT, False,
DAG.getConstant(SignMask, dl, DstVT));		DAG.getConstant(SignMask, dl, DstVT));
Result = DAG.getSelect(dl, DstVT, Sel, True, False);		Result = DAG.getSelect(dl, DstVT, Sel, True, False);
}		}
return true;		return true;
}		}

bool TargetLowering::expandUINT_TO_FP(SDNode *Node, SDValue &Result,		bool TargetLowering::expandUINT_TO_FP(SDNode *Node, SDValue &Result,
		SDValue &Chain,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue Src = Node->getOperand(0);		unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
		SDValue Src = Node->getOperand(OpNo);
		craig.topperUnsubmitted Done Reply Inline Actions A ternary operator on the operand number would probably be shorter? craig.topper: A ternary operator on the operand number would probably be shorter?
EVT SrcVT = Src.getValueType();		EVT SrcVT = Src.getValueType();
EVT DstVT = Node->getValueType(0);		EVT DstVT = Node->getValueType(0);

if (SrcVT.getScalarType() != MVT::i64)		if (SrcVT.getScalarType() != MVT::i64)
return false;		return false;

SDLoc dl(SDValue(Node, 0));		SDLoc dl(SDValue(Node, 0));
EVT ShiftVT = getShiftAmountTy(SrcVT, DAG.getDataLayout());		EVT ShiftVT = getShiftAmountTy(SrcVT, DAG.getDataLayout());

if (DstVT.getScalarType() == MVT::f32) {		if (DstVT.getScalarType() == MVT::f32) {
// Only expand vector types if we have the appropriate vector bit		// Only expand vector types if we have the appropriate vector bit
// operations.		// operations.
if (SrcVT.isVector() &&		if (SrcVT.isVector() &&
(!isOperationLegalOrCustom(ISD::SRL, SrcVT) \|\|		(!isOperationLegalOrCustom(ISD::SRL, SrcVT) \|\|
!isOperationLegalOrCustom(ISD::FADD, DstVT) \|\|		!isOperationLegalOrCustom(ISD::FADD, DstVT) \|\|
!isOperationLegalOrCustom(ISD::SINT_TO_FP, SrcVT) \|\|		!isOperationLegalOrCustom(ISD::SINT_TO_FP, SrcVT) \|\|
!isOperationLegalOrCustomOrPromote(ISD::OR, SrcVT) \|\|		!isOperationLegalOrCustomOrPromote(ISD::OR, SrcVT) \|\|
!isOperationLegalOrCustomOrPromote(ISD::AND, SrcVT)))		!isOperationLegalOrCustomOrPromote(ISD::AND, SrcVT)))
return false;		return false;

// For unsigned conversions, convert them to signed conversions using the		// For unsigned conversions, convert them to signed conversions using the
// algorithm from the x86_64 __floatundidf in compiler_rt.		// algorithm from the x86_64 __floatundidf in compiler_rt.
SDValue Fast = DAG.getNode(ISD::SINT_TO_FP, dl, DstVT, Src);		SDValue Fast;
		if (Node->isStrictFPOpcode()) {
		Fast = DAG.getNode(ISD::STRICT_SINT_TO_FP, dl, {DstVT, MVT::Other},
		{Node->getOperand(0), Src});
		Chain = SDValue(Fast.getNode(), 1);
		} else
		Fast = DAG.getNode(ISD::SINT_TO_FP, dl, DstVT, Src);

SDValue ShiftConst = DAG.getConstant(1, dl, ShiftVT);		SDValue ShiftConst = DAG.getConstant(1, dl, ShiftVT);
SDValue Shr = DAG.getNode(ISD::SRL, dl, SrcVT, Src, ShiftConst);		SDValue Shr = DAG.getNode(ISD::SRL, dl, SrcVT, Src, ShiftConst);
SDValue AndConst = DAG.getConstant(1, dl, SrcVT);		SDValue AndConst = DAG.getConstant(1, dl, SrcVT);
SDValue And = DAG.getNode(ISD::AND, dl, SrcVT, Src, AndConst);		SDValue And = DAG.getNode(ISD::AND, dl, SrcVT, Src, AndConst);
SDValue Or = DAG.getNode(ISD::OR, dl, SrcVT, And, Shr);		SDValue Or = DAG.getNode(ISD::OR, dl, SrcVT, And, Shr);

		SDValue Slow;
		if (Node->isStrictFPOpcode()) {
		SDValue SignCvt = DAG.getNode(ISD::STRICT_SINT_TO_FP, dl,
		{DstVT, MVT::Other}, {Chain, Or});
		Slow = DAG.getNode(ISD::STRICT_FADD, dl, { DstVT, MVT::Other },
		{ SignCvt.getValue(1), SignCvt, SignCvt });
		Chain = Slow.getValue(1);
		uweigandUnsubmitted Not Done Reply Inline Actions This also looks correct to me. The STRICT_SINT_TO_FP will round correctly in any rounding mode, if my understanding of the algorithm is correct, and it will also raise the inexact exception if appropriate. The STRICT_FADD is just a multiply by two, which does not depend on rounding and cannot raise any exceptions given the input (so it might as well be just a plain FADD). But what confuses me a bit again is why we have two algorithms for UINT_TO_FP expansion: a correct one here, and the incorrect one above in ExpandLegalINT_TO_FP? Under what circumstances will we ever end up using the incorrect one? uweigand: This also looks correct to me. The STRICT_SINT_TO_FP will round correctly in any rounding mode…
		} else {
SDValue SignCvt = DAG.getNode(ISD::SINT_TO_FP, dl, DstVT, Or);		SDValue SignCvt = DAG.getNode(ISD::SINT_TO_FP, dl, DstVT, Or);
SDValue Slow = DAG.getNode(ISD::FADD, dl, DstVT, SignCvt, SignCvt);		Slow = DAG.getNode(ISD::FADD, dl, DstVT, SignCvt, SignCvt);
		}

// TODO: This really should be implemented using a branch rather than a		// TODO: This really should be implemented using a branch rather than a
// select. We happen to get lucky and machinesink does the right		// select. We happen to get lucky and machinesink does the right
// thing most of the time. This would be a good candidate for a		// thing most of the time. This would be a good candidate for a
// pseudo-op, or, even better, for whole-function isel.		// pseudo-op, or, even better, for whole-function isel.
EVT SetCCVT =		EVT SetCCVT =
getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), SrcVT);		getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), SrcVT);

Show All 26 Lines	if (DstVT.getScalarType() == MVT::f64) {
SDValue HiShift = DAG.getConstant(32, dl, ShiftVT);		SDValue HiShift = DAG.getConstant(32, dl, ShiftVT);

SDValue Lo = DAG.getNode(ISD::AND, dl, SrcVT, Src, LoMask);		SDValue Lo = DAG.getNode(ISD::AND, dl, SrcVT, Src, LoMask);
SDValue Hi = DAG.getNode(ISD::SRL, dl, SrcVT, Src, HiShift);		SDValue Hi = DAG.getNode(ISD::SRL, dl, SrcVT, Src, HiShift);
SDValue LoOr = DAG.getNode(ISD::OR, dl, SrcVT, Lo, TwoP52);		SDValue LoOr = DAG.getNode(ISD::OR, dl, SrcVT, Lo, TwoP52);
SDValue HiOr = DAG.getNode(ISD::OR, dl, SrcVT, Hi, TwoP84);		SDValue HiOr = DAG.getNode(ISD::OR, dl, SrcVT, Hi, TwoP84);
SDValue LoFlt = DAG.getBitcast(DstVT, LoOr);		SDValue LoFlt = DAG.getBitcast(DstVT, LoOr);
SDValue HiFlt = DAG.getBitcast(DstVT, HiOr);		SDValue HiFlt = DAG.getBitcast(DstVT, HiOr);
SDValue HiSub = DAG.getNode(ISD::FSUB, dl, DstVT, HiFlt, TwoP84PlusTwoP52);		if (Node->isStrictFPOpcode()) {
		SDValue HiSub =
		DAG.getNode(ISD::STRICT_FSUB, dl, {DstVT, MVT::Other},
		{Node->getOperand(0), HiFlt, TwoP84PlusTwoP52});
		Result = DAG.getNode(ISD::STRICT_FADD, dl, {DstVT, MVT::Other},
		{HiSub.getValue(1), LoFlt, HiSub});
		uweigandUnsubmitted Not Done Reply Inline Actions Same comment as above. uweigand: Same comment as above.
		Chain = Result.getValue(1);
		} else {
		SDValue HiSub =
		DAG.getNode(ISD::FSUB, dl, DstVT, HiFlt, TwoP84PlusTwoP52);
Result = DAG.getNode(ISD::FADD, dl, DstVT, LoFlt, HiSub);		Result = DAG.getNode(ISD::FADD, dl, DstVT, LoFlt, HiSub);
		}
return true;		return true;
}		}

return false;		return false;
}		}

SDValue TargetLowering::expandFMINNUM_FMAXNUM(SDNode *Node,		SDValue TargetLowering::expandFMINNUM_FMAXNUM(SDNode *Node,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 1,271 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,798 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_constrained_fptoui: {
if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {		if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
Assert(NumSrcElem == OperandT->getNumElements(),		Assert(NumSrcElem == OperandT->getNumElements(),
"Intrinsic first argument and result vector lengths must be equal",		"Intrinsic first argument and result vector lengths must be equal",
&FPI);		&FPI);
}		}
}		}
break;		break;

		case Intrinsic::experimental_constrained_sitofp:
		case Intrinsic::experimental_constrained_uitofp: {
		Value *Operand = FPI.getArgOperand(0);
		uint64_t NumSrcElem = 0;
		uweigandUnsubmitted Not Done Reply Inline Actions I believe the NumOperands check is now redunant with common tests handled above. uweigand: I believe the NumOperands check is now redunant with common tests handled above.
		Assert(Operand->getType()->isIntOrIntVectorTy(),
		"Intrinsic first argument must be integer", &FPI);
		if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
		NumSrcElem = OperandT->getNumElements();
		}

		Operand = &FPI;
		Assert((NumSrcElem > 0) == Operand->getType()->isVectorTy(),
		"Intrinsic first argument and result disagree on vector use", &FPI);
		Assert(Operand->getType()->isFPOrFPVectorTy(),
		"Intrinsic result must be a floating point", &FPI);
		if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
		Assert(NumSrcElem == OperandT->getNumElements(),
		"Intrinsic first argument and result vector lengths must be equal",
		&FPI);
		}
		} break;

case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext: {		case Intrinsic::experimental_constrained_fpext: {
Value *Operand = FPI.getArgOperand(0);		Value *Operand = FPI.getArgOperand(0);
Type *OperandTy = Operand->getType();		Type *OperandTy = Operand->getType();
Value *Result = &FPI;		Value *Result = &FPI;
Type *ResultTy = Result->getType();		Type *ResultTy = Result->getType();
Assert(OperandTy->isFPOrFPVectorTy(),		Assert(OperandTy->isFPOrFPVectorTy(),
"Intrinsic first argument must be FP or FP vector", &FPI);		"Intrinsic first argument must be FP or FP vector", &FPI);
▲ Show 20 Lines • Show All 692 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	for (auto ShiftOp : {ISD::FSHL, ISD::FSHR}) {
setOperationAction(ShiftOp , MVT::i32 , Custom);		setOperationAction(ShiftOp , MVT::i32 , Custom);
if (Subtarget.is64Bit())		if (Subtarget.is64Bit())
setOperationAction(ShiftOp , MVT::i64 , Custom);		setOperationAction(ShiftOp , MVT::i64 , Custom);
}		}

if (!Subtarget.useSoftFloat()) {		if (!Subtarget.useSoftFloat()) {
// Promote all UINT_TO_FP to larger SINT_TO_FP's, as X86 doesn't have this		// Promote all UINT_TO_FP to larger SINT_TO_FP's, as X86 doesn't have this
// operation.		// operation.
setOperationAction(ISD::UINT_TO_FP, MVT::i8, Promote);		setOperationAction(ISD::UINT_TO_FP, MVT::i8, Promote);
		setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i8, Promote);
setOperationAction(ISD::UINT_TO_FP, MVT::i16, Promote);		setOperationAction(ISD::UINT_TO_FP, MVT::i16, Promote);
		setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i16, Promote);
// We have an algorithm for SSE2, and we turn this into a 64-bit		// We have an algorithm for SSE2, and we turn this into a 64-bit
// FILD or VCVTUSI2SS/SD for other targets.		// FILD or VCVTUSI2SS/SD for other targets.
setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i32, Custom);
		setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i32, Custom);
// We have an algorithm for SSE2->double, and we turn this into a		// We have an algorithm for SSE2->double, and we turn this into a
// 64-bit FILD followed by conditional FADD for other targets.		// 64-bit FILD followed by conditional FADD for other targets.
setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::i64, Custom);
		setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::i64, Custom);

// Promote i8 SINT_TO_FP to larger SINT_TO_FP's, as X86 doesn't have		// Promote i8 SINT_TO_FP to larger SINT_TO_FP's, as X86 doesn't have
// this operation.		// this operation.
setOperationAction(ISD::SINT_TO_FP, MVT::i8, Promote);		setOperationAction(ISD::SINT_TO_FP, MVT::i8, Promote);
		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i8, Promote);
// SSE has no i16 to fp conversion, only i32. We promote in the handler		// SSE has no i16 to fp conversion, only i32. We promote in the handler
// to allow f80 to use i16 and f64 to use i16 with sse1 only		// to allow f80 to use i16 and f64 to use i16 with sse1 only
setOperationAction(ISD::SINT_TO_FP, MVT::i16, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i16, Custom);
		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i16, Custom);
// f32 and f64 cases are Legal with SSE1/SSE2, f80 case is not		// f32 and f64 cases are Legal with SSE1/SSE2, f80 case is not
setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i32, Custom);
		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i32, Custom);
// In 32-bit mode these are custom lowered. In 64-bit mode F32 and F64		// In 32-bit mode these are custom lowered. In 64-bit mode F32 and F64
// are Legal, f80 is custom lowered.		// are Legal, f80 is custom lowered.
setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::i64, Custom);
		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::i64, Custom);

// Promote i8 FP_TO_SINT to larger FP_TO_SINTS's, as X86 doesn't have		// Promote i8 FP_TO_SINT to larger FP_TO_SINTS's, as X86 doesn't have
// this operation.		// this operation.
setOperationAction(ISD::FP_TO_SINT, MVT::i8, Promote);		setOperationAction(ISD::FP_TO_SINT, MVT::i8, Promote);
// FIXME: This doesn't generate invalid exception when it should. PR44019.		// FIXME: This doesn't generate invalid exception when it should. PR44019.
setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i8, Promote);		setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i8, Promote);
setOperationAction(ISD::FP_TO_SINT, MVT::i16, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::i16, Custom);
setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i16, Custom);		setOperationAction(ISD::STRICT_FP_TO_SINT, MVT::i16, Custom);
▲ Show 20 Lines • Show All 721 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasSSE2()) {
setOperationAction(ISD::FP_TO_SINT, MVT::v4i16, Custom);		setOperationAction(ISD::FP_TO_SINT, MVT::v4i16, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v2i8, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::v2i8, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v4i8, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::v4i8, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v8i8, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::v8i8, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v2i16, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::v2i16, Custom);
setOperationAction(ISD::FP_TO_UINT, MVT::v4i16, Custom);		setOperationAction(ISD::FP_TO_UINT, MVT::v4i16, Custom);

setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Legal);		setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Legal);
		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::v4i32, Legal);
		craig.topperUnsubmitted Not Done Reply Inline Actions Line the arguments up with the l ine above. Same for the two below craig.topper: Line the arguments up with the l ine above. Same for the two below
setOperationAction(ISD::SINT_TO_FP, MVT::v2i32, Custom);		setOperationAction(ISD::SINT_TO_FP, MVT::v2i32, Custom);
		setOperationAction(ISD::STRICT_SINT_TO_FP, MVT::v2i32, Custom);

setOperationAction(ISD::UINT_TO_FP, MVT::v2i32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v2i32, Custom);
		setOperationAction(ISD::STRICT_UINT_TO_FP, MVT::v2i32, Custom);

// Fast v2f32 UINT_TO_FP( v2i32 ) custom conversion.		// Fast v2f32 UINT_TO_FP( v2i32 ) custom conversion.
setOperationAction(ISD::UINT_TO_FP, MVT::v2f32, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v2f32, Custom);

setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);		setOperationAction(ISD::FP_EXTEND, MVT::v2f32, Custom);
setOperationAction(ISD::FP_ROUND, MVT::v2f32, Custom);		setOperationAction(ISD::FP_ROUND, MVT::v2f32, Custom);

// We want to legalize this to an f64 load rather than an i64 load on		// We want to legalize this to an f64 load rather than an i64 load on
▲ Show 20 Lines • Show All 17,417 Lines • ▼ Show 20 Lines	static SDValue LowerFunnelShift(SDValue Op, const X86Subtarget &Subtarget,
return DAG.getNode(SHDOp, DL, VT, Op0, Op1, Amt);		return DAG.getNode(SHDOp, DL, VT, Op0, Op1, Amt);
}		}

// Try to use a packed vector operation to handle i64 on 32-bit targets when		// Try to use a packed vector operation to handle i64 on 32-bit targets when
// AVX512DQ is enabled.		// AVX512DQ is enabled.
static SDValue LowerI64IntToFP_AVX512DQ(SDValue Op, SelectionDAG &DAG,		static SDValue LowerI64IntToFP_AVX512DQ(SDValue Op, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
assert((Op.getOpcode() == ISD::SINT_TO_FP \|\|		assert((Op.getOpcode() == ISD::SINT_TO_FP \|\|
Op.getOpcode() == ISD::UINT_TO_FP) && "Unexpected opcode!");		Op.getOpcode() == ISD::STRICT_SINT_TO_FP \|\|
SDValue Src = Op.getOperand(0);		Op.getOpcode() == ISD::STRICT_UINT_TO_FP \|\|
		Op.getOpcode() == ISD::UINT_TO_FP) &&
		"Unexpected opcode!");
		bool IsStrict = Op->isStrictFPOpcode();
		unsigned OpNo = IsStrict ? 1 : 0;
		SDValue Src = Op.getOperand(OpNo);
MVT SrcVT = Src.getSimpleValueType();		MVT SrcVT = Src.getSimpleValueType();
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

if (!Subtarget.hasDQI() \|\| SrcVT != MVT::i64 \|\| Subtarget.is64Bit() \|\|		if (!Subtarget.hasDQI() \|\| SrcVT != MVT::i64 \|\| Subtarget.is64Bit() \|\|
(VT != MVT::f32 && VT != MVT::f64))		(VT != MVT::f32 && VT != MVT::f64))
return SDValue();		return SDValue();

// Pack the i64 into a vector, do the operation and extract.		// Pack the i64 into a vector, do the operation and extract.

// Using 256-bit to ensure result is 128-bits for f32 case.		// Using 256-bit to ensure result is 128-bits for f32 case.
unsigned NumElts = Subtarget.hasVLX() ? 4 : 8;		unsigned NumElts = Subtarget.hasVLX() ? 4 : 8;
MVT VecInVT = MVT::getVectorVT(MVT::i64, NumElts);		MVT VecInVT = MVT::getVectorVT(MVT::i64, NumElts);
MVT VecVT = MVT::getVectorVT(VT, NumElts);		MVT VecVT = MVT::getVectorVT(VT, NumElts);

SDLoc dl(Op);		SDLoc dl(Op);
SDValue InVec = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VecInVT, Src);		SDValue InVec = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VecInVT, Src);
		if (IsStrict) {
		SDValue CvtVec = DAG.getNode(Op.getOpcode(), dl, {VecVT, MVT::Other},
		{Op.getOperand(0), InVec});
		SDValue Chain = CvtVec.getValue(1);
		SDValue Value = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, CvtVec,
		DAG.getIntPtrConstant(0, dl));
		return DAG.getMergeValues({Value, Chain}, dl);
		}

SDValue CvtVec = DAG.getNode(Op.getOpcode(), dl, VecVT, InVec);		SDValue CvtVec = DAG.getNode(Op.getOpcode(), dl, VecVT, InVec);

return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, CvtVec,		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, CvtVec,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
}		}

static bool useVectorCast(unsigned Opcode, MVT FromVT, MVT ToVT,		static bool useVectorCast(unsigned Opcode, MVT FromVT, MVT ToVT,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
switch (Opcode) {		switch (Opcode) {
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static SDValue vectorizeExtractedCast(SDValue Cast, SelectionDAG &DAG,
// cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0		// cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0
SDValue VCast = DAG.getNode(Cast.getOpcode(), DL, ToVT, VecOp);		SDValue VCast = DAG.getNode(Cast.getOpcode(), DL, ToVT, VecOp);
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, DestVT, VCast,		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, DestVT, VCast,
DAG.getIntPtrConstant(0, DL));		DAG.getIntPtrConstant(0, DL));
}		}

SDValue X86TargetLowering::LowerSINT_TO_FP(SDValue Op,		SDValue X86TargetLowering::LowerSINT_TO_FP(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue Src = Op.getOperand(0);		bool IsStrict = Op->isStrictFPOpcode();
		unsigned OpNo = IsStrict ? 1 : 0;
		SDValue Src = Op.getOperand(OpNo);
MVT SrcVT = Src.getSimpleValueType();		MVT SrcVT = Src.getSimpleValueType();
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
SDLoc dl(Op);		SDLoc dl(Op);

if (SDValue Extract = vectorizeExtractedCast(Op, DAG, Subtarget))		if (SDValue Extract = vectorizeExtractedCast(Op, DAG, Subtarget))
return Extract;		return Extract;

if (SrcVT.isVector()) {		if (SrcVT.isVector()) {
if (SrcVT == MVT::v2i32 && VT == MVT::v2f64) {		if (SrcVT == MVT::v2i32 && VT == MVT::v2f64 && !IsStrict) {
		// FIXME: A strict version of CVTSI2P is needed.
		craig.topperUnsubmitted Not Done Reply Inline Actions Please add a FIXME craig.topper: Please add a FIXME
return DAG.getNode(X86ISD::CVTSI2P, dl, VT,		return DAG.getNode(X86ISD::CVTSI2P, dl, VT,
DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v4i32, Src,		DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v4i32, Src,
DAG.getUNDEF(SrcVT)));		DAG.getUNDEF(SrcVT)));
}		}
return SDValue();		return SDValue();
}		}

assert(SrcVT <= MVT::i64 && SrcVT >= MVT::i16 &&		assert(SrcVT <= MVT::i64 && SrcVT >= MVT::i16 &&
Show All 9 Lines	if (SrcVT == MVT::i64 && UseSSEReg && Subtarget.is64Bit())
return Op;		return Op;

if (SDValue V = LowerI64IntToFP_AVX512DQ(Op, DAG, Subtarget))		if (SDValue V = LowerI64IntToFP_AVX512DQ(Op, DAG, Subtarget))
return V;		return V;

// SSE doesn't have an i16 conversion so we need to promote.		// SSE doesn't have an i16 conversion so we need to promote.
if (SrcVT == MVT::i16 && (UseSSEReg \|\| VT == MVT::f128)) {		if (SrcVT == MVT::i16 && (UseSSEReg \|\| VT == MVT::f128)) {
SDValue Ext = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::i32, Src);		SDValue Ext = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::i32, Src);
		if (IsStrict)
		return DAG.getNode(ISD::STRICT_SINT_TO_FP, dl, {VT, MVT::Other},
		{Op.getOperand(0), Ext});

return DAG.getNode(ISD::SINT_TO_FP, dl, VT, Ext);		return DAG.getNode(ISD::SINT_TO_FP, dl, VT, Ext);
}		}

if (VT == MVT::f128)		if (VT == MVT::f128)
return LowerF128Call(Op, DAG, RTLIB::getSINTTOFP(SrcVT, VT));		return LowerF128Call(Op, DAG, RTLIB::getSINTTOFP(SrcVT, VT));

SDValue ValueToStore = Op.getOperand(0);		SDValue ValueToStore = Src;
if (SrcVT == MVT::i64 && UseSSEReg && !Subtarget.is64Bit())		if (SrcVT == MVT::i64 && UseSSEReg && !Subtarget.is64Bit())
// Bitcasting to f64 here allows us to do a single 64-bit store from		// Bitcasting to f64 here allows us to do a single 64-bit store from
// an SSE register, avoiding the store forwarding penalty that would come		// an SSE register, avoiding the store forwarding penalty that would come
// with two 32-bit stores.		// with two 32-bit stores.
ValueToStore = DAG.getBitcast(MVT::f64, ValueToStore);		ValueToStore = DAG.getBitcast(MVT::f64, ValueToStore);

unsigned Size = SrcVT.getSizeInBits()/8;		unsigned Size = SrcVT.getSizeInBits()/8;
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
auto PtrVT = getPointerTy(MF.getDataLayout());		auto PtrVT = getPointerTy(MF.getDataLayout());
int SSFI = MF.getFrameInfo().CreateStackObject(Size, Size, false);		int SSFI = MF.getFrameInfo().CreateStackObject(Size, Size, false);
SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);		SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);
SDValue Chain = DAG.getStore(		SDValue Chain = IsStrict ? Op->getOperand(0) : DAG.getEntryNode();
DAG.getEntryNode(), dl, ValueToStore, StackSlot,		Chain = DAG.getStore(
		Chain, dl, ValueToStore, StackSlot,
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI));		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI));
return BuildFILD(Op, SrcVT, Chain, StackSlot, DAG).first;		std::pair<SDValue, SDValue> Tmp = BuildFILD(Op, SrcVT, Chain, StackSlot, DAG);

		if (IsStrict)
		return DAG.getMergeValues({Tmp.first, Tmp.second}, dl);

		return Tmp.first;
}		}

std::pair<SDValue, SDValue> X86TargetLowering::BuildFILD(SDValue Op, EVT SrcVT, SDValue Chain,		std::pair<SDValue, SDValue> X86TargetLowering::BuildFILD(SDValue Op, EVT SrcVT, SDValue Chain,
SDValue StackSlot,		SDValue StackSlot,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
// Build the FILD		// Build the FILD
SDLoc DL(Op);		SDLoc DL(Op);
SDVTList Tys;		SDVTList Tys;
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	/*
#ifdef __SSE3__		#ifdef __SSE3__
haddpd %xmm0, %xmm0		haddpd %xmm0, %xmm0
#else		#else
pshufd $0x4e, %xmm0, %xmm1		pshufd $0x4e, %xmm0, %xmm1
addpd %xmm1, %xmm0		addpd %xmm1, %xmm0
#endif		#endif
*/		*/

		bool IsStrict = Op->isStrictFPOpcode();
		unsigned OpNo = IsStrict ? 1 : 0;
SDLoc dl(Op);		SDLoc dl(Op);
LLVMContext *Context = DAG.getContext();		LLVMContext *Context = DAG.getContext();

// Build some magic constants.		// Build some magic constants.
static const uint32_t CV0[] = { 0x43300000, 0x45300000, 0, 0 };		static const uint32_t CV0[] = { 0x43300000, 0x45300000, 0, 0 };
Constant C0 = ConstantDataVector::get(Context, CV0);		Constant C0 = ConstantDataVector::get(Context, CV0);
auto PtrVT = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());		auto PtrVT = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());
SDValue CPIdx0 = DAG.getConstantPool(C0, PtrVT, 16);		SDValue CPIdx0 = DAG.getConstantPool(C0, PtrVT, 16);

SmallVector<Constant*,2> CV1;		SmallVector<Constant*,2> CV1;
CV1.push_back(		CV1.push_back(
ConstantFP::get(*Context, APFloat(APFloat::IEEEdouble(),		ConstantFP::get(*Context, APFloat(APFloat::IEEEdouble(),
APInt(64, 0x4330000000000000ULL))));		APInt(64, 0x4330000000000000ULL))));
CV1.push_back(		CV1.push_back(
ConstantFP::get(*Context, APFloat(APFloat::IEEEdouble(),		ConstantFP::get(*Context, APFloat(APFloat::IEEEdouble(),
APInt(64, 0x4530000000000000ULL))));		APInt(64, 0x4530000000000000ULL))));
Constant *C1 = ConstantVector::get(CV1);		Constant *C1 = ConstantVector::get(CV1);
SDValue CPIdx1 = DAG.getConstantPool(C1, PtrVT, 16);		SDValue CPIdx1 = DAG.getConstantPool(C1, PtrVT, 16);

// Load the 64-bit value into an XMM register.		// Load the 64-bit value into an XMM register.
SDValue XR1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2i64,		SDValue XR1 =
Op.getOperand(0));		DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2i64, Op.getOperand(OpNo));
SDValue CLod0 =		SDValue CLod0 =
DAG.getLoad(MVT::v4i32, dl, DAG.getEntryNode(), CPIdx0,		DAG.getLoad(MVT::v4i32, dl, DAG.getEntryNode(), CPIdx0,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),
/* Alignment = */ 16);		/* Alignment = */ 16);
SDValue Unpck1 =		SDValue Unpck1 =
getUnpackl(DAG, dl, MVT::v4i32, DAG.getBitcast(MVT::v4i32, XR1), CLod0);		getUnpackl(DAG, dl, MVT::v4i32, DAG.getBitcast(MVT::v4i32, XR1), CLod0);

SDValue CLod1 =		SDValue CLod1 =
DAG.getLoad(MVT::v2f64, dl, CLod0.getValue(1), CPIdx1,		DAG.getLoad(MVT::v2f64, dl, CLod0.getValue(1), CPIdx1,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),
/* Alignment = */ 16);		/* Alignment = */ 16);
SDValue XR2F = DAG.getBitcast(MVT::v2f64, Unpck1);		SDValue XR2F = DAG.getBitcast(MVT::v2f64, Unpck1);
		SDValue Sub;
		SDValue Chain;
// TODO: Are there any fast-math-flags to propagate here?		// TODO: Are there any fast-math-flags to propagate here?
SDValue Sub = DAG.getNode(ISD::FSUB, dl, MVT::v2f64, XR2F, CLod1);		if (IsStrict) {
		Sub = DAG.getNode(ISD::STRICT_FSUB, dl, {MVT::v2f64, MVT::Other},
		{Op.getOperand(0), XR2F, CLod1});
		craig.topperUnsubmitted Not Done Reply Inline Actions getValue craig.topper: getValue
		Chain = Sub.getValue(1);
		craig.topperUnsubmitted Not Done Reply Inline Actions getValue craig.topper: getValue
		} else
		Sub = DAG.getNode(ISD::FSUB, dl, MVT::v2f64, XR2F, CLod1);
SDValue Result;		SDValue Result;

if (Subtarget.hasSSE3() && shouldUseHorizontalOp(true, DAG, Subtarget)) {		if (!IsStrict && Subtarget.hasSSE3() &&
		shouldUseHorizontalOp(true, DAG, Subtarget)) {
		craig.topperUnsubmitted Not Done Reply Inline Actions We probably do need a strict version of FHADD, but until we have that we should just go to the shuffle + STRICT_FADD code below rather than silently dropping the chain. craig.topper: We probably do need a strict version of FHADD, but until we have that we should just go to the…
		kpnAuthorUnsubmitted Done Reply Inline Actions Change lifted from D71130. kpn: Change lifted from D71130.
		// FIXME: Do we need a STRICT version of FHADD?
Result = DAG.getNode(X86ISD::FHADD, dl, MVT::v2f64, Sub, Sub);		Result = DAG.getNode(X86ISD::FHADD, dl, MVT::v2f64, Sub, Sub);
} else {		} else {
SDValue Shuffle = DAG.getVectorShuffle(MVT::v2f64, dl, Sub, Sub, {1,-1});		SDValue Shuffle = DAG.getVectorShuffle(MVT::v2f64, dl, Sub, Sub, {1,-1});
		if (IsStrict) {
		Result = DAG.getNode(ISD::STRICT_FADD, dl, {MVT::v2f64, MVT::Other},
		{Chain, Shuffle, Sub});
		Chain = Result.getValue(1);
		} else
Result = DAG.getNode(ISD::FADD, dl, MVT::v2f64, Shuffle, Sub);		Result = DAG.getNode(ISD::FADD, dl, MVT::v2f64, Shuffle, Sub);
}		}
		Result = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Result,
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64, Result,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
		if (IsStrict)
		return DAG.getMergeValues({Result, Chain}, dl);
		craig.topperUnsubmitted Not Done Reply Inline Actions You can DAG.getMergeValues which takes less arguments craig.topper: You can DAG.getMergeValues which takes less arguments

		return Result;
}		}

/// 32-bit unsigned integer to float expansion.		/// 32-bit unsigned integer to float expansion.
static SDValue LowerUINT_TO_FP_i32(SDValue Op, SelectionDAG &DAG,		static SDValue LowerUINT_TO_FP_i32(SDValue Op, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
		unsigned OpNo = Op.getNode()->isStrictFPOpcode() ? 1 : 0;
SDLoc dl(Op);		SDLoc dl(Op);
// FP constant to bias correct the final result.		// FP constant to bias correct the final result.
SDValue Bias = DAG.getConstantFP(BitsToDouble(0x4330000000000000ULL), dl,		SDValue Bias = DAG.getConstantFP(BitsToDouble(0x4330000000000000ULL), dl,
MVT::f64);		MVT::f64);

// Load the 32-bit value into an XMM register.		// Load the 32-bit value into an XMM register.
SDValue Load = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32,		SDValue Load =
Op.getOperand(0));		DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v4i32, Op.getOperand(OpNo));

// Zero out the upper parts of the register.		// Zero out the upper parts of the register.
Load = getShuffleVectorZeroOrUndef(Load, 0, true, Subtarget, DAG);		Load = getShuffleVectorZeroOrUndef(Load, 0, true, Subtarget, DAG);

Load = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64,		Load = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64,
DAG.getBitcast(MVT::v2f64, Load),		DAG.getBitcast(MVT::v2f64, Load),
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));

// Or the load with the bias.		// Or the load with the bias.
SDValue Or = DAG.getNode(		SDValue Or = DAG.getNode(
ISD::OR, dl, MVT::v2i64,		ISD::OR, dl, MVT::v2i64,
DAG.getBitcast(MVT::v2i64,		DAG.getBitcast(MVT::v2i64,
DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f64, Load)),		DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f64, Load)),
DAG.getBitcast(MVT::v2i64,		DAG.getBitcast(MVT::v2i64,
DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f64, Bias)));		DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, MVT::v2f64, Bias)));
Or =		Or =
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64,		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::f64,
DAG.getBitcast(MVT::v2f64, Or), DAG.getIntPtrConstant(0, dl));		DAG.getBitcast(MVT::v2f64, Or), DAG.getIntPtrConstant(0, dl));

		if (Op.getNode()->isStrictFPOpcode()) {
		// Subtract the bias.
		// TODO: Are there any fast-math-flags to propagate here?
		SDValue Chain = Op.getOperand(0);
		SDValue Sub = DAG.getNode(ISD::STRICT_FSUB, dl, {MVT::f64, MVT::Other},
		{Chain, Or, Bias});

		if (Op.getValueType() == Sub.getValueType())
		return Sub;

		craig.topperUnsubmitted Not Done Reply Inline Actions Use Sub.getValue(1) craig.topper: Use Sub.getValue(1)
		// Handle final rounding.
		std::pair<SDValue, SDValue> ResultPair = DAG.getStrictFPExtendOrRound(
		craig.topperUnsubmitted Not Done Reply Inline Actions Don't we need to propagate the chain result too? craig.topper: Don't we need to propagate the chain result too?
		kpnAuthorUnsubmitted Done Reply Inline Actions When given a strict FP node LowerOperation() will require that the resulting new node has the chain in the same place in the values. So, no, I don't think so unless we want to thread the std::pair all the way back up. kpn: When given a strict FP node LowerOperation() will require that the resulting new node has the…
		craig.topperUnsubmitted Not Done Reply Inline Actions Oh this works because the node is either a STRICT_FP_EXTEND or STRICT_FP_ROUND so the node in first and second is the same. But we should probably use a MERGE_VALUES here to not rely on that. craig.topper: Oh this works because the node is either a STRICT_FP_EXTEND or STRICT_FP_ROUND so the node in…
		Sub, Sub.getValue(1), dl, Op.getSimpleValueType());

		return DAG.getMergeValues({ResultPair.first, ResultPair.second}, dl);
		}

// Subtract the bias.		// Subtract the bias.
// TODO: Are there any fast-math-flags to propagate here?		// TODO: Are there any fast-math-flags to propagate here?
SDValue Sub = DAG.getNode(ISD::FSUB, dl, MVT::f64, Or, Bias);		SDValue Sub = DAG.getNode(ISD::FSUB, dl, MVT::f64, Or, Bias);

// Handle final rounding.		// Handle final rounding.
return DAG.getFPExtendOrRound(Sub, dl, Op.getSimpleValueType());		return DAG.getFPExtendOrRound(Sub, dl, Op.getSimpleValueType());
}		}

static SDValue lowerUINT_TO_FP_v2i32(SDValue Op, SelectionDAG &DAG,		static SDValue lowerUINT_TO_FP_v2i32(SDValue Op, SelectionDAG &DAG,
const X86Subtarget &Subtarget,		const X86Subtarget &Subtarget,
const SDLoc &DL) {		const SDLoc &DL) {
if (Op.getSimpleValueType() != MVT::v2f64)		if (Op.getSimpleValueType() != MVT::v2f64)
return SDValue();		return SDValue();

		// FIXME: Need to fix the lack of StrictFP support here.
		craig.topperUnsubmitted Not Done Reply Inline Actions Pleas add a FIXME here craig.topper: Pleas add a FIXME here
		if (Op.getNode()->isStrictFPOpcode())
		return SDValue();

SDValue N0 = Op.getOperand(0);		SDValue N0 = Op.getOperand(0);
assert(N0.getSimpleValueType() == MVT::v2i32 && "Unexpected input type");		assert(N0.getSimpleValueType() == MVT::v2i32 && "Unexpected input type");

// Legalize to v4i32 type.		// Legalize to v4i32 type.
N0 = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v4i32, N0,		N0 = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v4i32, N0,
DAG.getUNDEF(MVT::v2i32));		DAG.getUNDEF(MVT::v2i32));

if (Subtarget.hasAVX512())		if (Subtarget.hasAVX512())
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	SDValue FHigh =
DAG.getNode(ISD::FADD, DL, VecFloatVT, HighBitcast, VecCstFAdd);		DAG.getNode(ISD::FADD, DL, VecFloatVT, HighBitcast, VecCstFAdd);
// return (float4) lo + fhi;		// return (float4) lo + fhi;
SDValue LowBitcast = DAG.getBitcast(VecFloatVT, Low);		SDValue LowBitcast = DAG.getBitcast(VecFloatVT, Low);
return DAG.getNode(ISD::FADD, DL, VecFloatVT, LowBitcast, FHigh);		return DAG.getNode(ISD::FADD, DL, VecFloatVT, LowBitcast, FHigh);
}		}

static SDValue lowerUINT_TO_FP_vec(SDValue Op, SelectionDAG &DAG,		static SDValue lowerUINT_TO_FP_vec(SDValue Op, SelectionDAG &DAG,
const X86Subtarget &Subtarget) {		const X86Subtarget &Subtarget) {
SDValue N0 = Op.getOperand(0);		unsigned OpNo = Op.getNode()->isStrictFPOpcode() ? 1 : 0;
		SDValue N0 = Op.getOperand(OpNo);
MVT SrcVT = N0.getSimpleValueType();		MVT SrcVT = N0.getSimpleValueType();
SDLoc dl(Op);		SDLoc dl(Op);

switch (SrcVT.SimpleTy) {		switch (SrcVT.SimpleTy) {
default:		default:
llvm_unreachable("Custom UINT_TO_FP is not supported!");		llvm_unreachable("Custom UINT_TO_FP is not supported!");
case MVT::v2i32:		case MVT::v2i32:
return lowerUINT_TO_FP_v2i32(Op, DAG, Subtarget, dl);		return lowerUINT_TO_FP_v2i32(Op, DAG, Subtarget, dl);
case MVT::v4i32:		case MVT::v4i32:
case MVT::v8i32:		case MVT::v8i32:
assert(!Subtarget.hasAVX512());		assert(!Subtarget.hasAVX512());
return lowerUINT_TO_FP_vXi32(Op, DAG, Subtarget);		return lowerUINT_TO_FP_vXi32(Op, DAG, Subtarget);
}		}
}		}

SDValue X86TargetLowering::LowerUINT_TO_FP(SDValue Op,		SDValue X86TargetLowering::LowerUINT_TO_FP(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue N0 = Op.getOperand(0);		bool IsStrict = Op->isStrictFPOpcode();
		unsigned OpNo = IsStrict ? 1 : 0;
		SDValue Src = Op.getOperand(OpNo);
SDLoc dl(Op);		SDLoc dl(Op);
auto PtrVT = getPointerTy(DAG.getDataLayout());		auto PtrVT = getPointerTy(DAG.getDataLayout());
MVT SrcVT = N0.getSimpleValueType();		MVT SrcVT = Src.getSimpleValueType();
MVT DstVT = Op.getSimpleValueType();		MVT DstVT = Op.getSimpleValueType();
		SDValue Chain = IsStrict ? Op.getOperand(0) : DAG.getEntryNode();

if (DstVT == MVT::f128)		if (DstVT == MVT::f128)
return LowerF128Call(Op, DAG, RTLIB::getUINTTOFP(SrcVT, DstVT));		return LowerF128Call(Op, DAG, RTLIB::getUINTTOFP(SrcVT, DstVT));

if (DstVT.isVector())		if (DstVT.isVector())
return lowerUINT_TO_FP_vec(Op, DAG, Subtarget);		return lowerUINT_TO_FP_vec(Op, DAG, Subtarget);

if (SDValue Extract = vectorizeExtractedCast(Op, DAG, Subtarget))		if (SDValue Extract = vectorizeExtractedCast(Op, DAG, Subtarget))
return Extract;		return Extract;

if (Subtarget.hasAVX512() && isScalarFPTypeInSSEReg(DstVT) &&		if (Subtarget.hasAVX512() && isScalarFPTypeInSSEReg(DstVT) &&
(SrcVT == MVT::i32 \|\| (SrcVT == MVT::i64 && Subtarget.is64Bit()))) {		(SrcVT == MVT::i32 \|\| (SrcVT == MVT::i64 && Subtarget.is64Bit()))) {
// Conversions from unsigned i32 to f32/f64 are legal,		// Conversions from unsigned i32 to f32/f64 are legal,
// using VCVTUSI2SS/SD. Same for i64 in 64-bit mode.		// using VCVTUSI2SS/SD. Same for i64 in 64-bit mode.
return Op;		return Op;
}		}

// Promote i32 to i64 and use a signed conversion on 64-bit targets.		// Promote i32 to i64 and use a signed conversion on 64-bit targets.
if (SrcVT == MVT::i32 && Subtarget.is64Bit()) {		if (SrcVT == MVT::i32 && Subtarget.is64Bit()) {
N0 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, N0);		Src = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i64, Src);
return DAG.getNode(ISD::SINT_TO_FP, dl, DstVT, N0);		if (IsStrict)
		return DAG.getNode(ISD::STRICT_SINT_TO_FP, dl, {DstVT, MVT::Other},
		{Chain, Src});
		return DAG.getNode(ISD::SINT_TO_FP, dl, DstVT, Src);
}		}

if (SDValue V = LowerI64IntToFP_AVX512DQ(Op, DAG, Subtarget))		if (SDValue V = LowerI64IntToFP_AVX512DQ(Op, DAG, Subtarget))
return V;		return V;

if (SrcVT == MVT::i64 && DstVT == MVT::f64 && X86ScalarSSEf64)		if (SrcVT == MVT::i64 && DstVT == MVT::f64 && X86ScalarSSEf64)
return LowerUINT_TO_FP_i64(Op, DAG, Subtarget);		return LowerUINT_TO_FP_i64(Op, DAG, Subtarget);
if (SrcVT == MVT::i32 && X86ScalarSSEf64)		if (SrcVT == MVT::i32 && X86ScalarSSEf64)
return LowerUINT_TO_FP_i32(Op, DAG, Subtarget);		return LowerUINT_TO_FP_i32(Op, DAG, Subtarget);
if (Subtarget.is64Bit() && SrcVT == MVT::i64 && DstVT == MVT::f32)		if (Subtarget.is64Bit() && SrcVT == MVT::i64 && DstVT == MVT::f32)
return SDValue();		return SDValue();

// Make a 64-bit buffer, and use it to build an FILD.		// Make a 64-bit buffer, and use it to build an FILD.
SDValue StackSlot = DAG.CreateStackTemporary(MVT::i64);		SDValue StackSlot = DAG.CreateStackTemporary(MVT::i64);
if (SrcVT == MVT::i32) {		if (SrcVT == MVT::i32) {
SDValue OffsetSlot = DAG.getMemBasePlusOffset(StackSlot, 4, dl);		SDValue OffsetSlot = DAG.getMemBasePlusOffset(StackSlot, 4, dl);
SDValue Store1 = DAG.getStore(DAG.getEntryNode(), dl, Op.getOperand(0),		SDValue Store1 =
StackSlot, MachinePointerInfo());		DAG.getStore(Chain, dl, Src, StackSlot, MachinePointerInfo());
SDValue Store2 = DAG.getStore(Store1, dl, DAG.getConstant(0, dl, MVT::i32),		SDValue Store2 = DAG.getStore(Store1, dl, DAG.getConstant(0, dl, MVT::i32),
OffsetSlot, MachinePointerInfo());		OffsetSlot, MachinePointerInfo());
return BuildFILD(Op, MVT::i64, Store2, StackSlot, DAG).first;		std::pair<SDValue, SDValue> Tmp =
		BuildFILD(Op, MVT::i64, Store2, StackSlot, DAG);
		if (IsStrict)
		return DAG.getMergeValues({Tmp.first, Tmp.second}, dl);

		return Tmp.first;
		craig.topperUnsubmitted Not Done Reply Inline Actions This looks out of date. I recently changed this to return a std::pair of Result and Chain so it was obvious that the chain result was intended as an output. So we need a merge values here for strict fp now. This should be reflected in https://reviews.llvm.org/D71130 craig.topper: This looks out of date. I recently changed this to return a std::pair of Result and Chain so it…
		kpnAuthorUnsubmitted Done Reply Inline Actions Yup, I just needed to rebase. kpn: Yup, I just needed to rebase.
}		}

assert(SrcVT == MVT::i64 && "Unexpected type in UINT_TO_FP");		assert(SrcVT == MVT::i64 && "Unexpected type in UINT_TO_FP");
SDValue ValueToStore = Op.getOperand(0);		SDValue ValueToStore = Src;
if (isScalarFPTypeInSSEReg(Op.getValueType()) && !Subtarget.is64Bit())		if (isScalarFPTypeInSSEReg(Op.getValueType()) && !Subtarget.is64Bit()) {
// Bitcasting to f64 here allows us to do a single 64-bit store from		// Bitcasting to f64 here allows us to do a single 64-bit store from
// an SSE register, avoiding the store forwarding penalty that would come		// an SSE register, avoiding the store forwarding penalty that would come
// with two 32-bit stores.		// with two 32-bit stores.
ValueToStore = DAG.getBitcast(MVT::f64, ValueToStore);		ValueToStore = DAG.getBitcast(MVT::f64, ValueToStore);
SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, ValueToStore, StackSlot,		}
MachinePointerInfo());		SDValue Store =
		DAG.getStore(Chain, dl, ValueToStore, StackSlot, MachinePointerInfo());
// For i64 source, we need to add the appropriate power of 2 if the input		// For i64 source, we need to add the appropriate power of 2 if the input
// was negative. This is the same as the optimization in		// was negative. This is the same as the optimization in
// DAGTypeLegalizer::ExpandIntOp_UNIT_TO_FP, and for it to be safe here,		// DAGTypeLegalizer::ExpandIntOp_UNIT_TO_FP, and for it to be safe here,
// we must be careful to do the computation in x87 extended precision, not		// we must be careful to do the computation in x87 extended precision, not
// in SSE. (The generic code can't know it's OK to do this, or how to.)		// in SSE. (The generic code can't know it's OK to do this, or how to.)
int SSFI = cast<FrameIndexSDNode>(StackSlot)->getIndex();		int SSFI = cast<FrameIndexSDNode>(StackSlot)->getIndex();
MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(		MachineMemOperand *MMO = DAG.getMachineFunction().getMachineMemOperand(
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI),		MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), SSFI),
MachineMemOperand::MOLoad, 8, 8);		MachineMemOperand::MOLoad, 8, 8);

SDVTList Tys = DAG.getVTList(MVT::f80, MVT::Other);		SDVTList Tys = DAG.getVTList(MVT::f80, MVT::Other);
SDValue Ops[] = { Store, StackSlot };		SDValue Ops[] = { Store, StackSlot };
SDValue Fild = DAG.getMemIntrinsicNode(X86ISD::FILD, dl, Tys, Ops,		SDValue Fild = DAG.getMemIntrinsicNode(X86ISD::FILD, dl, Tys, Ops,
MVT::i64, MMO);		MVT::i64, MMO);
		Chain = Fild.getValue(1);
		craig.topperUnsubmitted Not Done Reply Inline Actions You can just use Fild.getValue(1) craig.topper: You can just use Fild.getValue(1)

APInt FF(32, 0x5F800000ULL);		APInt FF(32, 0x5F800000ULL);

// Check whether the sign bit is set.		// Check whether the sign bit is set.
SDValue SignSet = DAG.getSetCC(		SDValue SignSet = DAG.getSetCC(
dl, getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), MVT::i64),		dl, getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), MVT::i64),
Op.getOperand(0), DAG.getConstant(0, dl, MVT::i64), ISD::SETLT);		Op.getOperand(OpNo), DAG.getConstant(0, dl, MVT::i64), ISD::SETLT);

// Build a 64 bit pair (0, FF) in the constant pool, with FF in the lo bits.		// Build a 64 bit pair (0, FF) in the constant pool, with FF in the lo bits.
SDValue FudgePtr = DAG.getConstantPool(		SDValue FudgePtr = DAG.getConstantPool(
ConstantInt::get(*DAG.getContext(), FF.zext(64)), PtrVT);		ConstantInt::get(*DAG.getContext(), FF.zext(64)), PtrVT);

// Get a pointer to FF if the sign bit was set, or to 0 otherwise.		// Get a pointer to FF if the sign bit was set, or to 0 otherwise.
SDValue Zero = DAG.getIntPtrConstant(0, dl);		SDValue Zero = DAG.getIntPtrConstant(0, dl);
SDValue Four = DAG.getIntPtrConstant(4, dl);		SDValue Four = DAG.getIntPtrConstant(4, dl);
SDValue Offset = DAG.getSelect(dl, Zero.getValueType(), SignSet, Zero, Four);		SDValue Offset = DAG.getSelect(dl, Zero.getValueType(), SignSet, Zero, Four);
FudgePtr = DAG.getNode(ISD::ADD, dl, PtrVT, FudgePtr, Offset);		FudgePtr = DAG.getNode(ISD::ADD, dl, PtrVT, FudgePtr, Offset);

// Load the value out, extending it from f32 to f80.		// Load the value out, extending it from f32 to f80.
// FIXME: Avoid the extend by constructing the right constant pool?		// FIXME: Avoid the extend by constructing the right constant pool?
SDValue Fudge = DAG.getExtLoad(		SDValue Fudge = DAG.getExtLoad(
ISD::EXTLOAD, dl, MVT::f80, DAG.getEntryNode(), FudgePtr,		ISD::EXTLOAD, dl, MVT::f80, Chain, FudgePtr,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()), MVT::f32,
/* Alignment = */ 4);		/* Alignment = */ 4);
		Chain = Fudge.getValue(1);
		craig.topperUnsubmitted Not Done Reply Inline Actions Use getValue craig.topper: Use getValue
// Extend everything to 80 bits to force it to be done on x87.		// Extend everything to 80 bits to force it to be done on x87.
// TODO: Are there any fast-math-flags to propagate here?		// TODO: Are there any fast-math-flags to propagate here?
		if (IsStrict) {
		SDValue Add = DAG.getNode(ISD::STRICT_FADD, dl, {MVT::f80, MVT::Other},
		{Chain, Fild, Fudge});
		return DAG.getNode(ISD::STRICT_FP_ROUND, dl, {DstVT, MVT::Other},
		{Add.getValue(1), Add, DAG.getIntPtrConstant(0, dl)});
		}
		craig.topperUnsubmitted Not Done Reply Inline Actions Use getValue craig.topper: Use getValue
SDValue Add = DAG.getNode(ISD::FADD, dl, MVT::f80, Fild, Fudge);		SDValue Add = DAG.getNode(ISD::FADD, dl, MVT::f80, Fild, Fudge);
return DAG.getNode(ISD::FP_ROUND, dl, DstVT, Add,		return DAG.getNode(ISD::FP_ROUND, dl, DstVT, Add,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
}		}

// If the given FP_TO_SINT (IsSigned) or FP_TO_UINT (!IsSigned) operation		// If the given FP_TO_SINT (IsSigned) or FP_TO_UINT (!IsSigned) operation
// is legal, or has an fp128 or f16 source (which needs to be promoted to f32),		// is legal, or has an fp128 or f16 source (which needs to be promoted to f32),
// just return an SDValue().		// just return an SDValue().
Show All 37 Lines	X86TargetLowering::FP_TO_INTHelper(SDValue Op, SelectionDAG &DAG,

// We lower FP->int64 into FISTP64 followed by a load from a temporary		// We lower FP->int64 into FISTP64 followed by a load from a temporary
// stack slot.		// stack slot.
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
unsigned MemSize = DstTy.getStoreSize();		unsigned MemSize = DstTy.getStoreSize();
int SSFI = MF.getFrameInfo().CreateStackObject(MemSize, MemSize, false);		int SSFI = MF.getFrameInfo().CreateStackObject(MemSize, MemSize, false);
SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);		SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);

if (IsStrict)		Chain = IsStrict ? Op.getOperand(0) : DAG.getEntryNode();
Chain = Op.getOperand(0);
else
Chain = DAG.getEntryNode();

SDValue Adjust; // 0x0 or 0x80000000, for result sign bit adjustment.		SDValue Adjust; // 0x0 or 0x80000000, for result sign bit adjustment.

if (UnsignedFixup) {		if (UnsignedFixup) {
//		//
// Conversion to unsigned i64 is implemented with a select,		// Conversion to unsigned i64 is implemented with a select,
// depending on whether the source value fits in the range		// depending on whether the source value fits in the range
// of a signed i64. Let Thresh be the FP equivalent of		// of a signed i64. Let Thresh be the FP equivalent of
▲ Show 20 Lines • Show All 8,951 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG);		case ISD::GlobalTLSAddress: return LowerGlobalTLSAddress(Op, DAG);
case ISD::ExternalSymbol: return LowerExternalSymbol(Op, DAG);		case ISD::ExternalSymbol: return LowerExternalSymbol(Op, DAG);
case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);		case ISD::BlockAddress: return LowerBlockAddress(Op, DAG);
case ISD::SHL_PARTS:		case ISD::SHL_PARTS:
case ISD::SRA_PARTS:		case ISD::SRA_PARTS:
case ISD::SRL_PARTS: return LowerShiftParts(Op, DAG);		case ISD::SRL_PARTS: return LowerShiftParts(Op, DAG);
case ISD::FSHL:		case ISD::FSHL:
case ISD::FSHR: return LowerFunnelShift(Op, Subtarget, DAG);		case ISD::FSHR: return LowerFunnelShift(Op, Subtarget, DAG);
		case ISD::STRICT_SINT_TO_FP:
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
		case ISD::STRICT_UINT_TO_FP:
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::TRUNCATE: return LowerTRUNCATE(Op, DAG);		case ISD::TRUNCATE: return LowerTRUNCATE(Op, DAG);
case ISD::ZERO_EXTEND: return LowerZERO_EXTEND(Op, Subtarget, DAG);		case ISD::ZERO_EXTEND: return LowerZERO_EXTEND(Op, Subtarget, DAG);
case ISD::SIGN_EXTEND: return LowerSIGN_EXTEND(Op, Subtarget, DAG);		case ISD::SIGN_EXTEND: return LowerSIGN_EXTEND(Op, Subtarget, DAG);
case ISD::ANY_EXTEND: return LowerANY_EXTEND(Op, Subtarget, DAG);		case ISD::ANY_EXTEND: return LowerANY_EXTEND(Op, Subtarget, DAG);
case ISD::ZERO_EXTEND_VECTOR_INREG:		case ISD::ZERO_EXTEND_VECTOR_INREG:
case ISD::SIGN_EXTEND_VECTOR_INREG:		case ISD::SIGN_EXTEND_VECTOR_INREG:
return LowerEXTEND_VECTOR_INREG(Op, Subtarget, DAG);		return LowerEXTEND_VECTOR_INREG(Op, Subtarget, DAG);
▲ Show 20 Lines • Show All 18,554 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fp-intrinsics.ll

	Show First 20 Lines • Show All 1,935 Lines • ▼ Show 20 Lines
	; AVX-NEXT: .cfi_def_cfa_offset 8			; AVX-NEXT: .cfi_def_cfa_offset 8
	; AVX-NEXT: retq			; AVX-NEXT: retq
	entry:			entry:
	%result = call i64 @llvm.experimental.constrained.llround.i64.f32(float %x,			%result = call i64 @llvm.experimental.constrained.llround.i64.f32(float %x,
	metadata !"fpexcept.strict") #0			metadata !"fpexcept.strict") #0
	ret i64 %result			ret i64 %result
	}			}

				; Verify that sitofp(%x) isn't simplified when the rounding mode is
				; unknown.
				; Verify that no gross errors happen.
				craig.topperUnsubmitted Not Done Reply Inline Actions This check line is for IR, but this is an assembly test. CHECK isn't a valid check-prefix for this file. Which also means all of the CHECK-LABEL lines are broken in the existing tests aren't doing anything craig.topper: This check line is for IR, but this is an assembly test. CHECK isn't a valid check-prefix for…
				kpnAuthorUnsubmitted Done Reply Inline Actions Ouch, yeah, that error is all over this file. How about I fix it for the new changes here and then fix the rest in another ticket? kpn: Ouch, yeah, that error is all over this file. How about I fix it for the new changes here and…
				craig.topperUnsubmitted Not Done Reply Inline Actions What is the expected behavior here? We seem to doing some emulation of the operation instead of using the conversion instruction we have with SSE2. craig.topper: What is the expected behavior here? We seem to doing some emulation of the operation instead of…
				kpnAuthorUnsubmitted Done Reply Inline Actions That may be an artifact of of the getOperationAction()/getStrictFPOperationAction() issue that I found after the rearranging Cameron asked for. I'm looking into it now. kpn: That may be an artifact of of the getOperationAction()/getStrictFPOperationAction() issue that…
				kpnAuthorUnsubmitted Done Reply Inline Actions Yep, that's what caused it. If we use the getStrictFPOperationAction() route like the other STRICT nodes here then we end up with Custom lowerings. It's a shame there's no "Unassigned" "lowering". Currently there's no way to know if the target doesn't implement something or if it really does want to Expand like is currently returned. And LegalizeOp() doesn't bother trying to query the target like ExpandNode() does. That seems like a bug. kpn: Yep, that's what caused it. If we use the getStrictFPOperationAction() route like the other…
				craig.topperUnsubmitted Not Done Reply Inline Actions I didn't understand this. Isn't ExpandNode is only called from LegalizeOp? And doesn't LegalizeOp query the target to know to call ExpandNode? craig.topper: I didn't understand this. Isn't ExpandNode is only called from LegalizeOp? And doesn't…
				kpnAuthorUnsubmitted Done Reply Inline Actions Both true, yes. But for STRICT nodes LegalizeOp calls getStrictFPOperationAction(), which just calls getOperationAction() for the non-strict version of the node. So there's no way for LegalizeOp to know if the target wants to handle the STRICT node any differently from the non-strict. It never actually calls getOperationAction() for a STRICT node. kpn: Both true, yes. But for STRICT nodes LegalizeOp calls getStrictFPOperationAction(), which just…
				uweigandUnsubmitted Not Done Reply Inline Actions Maybe I'm confused here, but I thought I fixed this with my recent changes. These days, all code should always call getOperationAction on the strict node first, and only if that returns Expand, then it should call getStrictFPOperationAction to find out how the expansion is to be implemented. That means in LegalizeOp you should not call getStrictFPOperationAction. Really only ExpandNode should do so ... uweigand: Maybe I'm confused here, but I thought I fixed this with my recent changes. These days, all…
				define double @sifdb(i8 %x) #0 {
				; X87-LABEL: sifdb:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movsbl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movw %ax, {{[0-9]+}}(%esp)
				; X87-NEXT: filds {{[0-9]+}}(%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: sifdb:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				; X86-SSE-NEXT: movsbl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2sd %eax, %xmm0
				; X86-SSE-NEXT: movsd %xmm0, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: sifdb:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movsbl %dil, %eax
				; SSE-NEXT: cvtsi2sd %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: sifdb:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movsbl %dil, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.sitofp.f64.i8(i8 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define double @sifdw(i16 %x) #0 {
				; X87-LABEL: sifdw:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movw %ax, {{[0-9]+}}(%esp)
				; X87-NEXT: filds {{[0-9]+}}(%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: sifdw:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I feel like this should also be checking for the mov instruction, which implicitly zero-extends the 32-bit input. Maybe even a comment explaining what happened, because just looking at the test and seeing that uitofp gets lowered to cvtsi2sd looks wrong. andrew.w.kaylor: I feel like this should also be checking for the mov instruction, which implicitly zero-extends…
				; X86-SSE-NEXT: movswl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2sd %eax, %xmm0
				; X86-SSE-NEXT: movsd %xmm0, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: sifdw:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movswl %di, %eax
				; SSE-NEXT: cvtsi2sd %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: sifdw:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movswl %di, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm0, %xmm0
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This sequence is quite opaque. I'm not sure it's the best way of handling this case, though I realize that's unrelated to your patch. What is related to your patch is the question of whether the subpd here might be introducing a spurious exception. I'd need to figure out what this is doing to answer that question. andrew.w.kaylor: This sequence is quite opaque. I'm not sure it's the best way of handling this case, though I…
				; AVX-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.sitofp.f64.i16(i16 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define double @sifdi(i32 %x) #0 {
				; X87-LABEL: sifdi:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: fildl (%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: sifdi:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				; X86-SSE-NEXT: cvtsi2sdl {{[0-9]+}}(%esp), %xmm0
				; X86-SSE-NEXT: movsd %xmm0, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Some more context would be useful here. I believe there is a test a jump and an addss that are also relevant in the output. andrew.w.kaylor: Some more context would be useful here. I believe there is a test a jump and an addss that are…
				kpnAuthorUnsubmitted Done Reply Inline Actions These issues are solved by switching to using the update_llc_test_checks.py script that Craig ran. I'll have it in my next round of this patch when I can get to it. kpn: These issues are solved by switching to using the update_llc_test_checks.py script that Craig…
				; SSE-LABEL: sifdi:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: cvtsi2sd %edi, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: sifdi:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2sd %edi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define float @siffb(i8 %x) #0 {
				; X87-LABEL: siffb:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movsbl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movw %ax, {{[0-9]+}}(%esp)
				; X87-NEXT: filds {{[0-9]+}}(%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: siffb:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: pushl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 8
				; X86-SSE-NEXT: movsbl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2ss %eax, %xmm0
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: popl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: siffb:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movsbl %dil, %eax
				; SSE-NEXT: cvtsi2ss %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: siffb:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movsbl %dil, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.sitofp.f32.i8(i8 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				define float @siffw(i16 %x) #0 {
				; X87-LABEL: siffw:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movw %ax, {{[0-9]+}}(%esp)
				; X87-NEXT: filds {{[0-9]+}}(%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: siffw:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: pushl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 8
				; X86-SSE-NEXT: movswl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2ss %eax, %xmm0
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: popl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: siffw:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movswl %di, %eax
				; SSE-NEXT: cvtsi2ss %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: siffw:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movswl %di, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.sitofp.f32.i16(i16 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				define float @siffi(i32 %x) #0 {
				; X87-LABEL: siffi:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: fildl (%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: siffi:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: pushl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 8
				; X86-SSE-NEXT: cvtsi2ssl {{[0-9]+}}(%esp), %xmm0
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: popl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: siffi:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: cvtsi2ss %edi, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: siffi:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2ss %edi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.sitofp.f32.i32(i32 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				define double @sifdl(i64 %x) #0 {
				; X87-LABEL: sifdl:
				; X87: # %bb.0: # %entry
				; X87-NEXT: subl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 16
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X87-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: fildll (%esp)
				; X87-NEXT: addl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: sifdl:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $20, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 24
				; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE-NEXT: movlps %xmm0, {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: fildll {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: fstpl (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $20, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: sifdl:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: cvtsi2sd %rdi, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: sifdl:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2sd %rdi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.sitofp.f64.i64(i64 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define float @siffl(i64 %x) #0 {
				; X87-LABEL: siffl:
				; X87: # %bb.0: # %entry
				; X87-NEXT: subl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 16
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X87-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: fildll (%esp)
				; X87-NEXT: addl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: siffl:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $20, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 24
				; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE-NEXT: movlps %xmm0, {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: fildll {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: fstps {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: flds {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: addl $20, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: siffl:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: cvtsi2ss %rdi, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: siffl:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2ss %rdi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.sitofp.f32.i64(i64 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				; Verify that uitofp(%x) isn't simplified when the rounding mode is
				; unknown.
				; Verify that no gross errors happen.
				define double @uifdb(i8 %x) #0 {
				; X87-LABEL: uifdb:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movzbl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movw %ax, {{[0-9]+}}(%esp)
				; X87-NEXT: filds {{[0-9]+}}(%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uifdb:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				; X86-SSE-NEXT: movzbl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2sd %eax, %xmm0
				; X86-SSE-NEXT: movsd %xmm0, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uifdb:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movzbl %dil, %eax
				; SSE-NEXT: cvtsi2sd %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: uifdb:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movzbl %dil, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.uitofp.f64.i8(i8 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define double @uifdw(i16 %x) #0 {
				; X87-LABEL: uifdw:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: fildl (%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uifdw:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				; X86-SSE-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2sd %eax, %xmm0
				; X86-SSE-NEXT: movsd %xmm0, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uifdw:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movzwl %di, %eax
				; SSE-NEXT: cvtsi2sd %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: uifdw:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movzwl %di, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.uitofp.f64.i16(i16 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define double @uifdi(i32 %x) #0 {
				; X87-LABEL: uifdi:
				; X87: # %bb.0: # %entry
				; X87-NEXT: subl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 16
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X87-NEXT: fildll (%esp)
				; X87-NEXT: addl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uifdi:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-SSE-NEXT: orpd %xmm0, %xmm1
				; X86-SSE-NEXT: subsd %xmm0, %xmm1
				; X86-SSE-NEXT: movsd %xmm1, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uifdi:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movl %edi, %eax
				; SSE-NEXT: cvtsi2sd %rax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX1-LABEL: uifdi:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: movl %edi, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: uifdi:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2sd %edi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define double @uifdl(i64 %x) #0 {
				; X87-LABEL: uifdl:
				; X87: # %bb.0: # %entry
				; X87-NEXT: subl $20, %esp
				; X87-NEXT: .cfi_def_cfa_offset 24
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X87-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: xorl %eax, %eax
				; X87-NEXT: testl %ecx, %ecx
				; X87-NEXT: setns %al
				; X87-NEXT: fildll (%esp)
				; X87-NEXT: fadds {{\.LCPI.*}}(,%eax,4)
				; X87-NEXT: fstpl {{[0-9]+}}(%esp)
				; X87-NEXT: fldl {{[0-9]+}}(%esp)
				; X87-NEXT: addl $20, %esp
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uifdl:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 16
				; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
				; X86-SSE-NEXT: subpd {{\.LCPI.*}}, %xmm0
				; X86-SSE-NEXT: movapd %xmm0, %xmm1
				; X86-SSE-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]
				; X86-SSE-NEXT: addpd %xmm0, %xmm1
				; X86-SSE-NEXT: movlpd %xmm1, (%esp)
				; X86-SSE-NEXT: fldl (%esp)
				; X86-SSE-NEXT: addl $12, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uifdl:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movq %rdi, %xmm1
				; SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
				; SSE-NEXT: subpd {{.*}}(%rip), %xmm1
				; SSE-NEXT: movapd %xmm1, %xmm0
				; SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
				; SSE-NEXT: addpd %xmm1, %xmm0
				; SSE-NEXT: retq
				;
				; AVX1-LABEL: uifdl:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vmovq %rdi, %xmm0
				; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
				; AVX1-NEXT: vsubpd {{.*}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]
				; AVX1-NEXT: vaddpd %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: uifdl:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2sd %rdi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call double @llvm.experimental.constrained.uitofp.f64.i64(i64 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				define float @uiffb(i8 %x) #0 {
				; X87-LABEL: uiffb:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movzbl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movw %ax, {{[0-9]+}}(%esp)
				; X87-NEXT: filds {{[0-9]+}}(%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uiffb:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: pushl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 8
				; X86-SSE-NEXT: movzbl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2ss %eax, %xmm0
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: popl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uiffb:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movzbl %dil, %eax
				; SSE-NEXT: cvtsi2ss %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: uiffb:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movzbl %dil, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.uitofp.f32.i8(i8 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				define float @uiffw(i16 %x) #0 {
				; X87-LABEL: uiffw:
				; X87: # %bb.0: # %entry
				; X87-NEXT: pushl %eax
				; X87-NEXT: .cfi_def_cfa_offset 8
				; X87-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: fildl (%esp)
				; X87-NEXT: popl %eax
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uiffw:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: pushl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 8
				; X86-SSE-NEXT: movzwl {{[0-9]+}}(%esp), %eax
				; X86-SSE-NEXT: cvtsi2ss %eax, %xmm0
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: popl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uiffw:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movzwl %di, %eax
				; SSE-NEXT: cvtsi2ss %eax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX-LABEL: uiffw:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: movzwl %di, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.uitofp.f32.i16(i16 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				define float @uiffi(i32 %x) #0 {
				; X87-LABEL: uiffi:
				; X87: # %bb.0: # %entry
				; X87-NEXT: subl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 16
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl %eax, (%esp)
				; X87-NEXT: movl $0, {{[0-9]+}}(%esp)
				; X87-NEXT: fildll (%esp)
				; X87-NEXT: addl $12, %esp
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uiffi:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: pushl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 8
				; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-SSE-NEXT: orpd %xmm0, %xmm1
				; X86-SSE-NEXT: subsd %xmm0, %xmm1
				; X86-SSE-NEXT: xorps %xmm0, %xmm0
				; X86-SSE-NEXT: cvtsd2ss %xmm1, %xmm0
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: popl %eax
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uiffi:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: movl %edi, %eax
				; SSE-NEXT: cvtsi2ss %rax, %xmm0
				; SSE-NEXT: retq
				;
				; AVX1-LABEL: uiffi:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: movl %edi, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: uiffi:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2ss %edi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.uitofp.f32.i32(i32 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

				define float @uiffl(i64 %x) #0 {
				; X87-LABEL: uiffl:
				; X87: # %bb.0: # %entry
				; X87-NEXT: subl $20, %esp
				; X87-NEXT: .cfi_def_cfa_offset 24
				; X87-NEXT: movl {{[0-9]+}}(%esp), %eax
				; X87-NEXT: movl {{[0-9]+}}(%esp), %ecx
				; X87-NEXT: movl %ecx, {{[0-9]+}}(%esp)
				; X87-NEXT: movl %eax, {{[0-9]+}}(%esp)
				; X87-NEXT: xorl %eax, %eax
				; X87-NEXT: testl %ecx, %ecx
				; X87-NEXT: setns %al
				; X87-NEXT: fildll {{[0-9]+}}(%esp)
				; X87-NEXT: fadds {{\.LCPI.*}}(,%eax,4)
				; X87-NEXT: fstps {{[0-9]+}}(%esp)
				; X87-NEXT: flds {{[0-9]+}}(%esp)
				; X87-NEXT: addl $20, %esp
				; X87-NEXT: .cfi_def_cfa_offset 4
				; X87-NEXT: retl
				;
				; X86-SSE-LABEL: uiffl:
				; X86-SSE: # %bb.0: # %entry
				; X86-SSE-NEXT: subl $20, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 24
				; X86-SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-SSE-NEXT: movlps %xmm0, {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: xorl %eax, %eax
				; X86-SSE-NEXT: cmpl $0, {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: setns %al
				; X86-SSE-NEXT: fildll {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: fadds {{\.LCPI.*}}(,%eax,4)
				; X86-SSE-NEXT: fstps {{[0-9]+}}(%esp)
				; X86-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-SSE-NEXT: movss %xmm0, (%esp)
				; X86-SSE-NEXT: flds (%esp)
				; X86-SSE-NEXT: addl $20, %esp
				; X86-SSE-NEXT: .cfi_def_cfa_offset 4
				; X86-SSE-NEXT: retl
				;
				; SSE-LABEL: uiffl:
				; SSE: # %bb.0: # %entry
				; SSE-NEXT: testq %rdi, %rdi
				; SSE-NEXT: js .LBB52_1
				; SSE-NEXT: # %bb.2: # %entry
				; SSE-NEXT: cvtsi2ss %rdi, %xmm0
				; SSE-NEXT: retq
				; SSE-NEXT: .LBB52_1:
				; SSE-NEXT: movq %rdi, %rax
				; SSE-NEXT: shrq %rax
				; SSE-NEXT: andl $1, %edi
				; SSE-NEXT: orq %rax, %rdi
				; SSE-NEXT: cvtsi2ss %rdi, %xmm0
				; SSE-NEXT: addss %xmm0, %xmm0
				; SSE-NEXT: retq
				;
				; AVX1-LABEL: uiffl:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: testq %rdi, %rdi
				; AVX1-NEXT: js .LBB52_1
				; AVX1-NEXT: # %bb.2: # %entry
				; AVX1-NEXT: vcvtsi2ss %rdi, %xmm0, %xmm0
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB52_1:
				; AVX1-NEXT: movq %rdi, %rax
				; AVX1-NEXT: shrq %rax
				; AVX1-NEXT: andl $1, %edi
				; AVX1-NEXT: orq %rax, %rdi
				; AVX1-NEXT: vcvtsi2ss %rdi, %xmm0, %xmm0
				; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: uiffl:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2ss %rdi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call float @llvm.experimental.constrained.uitofp.f32.i64(i64 %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret float %result
				}

	attributes #0 = { strictfp }			attributes #0 = { strictfp }

	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"			@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"
	declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.frem.f64(double, double, metadata, metadata)
	Show All 24 Lines
	declare i32 @llvm.experimental.constrained.lrint.i32.f64(double, metadata, metadata)			declare i32 @llvm.experimental.constrained.lrint.i32.f64(double, metadata, metadata)
	declare i32 @llvm.experimental.constrained.lrint.i32.f32(float, metadata, metadata)			declare i32 @llvm.experimental.constrained.lrint.i32.f32(float, metadata, metadata)
	declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)			declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)
	declare i64 @llvm.experimental.constrained.llrint.i64.f32(float, metadata, metadata)			declare i64 @llvm.experimental.constrained.llrint.i64.f32(float, metadata, metadata)
	declare i32 @llvm.experimental.constrained.lround.i32.f64(double, metadata)			declare i32 @llvm.experimental.constrained.lround.i32.f64(double, metadata)
	declare i32 @llvm.experimental.constrained.lround.i32.f32(float, metadata)			declare i32 @llvm.experimental.constrained.lround.i32.f32(float, metadata)
	declare i64 @llvm.experimental.constrained.llround.i64.f64(double, metadata)			declare i64 @llvm.experimental.constrained.llround.i64.f64(double, metadata)
	declare i64 @llvm.experimental.constrained.llround.i64.f32(float, metadata)			declare i64 @llvm.experimental.constrained.llround.i64.f32(float, metadata)
				declare double @llvm.experimental.constrained.sitofp.f64.i8(i8, metadata, metadata)
				declare double @llvm.experimental.constrained.sitofp.f64.i16(i16, metadata, metadata)
				declare double @llvm.experimental.constrained.sitofp.f64.i32(i32, metadata, metadata)
				declare double @llvm.experimental.constrained.sitofp.f64.i64(i64, metadata, metadata)
				declare float @llvm.experimental.constrained.sitofp.f32.i8(i8, metadata, metadata)
				declare float @llvm.experimental.constrained.sitofp.f32.i16(i16, metadata, metadata)
				declare float @llvm.experimental.constrained.sitofp.f32.i32(i32, metadata, metadata)
				declare float @llvm.experimental.constrained.sitofp.f32.i64(i64, metadata, metadata)
				declare double @llvm.experimental.constrained.uitofp.f64.i8(i8, metadata, metadata)
				declare double @llvm.experimental.constrained.uitofp.f64.i16(i16, metadata, metadata)
				declare double @llvm.experimental.constrained.uitofp.f64.i32(i32, metadata, metadata)
				declare double @llvm.experimental.constrained.uitofp.f64.i64(i64, metadata, metadata)
				declare float @llvm.experimental.constrained.uitofp.f32.i8(i8, metadata, metadata)
				declare float @llvm.experimental.constrained.uitofp.f32.i16(i16, metadata, metadata)
				declare float @llvm.experimental.constrained.uitofp.f32.i32(i32, metadata, metadata)
				declare float @llvm.experimental.constrained.uitofp.f32.i64(i64, metadata, metadata)

llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,780 Lines • ▼ Show 20 Lines
	entry:			entry:
	%trunc = call <3 x double> @llvm.experimental.constrained.trunc.v3f64(			%trunc = call <3 x double> @llvm.experimental.constrained.trunc.v3f64(
	<3 x double> <double 1.1, double 1.9, double 1.5>,			<3 x double> <double 1.1, double 1.9, double 1.5>,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict") #0			metadata !"fpexcept.strict") #0
	ret <3 x double> %trunc			ret <3 x double> %trunc
	}			}

				define <1 x double> @constrained_vector_sitofp_v1f64_v1i32(<1 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v1f64_v1i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtsi2sd %edi, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v1f64_v1i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2sd %edi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call <1 x double>
				@llvm.experimental.constrained.sitofp.v1f64.v1i32(<1 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x double> %result
				}

				define <1 x float> @constrained_vector_sitofp_v1f32_v1i32(<1 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v1f32_v1i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtsi2ss %edi, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v1f32_v1i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2ss %edi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call <1 x float>
				@llvm.experimental.constrained.sitofp.v1f32.v1i32(<1 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x float> %result
				}

				define <1 x double> @constrained_vector_sitofp_v1f64_v1i64(<1 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v1f64_v1i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtsi2sd %rdi, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v1f64_v1i64:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2sd %rdi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call <1 x double>
				@llvm.experimental.constrained.sitofp.v1f64.v1i64(<1 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x double> %result
				}

				define <1 x float> @constrained_vector_sitofp_v1f32_v1i64(<1 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v1f32_v1i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtsi2ss %rdi, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v1f32_v1i64:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtsi2ss %rdi, %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call <1 x float>
				@llvm.experimental.constrained.sitofp.v1f32.v1i64(<1 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x float> %result
				}

				define <2 x double> @constrained_vector_sitofp_v2f64_v2i32(<2 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v2f64_v2i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2sd %eax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %eax, %xmm0
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm0[0]
				; CHECK-NEXT: movapd %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v2f64_v2i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vextractps $1, %xmm0, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm1, %xmm1
				; AVX-NEXT: vmovd %xmm0, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm2, %xmm0
				; AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX-NEXT: retq
				entry:
				%result = call <2 x double>
				@llvm.experimental.constrained.sitofp.v2f64.v2i32(<2 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x double> %result
				}

				define <2 x float> @constrained_vector_sitofp_v2f32_v2i32(<2 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v2f32_v2i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2ss %eax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %eax, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v2f32_v2i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vextractps $1, %xmm0, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm1, %xmm1
				; AVX-NEXT: vmovd %xmm0, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm2, %xmm0
				; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX-NEXT: retq
				entry:
				%result = call <2 x float>
				@llvm.experimental.constrained.sitofp.v2f32.v2i32(<2 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x float> %result
				}

				define <2 x double> @constrained_vector_sitofp_v2f64_v2i64(<2 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v2f64_v2i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %rax, %xmm0
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm0[0]
				; CHECK-NEXT: movapd %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v2f64_v2i64:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vpextrq $1, %xmm0, %rax
				; AVX-NEXT: vcvtsi2sd %rax, %xmm1, %xmm1
				; AVX-NEXT: vmovq %xmm0, %rax
				; AVX-NEXT: vcvtsi2sd %rax, %xmm2, %xmm0
				; AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX-NEXT: retq
				entry:
				%result = call <2 x double>
				@llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x double> %result
				}

				define <2 x float> @constrained_vector_sitofp_v2f32_v2i64(<2 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v2f32_v2i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v2f32_v2i64:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vpextrq $1, %xmm0, %rax
				; AVX-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX-NEXT: vmovq %xmm0, %rax
				; AVX-NEXT: vcvtsi2ss %rax, %xmm2, %xmm0
				; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX-NEXT: retq
				entry:
				%result = call <2 x float>
				@llvm.experimental.constrained.sitofp.v2f32.v2i64(<2 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x float> %result
				}

				define <3 x double> @constrained_vector_sitofp_v3f64_v3i32(<3 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v3f64_v3i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2sd %eax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm1, %eax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2sd %eax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %eax, %xmm0
				; CHECK-NEXT: movsd %xmm0, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: fldl -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movapd %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v3f64_v3i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vextractps $1, %xmm0, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm1, %xmm1
				; AVX-NEXT: vmovd %xmm0, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm2, %xmm2
				; AVX-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm2[0],xmm1[0]
				; AVX-NEXT: vpextrd $2, %xmm0, %eax
				; AVX-NEXT: vcvtsi2sd %eax, %xmm3, %xmm0
				; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
				; AVX-NEXT: retq
				entry:
				%result = call <3 x double>
				@llvm.experimental.constrained.sitofp.v3f64.v3i32(<3 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x double> %result
				}

				define <3 x float> @constrained_vector_sitofp_v3f32_v3i32(<3 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v3f32_v3i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2ss %eax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm2 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm2, %eax
				; CHECK-NEXT: xorps %xmm2, %xmm2
				; CHECK-NEXT: cvtsi2ss %eax, %xmm2
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %eax, %xmm0
				; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v3f32_v3i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vextractps $1, %xmm0, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm1, %xmm1
				; AVX-NEXT: vmovd %xmm0, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm2, %xmm2
				; AVX-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX-NEXT: vpextrd $2, %xmm0, %eax
				; AVX-NEXT: vcvtsi2ss %eax, %xmm3, %xmm0
				; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX-NEXT: retq
				entry:
				%result = call <3 x float>
				@llvm.experimental.constrained.sitofp.v3f32.v3i32(<3 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x float> %result
				}

				define <3 x double> @constrained_vector_sitofp_v3f64_v3i64(<3 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v3f64_v3i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtsi2sd %rdi, %xmm0
				; CHECK-NEXT: cvtsi2sd %rsi, %xmm1
				; CHECK-NEXT: cvtsi2sd %rdx, %xmm2
				; CHECK-NEXT: movsd %xmm2, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: fldl -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_sitofp_v3f64_v3i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm2, %xmm2
				; AVX1-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm2[0],xmm1[0]
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_sitofp_v3f64_v3i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm1, %xmm1
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm2, %xmm2
				; AVX512-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm2[0],xmm1[0]
				; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm0
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
				; AVX512-NEXT: retq
				entry:
				%result = call <3 x double>
				@llvm.experimental.constrained.sitofp.v3f64.v3i64(<3 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x double> %result
				}

				define <3 x float> @constrained_vector_sitofp_v3f32_v3i64(<3 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v3f32_v3i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtsi2ss %rsi, %xmm1
				; CHECK-NEXT: cvtsi2ss %rdi, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rdx, %xmm1
				; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_sitofp_v3f32_v3i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_sitofp_v3f32_v3i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm0
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
				entry:
				%result = call <3 x float>
				@llvm.experimental.constrained.sitofp.v3f32.v3i64(<3 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x float> %result
				}

				define <4 x double> @constrained_vector_sitofp_v4f64_v4i32(<4 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v4f64_v4i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2sd %eax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm1, %eax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2sd %eax, %xmm1
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[3,1,2,3]
				; CHECK-NEXT: movd %xmm1, %eax
				; CHECK-NEXT: cvtsi2sd %eax, %xmm3
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2sd %eax, %xmm1
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
				; CHECK-NEXT: movapd %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v4f64_v4i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtdq2pd %xmm0, %ymm0
				; AVX-NEXT: retq
				entry:
				%result = call <4 x double>
				@llvm.experimental.constrained.sitofp.v4f64.v4i32(<4 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x double> %result
				}

				define <4 x float> @constrained_vector_sitofp_v4f32_v4i32(<4 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v4f32_v4i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: cvtdq2ps %xmm0, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX-LABEL: constrained_vector_sitofp_v4f32_v4i32:
				; AVX: # %bb.0: # %entry
				; AVX-NEXT: vcvtdq2ps %xmm0, %xmm0
				; AVX-NEXT: retq
				entry:
				%result = call <4 x float>
				@llvm.experimental.constrained.sitofp.v4f32.v4i32(<4 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x float> %result
				}

				define <4 x double> @constrained_vector_sitofp_v4f64_v4i64(<4 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v4f64_v4i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %rax, %xmm0
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm0[0]
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm3
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm1[2,3,0,1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %rax, %xmm0
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm3 = xmm3[0],xmm0[0]
				; CHECK-NEXT: movapd %xmm2, %xmm0
				; CHECK-NEXT: movapd %xmm3, %xmm1
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_sitofp_v4f64_v4i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX1-NEXT: vpextrq $1, %xmm1, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm2, %xmm2
				; AVX1-NEXT: vmovq %xmm1, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm3, %xmm1
				; AVX1-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm3, %xmm2
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm3, %xmm0
				; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
				; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_sitofp_v4f64_v4i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm1
				; AVX512-NEXT: vpextrq $1, %xmm1, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm2, %xmm2
				; AVX512-NEXT: vmovq %xmm1, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm3, %xmm1
				; AVX512-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm1[0],xmm2[0]
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm3, %xmm2
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2sd %rax, %xmm3, %xmm0
				; AVX512-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm2[0]
				; AVX512-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
				; AVX512-NEXT: retq
				entry:
				%result = call <4 x double>
				@llvm.experimental.constrained.sitofp.v4f64.v4i64(<4 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x double> %result
				}

				define <4 x float> @constrained_vector_sitofp_v4f32_v4i64(<4 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_sitofp_v4f32_v4i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_sitofp_v4f32_v4i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm2
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_sitofp_v4f32_v4i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm0
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm3, %xmm2
				; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
				entry:
				%result = call <4 x float>
				@llvm.experimental.constrained.sitofp.v4f32.v4i64(<4 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x float> %result
				}

				define <1 x double> @constrained_vector_uitofp_v1f64_v1i32(<1 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v1f64_v1i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v1f64_v1i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: movl %edi, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v1f64_v1i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2sd %edi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call <1 x double>
				@llvm.experimental.constrained.uitofp.v1f64.v1i32(<1 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x double> %result
				}

				define <1 x float> @constrained_vector_uitofp_v1f32_v1i32(<1 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v1f32_v1i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v1f32_v1i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: movl %edi, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v1f32_v1i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2ss %edi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call <1 x float>
				@llvm.experimental.constrained.uitofp.v1f32.v1i32(<1 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x float> %result
				}

				define <1 x double> @constrained_vector_uitofp_v1f64_v1i64(<1 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v1f64_v1i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %rdi, %xmm1
				; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
				; CHECK-NEXT: subpd {{.*}}(%rip), %xmm1
				; CHECK-NEXT: movapd %xmm1, %xmm0
				; CHECK-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
				; CHECK-NEXT: addpd %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v1f64_v1i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vmovq %rdi, %xmm0
				; AVX1-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
				; AVX1-NEXT: vsubpd {{.*}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]
				; AVX1-NEXT: vaddpd %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v1f64_v1i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2sd %rdi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call <1 x double>
				@llvm.experimental.constrained.uitofp.v1f64.v1i64(<1 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x double> %result
				}

				define <1 x float> @constrained_vector_uitofp_v1f32_v1i64(<1 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v1f32_v1i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: testq %rdi, %rdi
				; CHECK-NEXT: js .LBB170_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: cvtsi2ss %rdi, %xmm0
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB170_1:
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: shrq %rax
				; CHECK-NEXT: andl $1, %edi
				; CHECK-NEXT: orq %rax, %rdi
				; CHECK-NEXT: cvtsi2ss %rdi, %xmm0
				; CHECK-NEXT: addss %xmm0, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v1f32_v1i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: testq %rdi, %rdi
				; AVX1-NEXT: js .LBB170_1
				; AVX1-NEXT: # %bb.2: # %entry
				; AVX1-NEXT: vcvtsi2ss %rdi, %xmm0, %xmm0
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB170_1:
				; AVX1-NEXT: movq %rdi, %rax
				; AVX1-NEXT: shrq %rax
				; AVX1-NEXT: andl $1, %edi
				; AVX1-NEXT: orq %rax, %rdi
				; AVX1-NEXT: vcvtsi2ss %rdi, %xmm0, %xmm0
				; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v1f32_v1i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vcvtusi2ss %rdi, %xmm0, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call <1 x float>
				@llvm.experimental.constrained.uitofp.v1f32.v1i64(<1 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <1 x float> %result
				}

				define <2 x double> @constrained_vector_uitofp_v2f64_v2i32(<2 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v2f64_v2i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %rax, %xmm0
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm0[0]
				; CHECK-NEXT: movapd %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v2f64_v2i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vextractps $1, %xmm0, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm2, %xmm0
				; AVX1-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v2f64_v2i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vextractps $1, %xmm0, %eax
				; AVX512-NEXT: vcvtusi2sd %eax, %xmm1, %xmm1
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: vcvtusi2sd %eax, %xmm2, %xmm0
				; AVX512-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; AVX512-NEXT: retq
				entry:
				%result = call <2 x double>
				@llvm.experimental.constrained.uitofp.v2f64.v2i32(<2 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x double> %result
				}

				define <2 x float> @constrained_vector_uitofp_v2f32_v2i32(<2 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v2f32_v2i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v2f32_v2i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vextractps $1, %xmm0, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v2f32_v2i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vextractps $1, %xmm0, %eax
				; AVX512-NEXT: vcvtusi2ss %eax, %xmm1, %xmm1
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: vcvtusi2ss %eax, %xmm2, %xmm0
				; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX512-NEXT: retq
				entry:
				%result = call <2 x float>
				@llvm.experimental.constrained.uitofp.v2f32.v2i32(<2 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x float> %result
				}

				define <2 x double> @constrained_vector_uitofp_v2f64_v2i64(<2 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v2f64_v2i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movdqa {{.*#+}} xmm1 = [4294967295,4294967295]
				; CHECK-NEXT: pand %xmm0, %xmm1
				; CHECK-NEXT: por {{.*}}(%rip), %xmm1
				; CHECK-NEXT: psrlq $32, %xmm0
				; CHECK-NEXT: por {{.*}}(%rip), %xmm0
				; CHECK-NEXT: subpd {{.*}}(%rip), %xmm0
				; CHECK-NEXT: addpd %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v2f64_v2i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0,1],xmm1[2,3],xmm0[4,5],xmm1[6,7]
				; AVX1-NEXT: vpor {{.*}}(%rip), %xmm1, %xmm1
				; AVX1-NEXT: vpsrlq $32, %xmm0, %xmm0
				; AVX1-NEXT: vpor {{.*}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vsubpd {{.*}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vaddpd %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v2f64_v2i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512-NEXT: vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
				; AVX512-NEXT: vpor {{.*}}(%rip), %xmm1, %xmm1
				; AVX512-NEXT: vpsrlq $32, %xmm0, %xmm0
				; AVX512-NEXT: vpor {{.*}}(%rip), %xmm0, %xmm0
				; AVX512-NEXT: vsubpd {{.*}}(%rip), %xmm0, %xmm0
				; AVX512-NEXT: vaddpd %xmm0, %xmm1, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call <2 x double>
				@llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x double> %result
				}

				define <2 x float> @constrained_vector_uitofp_v2f32_v2i64(<2 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v2f32_v2i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movdqa %xmm0, %xmm1
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: js .LBB174_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: jns .LBB174_5
				; CHECK-NEXT: .LBB174_4:
				; CHECK-NEXT: movq %rax, %rcx
				; CHECK-NEXT: shrq %rcx
				; CHECK-NEXT: andl $1, %eax
				; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: addss %xmm1, %xmm1
				; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB174_1:
				; CHECK-NEXT: movq %rax, %rcx
				; CHECK-NEXT: shrq %rcx
				; CHECK-NEXT: andl $1, %eax
				; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: addss %xmm0, %xmm0
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: js .LBB174_4
				; CHECK-NEXT: .LBB174_5: # %entry
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v2f32_v2i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB174_1
				; AVX1-NEXT: # %bb.2: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: jns .LBB174_5
				; AVX1-NEXT: .LBB174_4:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm0
				; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB174_1:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vaddss %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB174_4
				; AVX1-NEXT: .LBB174_5: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v2f32_v2i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtusi2ss %rax, %xmm1, %xmm1
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtusi2ss %rax, %xmm2, %xmm0
				; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
				; AVX512-NEXT: retq
				entry:
				%result = call <2 x float>
				@llvm.experimental.constrained.uitofp.v2f32.v2i64(<2 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <2 x float> %result
				}

				define <3 x double> @constrained_vector_uitofp_v3f64_v3i32(<3 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v3f64_v3i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm1, %eax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2sd %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2sd %rax, %xmm0
				; CHECK-NEXT: movsd %xmm0, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: fldl -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: movapd %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v3f64_v3i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vextractps $1, %xmm0, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm2, %xmm2
				; AVX1-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm2[0],xmm1[0]
				; AVX1-NEXT: vpextrd $2, %xmm0, %eax
				; AVX1-NEXT: vcvtsi2sd %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v3f64_v3i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vextractps $1, %xmm0, %eax
				; AVX512-NEXT: vcvtusi2sd %eax, %xmm1, %xmm1
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: vcvtusi2sd %eax, %xmm2, %xmm2
				; AVX512-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm2[0],xmm1[0]
				; AVX512-NEXT: vpextrd $2, %xmm0, %eax
				; AVX512-NEXT: vcvtusi2sd %eax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
				; AVX512-NEXT: retq
				entry:
				%result = call <3 x double>
				@llvm.experimental.constrained.uitofp.v3f64.v3i32(<3 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x double> %result
				}

				define <3 x float> @constrained_vector_uitofp_v3f32_v3i32(<3 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v3f32_v3i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: pshufd {{.*#+}} xmm2 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm2, %eax
				; CHECK-NEXT: xorps %xmm2, %xmm2
				; CHECK-NEXT: cvtsi2ss %rax, %xmm2
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm0[0]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v3f32_v3i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vextractps $1, %xmm0, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovd %xmm0, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX1-NEXT: vpextrd $2, %xmm0, %eax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v3f32_v3i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vextractps $1, %xmm0, %eax
				; AVX512-NEXT: vcvtusi2ss %eax, %xmm1, %xmm1
				; AVX512-NEXT: vmovd %xmm0, %eax
				; AVX512-NEXT: vcvtusi2ss %eax, %xmm2, %xmm2
				; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX512-NEXT: vpextrd $2, %xmm0, %eax
				; AVX512-NEXT: vcvtusi2ss %eax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX512-NEXT: retq
				entry:
				%result = call <3 x float>
				@llvm.experimental.constrained.uitofp.v3f32.v3i32(<3 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x float> %result
				}

				define <3 x double> @constrained_vector_uitofp_v3f64_v3i64(<3 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v3f64_v3i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %rdi, %xmm1
				; CHECK-NEXT: movdqa {{.*#+}} xmm2 = [1127219200,1160773632,0,0]
				; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1]
				; CHECK-NEXT: movapd {{.*#+}} xmm3 = [4.503599627370496E+15,1.9342813113834067E+25]
				; CHECK-NEXT: subpd %xmm3, %xmm1
				; CHECK-NEXT: movapd %xmm1, %xmm0
				; CHECK-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
				; CHECK-NEXT: addpd %xmm1, %xmm0
				; CHECK-NEXT: movq %rsi, %xmm4
				; CHECK-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
				; CHECK-NEXT: subpd %xmm3, %xmm4
				; CHECK-NEXT: movapd %xmm4, %xmm1
				; CHECK-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm4[1]
				; CHECK-NEXT: addpd %xmm4, %xmm1
				; CHECK-NEXT: movq %rdx, %xmm4
				; CHECK-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
				; CHECK-NEXT: subpd %xmm3, %xmm4
				; CHECK-NEXT: movapd %xmm4, %xmm2
				; CHECK-NEXT: unpckhpd {{.*#+}} xmm2 = xmm2[1],xmm4[1]
				; CHECK-NEXT: addpd %xmm4, %xmm2
				; CHECK-NEXT: movlpd %xmm2, -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: fldl -{{[0-9]+}}(%rsp)
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v3f64_v3i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vmovapd {{.*#+}} xmm1 = [1127219200,1160773632,0,0]
				; AVX1-NEXT: vunpcklps {{.*#+}} xmm2 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; AVX1-NEXT: vmovapd {{.*#+}} xmm3 = [4.503599627370496E+15,1.9342813113834067E+25]
				; AVX1-NEXT: vsubpd %xmm3, %xmm2, %xmm2
				; AVX1-NEXT: vpermilpd {{.*#+}} xmm4 = xmm2[1,0]
				; AVX1-NEXT: vaddpd %xmm2, %xmm4, %xmm2
				; AVX1-NEXT: vpermilps {{.*#+}} xmm4 = xmm0[2,3,0,1]
				; AVX1-NEXT: vunpcklps {{.*#+}} xmm4 = xmm4[0],xmm1[0],xmm4[1],xmm1[1]
				; AVX1-NEXT: vsubpd %xmm3, %xmm4, %xmm4
				; AVX1-NEXT: vpermilpd {{.*#+}} xmm5 = xmm4[1,0]
				; AVX1-NEXT: vaddpd %xmm4, %xmm5, %xmm4
				; AVX1-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],xmm4[0]
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; AVX1-NEXT: vsubpd %xmm3, %xmm0, %xmm0
				; AVX1-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]
				; AVX1-NEXT: vaddpd %xmm0, %xmm1, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v3f64_v3i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtusi2sd %rax, %xmm1, %xmm1
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtusi2sd %rax, %xmm2, %xmm2
				; AVX512-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm2[0],xmm1[0]
				; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm0
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtusi2sd %rax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
				; AVX512-NEXT: retq
				entry:
				%result = call <3 x double>
				@llvm.experimental.constrained.uitofp.v3f64.v3i64(<3 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x double> %result
				}

				define <3 x float> @constrained_vector_uitofp_v3f32_v3i64(<3 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v3f32_v3i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: testq %rsi, %rsi
				; CHECK-NEXT: js .LBB178_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: cvtsi2ss %rsi, %xmm1
				; CHECK-NEXT: testq %rdi, %rdi
				; CHECK-NEXT: jns .LBB178_5
				; CHECK-NEXT: .LBB178_4:
				; CHECK-NEXT: movq %rdi, %rax
				; CHECK-NEXT: shrq %rax
				; CHECK-NEXT: andl $1, %edi
				; CHECK-NEXT: orq %rax, %rdi
				; CHECK-NEXT: cvtsi2ss %rdi, %xmm0
				; CHECK-NEXT: addss %xmm0, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; CHECK-NEXT: testq %rdx, %rdx
				; CHECK-NEXT: jns .LBB178_8
				; CHECK-NEXT: .LBB178_7:
				; CHECK-NEXT: movq %rdx, %rax
				; CHECK-NEXT: shrq %rax
				; CHECK-NEXT: andl $1, %edx
				; CHECK-NEXT: orq %rax, %rdx
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rdx, %xmm1
				; CHECK-NEXT: addss %xmm1, %xmm1
				; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB178_1:
				; CHECK-NEXT: movq %rsi, %rax
				; CHECK-NEXT: shrq %rax
				; CHECK-NEXT: andl $1, %esi
				; CHECK-NEXT: orq %rax, %rsi
				; CHECK-NEXT: cvtsi2ss %rsi, %xmm1
				; CHECK-NEXT: addss %xmm1, %xmm1
				; CHECK-NEXT: testq %rdi, %rdi
				; CHECK-NEXT: js .LBB178_4
				; CHECK-NEXT: .LBB178_5: # %entry
				; CHECK-NEXT: cvtsi2ss %rdi, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
				; CHECK-NEXT: testq %rdx, %rdx
				; CHECK-NEXT: js .LBB178_7
				; CHECK-NEXT: .LBB178_8: # %entry
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rdx, %xmm1
				; CHECK-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm1[0]
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v3f32_v3i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB178_1
				; AVX1-NEXT: # %bb.2: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: jns .LBB178_5
				; AVX1-NEXT: .LBB178_4:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2
				; AVX1-NEXT: jmp .LBB178_6
				; AVX1-NEXT: .LBB178_1:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vaddss %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB178_4
				; AVX1-NEXT: .LBB178_5: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: .LBB178_6: # %entry
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB178_7
				; AVX1-NEXT: # %bb.8: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB178_7:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v3f32_v3i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512-NEXT: vcvtusi2ss %rax, %xmm1, %xmm1
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtusi2ss %rax, %xmm2, %xmm2
				; AVX512-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX512-NEXT: vextracti128 $1, %ymm0, %xmm0
				; AVX512-NEXT: vmovq %xmm0, %rax
				; AVX512-NEXT: vcvtusi2ss %rax, %xmm3, %xmm0
				; AVX512-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0],xmm1[3]
				; AVX512-NEXT: vzeroupper
				; AVX512-NEXT: retq
				entry:
				%result = call <3 x float>
				@llvm.experimental.constrained.uitofp.v3f32.v3i64(<3 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <3 x float> %result
				}

				define <4 x double> @constrained_vector_uitofp_v4f64_v4i32(<4 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v4f64_v4i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
				; CHECK-NEXT: movd %xmm1, %eax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2sd %rax, %xmm1
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm2 = xmm2[0],xmm1[0]
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm0[3,1,2,3]
				; CHECK-NEXT: movd %xmm1, %eax
				; CHECK-NEXT: cvtsi2sd %rax, %xmm3
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movd %xmm0, %eax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2sd %rax, %xmm1
				; CHECK-NEXT: unpcklpd {{.*#+}} xmm1 = xmm1[0],xmm3[0]
				; CHECK-NEXT: movapd %xmm2, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v4f64_v4i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5],xmm0[6],xmm1[7]
				; AVX1-NEXT: vcvtdq2pd %xmm1, %ymm1
				; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
				; AVX1-NEXT: vcvtdq2pd %xmm0, %ymm0
				; AVX1-NEXT: vmulpd {{.*}}(%rip), %ymm0, %ymm0
				; AVX1-NEXT: vaddpd %ymm1, %ymm0, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v4f64_v4i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpsrld $16, %xmm0, %xmm1
				; AVX512-NEXT: vcvtdq2pd %xmm1, %ymm1
				; AVX512-NEXT: vbroadcastsd {{.*#+}} ymm2 = [6.5536E+4,6.5536E+4,6.5536E+4,6.5536E+4]
				; AVX512-NEXT: vmulpd %ymm2, %ymm1, %ymm1
				; AVX512-NEXT: vxorpd %xmm2, %xmm2, %xmm2
				; AVX512-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; AVX512-NEXT: vcvtdq2pd %xmm0, %ymm0
				; AVX512-NEXT: vaddpd %ymm0, %ymm1, %ymm0
				; AVX512-NEXT: retq
				entry:
				%result = call <4 x double>
				@llvm.experimental.constrained.uitofp.v4f64.v4i32(<4 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x double> %result
				}

				define <4 x float> @constrained_vector_uitofp_v4f32_v4i32(<4 x i32> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v4f32_v4i32:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movaps {{.*#+}} xmm1 = [65535,65535,65535,65535]
				; CHECK-NEXT: andps %xmm0, %xmm1
				; CHECK-NEXT: cvtdq2ps %xmm1, %xmm1
				; CHECK-NEXT: psrld $16, %xmm0
				; CHECK-NEXT: cvtdq2ps %xmm0, %xmm0
				; CHECK-NEXT: mulps {{.*}}(%rip), %xmm0
				; CHECK-NEXT: addps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v4f32_v4i32:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3],xmm0[4],xmm1[5],xmm0[6],xmm1[7]
				; AVX1-NEXT: vcvtdq2ps %xmm1, %xmm1
				; AVX1-NEXT: vpsrld $16, %xmm0, %xmm0
				; AVX1-NEXT: vcvtdq2ps %xmm0, %xmm0
				; AVX1-NEXT: vmulps {{.*}}(%rip), %xmm0, %xmm0
				; AVX1-NEXT: vaddps %xmm1, %xmm0, %xmm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v4f32_v4i32:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpsrld $16, %xmm0, %xmm1
				; AVX512-NEXT: vcvtdq2ps %xmm1, %xmm1
				; AVX512-NEXT: vbroadcastss {{.*#+}} xmm2 = [6.5536E+4,6.5536E+4,6.5536E+4,6.5536E+4]
				; AVX512-NEXT: vmulps %xmm2, %xmm1, %xmm1
				; AVX512-NEXT: vxorps %xmm2, %xmm2, %xmm2
				; AVX512-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm2[1],xmm0[2],xmm2[3],xmm0[4],xmm2[5],xmm0[6],xmm2[7]
				; AVX512-NEXT: vcvtdq2ps %xmm0, %xmm0
				; AVX512-NEXT: vaddps %xmm0, %xmm1, %xmm0
				; AVX512-NEXT: retq
				entry:
				%result = call <4 x float>
				@llvm.experimental.constrained.uitofp.v4f32.v4i32(<4 x i32> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x float> %result
				}

				define <4 x double> @constrained_vector_uitofp_v4f64_v4i64(<4 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v4f64_v4i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movdqa {{.*#+}} xmm2 = [4294967295,4294967295]
				; CHECK-NEXT: movdqa %xmm0, %xmm3
				; CHECK-NEXT: pand %xmm2, %xmm3
				; CHECK-NEXT: movdqa {{.*#+}} xmm4 = [4841369599423283200,4841369599423283200]
				; CHECK-NEXT: por %xmm4, %xmm3
				; CHECK-NEXT: psrlq $32, %xmm0
				; CHECK-NEXT: movdqa {{.*#+}} xmm5 = [4985484787499139072,4985484787499139072]
				; CHECK-NEXT: por %xmm5, %xmm0
				; CHECK-NEXT: movapd {{.*#+}} xmm6 = [1.9342813118337666E+25,1.9342813118337666E+25]
				; CHECK-NEXT: subpd %xmm6, %xmm0
				; CHECK-NEXT: addpd %xmm3, %xmm0
				; CHECK-NEXT: pand %xmm1, %xmm2
				; CHECK-NEXT: por %xmm4, %xmm2
				; CHECK-NEXT: psrlq $32, %xmm1
				; CHECK-NEXT: por %xmm5, %xmm1
				; CHECK-NEXT: subpd %xmm6, %xmm1
				; CHECK-NEXT: addpd %xmm2, %xmm1
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v4f64_v4i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vxorps %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
				; AVX1-NEXT: vorps {{.*}}(%rip), %ymm1, %ymm1
				; AVX1-NEXT: vpsrlq $32, %xmm0, %xmm2
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vpsrlq $32, %xmm0, %xmm0
				; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm2, %ymm0
				; AVX1-NEXT: vorpd {{.*}}(%rip), %ymm0, %ymm0
				; AVX1-NEXT: vsubpd {{.*}}(%rip), %ymm0, %ymm0
				; AVX1-NEXT: vaddpd %ymm0, %ymm1, %ymm0
				; AVX1-NEXT: retq
				;
				; AVX512-LABEL: constrained_vector_uitofp_v4f64_v4i64:
				; AVX512: # %bb.0: # %entry
				; AVX512-NEXT: vpxor %xmm1, %xmm1, %xmm1
				; AVX512-NEXT: vpblendd {{.*#+}} ymm1 = ymm0[0],ymm1[1],ymm0[2],ymm1[3],ymm0[4],ymm1[5],ymm0[6],ymm1[7]
				; AVX512-NEXT: vpbroadcastq {{.*#+}} ymm2 = [4841369599423283200,4841369599423283200,4841369599423283200,4841369599423283200]
				; AVX512-NEXT: vpor %ymm2, %ymm1, %ymm1
				; AVX512-NEXT: vpsrlq $32, %ymm0, %ymm0
				; AVX512-NEXT: vpbroadcastq {{.*#+}} ymm2 = [4985484787499139072,4985484787499139072,4985484787499139072,4985484787499139072]
				; AVX512-NEXT: vpor %ymm2, %ymm0, %ymm0
				; AVX512-NEXT: vbroadcastsd {{.*#+}} ymm2 = [1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25,1.9342813118337666E+25]
				; AVX512-NEXT: vsubpd %ymm2, %ymm0, %ymm0
				; AVX512-NEXT: vaddpd %ymm0, %ymm1, %ymm0
				; AVX512-NEXT: retq
				entry:
				%result = call <4 x double>
				@llvm.experimental.constrained.uitofp.v4f64.v4i64(<4 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x double> %result
				}

				define <4 x float> @constrained_vector_uitofp_v4f32_v4i64(<4 x i64> %x) #0 {
				; CHECK-LABEL: constrained_vector_uitofp_v4f32_v4i64:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: js .LBB182_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: cvtsi2ss %rax, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: jns .LBB182_5
				; CHECK-NEXT: .LBB182_4:
				; CHECK-NEXT: movq %rax, %rcx
				; CHECK-NEXT: shrq %rcx
				; CHECK-NEXT: andl $1, %eax
				; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm3
				; CHECK-NEXT: addss %xmm3, %xmm3
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: jns .LBB182_8
				; CHECK-NEXT: .LBB182_7:
				; CHECK-NEXT: movq %rax, %rcx
				; CHECK-NEXT: shrq %rcx
				; CHECK-NEXT: andl $1, %eax
				; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: addss %xmm1, %xmm1
				; CHECK-NEXT: jmp .LBB182_9
				; CHECK-NEXT: .LBB182_1:
				; CHECK-NEXT: movq %rax, %rcx
				; CHECK-NEXT: shrq %rcx
				; CHECK-NEXT: andl $1, %eax
				; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: cvtsi2ss %rax, %xmm2
				; CHECK-NEXT: addss %xmm2, %xmm2
				; CHECK-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
				; CHECK-NEXT: movq %xmm1, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: js .LBB182_4
				; CHECK-NEXT: .LBB182_5: # %entry
				; CHECK-NEXT: cvtsi2ss %rax, %xmm3
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: js .LBB182_7
				; CHECK-NEXT: .LBB182_8: # %entry
				; CHECK-NEXT: xorps %xmm1, %xmm1
				; CHECK-NEXT: cvtsi2ss %rax, %xmm1
				; CHECK-NEXT: .LBB182_9: # %entry
				; CHECK-NEXT: unpcklps {{.*#+}} xmm2 = xmm2[0],xmm3[0],xmm2[1],xmm3[1]
				; CHECK-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
				; CHECK-NEXT: movq %xmm0, %rax
				; CHECK-NEXT: testq %rax, %rax
				; CHECK-NEXT: js .LBB182_10
				; CHECK-NEXT: # %bb.11: # %entry
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB182_10:
				; CHECK-NEXT: movq %rax, %rcx
				; CHECK-NEXT: shrq %rcx
				; CHECK-NEXT: andl $1, %eax
				; CHECK-NEXT: orq %rcx, %rax
				; CHECK-NEXT: xorps %xmm0, %xmm0
				; CHECK-NEXT: cvtsi2ss %rax, %xmm0
				; CHECK-NEXT: addss %xmm0, %xmm0
				; CHECK-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
				; CHECK-NEXT: movlhps {{.*#+}} xmm1 = xmm1[0],xmm2[0]
				; CHECK-NEXT: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				;
				; AVX1-LABEL: constrained_vector_uitofp_v4f32_v4i64:
				; AVX1: # %bb.0: # %entry
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB182_1
				; AVX1-NEXT: # %bb.2: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: jns .LBB182_5
				; AVX1-NEXT: .LBB182_4:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2
				; AVX1-NEXT: jmp .LBB182_6
				; AVX1-NEXT: .LBB182_1:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX1-NEXT: vaddss %xmm1, %xmm1, %xmm1
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB182_4
				; AVX1-NEXT: .LBB182_5: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX1-NEXT: .LBB182_6: # %entry
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
				; AVX1-NEXT: vmovq %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB182_7
				; AVX1-NEXT: # %bb.8: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm2
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: jns .LBB182_11
				; AVX1-NEXT: .LBB182_10:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vaddss %xmm0, %xmm0, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				; AVX1-NEXT: .LBB182_7:
				; AVX1-NEXT: movq %rax, %rcx
				; AVX1-NEXT: shrq %rcx
				; AVX1-NEXT: andl $1, %eax
				; AVX1-NEXT: orq %rcx, %rax
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm2
				; AVX1-NEXT: vaddss %xmm2, %xmm2, %xmm2
				; AVX1-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
				; AVX1-NEXT: vpextrq $1, %xmm0, %rax
				; AVX1-NEXT: testq %rax, %rax
				; AVX1-NEXT: js .LBB182_10
				; AVX1-NEXT: .LBB182_11: # %entry
				; AVX1-NEXT: vcvtsi2ss %rax, %xmm3, %xmm0
				; AVX1-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				;
				; AVX512F-LABEL: constrained_vector_uitofp_v4f32_v4i64:
				; AVX512F: # %bb.0: # %entry
				; AVX512F-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512F-NEXT: vcvtusi2ss %rax, %xmm1, %xmm1
				; AVX512F-NEXT: vmovq %xmm0, %rax
				; AVX512F-NEXT: vcvtusi2ss %rax, %xmm2, %xmm2
				; AVX512F-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX512F-NEXT: vextracti128 $1, %ymm0, %xmm0
				; AVX512F-NEXT: vmovq %xmm0, %rax
				; AVX512F-NEXT: vcvtusi2ss %rax, %xmm3, %xmm2
				; AVX512F-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm2[0],xmm1[3]
				; AVX512F-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512F-NEXT: vcvtusi2ss %rax, %xmm3, %xmm0
				; AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm1[0,1,2],xmm0[0]
				; AVX512F-NEXT: vzeroupper
				; AVX512F-NEXT: retq
				;
				; AVX512DQ-LABEL: constrained_vector_uitofp_v4f32_v4i64:
				; AVX512DQ: # %bb.0: # %entry
				; AVX512DQ-NEXT: vpextrq $1, %xmm0, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm1, %xmm1
				; AVX512DQ-NEXT: vmovq %xmm0, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm2, %xmm2
				; AVX512DQ-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[2,3]
				; AVX512DQ-NEXT: vextracti128 $1, %ymm0, %xmm2
				; AVX512DQ-NEXT: vmovq %xmm2, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm3, %xmm3
				; AVX512DQ-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1],xmm3[0],xmm1[3]
				; AVX512DQ-NEXT: vpextrq $1, %xmm2, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm4, %xmm2
				; AVX512DQ-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm2[0]
				; AVX512DQ-NEXT: vpbroadcastq {{.*#+}} ymm2 = [1,1,1,1]
				; AVX512DQ-NEXT: vpand %ymm2, %ymm0, %ymm2
				; AVX512DQ-NEXT: vpsrlq $1, %ymm0, %ymm3
				; AVX512DQ-NEXT: vpor %ymm3, %ymm2, %ymm2
				; AVX512DQ-NEXT: vpextrq $1, %xmm2, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm4, %xmm3
				; AVX512DQ-NEXT: vmovq %xmm2, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm4, %xmm4
				; AVX512DQ-NEXT: vinsertps {{.*#+}} xmm3 = xmm4[0],xmm3[0],xmm4[2,3]
				; AVX512DQ-NEXT: vextracti128 $1, %ymm2, %xmm2
				; AVX512DQ-NEXT: vmovq %xmm2, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm5, %xmm4
				; AVX512DQ-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0,1],xmm4[0],xmm3[3]
				; AVX512DQ-NEXT: vpextrq $1, %xmm2, %rax
				; AVX512DQ-NEXT: vcvtsi2ss %rax, %xmm5, %xmm2
				; AVX512DQ-NEXT: vinsertps {{.*#+}} xmm2 = xmm3[0,1,2],xmm2[0]
				; AVX512DQ-NEXT: vaddps %xmm2, %xmm2, %xmm2
				; AVX512DQ-NEXT: vxorps %xmm3, %xmm3, %xmm3
				; AVX512DQ-NEXT: vpcmpgtq %ymm0, %ymm3, %ymm0
				; AVX512DQ-NEXT: vpmovqd %zmm0, %ymm0
				; AVX512DQ-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
				; AVX512DQ-NEXT: vzeroupper
				; AVX512DQ-NEXT: retq
				entry:
				%result = call <4 x float>
				@llvm.experimental.constrained.uitofp.v4f32.v4i64(<4 x i64> %x,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret <4 x float> %result
				}

	attributes #0 = { strictfp }			attributes #0 = { strictfp }

	; Single width declarations			; Single width declarations
	declare <2 x double> @llvm.experimental.constrained.fadd.v2f64(<2 x double>, <2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.fadd.v2f64(<2 x double>, <2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.fsub.v2f64(<2 x double>, <2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.fsub.v2f64(<2 x double>, <2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double>, <2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.fmul.v2f64(<2 x double>, <2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.fdiv.v2f64(<2 x double>, <2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.fdiv.v2f64(<2 x double>, <2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.frem.v2f64(<2 x double>, <2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.frem.v2f64(<2 x double>, <2 x double>, metadata, metadata)
	Show All 20 Lines
	declare <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f64(<2 x double>, metadata)			declare <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f64(<2 x double>, metadata)
	declare <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double>, metadata)			declare <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double>, metadata)
	declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata, metadata)			declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata)			declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata)
	declare <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double>, metadata, metadata)
	declare <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>, metadata, metadata)			declare <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>, metadata, metadata)
				declare <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i32(<2 x i32>, metadata, metadata)
				declare <2 x float> @llvm.experimental.constrained.sitofp.v2f32.v2i32(<2 x i32>, metadata, metadata)
				declare <2 x double> @llvm.experimental.constrained.sitofp.v2f64.v2i64(<2 x i64>, metadata, metadata)
				declare <2 x float> @llvm.experimental.constrained.sitofp.v2f32.v2i64(<2 x i64>, metadata, metadata)
				declare <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i32(<2 x i32>, metadata, metadata)
				declare <2 x float> @llvm.experimental.constrained.uitofp.v2f32.v2i32(<2 x i32>, metadata, metadata)
				declare <2 x double> @llvm.experimental.constrained.uitofp.v2f64.v2i64(<2 x i64>, metadata, metadata)
				declare <2 x float> @llvm.experimental.constrained.uitofp.v2f32.v2i64(<2 x i64>, metadata, metadata)

	; Scalar width declarations			; Scalar width declarations
	declare <1 x float> @llvm.experimental.constrained.fadd.v1f32(<1 x float>, <1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.fadd.v1f32(<1 x float>, <1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.fsub.v1f32(<1 x float>, <1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.fsub.v1f32(<1 x float>, <1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.fmul.v1f32(<1 x float>, <1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.fmul.v1f32(<1 x float>, <1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.fdiv.v1f32(<1 x float>, <1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.fdiv.v1f32(<1 x float>, <1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.frem.v1f32(<1 x float>, <1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.frem.v1f32(<1 x float>, <1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.sqrt.v1f32(<1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.sqrt.v1f32(<1 x float>, metadata, metadata)
	Show All 19 Lines
	declare <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f64(<1 x double>, metadata)			declare <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f64(<1 x double>, metadata)
	declare <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f64(<1 x double>, metadata)			declare <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f64(<1 x double>, metadata)
	declare <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(<1 x double>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(<1 x double>, metadata, metadata)
	declare <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(<1 x float>, metadata)			declare <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(<1 x float>, metadata)
	declare <1 x float> @llvm.experimental.constrained.ceil.v1f32(<1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.ceil.v1f32(<1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.floor.v1f32(<1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.floor.v1f32(<1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.round.v1f32(<1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.round.v1f32(<1 x float>, metadata, metadata)
	declare <1 x float> @llvm.experimental.constrained.trunc.v1f32(<1 x float>, metadata, metadata)			declare <1 x float> @llvm.experimental.constrained.trunc.v1f32(<1 x float>, metadata, metadata)
				declare <1 x double> @llvm.experimental.constrained.sitofp.v1f64.v1i32(<1 x i32>, metadata, metadata)
				declare <1 x float> @llvm.experimental.constrained.sitofp.v1f32.v1i32(<1 x i32>, metadata, metadata)
				declare <1 x double> @llvm.experimental.constrained.sitofp.v1f64.v1i64(<1 x i64>, metadata, metadata)
				declare <1 x float> @llvm.experimental.constrained.sitofp.v1f32.v1i64(<1 x i64>, metadata, metadata)
				declare <1 x double> @llvm.experimental.constrained.uitofp.v1f64.v1i32(<1 x i32>, metadata, metadata)
				declare <1 x float> @llvm.experimental.constrained.uitofp.v1f32.v1i32(<1 x i32>, metadata, metadata)
				declare <1 x double> @llvm.experimental.constrained.uitofp.v1f64.v1i64(<1 x i64>, metadata, metadata)
				declare <1 x float> @llvm.experimental.constrained.uitofp.v1f32.v1i64(<1 x i64>, metadata, metadata)

	; Illegal width declarations			; Illegal width declarations
	declare <3 x float> @llvm.experimental.constrained.fadd.v3f32(<3 x float>, <3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.fadd.v3f32(<3 x float>, <3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.fadd.v3f64(<3 x double>, <3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.fadd.v3f64(<3 x double>, <3 x double>, metadata, metadata)
	declare <3 x float> @llvm.experimental.constrained.fsub.v3f32(<3 x float>, <3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.fsub.v3f32(<3 x float>, <3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.fsub.v3f64(<3 x double>, <3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.fsub.v3f64(<3 x double>, <3 x double>, metadata, metadata)
	declare <3 x float> @llvm.experimental.constrained.fmul.v3f32(<3 x float>, <3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.fmul.v3f32(<3 x float>, <3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.fmul.v3f64(<3 x double>, <3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.fmul.v3f64(<3 x double>, <3 x double>, metadata, metadata)
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	declare <3 x float> @llvm.experimental.constrained.ceil.v3f32(<3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.ceil.v3f32(<3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.ceil.v3f64(<3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.ceil.v3f64(<3 x double>, metadata, metadata)
	declare <3 x float> @llvm.experimental.constrained.floor.v3f32(<3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.floor.v3f32(<3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.floor.v3f64(<3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.floor.v3f64(<3 x double>, metadata, metadata)
	declare <3 x float> @llvm.experimental.constrained.round.v3f32(<3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.round.v3f32(<3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.round.v3f64(<3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.round.v3f64(<3 x double>, metadata, metadata)
	declare <3 x float> @llvm.experimental.constrained.trunc.v3f32(<3 x float>, metadata, metadata)			declare <3 x float> @llvm.experimental.constrained.trunc.v3f32(<3 x float>, metadata, metadata)
	declare <3 x double> @llvm.experimental.constrained.trunc.v3f64(<3 x double>, metadata, metadata)			declare <3 x double> @llvm.experimental.constrained.trunc.v3f64(<3 x double>, metadata, metadata)
				declare <3 x double> @llvm.experimental.constrained.sitofp.v3f64.v3i32(<3 x i32>, metadata, metadata)
				declare <3 x float> @llvm.experimental.constrained.sitofp.v3f32.v3i32(<3 x i32>, metadata, metadata)
				declare <3 x double> @llvm.experimental.constrained.sitofp.v3f64.v3i64(<3 x i64>, metadata, metadata)
				declare <3 x float> @llvm.experimental.constrained.sitofp.v3f32.v3i64(<3 x i64>, metadata, metadata)
				declare <3 x double> @llvm.experimental.constrained.uitofp.v3f64.v3i32(<3 x i32>, metadata, metadata)
				declare <3 x float> @llvm.experimental.constrained.uitofp.v3f32.v3i32(<3 x i32>, metadata, metadata)
				declare <3 x double> @llvm.experimental.constrained.uitofp.v3f64.v3i64(<3 x i64>, metadata, metadata)
				declare <3 x float> @llvm.experimental.constrained.uitofp.v3f32.v3i64(<3 x i64>, metadata, metadata)

	; Double width declarations			; Double width declarations
	declare <4 x double> @llvm.experimental.constrained.fadd.v4f64(<4 x double>, <4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.fadd.v4f64(<4 x double>, <4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.fsub.v4f64(<4 x double>, <4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.fsub.v4f64(<4 x double>, <4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.fmul.v4f64(<4 x double>, <4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.fmul.v4f64(<4 x double>, <4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.fdiv.v4f64(<4 x double>, <4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.fdiv.v4f64(<4 x double>, <4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.frem.v4f64(<4 x double>, <4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.frem.v4f64(<4 x double>, <4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.sqrt.v4f64(<4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.sqrt.v4f64(<4 x double>, metadata, metadata)
	Show All 19 Lines
	declare <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f64(<4 x double>, metadata)			declare <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f64(<4 x double>, metadata)
	declare <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f64(<4 x double>, metadata)			declare <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f64(<4 x double>, metadata)
	declare <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double>, metadata, metadata)			declare <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata)			declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata)
	declare <4 x double> @llvm.experimental.constrained.ceil.v4f64(<4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.ceil.v4f64(<4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.floor.v4f64(<4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.floor.v4f64(<4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.round.v4f64(<4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.round.v4f64(<4 x double>, metadata, metadata)
	declare <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double>, metadata, metadata)			declare <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double>, metadata, metadata)
				declare <4 x double> @llvm.experimental.constrained.sitofp.v4f64.v4i32(<4 x i32>, metadata, metadata)
				declare <4 x float> @llvm.experimental.constrained.sitofp.v4f32.v4i32(<4 x i32>, metadata, metadata)
				declare <4 x double> @llvm.experimental.constrained.sitofp.v4f64.v4i64(<4 x i64>, metadata, metadata)
				declare <4 x float> @llvm.experimental.constrained.sitofp.v4f32.v4i64(<4 x i64>, metadata, metadata)
				declare <4 x double> @llvm.experimental.constrained.uitofp.v4f64.v4i32(<4 x i32>, metadata, metadata)
				declare <4 x float> @llvm.experimental.constrained.uitofp.v4f32.v4i32(<4 x i32>, metadata, metadata)
				declare <4 x double> @llvm.experimental.constrained.uitofp.v4f64.v4i64(<4 x i64>, metadata, metadata)
				declare <4 x float> @llvm.experimental.constrained.uitofp.v4f32.v4i64(<4 x i64>, metadata, metadata)

llvm/test/Feature/fp-intrinsics.ll

	Show First 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
	; CHECK: call i64 @llvm.experimental.constrained.llround			; CHECK: call i64 @llvm.experimental.constrained.llround
	define i64 @f29() #0 {			define i64 @f29() #0 {
	entry:			entry:
	%result = call i64 @llvm.experimental.constrained.llround.i64.f32(float 42.0,			%result = call i64 @llvm.experimental.constrained.llround.i64.f32(float 42.0,
	metadata !"fpexcept.strict") #0			metadata !"fpexcept.strict") #0
	ret i64 %result			ret i64 %result
	}			}

				; Verify that sitofp(42) isn't simplified when the rounding mode is unknown.
				; CHECK-LABEL: @f30
				; CHECK: call double @llvm.experimental.constrained.sitofp
				define double @f30() #0 {
				entry:
				%result = call double @llvm.experimental.constrained.sitofp.f64.i32(i32 42,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

				; Verify that uitofp(42) isn't simplified when the rounding mode is unknown.
				; CHECK-LABEL: @f31
				; CHECK: call double @llvm.experimental.constrained.uitofp
				define double @f31() #0 {
				entry:
				%result = call double @llvm.experimental.constrained.uitofp.f64.i32(i32 42,
				metadata !"round.dynamic",
				metadata !"fpexcept.strict") #0
				ret double %result
				}

	attributes #0 = { strictfp }			attributes #0 = { strictfp }

	@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"			@llvm.fp.env = thread_local global i8 zeroinitializer, section "llvm.metadata"
	declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fdiv.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fmul.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fadd.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fsub.f64(double, double, metadata, metadata)
	declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.sqrt.f64(double, metadata, metadata)
	Show All 16 Lines
	declare i32 @llvm.experimental.constrained.lrint.i32.f64(double, metadata, metadata)			declare i32 @llvm.experimental.constrained.lrint.i32.f64(double, metadata, metadata)
	declare i32 @llvm.experimental.constrained.lrint.i32.f32(float, metadata, metadata)			declare i32 @llvm.experimental.constrained.lrint.i32.f32(float, metadata, metadata)
	declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)			declare i64 @llvm.experimental.constrained.llrint.i64.f64(double, metadata, metadata)
	declare i64 @llvm.experimental.constrained.llrint.i64.f32(float, metadata, metadata)			declare i64 @llvm.experimental.constrained.llrint.i64.f32(float, metadata, metadata)
	declare i32 @llvm.experimental.constrained.lround.i32.f64(double, metadata)			declare i32 @llvm.experimental.constrained.lround.i32.f64(double, metadata)
	declare i32 @llvm.experimental.constrained.lround.i32.f32(float, metadata)			declare i32 @llvm.experimental.constrained.lround.i32.f32(float, metadata)
	declare i64 @llvm.experimental.constrained.llround.i64.f64(double, metadata)			declare i64 @llvm.experimental.constrained.llround.i64.f64(double, metadata)
	declare i64 @llvm.experimental.constrained.llround.i64.f32(float, metadata)			declare i64 @llvm.experimental.constrained.llround.i64.f32(float, metadata)
				declare double @llvm.experimental.constrained.sitofp.f64.i32(i32, metadata, metadata)
				declare double @llvm.experimental.constrained.uitofp.f64.i32(i32, metadata, metadata)

This is an archive of the discontinued LLVM Phabricator instance.

Add constrained int->FP intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 234295

llvm/docs/LangRef.rst

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/include/llvm/IR/ConstrainedOps.def

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/IR/Verifier.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/fp-intrinsics.ll

llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll

llvm/test/Feature/fp-intrinsics.ll

Add constrained int->FP intrinsics
ClosedPublic