This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
2/29
LangRef.rst
-
include/llvm/
-
llvm/
-
CodeGen/
1/3
ISDOpcodes.h
-
SelectionDAGNodes.h
-
TargetLowering.h
-
IR/
-
IntrinsicInst.h
-
Intrinsics.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
10
LegalizeDAG.cpp
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
LegalizeVectorOps.cpp
-
LegalizeVectorTypes.cpp
2/5
SelectionDAG.cpp
7
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
TargetLowering.cpp
-
IR/
-
IntrinsicInst.cpp
5/11
Verifier.cpp
-
test/
-
CodeGen/X86/
-
X86/
2/6
fp-intrinsics.ll
4
vector-constrained-fp-intrinsics.ll
-
Feature/
4
fp-intrinsics.ll

Differential D43515

More math intrinsics for conservative math handling
AbandonedPublic

Authored by kpn on Feb 20 2018, 9:44 AM.

Download Raw Diff

Details

Reviewers

andrew.w.kaylor
craig.topper
hfinkel
mehdi_amini
aemerson
javed.absar
kbarton

Summary

This builds on D27028 and D32319's work on constrained math intrinsics.

Quoting from D32319: "The purpose of the constrained intrinsics is to force the optimizer to respect the restrictions that will be necessary to support things like the STDC FENV_ACCESS ON pragma without interfering with optimizations when these restrictions are not needed."

There are more patches coming, but I wanted to start with just a handful here.

Diff Detail

Event Timeline

kpn created this revision.Feb 20 2018, 9:44 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptFeb 20 2018, 9:44 AM

At the request of my employer's legal department:
Copyright © 2018 SAS Institute Inc., Cary, NC, USA. All Rights Reserved.

andrew.w.kaylor added inline comments.Feb 20 2018, 11:52 AM

docs/LangRef.rst
15159	I think you need to say more about the semantics. The LLVM IR fptoui and fptosi are specifically documented as rounding to zero. I don't think we want that with the constrained intrinsics so we need to very specifically document how they will be different from the standard instructions.
15206	This intrinsic is a bit troublesome. The llvm.round intrinsic says that it "returns the same values as the libm round functions would, and handles error conditions in the same way." The libm round function, in turn, is documented as rounding to the nearest integer (and away from zero in halfway cases) regardless of the current rounding mode. So what do we want the constrained form of the intrinsic to do? I think it needs to ignore the rounding mode. I'm not sure about exception behavior. If it doesn't respect exception behavior then we probably don't want to have the constrained form of this intrinsic at all.
15242	This seems to be leaking SelectionDAG implementation details into the IR space. How is this used?
15283	This is a replacement for fpext, right? I think you should say that somewhere.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
955	STRICT_FP_TO_SINT needs to do something more than this. The default lowering of FP_TO_SINT is known to raise spurious FE_INEXACT exceptions because it involves speculative execution.
1136–1176	Since this gets unique handling why isn't it just a separate case from the others?
1141	The style/formatting is wrong here. I think you need curly braces around your else-clause and the "else" itself needs to be on the same line as the curly brace above it.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6944	Why are you not attaching these nodes to the chain?

kpn added inline comments.Feb 21 2018, 12:29 PM

docs/LangRef.rst
15159	What do we want these intrinsics to do that is different from the normal IR fptoui and fptosi? I don't think any of the constrained intrinsics _today_ are doing anything with the rounding and exception metadata. That makes it hard for me to say much about it in documentation, today. Well, unless I misunderstood the current code. That's totally possible.
15206	We'll still need the constrained intrinsic to avoid getting reordered. How about if I rename the intrinsic to be fptrunc instead of round? Then the rounding would be explicit.
15242	It seems to exist for the MVT::ppcf128 type. A quick grep doesn't show any other users. Test coverage is lacking. I added a test using the intrinsic, but I had to mark it expected fail since I couldn't get it to work. To avoid the risk of bugs getting introduced later I went ahead and implemented the intrinsic. Would it be better to not have the intrinsic and to instead have a pass that replaces the non-STRICT SDNode with a STRICT version? That would avoid said leaking into the IR space. It would, however, mean that llvm would have an opinion on when STRICT nodes should be used. I'm not sure that's a good thing.
15283	I think I picked names for the intrinsics that matched the SelectionDAG node enum. Perhaps it would be better to match the bitcode language names? In which case this intrinsic would be "fpext" instead of "extend". Either way that's a good idea for the documentation to at least mention fpext.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
955	I have seen FP_TO_SINT cause traps when it shouldn't. Would having that default lowering use the chain in the STRICT_ case solve that issue?
1136–1176	Good point. And making it a separate case also takes care of the formatting issue in the else block.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6944	Because they need to match what the default lowering is expecting. Otherwise a variety of failures happen.

andrew.w.kaylor added inline comments.Feb 21 2018, 6:17 PM

docs/LangRef.rst
15159	You're right, none of the other intrinsics are doing anything specific with the rounding mode. The purpose of the intrinsics is to prevent the optimizer from doing things that would introduce compile-time rounding. However, the general assumption of the intrinsics is that whatever you set the rounding mode to at runtime is the rounding behavior you will get. I'd want to consult a front end expert as to exactly what should be happening. A quick glance at the C99 standard tells me that when a floating point number is converted to an integer it is truncated toward zero. I think that means the same thing as the LLVM language reference claim that fptoui and fptosi round the number toward zero. So I think that we don't want the runtime rounding mode to change the behavior of these intrinsics, and so we should document it as such.
15206	No, fptrunc does something else, right? My concern is that if this is preserving the behavior of the round library function (and I think we need to) then the rounding mode argument isn't relevant, so maybe it should be omitted in this case. The key thing is to be explicit about its behavior in the documentation and then make sure the implementation actually does what we say it will.
15242	I think it's better not to have the intrinsic for now. I don't understand what the SD node is doing well enough to say much more, but it looks to me like the SD node shouldn't be there either. It's a target-specific hack that leaked into the target-independent code if my understanding is correct.
15283	These intrinsics should be driven by what the front end needs. If no front end is generating an equivalent now then we don't want an intrinsic. So, yes, please match the bitcode language names.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
955	No, I don't think the chain will fix this. We need to implement strict lowering that does something different.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6944	There is code that removes the chain when the strict node is mutated to a non-strict node. That should be preventing lowering problems. I believe the chain was necessary to prevent re-ordering prior to final instruction selection.

kpn added inline comments.Feb 26 2018, 12:24 PM

docs/LangRef.rst
15159	Our front end guy here confirms truncation. I've updated my working copy to state that fptoui and fptosi truncate.
15206	Well, if I'm reading SelectionDAGBuilder::visitFPTrunc() correctly then fptrunc gets turned into ISD::FP_ROUND. So, no, renaming this intrinsic to be fptrunc would not be wrong. Contrast this with llvm.round getting turned into ISD::FROUND. I think we need constrained intrinsics for both, so this one in this patch should be renamed after fptrunc. I haven't touched the ISD::FROUND node yet, and that's what llvm.round gets turned into from the looks of SelectionDAGBuilder::visitIntrinsicCall(). That could be a later patch. I agree that the rounding mode should be ignored and shouldn't be in the intrinsic's metadata. WDYT?
15242	Done. It will be removed in the next diff.
15283	Will do.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
955	Is there something we can do at this level to fix this? If there is then I'm all for it, but if there isn't then we should probably still put the intrinsic in. We'll need it eventually, and currently none of the constrained intrinsics solve the complete optimization problem. So I wonder if this intrinsic is really all that different from the other experimental constrained intrinsics. If a backend models FP side effects then wouldn't the existing default lowering work correctly?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6944	Mutation happens too late in some cases. If it happened early enough then there wouldn't be any need for the strict intrinsics to be mentioned in SelectionDAGLegalize::ExpandNode(). Since it doesn't we can't use the chain. Should that mutation happen earlier?

andrew.w.kaylor added inline comments.Feb 28 2018, 11:25 AM

docs/LangRef.rst
15206	You shouldn't be thinking in terms of the SelectionDAG at this level. ISD::FP_ROUND is very poorly named and does not do the same thing that llvm.round does. ISD::FP_ROUND converts a floating point number to a smaller type, but not necessarily an integer. That's fptrunc. I suppose you are right that we do need a constrained version of fptrunc. I'm a bit concerned by the things that the LLVM language definition says are undefined. Those are the cases that will be of most interest for the constrained case and we should document the expected behavior, but I think we need to consider why the current spec says the result is undefined. In any event llvm.round, which is what I thought you were replacing here, is something completely different.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
955	To be honest I'm not entirely sure how to fix this in the current selection DAG model. The issue is that we need to introduce a branch to fix the problem, but by the time we're selecting instructions it's too late to do that. I think it needs to be addressed when we're building the DAG,
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6944	We need the mutation to be put off as long as possible. I think it should be possible to unlink the chain in ExpandNode if necessary. Depending on what's happening there, we might even want the expanded node to make use of the chain.

kpn added inline comments.Mar 6 2018, 6:13 AM

docs/LangRef.rst
15283	Say, now that I think about this, what does rounding have to do with extending a FP value? Shouldn't this intrinsic just plain not have rounding metadata?

andrew.w.kaylor added inline comments.Mar 6 2018, 9:14 AM

docs/LangRef.rst
15283	That's a reasonable point. The same issue came up with frem. The rounding mode doesn't apply to frem, but I kept the argument just so that it could be handled internally the same way as the other constrained intrinsics and then documented in the language reference that the rounding mode has no effect. I'm not completely convinced that was a good decision on my part. If you want to look at what would have to change to support constrained intrinsics with no rounding mode argument, I would likely support that. I don't think it would be much extra handling. There are just some things where the class that is used to represent the intrinsics (ConstrainedFPIntrinsic) would need to be aware of the possibility that this argument is omitted.

This new diff adds changing the default lowering of STRICT_FP_TO_UINT to not allow speculative execution. I've also eliminated the rounding metadata from instructions when it wasn't needed.

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 1 2018, 7:32 AM

Herald added a subscriber: mgorny. · View Herald Transcript

Missed one place in the documentation that needed updating.

At some point we should create a document that describes the entire flow of FP instructions through the instruction selection process. To be honest I don't remember how it all works, and that makes it difficult to review changes like this. It would also be nice to verify that we all have the same understanding of how it works. I don't mean to volunteer you to produce the entire document, but would you mind giving me a rough outline? I'm still concerned about the case that is not chained.

docs/LangRef.rst
15201	You need to do something more here to document the difference between the return type and the argument type. Also in fpext below.
lib/CodeGen/StrictFP.cpp
11 ↗	(On Diff #144722)	Can you say more here about what these transformations are? It's clear that you intend this as a generic pass that currently does one thing but might have others added later. That's good, but I'd like to see the possibilities described here as they are implemented.
74 ↗	(On Diff #144722)	Per the LLVM Programmer's Manual (http://llvm.org/docs/ProgrammersManual.html#iterating-over-the-instruction-in-a-function) you should be using an inst_iterator here.
136 ↗	(On Diff #144722)	Could you add an example here of what the resulting IR will look like? It would make the code a lot easier to follow.
144 ↗	(On Diff #144722)	I think you can use "IntDst->getBitWidth()" here. Also, APInt::getSignedMinValue() does this same thing and is a bit more self-documenting.
150 ↗	(On Diff #144722)	I believe conversion of a NaN to an integer should raise "INVALID" (which the fcmp will) and then the result is undefined, but the 'true' case does less so I think ULT is preferable.
151 ↗	(On Diff #144722)	We are going to need a constrained version of fcmp, and when we have it you should use it here. When the IRBuilder supports constrained floating point modes, it would be nice to use that here but I guess you can't do that yet, so maybe just a comment saying we should later?
153 ↗	(On Diff #144722)	This is an odd name. How about "within.sint.range"? In any case, I think '.' is more common than '_' as a name space holder, probably because the name will automatically get '.<n>' appended if it's a duplicate.
201 ↗	(On Diff #144722)	This description is wrong.
lib/CodeGen/TargetPassConfig.cpp
566 ↗	(On Diff #144722)	This doesn't seem like the right place to do this. Should it be happening much later, like around CodeGenPrepare?
lib/IR/Verifier.cpp
4746	Since you've broken this out into a switch statement, can you separate the unary and ternary ops and give them each the appropriate assert? I think that would be much more readable than this compound check (which I realize was my creation).
4762	The default should probably always been an error.
4772	How about this? int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1); On the other hand, are we ever going to have intrinsics that have a rounding mode but not exception behavior?
test/CodeGen/X86/fp-intrinsics.ll
281	Can you check the entire expanded IR pattern here? It might be worth having a separate test that verifies the StrictFP pass in isolation.
356	I believe your decorated intrinsic names are incorrect in all of these cases. The name needs to specify both return type and argument type. Take a look at what opt produces if you give it these names as inputs.

cameron.mcinally added a subscriber: cameron.mcinally.May 24 2018, 7:48 AM

kpn updated this revision to Diff 152072.Jun 20 2018, 6:05 AM

kpn marked 12 inline comments as done.

kpn added inline comments.

lib/CodeGen/StrictFP.cpp
144 ↗	(On Diff #144722)	error: no member named `getBitWidth' in` llvm::Value'.

kpn added inline comments.Jun 20 2018, 6:06 AM

docs/LangRef.rst
15201	I hope this is what you mean. I've changed it to show that the result is a different type. I've followed the naming scheme used elsewhere in this document.
lib/IR/Verifier.cpp
4772	Done. I don't know if we'll have a rounding mode but not exceptions. I doubt it, but I can't say for certain.

craig.topper added inline comments.Jun 20 2018, 11:52 PM

lib/CodeGen/StrictFP.cpp
76 ↗	(On Diff #152072)	I believe you should be getting TM by doing this. auto *TPC = getAnalysisIfAvailable<TargetPassConfig>(); if (!TPC) return false; auto &TM = TPC->getTM<TargetMachine>(); There is only one other pass that takes TargetMachine in its constructor. The others use what I've put above. So I believe that is the preferred way.
84 ↗	(On Diff #152072)	There is little reason to pass Context around. Value has a getContext as does Type. So you can get the context easily whenever you need it.
99 ↗	(On Diff #152072)	This cast is unnecessary. IntrinsicInst is a subclass of Value
101 ↗	(On Diff #152072)	This should use TLI->getValueType.
166 ↗	(On Diff #152072)	Why do you need a SmallVector here? Why can't you just call getArgOperand?
167 ↗	(On Diff #152072)	This cast is unecessary.
171 ↗	(On Diff #152072)	What if the intrinsic uses a vector type?

uweigand added a subscriber: uweigand.Jun 25 2018, 6:30 AM

kpn marked 6 inline comments as done.Jul 26 2018, 9:15 AM

kpn added inline comments.

lib/CodeGen/StrictFP.cpp
171 ↗	(On Diff #152072)	It would have been caught by the IR verifier. A vector would have been rejected there.

kpn updated this revision to Diff 157504.Jul 26 2018, 9:16 AM

craig.topper added inline comments.Jul 26 2018, 9:56 AM

lib/CodeGen/StrictFP.cpp
171 ↗	(On Diff #152072)	I don't see where the IR verifier rejects vectors. I just see that it checks that element counts are equal. And why should it reject vectors? We need to support fptosi/fptoui for vectors.

kpn added inline comments.Jul 26 2018, 9:59 AM

lib/CodeGen/StrictFP.cpp
171 ↗	(On Diff #152072)	It doesn't reject them, now. This latest patch added support for vectors. My inexperience with Phabricator lost the comment that said that.

Delete a line accidentally left in.

Rebase. Ping.

In D43515#1013383, @kpn wrote:

At the request of my employer's legal department:
Copyright © 2018 SAS Institute Inc., Cary, NC, USA. All Rights Reserved.

If it's not just a remark, but is supposed to have some legal/whatever meaning,
i'm not sure some comment in some review is the correct direction.

docs/LangRef.rst
15128	This probably has insufficient amount of `^`. Might want to actually test-build the docs.

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

docs/LangRef.rst
15062	This change should be in a separate patch. There's too much going on in this patch and this is easy to overlook.

In D43515#1229127, @craig.topper wrote:

Adding new constrained instrinsics and adding the pass should be separate patches I think. Changing the syntax of frem should be another patch.

Will do.

Split out changes as requested. This diff is just the four new intrinsics. The fptoui pass and the change to frem will be later.

I've also corrected some documentation issues in this iteration.

Ping

Herald added a subscriber: arphaman. · View Herald TranscriptOct 3 2018, 11:44 AM

craig.topper added inline comments.Oct 3 2018, 5:22 PM

docs/LangRef.rst
15209	This reads funny. I think it should maybe be "result of truncating a floating point"
15243	This also reads funny
include/llvm/CodeGen/ISDOpcodes.h
590	These need comments. Does STRICT_FP_ROUND have the TRUNC argument that FP_ROUND has?

kpn added inline comments.Oct 4 2018, 6:42 AM

docs/LangRef.rst
15209	How about if I just copy the text used by the normal fptrunc instruction?
15243	Same as fptrunc. I could just copy the text from the fpext instruction?
include/llvm/CodeGen/ISDOpcodes.h
590	I could rearrange and lump them in with the non-strict versions of each. Then I'd just need one extra line restating that the STRICT_ versions prevent optimizations. Yes, STRICT_FP_ROUND does have the TRUNC argument that FP_ROUND has, but it is currently always zero. Fixing this require rerouting this one strict node to go through the same codepath as the non-strict node. That would make it different from all the other constrained nodes. Should I go ahead and make that change? I _think_ it is safe if the TRUNC argument really does work like it is documented. I also just noticed that I need to put back STRICT_FP_TO_UINT in at least one place. It should be everywhere STRICT_FP_TO_SINT is handled _except_ in the default lowering.

craig.topper added inline comments.Oct 4 2018, 10:51 AM

docs/LangRef.rst
15209	Sure
15243	Sure
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7767	This doesn't copy the second argument to FP_ROUJND over does it?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6948–6953	Is this adding a second argument to STRICT_FP_EXTEND as well? I don't think the non-strict FP_EXTEND has two arguments.

kpn added inline comments.Oct 4 2018, 10:53 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7767	No. It should.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6948–6953	Agreed. I'll fix it.

cameron.mcinally added inline comments.Oct 4 2018, 11:43 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7743	There are a lot of comments on this, so I may have missed something. Take with a grain of salt... I don't think these are correct. These can trap so can't be speculatively executed. They would need a chain.

kpn added inline comments.Oct 4 2018, 11:58 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7743	I couldn't figure out how to have these be chained but have the non-strict continue to not be chained. Too many things fell over if they didn't match.

cameron.mcinally added inline comments.Oct 5 2018, 8:24 AM

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7743	I'm not an expert with this code, so cc @andrew.w.kaylor. This doesn't seem like the right direction though. Maybe these unchained operations should be left to a different patch until a proper solution is found.

kpn mentioned this in D53216: [FPEnv] Add constrained intrinsics for MAXNUM and MINNUM.Oct 18 2018, 11:24 AM

kpn mentioned this in D53411: [FPEnv] Add constrained CEIL/FLOOR/ROUND/TRUNC intrinsics.Oct 19 2018, 5:56 AM

kpn marked 5 inline comments as done.Nov 1 2018, 9:08 AM

Address review comments.

Add use of the chain to these four new SDNode types.

cameron.mcinally added inline comments.Nov 5 2018, 11:51 AM

test/CodeGen/X86/fp-intrinsics.ll
293	Same here as the vector version below. Do we want the truncating convert? That was surprising to me.
356	Do we want unsigned convert tests too? fptoui? I see that there are SystemZ tests to cover them, so maybe that's sufficient? Just pointing this out so others can see.
test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
4026	This surprised me. Should this be the truncating convert? Or should it be vcvtsd2si?
5586	Same question as the scalar versions. Do we want <4 x float> to <4 x i32>/etc casts?
test/Feature/fp-intrinsics.ll
311	I haven't been following along closely, so please forgive if this was already discussed... Should we have float->i32 casts too? Also double->i64?
313	Should 'fpext.f64' have a float argument instead of a double?

kpn added inline comments.Nov 5 2018, 12:36 PM

test/CodeGen/X86/fp-intrinsics.ll
356	The SystemZ tests target hardware new enough to lower to a single instruction. The tests for fptoui on x86 use the default lowering, but the default lowering is disallowed since it does speculative execution and traps. The support for fixing that I had in this patch but was asked to split it out into another patch. So there's no constrained fptoui test here using the default lowering in this patch.
test/Feature/fp-intrinsics.ll
313	Probably, yes. Will fix.

kpn added inline comments.Nov 6 2018, 10:29 AM

test/CodeGen/X86/fp-intrinsics.ll
293	The strict intrinsic results in the same instruction as the regular fptosi instruction. Having them be the same means the mutation from strict to non-strict is working correctly. And, yes, rounding towards zero is correct. That's why there's no rounding metadata.
test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
4026	Yes, it should be a truncating conversion.
5586	On the odd chance that it may tickle the vector legalizer I'll go ahead and add tests with float. Same answer as the scalar tests: Rounding towards zero is correct.
test/Feature/fp-intrinsics.ll
311	This is an opt test. I'm not sure we'd benefit from placing those extra tests here.

Rebase.

Minor test changes.

cameron.mcinally added inline comments.Nov 28 2018, 10:59 AM

include/llvm/CodeGen/ISDOpcodes.h
590	This ordering does not mesh with what is already in place. The STRICT_XXX opcodes are currently clustered together, where these new opcodes are grouped with their non-strict counterparts. I'm not opposed to grouping the corresponding non-strict and strict opcodes, but that would probably be better left to a separate patch. I think it makes sense to keep everything uniform until a final decision is made, for clarity's sake.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2850	This doesn't seem correct. Shouldn't Node->getOperand(0) be the argument for FP_ROUND and the Chain for STRICT_FP_ROUND? Same for FP_EXTEND too. Also, does using a truncating store provide us with the same trapping behavior as an explicit trunc instruction? I don't know off the top of my head, but it may be different. To be fair, a user probably won't care too much about optimizing away a trap, but the purists might.
2909–2916	This seems incorrect, unless I missed something. The first operand to FP_TO_SINT should be the argument, but the first operand to STRICT_FP_TO_SINT should be the Chain. It does not look like expandFP_TO_SINT accounts for that.

kpn mentioned this in D55897: Add constrained fptrunc and fpext intrinsics.Dec 19 2018, 12:20 PM

kpn mentioned this in D59830: [FPEnv] Make constrained FP IR verification more flexible..Mar 26 2019, 11:33 AM

kpn mentioned this in D59833: [FPEnv] New document for adding new constrained FP intrinsics.Mar 26 2019, 11:58 AM

kpn mentioned this in rL357065: The IR verifier currently supports the constrained floating point intrinsics,.Mar 27 2019, 6:30 AM

kpn mentioned this in rG4f3cdc6555ca: The IR verifier currently supports the constrained floating point intrinsics….

pengfei added a subscriber: pengfei.May 4 2019, 11:05 PM

kpn added a subscriber: kbarton.May 10 2019, 12:32 PM

A few minor comments about commoning asserts and using dyn_cast instead of cast<>.
Aside from that, I think this looks good. That said, I'm by no means an expert in this area so don't feel I'm qualified to give a final approval to commit.

lib/IR/Verifier.cpp
4759	Could you use the isFPOrFPVectorTy here to check both conditions, and then remove the assert in the else below?
4760	Can you use a dyn_cast here instead? if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) { do vector stuff }
4775	Similarly, could you use isIntOrIntVectorTy here and remove the assert in the if/else below?
4799	same comment about commoning the asserts here.
4843	It looks like getArgOperand expects an unsigned. Is there a specific reason you are using an int here? If it's possible that RoundingIdx is negative, then you should probably add an assert here before passing it to getArgOperand.

The only thing left in this ticket to commit is the fptosi and fptoui changes. I'm working on incorporating review comments from D55897 into this ticket and fixing other bugs that I'm coming across. I'll hopefully update this ticket next week.

lib/IR/Verifier.cpp
4760	Yes, that's much more concise.
4799	The fptrunc and fpext changes were split out into another ticket and updated there. That committed code is more concise in I suspect the way you are intending.

kpn added a subscriber: ajwock.May 21 2019, 6:26 AM

Address review comments from here and from D55897.

This patch is now down to just handling fptosi and fptoui.

kpn added a reviewer: kbarton.May 30 2019, 11:47 AM

kpn mentioned this in D53157: Teach the IRBuilder about constrained fadd and friends.May 31 2019, 9:52 AM

Ping.

Ping

How would you feel about rebooting this as a new patch? There's a lot of irrelevant history here, and I feel like I'm missing some context as I review it.

In general, I see that you're down to just implementing the fptosi and fptoui cases. I'm concerned about what happens in the fptoui case. It's mentioned in a few of the earlier comments that the default expansion of this opcode introduces speculative exceptions, and if that's being handled in the latest implementation I haven't read it closely enough to see what's going on. If it is being handled, I'd expect to see a comment block somewhere explaining what's being done.

In D43515#1547200, @andrew.w.kaylor wrote:

How would you feel about rebooting this as a new patch? There's a lot of irrelevant history here, and I feel like I'm missing some context as I review it.

I can do that.

In general, I see that you're down to just implementing the fptosi and fptoui cases. I'm concerned about what happens in the fptoui case. It's mentioned in a few of the earlier comments that the default expansion of this opcode introduces speculative exceptions, and if that's being handled in the latest implementation I haven't read it closely enough to see what's going on. If it is being handled, I'd expect to see a comment block somewhere explaining what's being done.

I believe the speculative exceptions should be largely fixed by r348251, committed by rksimon. I changed the code to continue to use strict nodes when expanding a strict node. Since the strict nodes are chained, does that not solve the part of the problem not solved by r348251? Is the existing comment not enough, or do I need to copy some of the commit message into code comments?

In D43515#1548845, @kpn wrote:

I believe the speculative exceptions should be largely fixed by r348251, committed by rksimon. I changed the code to continue to use strict nodes when expanding a strict node. Since the strict nodes are chained, does that not solve the part of the problem not solved by r348251? Is the existing comment not enough, or do I need to copy some of the commit message into code comments?

Sorry, I wasn't aware of Simon's change. That definitely simplifies what needs to be done here, and, yes, the existing comment is sufficient.

Replaced by D63782.

Revision Contents

Path

Size

docs/

LangRef.rst

66 lines

include/

llvm/

CodeGen/

ISDOpcodes.h

8 lines

SelectionDAGNodes.h

2 lines

TargetLowering.h

2 lines

IR/

IntrinsicInst.h

2 lines

Intrinsics.td

11 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

16 lines

LegalizeIntegerTypes.cpp

20 lines

LegalizeTypes.h

1 line

LegalizeVectorOps.cpp

4 lines

LegalizeVectorTypes.cpp

31 lines

SelectionDAG.cpp

4 lines

SelectionDAGBuilder.cpp

8 lines

SelectionDAGDumper.cpp

2 lines

TargetLowering.cpp

51 lines

IR/

IntrinsicInst.cpp

2 lines

Verifier.cpp

29 lines

test/

CodeGen/

X86/

fp-intrinsics.ll

27 lines

vector-constrained-fp-intrinsics.ll

862 lines

Feature/

fp-intrinsics.ll

25 lines

Diff 201965

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 15,053 Lines • ▼ Show 20 Lines

	Syntax:			Syntax:
	"""""""			"""""""

	::			::

	declare <type>			declare <type>
	@llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,			@llvm.experimental.constrained.frem(<type> <op1>, <type> <op2>,
	metadata <rounding mode>,			metadata <rounding mode>,
	craig.topperUnsubmitted Not Done Reply Inline Actions This change should be in a separate patch. There's too much going on in this patch and this is easy to overlook. craig.topper: This change should be in a separate patch. There's too much going on in this patch and this is…
	metadata <exception behavior>)			metadata <exception behavior>)

	Overview:			Overview:
	"""""""""			"""""""""

	The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder			The '``llvm.experimental.constrained.frem``' intrinsic returns the remainder
	from the division of its two operands.			from the division of its two operands.

	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

	Semantics:			Semantics:
	""""""""""			""""""""""

	The result produced is the product of the first two operands added to the third			The result produced is the product of the first two operands added to the third
	operand computed with infinite precision, and then rounded to the target			operand computed with infinite precision, and then rounded to the target
	precision.			precision.

				'``llvm.experimental.constrained.fptoui``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				lebedev.riUnsubmitted Not Done Reply Inline Actions This probably has insufficient amount of `^`. Might want to actually test-build the docs. lebedev.ri: This probably has insufficient amount of `^`. Might want to actually test-build the docs.

				Syntax:
				"""""""

				::

				declare <ty2>
				@llvm.experimental.constrained.fptoui(<type> <value>,
				metadata <exception behavior>)

				Overview:
				"""""""""

				The '``llvm.experimental.constrained.fptoui``' intrinsic converts a
				floating-point ``value`` to its unsigned integer equivalent of type ``ty2``.

				Arguments:
				""""""""""

				The first argument to the '``llvm.experimental.constrained.fptoui``'
				intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
				<t_vector>` of floating point values.

				The second argument specifies the exception behavior as described above.

				Semantics:
				""""""""""

				The result produced is an unsigned integer converted from the floating
				point operand. The value is truncated, so it is rounded towards zero.

				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I think you need to say more about the semantics. The LLVM IR fptoui and fptosi are specifically documented as rounding to zero. I don't think we want that with the constrained intrinsics so we need to very specifically document how they will be different from the standard instructions. andrew.w.kaylor: I think you need to say more about the semantics. The LLVM IR fptoui and fptosi are…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions What do we want these intrinsics to do that is different from the normal IR fptoui and fptosi? I don't think any of the constrained intrinsics _today_ are doing anything with the rounding and exception metadata. That makes it hard for me to say much about it in documentation, today. Well, unless I misunderstood the current code. That's totally possible. kpn: What do we want these intrinsics to do that is different from the normal IR fptoui and fptosi?
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You're right, none of the other intrinsics are doing anything specific with the rounding mode. The purpose of the intrinsics is to prevent the optimizer from doing things that would introduce compile-time rounding. However, the general assumption of the intrinsics is that whatever you set the rounding mode to at runtime is the rounding behavior you will get. I'd want to consult a front end expert as to exactly what should be happening. A quick glance at the C99 standard tells me that when a floating point number is converted to an integer it is truncated toward zero. I think that means the same thing as the LLVM language reference claim that fptoui and fptosi round the number toward zero. So I think that we don't want the runtime rounding mode to change the behavior of these intrinsics, and so we should document it as such. andrew.w.kaylor: You're right, none of the other intrinsics are doing anything specific with the rounding mode.
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Our front end guy here confirms truncation. I've updated my working copy to state that fptoui and fptosi truncate. kpn: Our front end guy here confirms truncation. I've updated my working copy to state that fptoui…
				'``llvm.experimental.constrained.fptosi``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare <ty2>
				@llvm.experimental.constrained.fptosi(<type> <value>,
				metadata <exception behavior>)

				Overview:
				"""""""""

				The '``llvm.experimental.constrained.fptosi``' intrinsic converts
				:ref:`floating-point <t_floating>` ``value`` to type ``ty2``.

				Arguments:
				""""""""""

				The first argument to the '``llvm.experimental.constrained.fptosi``'
				intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
				<t_vector>` of floating point values.

				The second argument specifies the exception behavior as described above.

				Semantics:
				""""""""""

				The result produced is a signed integer converted from the floating
				point operand. The value is truncated, so it is rounded towards zero.

	'``llvm.experimental.constrained.fptrunc``' Intrinsic			'``llvm.experimental.constrained.fptrunc``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	::			::

	declare <ty2>			declare <ty2>
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You need to do something more here to document the difference between the return type and the argument type. Also in fpext below. andrew.w.kaylor: You need to do something more here to document the difference between the return type and the…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions I hope this is what you mean. I've changed it to show that the result is a different type. I've followed the naming scheme used elsewhere in this document. kpn: I hope this is what you mean. I've changed it to show that the result is a different type. I've…
	@llvm.experimental.constrained.fptrunc(<type> <value>,			@llvm.experimental.constrained.fptrunc(<type> <value>,
	metadata <rounding mode>,			metadata <rounding mode>,
	metadata <exception behavior>)			metadata <exception behavior>)

	Overview:			Overview:
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This intrinsic is a bit troublesome. The llvm.round intrinsic says that it "returns the same values as the libm round functions would, and handles error conditions in the same way." The libm round function, in turn, is documented as rounding to the nearest integer (and away from zero in halfway cases) regardless of the current rounding mode. So what do we want the constrained form of the intrinsic to do? I think it needs to ignore the rounding mode. I'm not sure about exception behavior. If it doesn't respect exception behavior then we probably don't want to have the constrained form of this intrinsic at all. andrew.w.kaylor: This intrinsic is a bit troublesome. The llvm.round intrinsic says that it "returns the same…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions We'll still need the constrained intrinsic to avoid getting reordered. How about if I rename the intrinsic to be fptrunc instead of round? Then the rounding would be explicit. kpn: We'll still need the constrained intrinsic to avoid getting reordered. How about if I rename…
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions No, fptrunc does something else, right? My concern is that if this is preserving the behavior of the round library function (and I think we need to) then the rounding mode argument isn't relevant, so maybe it should be omitted in this case. The key thing is to be explicit about its behavior in the documentation and then make sure the implementation actually does what we say it will. andrew.w.kaylor: No, fptrunc does something else, right? My concern is that if this is preserving the behavior…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Well, if I'm reading SelectionDAGBuilder::visitFPTrunc() correctly then fptrunc gets turned into ISD::FP_ROUND. So, no, renaming this intrinsic to be fptrunc would not be wrong. Contrast this with llvm.round getting turned into ISD::FROUND. I think we need constrained intrinsics for both, so this one in this patch should be renamed after fptrunc. I haven't touched the ISD::FROUND node yet, and that's what llvm.round gets turned into from the looks of SelectionDAGBuilder::visitIntrinsicCall(). That could be a later patch. I agree that the rounding mode should be ignored and shouldn't be in the intrinsic's metadata. WDYT? kpn: Well, if I'm reading SelectionDAGBuilder::visitFPTrunc() correctly then fptrunc gets turned…
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions You shouldn't be thinking in terms of the SelectionDAG at this level. ISD::FP_ROUND is very poorly named and does not do the same thing that llvm.round does. ISD::FP_ROUND converts a floating point number to a smaller type, but not necessarily an integer. That's fptrunc. I suppose you are right that we do need a constrained version of fptrunc. I'm a bit concerned by the things that the LLVM language definition says are undefined. Those are the cases that will be of most interest for the constrained case and we should document the expected behavior, but I think we need to consider why the current spec says the result is undefined. In any event llvm.round, which is what I thought you were replacing here, is something completely different. andrew.w.kaylor: You shouldn't be thinking in terms of the SelectionDAG at this level. ISD::FP_ROUND is very…
	"""""""""			"""""""""

	The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value``			The '``llvm.experimental.constrained.fptrunc``' intrinsic truncates ``value``
				craig.topperUnsubmitted Not Done Reply Inline Actions This reads funny. I think it should maybe be "result of truncating a floating point" craig.topper: This reads funny. I think it should maybe be "result of truncating a floating point"
				kpnAuthorUnsubmitted Not Done Reply Inline Actions How about if I just copy the text used by the normal fptrunc instruction? kpn: How about if I just copy the text used by the normal fptrunc instruction?
				craig.topperUnsubmitted Done Reply Inline Actions Sure craig.topper: Sure
	to type ``ty2``.			to type ``ty2``.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument to the '``llvm.experimental.constrained.fptrunc``'			The first argument to the '``llvm.experimental.constrained.fptrunc``'
	intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector			intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
	<t_vector>` of floating point values. This argument must be larger in size			<t_vector>` of floating point values. This argument must be larger in size
	Show All 16 Lines

	::			::

	declare <ty2>			declare <ty2>
	@llvm.experimental.constrained.fpext(<type> <value>,			@llvm.experimental.constrained.fpext(<type> <value>,
	metadata <exception behavior>)			metadata <exception behavior>)

	Overview:			Overview:
	"""""""""			"""""""""
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This seems to be leaking SelectionDAG implementation details into the IR space. How is this used? andrew.w.kaylor: This seems to be leaking SelectionDAG implementation details into the IR space. How is this…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions It seems to exist for the MVT::ppcf128 type. A quick grep doesn't show any other users. Test coverage is lacking. I added a test using the intrinsic, but I had to mark it expected fail since I couldn't get it to work. To avoid the risk of bugs getting introduced later I went ahead and implemented the intrinsic. Would it be better to not have the intrinsic and to instead have a pass that replaces the non-STRICT SDNode with a STRICT version? That would avoid said leaking into the IR space. It would, however, mean that llvm would have an opinion on when STRICT nodes should be used. I'm not sure that's a good thing. kpn: It seems to exist for the MVT::ppcf128 type. A quick grep doesn't show any other users. Test…
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I think it's better not to have the intrinsic for now. I don't understand what the SD node is doing well enough to say much more, but it looks to me like the SD node shouldn't be there either. It's a target-specific hack that leaked into the target-independent code if my understanding is correct. andrew.w.kaylor: I think it's better not to have the intrinsic for now. I don't understand what the SD node is…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Done. It will be removed in the next diff. kpn: Done. It will be removed in the next diff.

				craig.topperUnsubmitted Not Done Reply Inline Actions This also reads funny craig.topper: This also reads funny
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Same as fptrunc. I could just copy the text from the fpext instruction? kpn: Same as fptrunc. I could just copy the text from the fpext instruction?
				craig.topperUnsubmitted Done Reply Inline Actions Sure craig.topper: Sure
	The '``llvm.experimental.constrained.fpext``' intrinsic extends a			The '``llvm.experimental.constrained.fpext``' intrinsic extends a
	floating-point ``value`` to a larger floating-point value.			floating-point ``value`` to a larger floating-point value.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument to the '``llvm.experimental.constrained.fpext``'			The first argument to the '``llvm.experimental.constrained.fpext``'
	intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector			intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
	Show All 23 Lines
	They do not change the runtime floating-point environment.			They do not change the runtime floating-point environment.


	'``llvm.experimental.constrained.sqrt``' Intrinsic			'``llvm.experimental.constrained.sqrt``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions This is a replacement for fpext, right? I think you should say that somewhere. andrew.w.kaylor: This is a replacement for fpext, right? I think you should say that somewhere.
				kpnAuthorUnsubmitted Not Done Reply Inline Actions I think I picked names for the intrinsics that matched the SelectionDAG node enum. Perhaps it would be better to match the bitcode language names? In which case this intrinsic would be "fpext" instead of "extend". Either way that's a good idea for the documentation to at least mention fpext. kpn: I think I picked names for the intrinsics that matched the SelectionDAG node enum. Perhaps it…
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions These intrinsics should be driven by what the front end needs. If no front end is generating an equivalent now then we don't want an intrinsic. So, yes, please match the bitcode language names. andrew.w.kaylor: These intrinsics should be driven by what the front end needs. If no front end is generating an…
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Will do. kpn: Will do.
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Say, now that I think about this, what does rounding have to do with extending a FP value? Shouldn't this intrinsic just plain not have rounding metadata? kpn: Say, now that I think about this, what does rounding have to do with extending a FP value?
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions That's a reasonable point. The same issue came up with frem. The rounding mode doesn't apply to frem, but I kept the argument just so that it could be handled internally the same way as the other constrained intrinsics and then documented in the language reference that the rounding mode has no effect. I'm not completely convinced that was a good decision on my part. If you want to look at what would have to change to support constrained intrinsics with no rounding mode argument, I would likely support that. I don't think it would be much extra handling. There are just some things where the class that is used to represent the intrinsics (ConstrainedFPIntrinsic) would need to be aware of the possibility that this argument is omitted. andrew.w.kaylor: That's a reasonable point. The same issue came up with frem. The rounding mode doesn't apply to…
	::			::

	declare <type>			declare <type>
	@llvm.experimental.constrained.sqrt(<type> <op1>,			@llvm.experimental.constrained.sqrt(<type> <op1>,
	metadata <rounding mode>,			metadata <rounding mode>,
	metadata <exception behavior>)			metadata <exception behavior>)

	Overview:			Overview:
	▲ Show 20 Lines • Show All 2,016 Lines • Show Last 20 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	enum NodeType {
/// These will be lowered to the equivalent non-constrained pseudo-op		/// These will be lowered to the equivalent non-constrained pseudo-op
/// (or expanded to the equivalent library call) before final selection.		/// (or expanded to the equivalent library call) before final selection.
/// They are used to limit optimizations while the DAG is being optimized.		/// They are used to limit optimizations while the DAG is being optimized.
STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,		STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,
STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,		STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,
STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM,		STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM,
STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC,		STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC,

		/// STRICT_FP_TO_[US]INT - Convert a floating point value to a signed or
		/// unsigned integer. These have the same semantics as fptosi and fptoui
		/// in IR. If the FP value cannot fit in the integer type, the results are
		/// undefined.
		/// They are used to limit optimizations while the DAG is being optimized.
		STRICT_FP_TO_SINT,
		STRICT_FP_TO_UINT,

/// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating		/// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating
/// point type down to the precision of the destination VT. TRUNC is a		/// point type down to the precision of the destination VT. TRUNC is a
/// flag, which is always an integer that is zero or one. If TRUNC is 0,		/// flag, which is always an integer that is zero or one. If TRUNC is 0,
/// this is a normal rounding, if it is 1, this FP_ROUND is known to not		/// this is a normal rounding, if it is 1, this FP_ROUND is known to not
/// change the value of Y.		/// change the value of Y.
///		///
/// The TRUNC = 1 case is used in cases where we know that the value will		/// The TRUNC = 1 case is used in cases where we know that the value will
/// not be modified by the node, because Y is not using any of the extra		/// not be modified by the node, because Y is not using any of the extra
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	enum NodeType {
/// in a register of the same size. This operation effectively just		/// in a register of the same size. This operation effectively just
/// discards excess precision. The type to round down to is specified by		/// discards excess precision. The type to round down to is specified by
/// the VT operand, a VTSDNode.		/// the VT operand, a VTSDNode.
FP_ROUND_INREG,		FP_ROUND_INREG,

/// X = FP_EXTEND(Y) - Extend a smaller FP type into a larger FP type.		/// X = FP_EXTEND(Y) - Extend a smaller FP type into a larger FP type.
FP_EXTEND,		FP_EXTEND,

/// BITCAST - This operator converts between integer, vector and FP		/// BITCAST - This operator converts between integer, vector and FP
		craig.topperUnsubmitted Not Done Reply Inline Actions These need comments. Does STRICT_FP_ROUND have the TRUNC argument that FP_ROUND has? craig.topper: These need comments. Does STRICT_FP_ROUND have the TRUNC argument that FP_ROUND has?
		kpnAuthorUnsubmitted Done Reply Inline Actions I could rearrange and lump them in with the non-strict versions of each. Then I'd just need one extra line restating that the STRICT_ versions prevent optimizations. Yes, STRICT_FP_ROUND does have the TRUNC argument that FP_ROUND has, but it is currently always zero. Fixing this require rerouting this one strict node to go through the same codepath as the non-strict node. That would make it different from all the other constrained nodes. Should I go ahead and make that change? I _think_ it is safe if the TRUNC argument really does work like it is documented. I also just noticed that I need to put back STRICT_FP_TO_UINT in at least one place. It should be everywhere STRICT_FP_TO_SINT is handled _except_ in the default lowering. kpn: I could rearrange and lump them in with the non-strict versions of each. Then I'd just need one…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions This ordering does not mesh with what is already in place. The STRICT_XXX opcodes are currently clustered together, where these new opcodes are grouped with their non-strict counterparts. I'm not opposed to grouping the corresponding non-strict and strict opcodes, but that would probably be better left to a separate patch. I think it makes sense to keep everything uniform until a final decision is made, for clarity's sake. cameron.mcinally: This ordering does not mesh with what is already in place. The STRICT_XXX opcodes are currently…
/// values, as if the value was stored to memory with one type and loaded		/// values, as if the value was stored to memory with one type and loaded
/// from the same address with the other type (or equivalently for vector		/// from the same address with the other type (or equivalently for vector
/// format conversions, etc). The source and result are required to have		/// format conversions, etc). The source and result are required to have
/// the same bit size (e.g. f32 <-> i32). This can also be used for		/// the same bit size (e.g. f32 <-> i32). This can also be used for
/// int-to-int or fp-to-fp conversions, but that is a noop, deleted by		/// int-to-int or fp-to-fp conversions, but that is a noop, deleted by
/// getNode().		/// getNode().
///		///
/// This operator is subtly different from the bitcast instruction from		/// This operator is subtly different from the bitcast instruction from
▲ Show 20 Lines • Show All 484 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 685 Lines • ▼ Show 20 Lines	switch (NodeType) {
case ISD::STRICT_FRINT:		case ISD::STRICT_FRINT:
case ISD::STRICT_FNEARBYINT:		case ISD::STRICT_FNEARBYINT:
case ISD::STRICT_FMAXNUM:		case ISD::STRICT_FMAXNUM:
case ISD::STRICT_FMINNUM:		case ISD::STRICT_FMINNUM:
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
return true;		return true;
}		}
}		}

/// Test if this node has a post-isel opcode, directly		/// Test if this node has a post-isel opcode, directly
/// corresponding to a MachineInstr opcode.		/// corresponding to a MachineInstr opcode.
▲ Show 20 Lines • Show All 1,912 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 895 Lines • ▼ Show 20 Lines	switch (Op) {
case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;		case ISD::STRICT_FRINT: EqOpc = ISD::FRINT; break;
case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;		case ISD::STRICT_FNEARBYINT: EqOpc = ISD::FNEARBYINT; break;
case ISD::STRICT_FMAXNUM: EqOpc = ISD::FMAXNUM; break;		case ISD::STRICT_FMAXNUM: EqOpc = ISD::FMAXNUM; break;
case ISD::STRICT_FMINNUM: EqOpc = ISD::FMINNUM; break;		case ISD::STRICT_FMINNUM: EqOpc = ISD::FMINNUM; break;
case ISD::STRICT_FCEIL: EqOpc = ISD::FCEIL; break;		case ISD::STRICT_FCEIL: EqOpc = ISD::FCEIL; break;
case ISD::STRICT_FFLOOR: EqOpc = ISD::FFLOOR; break;		case ISD::STRICT_FFLOOR: EqOpc = ISD::FFLOOR; break;
case ISD::STRICT_FROUND: EqOpc = ISD::FROUND; break;		case ISD::STRICT_FROUND: EqOpc = ISD::FROUND; break;
case ISD::STRICT_FTRUNC: EqOpc = ISD::FTRUNC; break;		case ISD::STRICT_FTRUNC: EqOpc = ISD::FTRUNC; break;
		case ISD::STRICT_FP_TO_SINT: EqOpc = ISD::FP_TO_SINT; break;
		case ISD::STRICT_FP_TO_UINT: EqOpc = ISD::FP_TO_UINT; break;
case ISD::STRICT_FP_ROUND: EqOpc = ISD::FP_ROUND; break;		case ISD::STRICT_FP_ROUND: EqOpc = ISD::FP_ROUND; break;
case ISD::STRICT_FP_EXTEND: EqOpc = ISD::FP_EXTEND; break;		case ISD::STRICT_FP_EXTEND: EqOpc = ISD::FP_EXTEND; break;
}		}

auto Action = getOperationAction(EqOpc, VT);		auto Action = getOperationAction(EqOpc, VT);

// We don't currently handle Custom or Promote for strict FP pseudo-ops.		// We don't currently handle Custom or Promote for strict FP pseudo-ops.
// For now, we just expand for those cases.		// For now, we just expand for those cases.
▲ Show 20 Lines • Show All 3,154 Lines • Show Last 20 Lines

include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	public:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {		switch (I->getIntrinsicID()) {
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
case Intrinsic::experimental_constrained_fma:		case Intrinsic::experimental_constrained_fma:
		case Intrinsic::experimental_constrained_fptosi:
		case Intrinsic::experimental_constrained_fptoui:
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
▲ Show 20 Lines • Show All 617 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	let IntrProperties = [IntrInaccessibleMemOnly] in {

def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fma : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
LLVMMatchType<0>,		LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

		def int_experimental_constrained_fptosi : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty ]>;

		def int_experimental_constrained_fptoui : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty ]>;

def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fptrunc : Intrinsic<[ llvm_anyfloat_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;

def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_fpext : Intrinsic<[ llvm_anyfloat_ty ],
[ llvm_anyfloat_ty,		[ llvm_anyfloat_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	def int_experimental_constrained_round : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;
def int_experimental_constrained_trunc : Intrinsic<[ llvm_anyfloat_ty ],		def int_experimental_constrained_trunc : Intrinsic<[ llvm_anyfloat_ty ],
[ LLVMMatchType<0>,		[ LLVMMatchType<0>,
llvm_metadata_ty,		llvm_metadata_ty,
llvm_metadata_ty ]>;		llvm_metadata_ty ]>;
}		}
// FIXME: Add intrinsics for fcmp, fptoui and fptosi.		// FIXME: Add intrinsic for fcmp.
		// FIXME: Consider maybe adding intrinsics for sitofp, uitofp.

//===------------------------- Expect Intrinsics --------------------------===//		//===------------------------- Expect Intrinsics --------------------------===//
//		//
def int_expect : Intrinsic<[llvm_anyint_ty],		def int_expect : Intrinsic<[llvm_anyint_ty],
[LLVMMatchType<0>, LLVMMatchType<0>], [IntrNoMem]>;		[LLVMMatchType<0>, LLVMMatchType<0>], [IntrNoMem]>;

//===-------------------- Bit Manipulation Intrinsics ---------------------===//		//===-------------------- Bit Manipulation Intrinsics ---------------------===//
//		//
▲ Show 20 Lines • Show All 492 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 946 Lines • ▼ Show 20 Lines	if (Chain.getNode() != Node) {
if (UpdatedNodes) {		if (UpdatedNodes) {
UpdatedNodes->insert(Value.getNode());		UpdatedNodes->insert(Value.getNode());
UpdatedNodes->insert(Chain.getNode());		UpdatedNodes->insert(Chain.getNode());
}		}
ReplacedNode(Node);		ReplacedNode(Node);
}		}
}		}

/// Return a legal replacement for the given operation, with all legal operands.		/// Return a legal replacement for the given operation, with all legal operands.
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions STRICT_FP_TO_SINT needs to do something more than this. The default lowering of FP_TO_SINT is known to raise spurious FE_INEXACT exceptions because it involves speculative execution. andrew.w.kaylor: STRICT_FP_TO_SINT needs to do something more than this. The default lowering of FP_TO_SINT is…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions I have seen FP_TO_SINT cause traps when it shouldn't. Would having that default lowering use the chain in the STRICT_ case solve that issue? kpn: I have seen FP_TO_SINT cause traps when it shouldn't. Would having that default lowering use…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions No, I don't think the chain will fix this. We need to implement strict lowering that does something different. andrew.w.kaylor: No, I don't think the chain will fix this. We need to implement strict lowering that does…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Is there something we can do at this level to fix this? If there is then I'm all for it, but if there isn't then we should probably still put the intrinsic in. We'll need it eventually, and currently none of the constrained intrinsics solve the complete optimization problem. So I wonder if this intrinsic is really all that different from the other experimental constrained intrinsics. If a backend models FP side effects then wouldn't the existing default lowering work correctly? kpn: Is there something we can do at this level to fix this? If there is then I'm all for it, but if…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions To be honest I'm not entirely sure how to fix this in the current selection DAG model. The issue is that we need to introduce a branch to fix the problem, but by the time we're selecting instructions it's too late to do that. I think it needs to be addressed when we're building the DAG, andrew.w.kaylor: To be honest I'm not entirely sure how to fix this in the current selection DAG model. The…
void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {		void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
LLVM_DEBUG(dbgs() << "\nLegalizing: "; Node->dump(&DAG));		LLVM_DEBUG(dbgs() << "\nLegalizing: "; Node->dump(&DAG));

// Allow illegal target nodes and illegal registers.		// Allow illegal target nodes and illegal registers.
if (Node->getOpcode() == ISD::TargetConstant \|\|		if (Node->getOpcode() == ISD::TargetConstant \|\|
Node->getOpcode() == ISD::Register)		Node->getOpcode() == ISD::Register)
return;		return;

▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	#endif
case ISD::STRICT_FRINT:		case ISD::STRICT_FRINT:
case ISD::STRICT_FNEARBYINT:		case ISD::STRICT_FNEARBYINT:
case ISD::STRICT_FMAXNUM:		case ISD::STRICT_FMAXNUM:
case ISD::STRICT_FMINNUM:		case ISD::STRICT_FMINNUM:
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
// These pseudo-ops get legalized as if they were their non-strict		// These pseudo-ops get legalized as if they were their non-strict
// equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT		// equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT
// is also legal, but if ISD::FSQRT requires expansion then so does		// is also legal, but if ISD::FSQRT requires expansion then so does
// ISD::STRICT_FSQRT.		// ISD::STRICT_FSQRT.
Action = TLI.getStrictFPOperationAction(Node->getOpcode(),		Action = TLI.getStrictFPOperationAction(Node->getOpcode(),
Node->getValueType(0));		Node->getValueType(0));
break;		break;
case ISD::SADDSAT:		case ISD::SADDSAT:
case ISD::UADDSAT:		case ISD::UADDSAT:
case ISD::SSUBSAT:		case ISD::SSUBSAT:
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions The style/formatting is wrong here. I think you need curly braces around your else-clause and the "else" itself needs to be on the same line as the curly brace above it. andrew.w.kaylor: The style/formatting is wrong here. I think you need curly braces around your else-clause and…
case ISD::USUBSAT: {		case ISD::USUBSAT: {
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
break;		break;
}		}
case ISD::SMULFIX:		case ISD::SMULFIX:
case ISD::SMULFIXSAT:		case ISD::SMULFIXSAT:
case ISD::UMULFIX: {		case ISD::UMULFIX: {
unsigned Scale = Node->getConstantOperandVal(2);		unsigned Scale = Node->getConstantOperandVal(2);
Action = TLI.getFixedPointOperationAction(Node->getOpcode(),		Action = TLI.getFixedPointOperationAction(Node->getOpcode(),
Node->getValueType(0), Scale);		Node->getValueType(0), Scale);
break;		break;
}		}
case ISD::MSCATTER:		case ISD::MSCATTER:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
cast<MaskedScatterSDNode>(Node)->getValue().getValueType());		cast<MaskedScatterSDNode>(Node)->getValue().getValueType());
break;		break;
case ISD::MSTORE:		case ISD::MSTORE:
Action = TLI.getOperationAction(Node->getOpcode(),		Action = TLI.getOperationAction(Node->getOpcode(),
cast<MaskedStoreSDNode>(Node)->getValue().getValueType());		cast<MaskedStoreSDNode>(Node)->getValue().getValueType());
break;		break;
case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
Action = TLI.getOperationAction(		Action = TLI.getOperationAction(
Node->getOpcode(), Node->getOperand(0).getValueType());		Node->getOpcode(), Node->getOperand(0).getValueType());
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Since this gets unique handling why isn't it just a separate case from the others? andrew.w.kaylor: Since this gets unique handling why isn't it just a separate case from the others?
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Good point. And making it a separate case also takes care of the formatting issue in the else block. kpn: Good point. And making it a separate case also takes care of the formatting issue in the else…
break;		break;
default:		default:
if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {		if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {
Action = TargetLowering::Legal;		Action = TargetLowering::Legal;
} else {		} else {
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
}		}
break;		break;
▲ Show 20 Lines • Show All 1,657 Lines • ▼ Show 20 Lines	case ISD::STRICT_FP_EXTEND:
LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_EXTEND node\n");		LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_EXTEND node\n");
return true;		return true;
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
Tmp1 = EmitStackConvert(Node->getOperand(0),		Tmp1 = EmitStackConvert(Node->getOperand(0),
Node->getOperand(0).getValueType(),		Node->getOperand(0).getValueType(),
Node->getValueType(0), dl);		Node->getValueType(0), dl);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
case ISD::SIGN_EXTEND_INREG: {		case ISD::SIGN_EXTEND_INREG: {
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions This doesn't seem correct. Shouldn't Node->getOperand(0) be the argument for FP_ROUND and the Chain for STRICT_FP_ROUND? Same for FP_EXTEND too. Also, does using a truncating store provide us with the same trapping behavior as an explicit trunc instruction? I don't know off the top of my head, but it may be different. To be fair, a user probably won't care too much about optimizing away a trap, but the purists might. cameron.mcinally: This doesn't seem correct. Shouldn't Node->getOperand(0) be the argument for FP_ROUND and the…
EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();		EVT ExtraVT = cast<VTSDNode>(Node->getOperand(1))->getVT();
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

// An in-register sign-extend of a boolean is a negation:		// An in-register sign-extend of a boolean is a negation:
// 'true' (1) sign-extended is -1.		// 'true' (1) sign-extended is -1.
// 'false' (0) sign-extended is 0.		// 'false' (0) sign-extended is 0.
// However, we must mask the high bits of the source operand because the		// However, we must mask the high bits of the source operand because the
// SIGN_EXTEND_INREG does not guarantee that the high bits are already zero.		// SIGN_EXTEND_INREG does not guarantee that the high bits are already zero.
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
Tmp1 = ExpandLegalINT_TO_FP(Node->getOpcode() == ISD::SINT_TO_FP,		Tmp1 = ExpandLegalINT_TO_FP(Node->getOpcode() == ISD::SINT_TO_FP,
Node->getOperand(0), Node->getValueType(0), dl);		Node->getOperand(0), Node->getValueType(0), dl);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG))		if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG))
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
		case ISD::STRICT_FP_TO_SINT:
		if (TLI.expandFP_TO_SINT(Node, Tmp1, DAG)) {
		ReplaceNode(Node, Tmp1.getNode());
		LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_TO_SINT node\n");
		return true;
		}
		break;
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions This seems incorrect, unless I missed something. The first operand to FP_TO_SINT should be the argument, but the first operand to STRICT_FP_TO_SINT should be the Chain. It does not look like expandFP_TO_SINT accounts for that. cameron.mcinally: This seems incorrect, unless I missed something. The first operand to FP_TO_SINT should be the…
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
if (TLI.expandFP_TO_UINT(Node, Tmp1, DAG))		if (TLI.expandFP_TO_UINT(Node, Tmp1, DAG))
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
		case ISD::STRICT_FP_TO_UINT:
		if (TLI.expandFP_TO_UINT(Node, Tmp1, DAG)) {
		ReplaceNode(Node, Tmp1.getNode());
		LLVM_DEBUG(dbgs() << "Successfully expanded STRICT_FP_TO_UINT node\n");
		return true;
		}
		break;
case ISD::LROUND:		case ISD::LROUND:
Results.push_back(ExpandArgFPLibCall(Node, RTLIB::LROUND_F32,		Results.push_back(ExpandArgFPLibCall(Node, RTLIB::LROUND_F32,
RTLIB::LROUND_F64, RTLIB::LROUND_F80,		RTLIB::LROUND_F64, RTLIB::LROUND_F80,
RTLIB::LROUND_F128,		RTLIB::LROUND_F128,
RTLIB::LROUND_PPCF128));		RTLIB::LROUND_PPCF128));
break;		break;
case ISD::LLROUND:		case ISD::LLROUND:
Results.push_back(ExpandArgFPLibCall(Node, RTLIB::LLROUND_F32,		Results.push_back(ExpandArgFPLibCall(Node, RTLIB::LLROUND_F32,
▲ Show 20 Lines • Show All 1,678 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	#endif
case ISD::SIGN_EXTEND_VECTOR_INREG:		case ISD::SIGN_EXTEND_VECTOR_INREG:
case ISD::ZERO_EXTEND_VECTOR_INREG:		case ISD::ZERO_EXTEND_VECTOR_INREG:
Res = PromoteIntRes_EXTEND_VECTOR_INREG(N); break;		Res = PromoteIntRes_EXTEND_VECTOR_INREG(N); break;

case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::ANY_EXTEND: Res = PromoteIntRes_INT_EXTEND(N); break;		case ISD::ANY_EXTEND: Res = PromoteIntRes_INT_EXTEND(N); break;

		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT: Res = PromoteIntRes_FP_TO_XINT(N); break;		case ISD::FP_TO_UINT: Res = PromoteIntRes_FP_TO_XINT(N); break;

case ISD::FP_TO_FP16: Res = PromoteIntRes_FP_TO_FP16(N); break;		case ISD::FP_TO_FP16: Res = PromoteIntRes_FP_TO_FP16(N); break;

case ISD::FLT_ROUNDS_: Res = PromoteIntRes_FLT_ROUNDS(N); break;		case ISD::FLT_ROUNDS_: Res = PromoteIntRes_FLT_ROUNDS(N); break;

case ISD::AND:		case ISD::AND:
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntRes_FP_TO_XINT(SDNode *N) {
// not Legal, check to see if we can use FP_TO_SINT instead. (If both UINT		// not Legal, check to see if we can use FP_TO_SINT instead. (If both UINT
// and SINT conversions are Custom, there is no way to tell which is		// and SINT conversions are Custom, there is no way to tell which is
// preferable. We choose SINT because that's the right thing on PPC.)		// preferable. We choose SINT because that's the right thing on PPC.)
if (N->getOpcode() == ISD::FP_TO_UINT &&		if (N->getOpcode() == ISD::FP_TO_UINT &&
!TLI.isOperationLegal(ISD::FP_TO_UINT, NVT) &&		!TLI.isOperationLegal(ISD::FP_TO_UINT, NVT) &&
TLI.isOperationLegalOrCustom(ISD::FP_TO_SINT, NVT))		TLI.isOperationLegalOrCustom(ISD::FP_TO_SINT, NVT))
NewOpc = ISD::FP_TO_SINT;		NewOpc = ISD::FP_TO_SINT;

SDValue Res = DAG.getNode(NewOpc, dl, NVT, N->getOperand(0));		if (N->getOpcode() == ISD::STRICT_FP_TO_UINT &&
		!TLI.isOperationLegal(ISD::STRICT_FP_TO_UINT, NVT) &&
		TLI.isOperationLegalOrCustom(ISD::STRICT_FP_TO_SINT, NVT))
		NewOpc = ISD::STRICT_FP_TO_SINT;

		SDValue Res;
		if (N->isStrictFPOpcode()) {
		Res = DAG.getNode(NewOpc, dl, { NVT, MVT::Other },
		{ N->getOperand(0), N->getOperand(1) });
		// Legalize the chain result - switch anything that used the old chain to
		// use the new one.
		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
		} else
		Res = DAG.getNode(NewOpc, dl, NVT, N->getOperand(0));

// Assert that the converted value fits in the original type. If it doesn't		// Assert that the converted value fits in the original type. If it doesn't
// (eg: because the value being converted is too big), then the result of the		// (eg: because the value being converted is too big), then the result of the
// original operation was undefined anyway, so the assert is still correct.		// original operation was undefined anyway, so the assert is still correct.
//		//
// NOTE: fp-to-uint to fp-to-sint promotion guarantees zero extend. For example:		// NOTE: fp-to-uint to fp-to-sint promotion guarantees zero extend. For example:
// before legalization: fp-to-uint16, 65534. -> 0xfffe		// before legalization: fp-to-uint16, 65534. -> 0xfffe
// after legalization: fp-to-sint32, 65534. -> 0x0000fffe		// after legalization: fp-to-sint32, 65534. -> 0x0000fffe
return DAG.getNode(N->getOpcode() == ISD::FP_TO_UINT ?		return DAG.getNode((N->getOpcode() == ISD::FP_TO_UINT \|\|
		N->getOpcode() == ISD::STRICT_FP_TO_UINT) ?
ISD::AssertZext : ISD::AssertSext, dl, NVT, Res,		ISD::AssertZext : ISD::AssertSext, dl, NVT, Res,
DAG.getValueType(N->getValueType(0).getScalarType()));		DAG.getValueType(N->getValueType(0).getScalarType()));
}		}

SDValue DAGTypeLegalizer::PromoteIntRes_FP_TO_FP16(SDNode *N) {		SDValue DAGTypeLegalizer::PromoteIntRes_FP_TO_FP16(SDNode *N) {
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));		EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
SDLoc dl(N);		SDLoc dl(N);

▲ Show 20 Lines • Show All 3,669 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 709 Lines • ▼ Show 20 Lines	private:
SDValue ScalarizeVecRes_VECTOR_SHUFFLE(SDNode *N);		SDValue ScalarizeVecRes_VECTOR_SHUFFLE(SDNode *N);

SDValue ScalarizeVecRes_MULFIX(SDNode *N);		SDValue ScalarizeVecRes_MULFIX(SDNode *N);

// Vector Operand Scalarization: <1 x ty> -> ty.		// Vector Operand Scalarization: <1 x ty> -> ty.
bool ScalarizeVectorOperand(SDNode *N, unsigned OpNo);		bool ScalarizeVectorOperand(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_BITCAST(SDNode *N);		SDValue ScalarizeVecOp_BITCAST(SDNode *N);
SDValue ScalarizeVecOp_UnaryOp(SDNode *N);		SDValue ScalarizeVecOp_UnaryOp(SDNode *N);
		SDValue ScalarizeVecOp_UnaryOp_StrictFP(SDNode *N);
SDValue ScalarizeVecOp_CONCAT_VECTORS(SDNode *N);		SDValue ScalarizeVecOp_CONCAT_VECTORS(SDNode *N);
SDValue ScalarizeVecOp_EXTRACT_VECTOR_ELT(SDNode *N);		SDValue ScalarizeVecOp_EXTRACT_VECTOR_ELT(SDNode *N);
SDValue ScalarizeVecOp_VSELECT(SDNode *N);		SDValue ScalarizeVecOp_VSELECT(SDNode *N);
SDValue ScalarizeVecOp_VSETCC(SDNode *N);		SDValue ScalarizeVecOp_VSETCC(SDNode *N);
SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo);		SDValue ScalarizeVecOp_STORE(StoreSDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo);		SDValue ScalarizeVecOp_FP_ROUND(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo);		SDValue ScalarizeVecOp_STRICT_FP_ROUND(SDNode *N, unsigned OpNo);
SDValue ScalarizeVecOp_VECREDUCE(SDNode *N);		SDValue ScalarizeVecOp_VECREDUCE(SDNode *N);
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	SDValue VectorLegalizer::LegalizeOp(SDValue Op) {
case ISD::STRICT_FRINT:		case ISD::STRICT_FRINT:
case ISD::STRICT_FNEARBYINT:		case ISD::STRICT_FNEARBYINT:
case ISD::STRICT_FMAXNUM:		case ISD::STRICT_FMAXNUM:
case ISD::STRICT_FMINNUM:		case ISD::STRICT_FMINNUM:
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
// These pseudo-ops get legalized as if they were their non-strict		// These pseudo-ops get legalized as if they were their non-strict
// equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT		// equivalent. For instance, if ISD::FSQRT is legal then ISD::STRICT_FSQRT
// is also legal, but if ISD::FSQRT requires expansion then so does		// is also legal, but if ISD::FSQRT requires expansion then so does
// ISD::STRICT_FSQRT.		// ISD::STRICT_FSQRT.
Action = TLI.getStrictFPOperationAction(Node->getOpcode(),		Action = TLI.getStrictFPOperationAction(Node->getOpcode(),
Node->getValueType(0));		Node->getValueType(0));
▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	SDValue VectorLegalizer::Expand(SDValue Op) {
case ISD::STRICT_FRINT:		case ISD::STRICT_FRINT:
case ISD::STRICT_FNEARBYINT:		case ISD::STRICT_FNEARBYINT:
case ISD::STRICT_FMAXNUM:		case ISD::STRICT_FMAXNUM:
case ISD::STRICT_FMINNUM:		case ISD::STRICT_FMINNUM:
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
return ExpandStrictFPOp(Op);		return ExpandStrictFPOp(Op);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_MUL:		case ISD::VECREDUCE_MUL:
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
▲ Show 20 Lines • Show All 540 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	#endif
case ISD::STRICT_FRINT:		case ISD::STRICT_FRINT:
case ISD::STRICT_FNEARBYINT:		case ISD::STRICT_FNEARBYINT:
case ISD::STRICT_FMAXNUM:		case ISD::STRICT_FMAXNUM:
case ISD::STRICT_FMINNUM:		case ISD::STRICT_FMINNUM:
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
R = ScalarizeVecRes_StrictFPOp(N);		R = ScalarizeVecRes_StrictFPOp(N);
break;		break;
case ISD::UADDO:		case ISD::UADDO:
case ISD::SADDO:		case ISD::SADDO:
case ISD::USUBO:		case ISD::USUBO:
case ISD::SSUBO:		case ISD::SSUBO:
case ISD::UMULO:		case ISD::UMULO:
▲ Show 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	#endif
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::TRUNCATE:		case ISD::TRUNCATE:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
Res = ScalarizeVecOp_UnaryOp(N);		Res = ScalarizeVecOp_UnaryOp(N);
break;		break;
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
		Res = ScalarizeVecOp_UnaryOp_StrictFP(N);
		break;
case ISD::CONCAT_VECTORS:		case ISD::CONCAT_VECTORS:
Res = ScalarizeVecOp_CONCAT_VECTORS(N);		Res = ScalarizeVecOp_CONCAT_VECTORS(N);
break;		break;
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
Res = ScalarizeVecOp_EXTRACT_VECTOR_ELT(N);		Res = ScalarizeVecOp_EXTRACT_VECTOR_ELT(N);
break;		break;
case ISD::VSELECT:		case ISD::VSELECT:
Res = ScalarizeVecOp_VSELECT(N);		Res = ScalarizeVecOp_VSELECT(N);
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::ScalarizeVecOp_UnaryOp(SDNode *N) {
SDValue Elt = GetScalarizedVector(N->getOperand(0));		SDValue Elt = GetScalarizedVector(N->getOperand(0));
SDValue Op = DAG.getNode(N->getOpcode(), SDLoc(N),		SDValue Op = DAG.getNode(N->getOpcode(), SDLoc(N),
N->getValueType(0).getScalarType(), Elt);		N->getValueType(0).getScalarType(), Elt);
// Revectorize the result so the types line up with what the uses of this		// Revectorize the result so the types line up with what the uses of this
// expression expect.		// expression expect.
return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Op);		return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Op);
}		}

		/// If the input is a vector that needs to be scalarized, it must be <1 x ty>.
		/// Do the strict FP operation on the element instead.
		SDValue DAGTypeLegalizer::ScalarizeVecOp_UnaryOp_StrictFP(SDNode *N) {
		assert(N->getValueType(0).getVectorNumElements() == 1 &&
		"Unexpected vector type!");
		SDValue Elt = GetScalarizedVector(N->getOperand(1));
		SDValue Res = DAG.getNode(N->getOpcode(), SDLoc(N),
		{ N->getValueType(0).getScalarType(), MVT::Other },
		{ N->getOperand(0), Elt });
		// Legalize the chain result - switch anything that used the old chain to
		// use the new one.
		ReplaceValueWith(SDValue(N, 1), Res.getValue(1));
		// Revectorize the result so the types line up with what the uses of this
		// expression expect.
		return DAG.getNode(ISD::SCALAR_TO_VECTOR, SDLoc(N), N->getValueType(0), Res);
		}

/// The vectors to concatenate have length one - use a BUILD_VECTOR instead.		/// The vectors to concatenate have length one - use a BUILD_VECTOR instead.
SDValue DAGTypeLegalizer::ScalarizeVecOp_CONCAT_VECTORS(SDNode *N) {		SDValue DAGTypeLegalizer::ScalarizeVecOp_CONCAT_VECTORS(SDNode *N) {
SmallVector<SDValue, 8> Ops(N->getNumOperands());		SmallVector<SDValue, 8> Ops(N->getNumOperands());
for (unsigned i = 0, e = N->getNumOperands(); i < e; ++i)		for (unsigned i = 0, e = N->getNumOperands(); i < e; ++i)
Ops[i] = GetScalarizedVector(N->getOperand(i));		Ops[i] = GetScalarizedVector(N->getOperand(i));
return DAG.getBuildVector(N->getValueType(0), SDLoc(N), Ops);		return DAG.getBuildVector(N->getValueType(0), SDLoc(N), Ops);
}		}

▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	#endif
case ISD::FLOG2:		case ISD::FLOG2:
case ISD::FNEARBYINT:		case ISD::FNEARBYINT:
case ISD::FNEG:		case ISD::FNEG:
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
case ISD::FP_ROUND:		case ISD::FP_ROUND:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
		case ISD::STRICT_FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::FRINT:		case ISD::FRINT:
case ISD::FROUND:		case ISD::FROUND:
case ISD::FSIN:		case ISD::FSIN:
case ISD::FSQRT:		case ISD::FSQRT:
case ISD::FTRUNC:		case ISD::FTRUNC:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::TRUNCATE:		case ISD::TRUNCATE:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
▲ Show 20 Lines • Show All 1,008 Lines • ▼ Show 20 Lines	#endif
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
if (N->getValueType(0).bitsLT(N->getOperand(0).getValueType()))		if (N->getValueType(0).bitsLT(N->getOperand(0).getValueType()))
Res = SplitVecOp_TruncateHelper(N);		Res = SplitVecOp_TruncateHelper(N);
else		else
Res = SplitVecOp_UnaryOp(N);		Res = SplitVecOp_UnaryOp(N);
break;		break;
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::CTTZ:		case ISD::CTTZ:
case ISD::CTLZ:		case ISD::CTLZ:
case ISD::CTPOP:		case ISD::CTPOP:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
▲ Show 20 Lines • Show All 801 Lines • ▼ Show 20 Lines	#endif
case ISD::TRUNCATE:		case ISD::TRUNCATE:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
Res = WidenVecRes_Convert(N);		Res = WidenVecRes_Convert(N);
break;		break;

case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
Res = WidenVecRes_Convert_StrictFP(N);		Res = WidenVecRes_Convert_StrictFP(N);
break;		break;

case ISD::FABS:		case ISD::FABS:
case ISD::FCEIL:		case ISD::FCEIL:
case ISD::FCOS:		case ISD::FCOS:
case ISD::FEXP:		case ISD::FEXP:
case ISD::FEXP2:		case ISD::FEXP2:
▲ Show 20 Lines • Show All 1,286 Lines • ▼ Show 20 Lines	#endif
case ISD::SIGN_EXTEND:		case ISD::SIGN_EXTEND:
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
Res = WidenVecOp_EXTEND(N);		Res = WidenVecOp_EXTEND(N);
break;		break;

case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
		case ISD::STRICT_FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::TRUNCATE:		case ISD::TRUNCATE:
Res = WidenVecOp_Convert(N);		Res = WidenVecOp_Convert(N);
break;		break;

case ISD::VECREDUCE_FADD:		case ISD::VECREDUCE_FADD:
case ISD::VECREDUCE_FMUL:		case ISD::VECREDUCE_FMUL:
▲ Show 20 Lines • Show All 945 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,716 Lines • ▼ Show 20 Lines	case ISD::STRICT_FNEARBYINT:
IsUnary = true;		IsUnary = true;
break;		break;
case ISD::STRICT_FMAXNUM: NewOpc = ISD::FMAXNUM; break;		case ISD::STRICT_FMAXNUM: NewOpc = ISD::FMAXNUM; break;
case ISD::STRICT_FMINNUM: NewOpc = ISD::FMINNUM; break;		case ISD::STRICT_FMINNUM: NewOpc = ISD::FMINNUM; break;
case ISD::STRICT_FCEIL: NewOpc = ISD::FCEIL; IsUnary = true; break;		case ISD::STRICT_FCEIL: NewOpc = ISD::FCEIL; IsUnary = true; break;
case ISD::STRICT_FFLOOR: NewOpc = ISD::FFLOOR; IsUnary = true; break;		case ISD::STRICT_FFLOOR: NewOpc = ISD::FFLOOR; IsUnary = true; break;
case ISD::STRICT_FROUND: NewOpc = ISD::FROUND; IsUnary = true; break;		case ISD::STRICT_FROUND: NewOpc = ISD::FROUND; IsUnary = true; break;
case ISD::STRICT_FTRUNC: NewOpc = ISD::FTRUNC; IsUnary = true; break;		case ISD::STRICT_FTRUNC: NewOpc = ISD::FTRUNC; IsUnary = true; break;
		case ISD::STRICT_FP_TO_SINT: NewOpc = ISD::FP_TO_SINT; IsUnary = true; break;
		case ISD::STRICT_FP_TO_UINT: NewOpc = ISD::FP_TO_UINT; IsUnary = true; break;
// STRICT_FP_ROUND takes an extra argument describing whether or not		// STRICT_FP_ROUND takes an extra argument describing whether or not
// the value will be changed by this node. See ISDOpcodes.h for details.		// the value will be changed by this node. See ISDOpcodes.h for details.
case ISD::STRICT_FP_ROUND: NewOpc = ISD::FP_ROUND; break;		case ISD::STRICT_FP_ROUND: NewOpc = ISD::FP_ROUND; break;
case ISD::STRICT_FP_EXTEND: NewOpc = ISD::FP_EXTEND; IsUnary = true; break;		case ISD::STRICT_FP_EXTEND: NewOpc = ISD::FP_EXTEND; IsUnary = true; break;
}		}

// We're taking this node out of the chain, so we need to re-link things.		// We're taking this node out of the chain, so we need to re-link things.
SDValue InputChain = Node->getOperand(0);		SDValue InputChain = Node->getOperand(0);
SDValue OutputChain = SDValue(Node, 1);		SDValue OutputChain = SDValue(Node, 1);
ReplaceAllUsesOfValueWith(OutputChain, InputChain);		ReplaceAllUsesOfValueWith(OutputChain, InputChain);

SDVTList VTs;		SDVTList VTs;
SDNode *Res = nullptr;		SDNode *Res = nullptr;

switch (OrigOpc) {		switch (OrigOpc) {
default:		default:
VTs = getVTList(Node->getOperand(1).getValueType());		VTs = getVTList(Node->getOperand(1).getValueType());
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions There are a lot of comments on this, so I may have missed something. Take with a grain of salt... I don't think these are correct. These can trap so can't be speculatively executed. They would need a chain. cameron.mcinally: There are a lot of comments on this, so I may have missed something. Take with a grain of salt..
		kpnAuthorUnsubmitted Not Done Reply Inline Actions I couldn't figure out how to have these be chained but have the non-strict continue to not be chained. Too many things fell over if they didn't match. kpn: I couldn't figure out how to have these be chained but have the non-strict continue to not be…
		cameron.mcinallyUnsubmitted Done Reply Inline Actions I'm not an expert with this code, so cc @andrew.w.kaylor. This doesn't seem like the right direction though. Maybe these unchained operations should be left to a different patch until a proper solution is found. cameron.mcinally: I'm not an expert with this code, so cc @andrew.w.kaylor. This doesn't seem like the right…
break;		break;
		case ISD::STRICT_FP_TO_SINT:
		case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
VTs = getVTList(Node->getValueType(0));		VTs = getVTList(Node->getValueType(0));
break;		break;
}		}

if (IsUnary)		if (IsUnary)
Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) });		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1) });
else if (IsTernary)		else if (IsTernary)
Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),
Node->getOperand(2),		Node->getOperand(2),
Node->getOperand(3)});		Node->getOperand(3)});
else		else
Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),		Res = MorphNodeTo(Node, NewOpc, VTs, { Node->getOperand(1),
Node->getOperand(2) });		Node->getOperand(2) });

// MorphNodeTo can operate in two ways: if an existing node with the		// MorphNodeTo can operate in two ways: if an existing node with the
// specified operands exists, it can just return it. Otherwise, it		// specified operands exists, it can just return it. Otherwise, it
// updates the node in place to have the requested operands.		// updates the node in place to have the requested operands.
if (Res == Node) {		if (Res == Node) {
// If we updated the node in place, reset the node ID. To the isel,		// If we updated the node in place, reset the node ID. To the isel,
		craig.topperUnsubmitted Not Done Reply Inline Actions This doesn't copy the second argument to FP_ROUJND over does it? craig.topper: This doesn't copy the second argument to FP_ROUJND over does it?
		kpnAuthorUnsubmitted Done Reply Inline Actions No. It should. kpn: No. It should.
// this should be just like a newly allocated machine node.		// this should be just like a newly allocated machine node.
Res->setNodeId(-1);		Res->setNodeId(-1);
} else {		} else {
ReplaceAllUsesWith(Node, Res);		ReplaceAllUsesWith(Node, Res);
RemoveDeadNode(Node);		RemoveDeadNode(Node);
}		}

return Res;		return Res;
▲ Show 20 Lines • Show All 1,772 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,093 Lines • ▼ Show 20 Lines	setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(2))));		getValue(I.getArgOperand(2))));
return;		return;
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
case Intrinsic::experimental_constrained_fma:		case Intrinsic::experimental_constrained_fma:
		case Intrinsic::experimental_constrained_fptosi:
		case Intrinsic::experimental_constrained_fptoui:
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
▲ Show 20 Lines • Show All 753 Lines • ▼ Show 20 Lines	case Intrinsic::experimental_constrained_fdiv:
Opcode = ISD::STRICT_FDIV;		Opcode = ISD::STRICT_FDIV;
break;		break;
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
Opcode = ISD::STRICT_FREM;		Opcode = ISD::STRICT_FREM;
break;		break;
case Intrinsic::experimental_constrained_fma:		case Intrinsic::experimental_constrained_fma:
Opcode = ISD::STRICT_FMA;		Opcode = ISD::STRICT_FMA;
break;		break;
		case Intrinsic::experimental_constrained_fptosi:
		Opcode = ISD::STRICT_FP_TO_SINT;
		break;
		case Intrinsic::experimental_constrained_fptoui:
		Opcode = ISD::STRICT_FP_TO_UINT;
		break;
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
Opcode = ISD::STRICT_FP_ROUND;		Opcode = ISD::STRICT_FP_ROUND;
break;		break;
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
Opcode = ISD::STRICT_FP_EXTEND;		Opcode = ISD::STRICT_FP_EXTEND;
break;		break;
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
Opcode = ISD::STRICT_FSQRT;		Opcode = ISD::STRICT_FSQRT;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
Opcode = ISD::STRICT_FTRUNC;		Opcode = ISD::STRICT_FTRUNC;
break;		break;
}		}
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
SDValue Chain = getRoot();		SDValue Chain = getRoot();
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);		ComputeValueVTs(TLI, DAG.getDataLayout(), FPI.getType(), ValueVTs);
ValueVTs.push_back(MVT::Other); // Out chain		ValueVTs.push_back(MVT::Other); // Out chain
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Why are you not attaching these nodes to the chain? andrew.w.kaylor: Why are you not attaching these nodes to the chain?
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Because they need to match what the default lowering is expecting. Otherwise a variety of failures happen. kpn: Because they need to match what the default lowering is expecting. Otherwise a variety of…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions There is code that removes the chain when the strict node is mutated to a non-strict node. That should be preventing lowering problems. I believe the chain was necessary to prevent re-ordering prior to final instruction selection. andrew.w.kaylor: There is code that removes the chain when the strict node is mutated to a non-strict node. That…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Mutation happens too late in some cases. If it happened early enough then there wouldn't be any need for the strict intrinsics to be mentioned in SelectionDAGLegalize::ExpandNode(). Since it doesn't we can't use the chain. Should that mutation happen earlier? kpn: Mutation happens too late in some cases. If it happened early enough then there wouldn't be…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions We need the mutation to be put off as long as possible. I think it should be possible to unlink the chain in ExpandNode if necessary. Depending on what's happening there, we might even want the expanded node to make use of the chain. andrew.w.kaylor: We need the mutation to be put off as long as possible. I think it should be possible to unlink…

SDVTList VTs = DAG.getVTList(ValueVTs);		SDVTList VTs = DAG.getVTList(ValueVTs);
SDValue Result;		SDValue Result;
if (Opcode == ISD::STRICT_FP_ROUND)		if (Opcode == ISD::STRICT_FP_ROUND)
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)),		{ Chain, getValue(FPI.getArgOperand(0)),
DAG.getTargetConstant(0, sdl,		DAG.getTargetConstant(0, sdl,
TLI.getPointerTy(DAG.getDataLayout())) });		TLI.getPointerTy(DAG.getDataLayout())) });
else if (FPI.isUnaryOp())		else if (FPI.isUnaryOp())
		craig.topperUnsubmitted Not Done Reply Inline Actions Is this adding a second argument to STRICT_FP_EXTEND as well? I don't think the non-strict FP_EXTEND has two arguments. craig.topper: Is this adding a second argument to STRICT_FP_EXTEND as well? I don't think the non-strict…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Agreed. I'll fix it. kpn: Agreed. I'll fix it.
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)) });		{ Chain, getValue(FPI.getArgOperand(0)) });
else if (FPI.isTernaryOp())		else if (FPI.isTernaryOp())
Result = DAG.getNode(Opcode, sdl, VTs,		Result = DAG.getNode(Opcode, sdl, VTs,
{ Chain, getValue(FPI.getArgOperand(0)),		{ Chain, getValue(FPI.getArgOperand(0)),
getValue(FPI.getArgOperand(1)),		getValue(FPI.getArgOperand(1)),
getValue(FPI.getArgOperand(2)) });		getValue(FPI.getArgOperand(2)) });
else		else
▲ Show 20 Lines • Show All 3,955 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	#endif
case ISD::FLT_ROUNDS_: return "flt_rounds";		case ISD::FLT_ROUNDS_: return "flt_rounds";
case ISD::FP_ROUND_INREG: return "fp_round_inreg";		case ISD::FP_ROUND_INREG: return "fp_round_inreg";
case ISD::FP_EXTEND: return "fp_extend";		case ISD::FP_EXTEND: return "fp_extend";
case ISD::STRICT_FP_EXTEND: return "strict_fp_extend";		case ISD::STRICT_FP_EXTEND: return "strict_fp_extend";

case ISD::SINT_TO_FP: return "sint_to_fp";		case ISD::SINT_TO_FP: return "sint_to_fp";
case ISD::UINT_TO_FP: return "uint_to_fp";		case ISD::UINT_TO_FP: return "uint_to_fp";
case ISD::FP_TO_SINT: return "fp_to_sint";		case ISD::FP_TO_SINT: return "fp_to_sint";
		case ISD::STRICT_FP_TO_SINT: return "strict_fp_to_sint";
case ISD::FP_TO_UINT: return "fp_to_uint";		case ISD::FP_TO_UINT: return "fp_to_uint";
		case ISD::STRICT_FP_TO_UINT: return "strict_fp_to_uint";
case ISD::BITCAST: return "bitcast";		case ISD::BITCAST: return "bitcast";
case ISD::ADDRSPACECAST: return "addrspacecast";		case ISD::ADDRSPACECAST: return "addrspacecast";
case ISD::FP16_TO_FP: return "fp16_to_fp";		case ISD::FP16_TO_FP: return "fp16_to_fp";
case ISD::FP_TO_FP16: return "fp_to_fp16";		case ISD::FP_TO_FP16: return "fp_to_fp16";
case ISD::LROUND: return "lround";		case ISD::LROUND: return "lround";
case ISD::LLROUND: return "llround";		case ISD::LLROUND: return "llround";
case ISD::LRINT: return "lrint";		case ISD::LRINT: return "lrint";
case ISD::LLRINT: return "llrint";		case ISD::LLRINT: return "llrint";
▲ Show 20 Lines • Show All 620 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 4,667 Lines • ▼ Show 20 Lines	bool TargetLowering::expandROT(SDNode *Node, SDValue &Result,
SDValue And1 = DAG.getNode(ISD::AND, DL, ShVT, NegOp1, BitWidthMinusOneC);		SDValue And1 = DAG.getNode(ISD::AND, DL, ShVT, NegOp1, BitWidthMinusOneC);
Result = DAG.getNode(ISD::OR, DL, VT, DAG.getNode(ShOpc, DL, VT, Op0, And0),		Result = DAG.getNode(ISD::OR, DL, VT, DAG.getNode(ShOpc, DL, VT, Op0, And0),
DAG.getNode(HsOpc, DL, VT, Op0, And1));		DAG.getNode(HsOpc, DL, VT, Op0, And1));
return true;		return true;
}		}

bool TargetLowering::expandFP_TO_SINT(SDNode *Node, SDValue &Result,		bool TargetLowering::expandFP_TO_SINT(SDNode *Node, SDValue &Result,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDValue Src = Node->getOperand(0);		unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
		SDValue Src = Node->getOperand(OpNo);
EVT SrcVT = Src.getValueType();		EVT SrcVT = Src.getValueType();
EVT DstVT = Node->getValueType(0);		EVT DstVT = Node->getValueType(0);
SDLoc dl(SDValue(Node, 0));		SDLoc dl(SDValue(Node, 0));

// FIXME: Only f32 to i64 conversions are supported.		// FIXME: Only f32 to i64 conversions are supported.
if (SrcVT != MVT::f32 \|\| DstVT != MVT::i64)		if (SrcVT != MVT::f32 \|\| DstVT != MVT::i64)
return false;		return false;

		if (Node->isStrictFPOpcode())
		// When a NaN is converted to an integer a trap is allowed. We can't
		// use this expansion here because it would eliminate that trap. Other
		// traps are also allowed and cannot be eliminated. See
		// IEEE 754-2008 sec 5.8.
		return false;

// Expand f32 -> i64 conversion		// Expand f32 -> i64 conversion
// This algorithm comes from compiler-rt's implementation of fixsfdi:		// This algorithm comes from compiler-rt's implementation of fixsfdi:
// https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/builtins/fixsfdi.c		// https://github.com/llvm/llvm-project/blob/master/compiler-rt/lib/builtins/fixsfdi.c
unsigned SrcEltBits = SrcVT.getScalarSizeInBits();		unsigned SrcEltBits = SrcVT.getScalarSizeInBits();
EVT IntVT = SrcVT.changeTypeToInteger();		EVT IntVT = SrcVT.changeTypeToInteger();
EVT IntShVT = getShiftAmountTy(IntVT, DAG.getDataLayout());		EVT IntShVT = getShiftAmountTy(IntVT, DAG.getDataLayout());

SDValue ExponentMask = DAG.getConstant(0x7F800000, dl, IntVT);		SDValue ExponentMask = DAG.getConstant(0x7F800000, dl, IntVT);
Show All 39 Lines	bool TargetLowering::expandFP_TO_SINT(SDNode *Node, SDValue &Result,
Result = DAG.getSelectCC(dl, Exponent, DAG.getConstant(0, dl, IntVT),		Result = DAG.getSelectCC(dl, Exponent, DAG.getConstant(0, dl, IntVT),
DAG.getConstant(0, dl, DstVT), Ret, ISD::SETLT);		DAG.getConstant(0, dl, DstVT), Ret, ISD::SETLT);
return true;		return true;
}		}

bool TargetLowering::expandFP_TO_UINT(SDNode *Node, SDValue &Result,		bool TargetLowering::expandFP_TO_UINT(SDNode *Node, SDValue &Result,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc dl(SDValue(Node, 0));		SDLoc dl(SDValue(Node, 0));
SDValue Src = Node->getOperand(0);		unsigned OpNo = Node->isStrictFPOpcode() ? 1 : 0;
		SDValue Src = Node->getOperand(OpNo);

EVT SrcVT = Src.getValueType();		EVT SrcVT = Src.getValueType();
EVT DstVT = Node->getValueType(0);		EVT DstVT = Node->getValueType(0);
EVT SetCCVT =		EVT SetCCVT =
getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), SrcVT);		getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), SrcVT);

// Only expand vector types if we have the appropriate vector bit operations.		// Only expand vector types if we have the appropriate vector bit operations.
if (DstVT.isVector() && (!isOperationLegalOrCustom(ISD::FP_TO_SINT, DstVT) \|\|		unsigned SIntOpcode = Node->isStrictFPOpcode() ? ISD::STRICT_FP_TO_SINT :
		ISD::FP_TO_SINT;
		if (DstVT.isVector() && (!isOperationLegalOrCustom(SIntOpcode, DstVT) \|\|
!isOperationLegalOrCustomOrPromote(ISD::XOR, SrcVT)))		!isOperationLegalOrCustomOrPromote(ISD::XOR, SrcVT)))
return false;		return false;

// If the maximum float value is smaller then the signed integer range,		// If the maximum float value is smaller then the signed integer range,
// the destination signmask can't be represented by the float, so we can		// the destination signmask can't be represented by the float, so we can
// just use FP_TO_SINT directly.		// just use FP_TO_SINT directly.
const fltSemantics &APFSem = DAG.EVTToAPFloatSemantics(SrcVT);		const fltSemantics &APFSem = DAG.EVTToAPFloatSemantics(SrcVT);
APFloat APF(APFSem, APInt::getNullValue(SrcVT.getScalarSizeInBits()));		APFloat APF(APFSem, APInt::getNullValue(SrcVT.getScalarSizeInBits()));
APInt SignMask = APInt::getSignMask(DstVT.getScalarSizeInBits());		APInt SignMask = APInt::getSignMask(DstVT.getScalarSizeInBits());
if (APFloat::opOverflow &		if (APFloat::opOverflow &
APF.convertFromAPInt(SignMask, false, APFloat::rmNearestTiesToEven)) {		APF.convertFromAPInt(SignMask, false, APFloat::rmNearestTiesToEven)) {
		if (Node->isStrictFPOpcode()) {
		Result = DAG.getNode(ISD::STRICT_FP_TO_SINT, dl, { DstVT, MVT::Other },
		{ Node->getOperand(0), Src });
		// Relink the chain
		DAG.ReplaceAllUsesOfValueWith(SDValue(Node,1), Result.getValue(1));
		}
		else
Result = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Src);		Result = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Src);
return true;		return true;
}		}

SDValue Cst = DAG.getConstantFP(APF, dl, SrcVT);		SDValue Cst = DAG.getConstantFP(APF, dl, SrcVT);
SDValue Sel = DAG.getSetCC(dl, SetCCVT, Src, Cst, ISD::SETLT);		SDValue Sel = DAG.getSetCC(dl, SetCCVT, Src, Cst, ISD::SETLT);

bool Strict = shouldUseStrictFP_TO_INT(SrcVT, DstVT, /IsSigned/ false);		bool Strict = Node->isStrictFPOpcode() \|\|
		shouldUseStrictFP_TO_INT(SrcVT, DstVT, /IsSigned/ false);

if (Strict) {		if (Strict) {
// Expand based on maximum range of FP_TO_SINT, if the value exceeds the		// Expand based on maximum range of FP_TO_SINT, if the value exceeds the
// signmask then offset (the result of which should be fully representable).		// signmask then offset (the result of which should be fully representable).
// Sel = Src < 0x8000000000000000		// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000		// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000		// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs		// Result = fp_to_sint(Val) ^ Ofs

// TODO: Should any fast-math-flags be set for the FSUB?		// TODO: Should any fast-math-flags be set for the FSUB?
SDValue Val = DAG.getSelect(dl, SrcVT, Sel, Src,		SDValue SrcBiased;
DAG.getNode(ISD::FSUB, dl, SrcVT, Src, Cst));		if (Node->isStrictFPOpcode())
		SrcBiased = DAG.getNode(ISD::STRICT_FSUB, dl, { SrcVT, MVT::Other },
		{ Node->getOperand(0), Src, Cst });
		else
		SrcBiased = DAG.getNode(ISD::FSUB, dl, SrcVT, Src, Cst);
		SDValue Val = DAG.getSelect(dl, SrcVT, Sel, Src, SrcBiased);
SDValue Ofs = DAG.getSelect(dl, DstVT, Sel, DAG.getConstant(0, dl, DstVT),		SDValue Ofs = DAG.getSelect(dl, DstVT, Sel, DAG.getConstant(0, dl, DstVT),
DAG.getConstant(SignMask, dl, DstVT));		DAG.getConstant(SignMask, dl, DstVT));
Result = DAG.getNode(ISD::XOR, dl, DstVT,		SDValue SInt;
DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Val), Ofs);		if (Node->isStrictFPOpcode()) {
		SInt = DAG.getNode(ISD::STRICT_FP_TO_SINT, dl, { DstVT, MVT::Other },
		{ SrcBiased.getValue(1), Val });
		// Relink the chain
		DAG.ReplaceAllUsesOfValueWith(SDValue(Node,1), SInt.getValue(1));
		}
		else
		SInt = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Val);
		Result = DAG.getNode(ISD::XOR, dl, DstVT, SInt, Ofs);
} else {		} else {
// Expand based on maximum range of FP_TO_SINT:		// Expand based on maximum range of FP_TO_SINT:
// True = fp_to_sint(Src)		// True = fp_to_sint(Src)
// False = 0x8000000000000000 + fp_to_sint(Src - 0x8000000000000000)		// False = 0x8000000000000000 + fp_to_sint(Src - 0x8000000000000000)
// Result = select (Src < 0x8000000000000000), True, False		// Result = select (Src < 0x8000000000000000), True, False

SDValue True = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Src);		SDValue True = DAG.getNode(ISD::FP_TO_SINT, dl, DstVT, Src);
// TODO: Should any fast-math-flags be set for the FSUB?		// TODO: Should any fast-math-flags be set for the FSUB?
▲ Show 20 Lines • Show All 1,294 Lines • Show Last 20 Lines

lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	return StringSwitch<ExceptionBehavior>(ExceptionArg)
.Case("fpexcept.strict", ebStrict)		.Case("fpexcept.strict", ebStrict)
.Default(ebInvalid);		.Default(ebInvalid);
}		}

bool ConstrainedFPIntrinsic::isUnaryOp() const {		bool ConstrainedFPIntrinsic::isUnaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
default:		default:
return false;		return false;
		case Intrinsic::experimental_constrained_fptosi:
		case Intrinsic::experimental_constrained_fptoui:
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
case Intrinsic::experimental_constrained_exp2:		case Intrinsic::experimental_constrained_exp2:
case Intrinsic::experimental_constrained_log:		case Intrinsic::experimental_constrained_log:
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 4,269 Lines • ▼ Show 20 Lines	case Intrinsic::coro_id: {
break;		break;
}		}
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
case Intrinsic::experimental_constrained_fma:		case Intrinsic::experimental_constrained_fma:
		case Intrinsic::experimental_constrained_fptosi:
		case Intrinsic::experimental_constrained_fptoui:
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
case Intrinsic::experimental_constrained_nearbyint:		case Intrinsic::experimental_constrained_nearbyint:
case Intrinsic::experimental_constrained_ceil:		case Intrinsic::experimental_constrained_ceil:
case Intrinsic::experimental_constrained_floor:		case Intrinsic::experimental_constrained_floor:
case Intrinsic::experimental_constrained_round:		case Intrinsic::experimental_constrained_round:
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
Assert((NumOperands == 3), "invalid arguments for constrained FP intrinsic",		Assert((NumOperands == 3), "invalid arguments for constrained FP intrinsic",
&FPI);		&FPI);
HasExceptionMD = true;		HasExceptionMD = true;
HasRoundingMD = true;		HasRoundingMD = true;
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Since you've broken this out into a switch statement, can you separate the unary and ternary ops and give them each the appropriate assert? I think that would be much more readable than this compound check (which I realize was my creation). andrew.w.kaylor: Since you've broken this out into a switch statement, can you separate the unary and ternary…
break;		break;

case Intrinsic::experimental_constrained_fma:		case Intrinsic::experimental_constrained_fma:
Assert((NumOperands == 5), "invalid arguments for constrained FP intrinsic",		Assert((NumOperands == 5), "invalid arguments for constrained FP intrinsic",
&FPI);		&FPI);
HasExceptionMD = true;		HasExceptionMD = true;
HasRoundingMD = true;		HasRoundingMD = true;
break;		break;

case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
		kbartonUnsubmitted Not Done Reply Inline Actions Could you use the isFPOrFPVectorTy here to check both conditions, and then remove the assert in the else below? kbarton: Could you use the isFPOrFPVectorTy here to check both conditions, and then remove the assert in…
case Intrinsic::experimental_constrained_frem:		case Intrinsic::experimental_constrained_frem:
		kbartonUnsubmitted Not Done Reply Inline Actions Can you use a dyn_cast here instead? if (auto OperandT = dyn_cast<VectorType>(Operand->getType())) { do vector stuff } kbarton:* Can you use a dyn_cast here instead? if (auto *OperandT = dyn_cast<VectorType>(Operand->getType…
		kpnAuthorUnsubmitted Done Reply Inline Actions Yes, that's much more concise. kpn: Yes, that's much more concise.
case Intrinsic::experimental_constrained_pow:		case Intrinsic::experimental_constrained_pow:
case Intrinsic::experimental_constrained_powi:		case Intrinsic::experimental_constrained_powi:
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions The default should probably always been an error. andrew.w.kaylor: The default should probably always been an error.
case Intrinsic::experimental_constrained_maxnum:		case Intrinsic::experimental_constrained_maxnum:
case Intrinsic::experimental_constrained_minnum:		case Intrinsic::experimental_constrained_minnum:
Assert((NumOperands == 4), "invalid arguments for constrained FP intrinsic",		Assert((NumOperands == 4), "invalid arguments for constrained FP intrinsic",
&FPI);		&FPI);
HasExceptionMD = true;		HasExceptionMD = true;
HasRoundingMD = true;		HasRoundingMD = true;
break;		break;

		case Intrinsic::experimental_constrained_fptosi:
		case Intrinsic::experimental_constrained_fptoui: {
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions How about this? int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1); On the other hand, are we ever going to have intrinsics that have a rounding mode but not exception behavior? andrew.w.kaylor: How about this? int RoundingIdx = (HasExceptionMD ? NumOperands - 2 : NumOperands - 1); On…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Done. I don't know if we'll have a rounding mode but not exceptions. I doubt it, but I can't say for certain. kpn: Done. I don't know if we'll have a rounding mode but not exceptions. I doubt it, but I can't…
		Assert((NumOperands == 2),
		"invalid arguments for constrained FP intrinsic", &FPI);
		HasExceptionMD = true;
		kbartonUnsubmitted Done Reply Inline Actions Similarly, could you use isIntOrIntVectorTy here and remove the assert in the if/else below? kbarton: Similarly, could you use isIntOrIntVectorTy here and remove the assert in the if/else below?

		Value *Operand = FPI.getArgOperand(0);
		uint64_t NumSrcElem = 0;
		Assert(Operand->getType()->isFPOrFPVectorTy(),
		"Intrinsic first argument must be floating point", &FPI);
		if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
		NumSrcElem = OperandT->getNumElements();
		}

		Operand = &FPI;
		Assert((NumSrcElem > 0) == Operand->getType()->isVectorTy(),
		"Intrinsic first argument and result disagree on vector use", &FPI);
		Assert(Operand->getType()->isIntOrIntVectorTy(),
		"Intrinsic result must be an integer", &FPI);
		if (auto *OperandT = dyn_cast<VectorType>(Operand->getType())) {
		Assert(NumSrcElem == OperandT->getNumElements(),
		"Intrinsic first argument and result vector lengths must be equal",
		&FPI);
		}
		}
		break;

case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext: {		case Intrinsic::experimental_constrained_fpext: {
		kbartonUnsubmitted Not Done Reply Inline Actions same comment about commoning the asserts here. kbarton: same comment about commoning the asserts here.
		kpnAuthorUnsubmitted Done Reply Inline Actions The fptrunc and fpext changes were split out into another ticket and updated there. That committed code is more concise in I suspect the way you are intending. kpn: The fptrunc and fpext changes were split out into another ticket and updated there. That…
if (FPI.getIntrinsicID() == Intrinsic::experimental_constrained_fptrunc) {		if (FPI.getIntrinsicID() == Intrinsic::experimental_constrained_fptrunc) {
Assert((NumOperands == 3),		Assert((NumOperands == 3),
"invalid arguments for constrained FP intrinsic", &FPI);		"invalid arguments for constrained FP intrinsic", &FPI);
HasRoundingMD = true;		HasRoundingMD = true;
} else {		} else {
Assert((NumOperands == 2),		Assert((NumOperands == 2),
"invalid arguments for constrained FP intrinsic", &FPI);		"invalid arguments for constrained FP intrinsic", &FPI);
}		}
Show All 27 Lines	case Intrinsic::experimental_constrained_fpext: {
}		}
}		}
break;		break;

default:		default:
llvm_unreachable("Invalid constrained FP intrinsic!");		llvm_unreachable("Invalid constrained FP intrinsic!");
}		}

// If a non-metadata argument is passed in a metadata slot then the		// If a non-metadata argument is passed in a metadata slot then the
		kbartonUnsubmitted Not Done Reply Inline Actions It looks like getArgOperand expects an unsigned. Is there a specific reason you are using an int here? If it's possible that RoundingIdx is negative, then you should probably add an assert here before passing it to getArgOperand. kbarton: It looks like getArgOperand expects an unsigned. Is there a specific reason you are using an…
// error will be caught earlier when the incorrect argument doesn't		// error will be caught earlier when the incorrect argument doesn't
// match the specification in the intrinsic call table. Thus, no		// match the specification in the intrinsic call table. Thus, no
// argument type check is needed here.		// argument type check is needed here.

if (HasExceptionMD) {		if (HasExceptionMD) {
Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid,		Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid,
"invalid exception behavior argument", &FPI);		"invalid exception behavior argument", &FPI);
}		}
▲ Show 20 Lines • Show All 653 Lines • Show Last 20 Lines

test/CodeGen/X86/fp-intrinsics.ll

Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines	%result = call double @llvm.experimental.constrained.fma.f64(
metadata !"fpexcept.strict")		metadata !"fpexcept.strict")
ret double %result		ret double %result
}		}

; CHECK-LABEL: f19		; CHECK-LABEL: f19
; COMMON: fmod		; COMMON: fmod
define double @f19() {		define double @f19() {
entry:		entry:
%rem = call double @llvm.experimental.constrained.frem.f64(		%rem = call double @llvm.experimental.constrained.frem.f64(
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Can you check the entire expanded IR pattern here? It might be worth having a separate test that verifies the StrictFP pass in isolation. andrew.w.kaylor: Can you check the entire expanded IR pattern here? It might be worth having a separate test…
double 1.000000e+00,		double 1.000000e+00,
double 1.000000e+01,		double 1.000000e+01,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict")		metadata !"fpexcept.strict")
ret double %rem		ret double %rem
}		}

		; Verify that fptosi(42.1) isn't simplified when the rounding mode is
		; unknown.
		; Verify that no gross errors happen.
		; CHECK-LABEL: @f20
		; COMMON: cvttsd2si
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Same here as the vector version below. Do we want the truncating convert? That was surprising to me. cameron.mcinally: Same here as the vector version below. Do we want the truncating convert? That was surprising…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions The strict intrinsic results in the same instruction as the regular fptosi instruction. Having them be the same means the mutation from strict to non-strict is working correctly. And, yes, rounding towards zero is correct. That's why there's no rounding metadata. kpn: The strict intrinsic results in the same instruction as the regular fptosi instruction. Having…
		define i32 @f20() {
		entry:
		%result = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double 42.1,
		metadata !"fpexcept.strict")
		ret i32 %result
		}

		; Verify that fptoui(42.1) isn't simplified when the rounding mode is
		; unknown.
		; Verify that no gross errors happen.
		; CHECK-LABEL: @f20u
		; COMMON: cvttsd2si
		define i32 @f20u() {
		entry:
		%result = call i32 @llvm.experimental.constrained.fptoui.i32.f64(double 42.1,
		metadata !"fpexcept.strict")
		ret i32 %result
		}

; Verify that round(42.1) isn't simplified when the rounding mode is		; Verify that round(42.1) isn't simplified when the rounding mode is
; unknown.		; unknown.
; Verify that no gross errors happen.		; Verify that no gross errors happen.
; CHECK-LABEL: @f21		; CHECK-LABEL: @f21
; COMMON: cvtsd2ss		; COMMON: cvtsd2ss
define float @f21() {		define float @f21() {
entry:		entry:
%result = call float @llvm.experimental.constrained.fptrunc.f32.f64(		%result = call float @llvm.experimental.constrained.fptrunc.f32.f64(
Show All 27 Lines
declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)		declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)
declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)		declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)
declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)		declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)
declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)		declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)		declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)		declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)		declare float @llvm.experimental.constrained.fma.f32(float, float, float, metadata, metadata)
declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)		declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
		declare i32 @llvm.experimental.constrained.fptosi.i32.f64(double, metadata)
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions I believe your decorated intrinsic names are incorrect in all of these cases. The name needs to specify both return type and argument type. Take a look at what opt produces if you give it these names as inputs. andrew.w.kaylor: I believe your decorated intrinsic names are incorrect in all of these cases. The name needs to…
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Do we want unsigned convert tests too? fptoui? I see that there are SystemZ tests to cover them, so maybe that's sufficient? Just pointing this out so others can see. cameron.mcinally: Do we want unsigned convert tests too? fptoui? I see that there are SystemZ tests to cover…
		kpnAuthorUnsubmitted Not Done Reply Inline Actions The SystemZ tests target hardware new enough to lower to a single instruction. The tests for fptoui on x86 use the default lowering, but the default lowering is disallowed since it does speculative execution and traps. The support for fixing that I had in this patch but was asked to split it out into another patch. So there's no constrained fptoui test here using the default lowering in this patch. kpn: The SystemZ tests target hardware new enough to lower to a single instruction. The tests for…
		declare i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata)
declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)		declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)
declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)		declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll

Show First 20 Lines • Show All 3,825 Lines • ▼ Show 20 Lines	%min = call <4 x double> @llvm.experimental.constrained.minnum.v4f64(
double 46.0, double 47.0>,		double 46.0, double 47.0>,
<4 x double> <double 40.0, double 41.0,		<4 x double> <double 40.0, double 41.0,
double 42.0, double 43.0>,		double 42.0, double 43.0>,
metadata !"round.dynamic",		metadata !"round.dynamic",
metadata !"fpexcept.strict")		metadata !"fpexcept.strict")
ret <4 x double> %min		ret <4 x double> %min
}		}

		define <1 x i32> @constrained_vector_fptosi_v1i32_v1f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v1i32_v1f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v1i32_v1f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f32(
		<1 x float><float 42.0>,
		metadata !"fpexcept.strict")
		ret <1 x i32> %result
		}

		define <2 x i32> @constrained_vector_fptosi_v2i32_v2f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v2i32_v2f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v2i32_v2f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32(
		<2 x float><float 42.0, float 43.0>,
		metadata !"fpexcept.strict")
		ret <2 x i32> %result
		}

		define <3 x i32> @constrained_vector_fptosi_v3i32_v3f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v3i32_v3f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm0
		; CHECK-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v3i32_v3f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vmovd %eax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f32(
		<3 x float><float 42.0, float 43.0,
		float 44.0>,
		metadata !"fpexcept.strict")
		ret <3 x i32> %result
		}

		define <4 x i32> @constrained_vector_fptosi_v4i32_v4f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v4i32_v4f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttps2dq {{.*}}(%rip), %xmm0
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v4i32_v4f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttps2dq {{.*}}(%rip), %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32(
		<4 x float><float 42.0, float 43.0,
		float 44.0, float 45.0>,
		metadata !"fpexcept.strict")
		ret <4 x i32> %result
		}

		define <1 x i64> @constrained_vector_fptosi_v1i64_v1f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v1i64_v1f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v1i64_v1f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f32(
		<1 x float><float 42.0>,
		metadata !"fpexcept.strict")
		ret <1 x i64> %result
		}

		define <2 x i64> @constrained_vector_fptosi_v2i64_v2f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v2i64_v2f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v2i64_v2f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f32(
		<2 x float><float 42.0, float 43.0>,
		metadata !"fpexcept.strict")
		ret <2 x i64> %result
		}

		define <3 x i64> @constrained_vector_fptosi_v3i64_v3f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v3i64_v3f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rdx
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rcx
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v3i64_v3f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f32(
		<3 x float><float 42.0, float 43.0,
		float 44.0>,
		metadata !"fpexcept.strict")
		ret <3 x i64> %result
		}

		define <4 x i64> @constrained_vector_fptosi_v4i64_v4f32() {
		; CHECK-LABEL: constrained_vector_fptosi_v4i64_v4f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm2
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v4i64_v4f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm2
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
		; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32(
		<4 x float><float 42.0, float 43.0,
		float 44.0, float 45.0>,
		metadata !"fpexcept.strict")
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions This surprised me. Should this be the truncating convert? Or should it be vcvtsd2si? cameron.mcinally: This surprised me. Should this be the truncating convert? Or should it be vcvtsd2si?
		kpnAuthorUnsubmitted Not Done Reply Inline Actions Yes, it should be a truncating conversion. kpn: Yes, it should be a truncating conversion.
		ret <4 x i64> %result
		}

		define <1 x i32> @constrained_vector_fptosi_v1i32_v1f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v1i32_v1f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v1i32_v1f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f64(
		<1 x double><double 42.1>,
		metadata !"fpexcept.strict")
		ret <1 x i32> %result
		}


		define <2 x i32> @constrained_vector_fptosi_v2i32_v2f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v2i32_v2f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v2i32_v2f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f64(
		<2 x double><double 42.1, double 42.2>,
		metadata !"fpexcept.strict")
		ret <2 x i32> %result
		}

		define <3 x i32> @constrained_vector_fptosi_v3i32_v3f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v3i32_v3f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm0
		; CHECK-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v3i32_v3f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vmovd %eax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f64(
		<3 x double><double 42.1, double 42.2,
		double 42.3>,
		metadata !"fpexcept.strict")
		ret <3 x i32> %result
		}

		define <4 x i32> @constrained_vector_fptosi_v4i32_v4f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v4i32_v4f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm2
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
		; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v4i32_v4f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttpd2dqy {{.*}}(%rip), %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f64(
		<4 x double><double 42.1, double 42.2,
		double 42.3, double 42.4>,
		metadata !"fpexcept.strict")
		ret <4 x i32> %result
		}

		define <1 x i64> @constrained_vector_fptosi_v1i64_v1f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v1i64_v1f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v1i64_v1f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f64(
		<1 x double><double 42.1>,
		metadata !"fpexcept.strict")
		ret <1 x i64> %result
		}

		define <2 x i64> @constrained_vector_fptosi_v2i64_v2f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v2i64_v2f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v2i64_v2f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(
		<2 x double><double 42.1, double 42.2>,
		metadata !"fpexcept.strict")
		ret <2 x i64> %result
		}

		define <3 x i64> @constrained_vector_fptosi_v3i64_v3f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v3i64_v3f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rdx
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rcx
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v3i64_v3f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f64(
		<3 x double><double 42.1, double 42.2,
		double 42.3>,
		metadata !"fpexcept.strict")
		ret <3 x i64> %result
		}

		define <4 x i64> @constrained_vector_fptosi_v4i64_v4f64() {
		; CHECK-LABEL: constrained_vector_fptosi_v4i64_v4f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm2
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptosi_v4i64_v4f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm2
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
		; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f64(
		<4 x double><double 42.1, double 42.2,
		double 42.3, double 42.4>,
		metadata !"fpexcept.strict")
		ret <4 x i64> %result
		}

		define <1 x i32> @constrained_vector_fptoui_v1i32_v1f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v1i32_v1f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v1i32_v1f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f32(
		<1 x float><float 42.0>,
		metadata !"fpexcept.strict")
		ret <1 x i32> %result
		}

		define <2 x i32> @constrained_vector_fptoui_v2i32_v2f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v2i32_v2f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v2i32_v2f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f32(
		<2 x float><float 42.0, float 43.0>,
		metadata !"fpexcept.strict")
		ret <2 x i32> %result
		}

		define <3 x i32> @constrained_vector_fptoui_v3i32_v3f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v3i32_v3f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm0
		; CHECK-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v3i32_v3f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %ecx
		; AVX-NEXT: vmovd %ecx, %xmm0
		; AVX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f32(
		<3 x float><float 42.0, float 43.0,
		float 44.0>,
		metadata !"fpexcept.strict")
		ret <3 x i32> %result
		}

		define <4 x i32> @constrained_vector_fptoui_v4i32_v4f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v4i32_v4f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm0
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm2
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm0
		; CHECK-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v4i32_v4f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %ecx
		; AVX-NEXT: vmovd %ecx, %xmm0
		; AVX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f32(
		<4 x float><float 42.0, float 43.0,
		float 44.0, float 45.0>,
		metadata !"fpexcept.strict")
		ret <4 x i32> %result
		}

		define <1 x i64> @constrained_vector_fptoui_v1i64_v1f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v1i64_v1f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v1i64_v1f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f32(
		<1 x float><float 42.0>,
		metadata !"fpexcept.strict")
		ret <1 x i64> %result
		}

		define <2 x i64> @constrained_vector_fptoui_v2i64_v2f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v2i64_v2f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v2i64_v2f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f32(
		<2 x float><float 42.0, float 43.0>,
		metadata !"fpexcept.strict")
		ret <2 x i64> %result
		}

		define <3 x i64> @constrained_vector_fptoui_v3i64_v3f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v3i64_v3f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rdx
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rcx
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v3i64_v3f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f32(
		<3 x float><float 42.0, float 43.0,
		float 44.0>,
		metadata !"fpexcept.strict")
		ret <3 x i64> %result
		}

		define <4 x i64> @constrained_vector_fptoui_v4i64_v4f32() {
		; CHECK-LABEL: constrained_vector_fptoui_v4i64_v4f32:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm2
		; CHECK-NEXT: cvttss2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v4i64_v4f32:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vcvttss2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm2
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
		; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f32(
		<4 x float><float 42.0, float 43.0,
		float 44.0, float 45.0>,
		metadata !"fpexcept.strict")
		ret <4 x i64> %result
		}

		define <1 x i32> @constrained_vector_fptoui_v1i32_v1f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v1i32_v1f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v1i32_v1f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f64(
		<1 x double><double 42.1>,
		metadata !"fpexcept.strict")
		ret <1 x i32> %result
		}

		define <2 x i32> @constrained_vector_fptoui_v2i32_v2f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v2i32_v2f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v2i32_v2f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f64(
		<2 x double><double 42.1, double 42.2>,
		metadata !"fpexcept.strict")
		ret <2 x i32> %result
		}

		define <3 x i32> @constrained_vector_fptoui_v3i32_v3f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v3i32_v3f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm0
		; CHECK-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %eax
		; CHECK-NEXT: movd %eax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v3i32_v3f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %ecx
		; AVX-NEXT: vmovd %ecx, %xmm0
		; AVX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f64(
		<3 x double><double 42.1, double 42.2,
		double 42.3>,
		metadata !"fpexcept.strict")
		ret <3 x i32> %result
		}

		define <4 x i32> @constrained_vector_fptoui_v4i32_v4f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v4i32_v4f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm0[0]
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm2
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm2[0]
		; CHECK-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v4i32_v4f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %ecx
		; AVX-NEXT: vmovd %ecx, %xmm0
		; AVX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %eax
		; AVX-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f64(
		<4 x double><double 42.1, double 42.2,
		double 42.3, double 42.4>,
		metadata !"fpexcept.strict")
		ret <4 x i32> %result
		}

		define <1 x i64> @constrained_vector_fptoui_v1i64_v1f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v1i64_v1f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v1i64_v1f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: retq
		entry:
		%result = call <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f64(
		<1 x double><double 42.1>,
		metadata !"fpexcept.strict")
		ret <1 x i64> %result
		}

		define <2 x i64> @constrained_vector_fptoui_v2i64_v2f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v2i64_v2f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v2i64_v2f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: retq
		entry:
		%result = call <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(
		<2 x double><double 42.1, double 42.2>,
		metadata !"fpexcept.strict")
		ret <2 x i64> %result
		}

		define <3 x i64> @constrained_vector_fptoui_v3i64_v3f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v3i64_v3f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rdx
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rcx
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v3i64_v3f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f64(
		<3 x double><double 42.1, double 42.2,
		double 42.3>,
		metadata !"fpexcept.strict")
		ret <3 x i64> %result
		}

		define <4 x i64> @constrained_vector_fptoui_v4i64_v4f64() {
		; CHECK-LABEL: constrained_vector_fptoui_v4i64_v4f64:
		; CHECK: # %bb.0: # %entry
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm0
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm2
		; CHECK-NEXT: cvttsd2si {{.*}}(%rip), %rax
		; CHECK-NEXT: movq %rax, %xmm1
		; CHECK-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
		; CHECK-NEXT: retq
		;
		; AVX-LABEL: constrained_vector_fptoui_v4i64_v4f64:
		; AVX: # %bb.0: # %entry
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm0
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm1
		; AVX-NEXT: vcvttsd2si {{.*}}(%rip), %rax
		; AVX-NEXT: vmovq %rax, %xmm2
		; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm2[0],xmm1[0]
		; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
		; AVX-NEXT: retq
		entry:
		%result = call <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f64(
		<4 x double><double 42.1, double 42.2,
		double 42.3, double 42.4>,
		metadata !"fpexcept.strict")
		ret <4 x i64> %result
		}


define <1 x float> @constrained_vector_fptrunc_v1f64() {		define <1 x float> @constrained_vector_fptrunc_v1f64() {
; CHECK-LABEL: constrained_vector_fptrunc_v1f64:		; CHECK-LABEL: constrained_vector_fptrunc_v1f64:
; CHECK: # %bb.0: # %entry		; CHECK: # %bb.0: # %entry
; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero		; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0		; CHECK-NEXT: cvtsd2ss %xmm0, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
;		;
; AVX-LABEL: constrained_vector_fptrunc_v1f64:		; AVX-LABEL: constrained_vector_fptrunc_v1f64:
▲ Show 20 Lines • Show All 777 Lines • ▼ Show 20 Lines
declare <2 x double> @llvm.experimental.constrained.exp2.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.exp2.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.log.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.log.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.log10.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.log10.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.log2.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.log2.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.rint.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.nearbyint.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.maxnum.v2f64(<2 x double>, <2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.maxnum.v2f64(<2 x double>, <2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.minnum.v2f64(<2 x double>, <2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.minnum.v2f64(<2 x double>, <2 x double>, metadata, metadata)
		declare <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f32(<2 x float>, metadata)
		declare <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f32(<2 x float>, metadata)
		declare <2 x i32> @llvm.experimental.constrained.fptosi.v2i32.v2f64(<2 x double>, metadata)
		declare <2 x i64> @llvm.experimental.constrained.fptosi.v2i64.v2f64(<2 x double>, metadata)
		declare <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f32(<2 x float>, metadata)
		declare <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f32(<2 x float>, metadata)
		declare <2 x i32> @llvm.experimental.constrained.fptoui.v2i32.v2f64(<2 x double>, metadata)
		declare <2 x i64> @llvm.experimental.constrained.fptoui.v2i64.v2f64(<2 x double>, metadata)
declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata, metadata)		declare <2 x float> @llvm.experimental.constrained.fptrunc.v2f32.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata)		declare <2 x double> @llvm.experimental.constrained.fpext.v2f64.v2f32(<2 x float>, metadata)
declare <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.ceil.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.floor.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.round.v2f64(<2 x double>, metadata, metadata)
declare <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>, metadata, metadata)		declare <2 x double> @llvm.experimental.constrained.trunc.v2f64(<2 x double>, metadata, metadata)

; Scalar width declarations		; Scalar width declarations
Show All 11 Lines
declare <1 x float> @llvm.experimental.constrained.exp2.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.exp2.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.log.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.log.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.log10.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.log10.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.log2.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.log2.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.rint.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.rint.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.nearbyint.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.nearbyint.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.maxnum.v1f32(<1 x float>, <1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.maxnum.v1f32(<1 x float>, <1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.minnum.v1f32(<1 x float>, <1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.minnum.v1f32(<1 x float>, <1 x float>, metadata, metadata)
		declare <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f32(<1 x float>, metadata)
		declare <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f32(<1 x float>, metadata)
		declare <1 x i32> @llvm.experimental.constrained.fptosi.v1i32.v1f64(<1 x double>, metadata)
		declare <1 x i64> @llvm.experimental.constrained.fptosi.v1i64.v1f64(<1 x double>, metadata)
		declare <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f32(<1 x float>, metadata)
		declare <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f32(<1 x float>, metadata)
		declare <1 x i32> @llvm.experimental.constrained.fptoui.v1i32.v1f64(<1 x double>, metadata)
		declare <1 x i64> @llvm.experimental.constrained.fptoui.v1i64.v1f64(<1 x double>, metadata)
declare <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(<1 x double>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.fptrunc.v1f32.v1f64(<1 x double>, metadata, metadata)
declare <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(<1 x float>, metadata)		declare <1 x double> @llvm.experimental.constrained.fpext.v1f64.v1f32(<1 x float>, metadata)
declare <1 x float> @llvm.experimental.constrained.ceil.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.ceil.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.floor.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.floor.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.round.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.round.v1f32(<1 x float>, metadata, metadata)
declare <1 x float> @llvm.experimental.constrained.trunc.v1f32(<1 x float>, metadata, metadata)		declare <1 x float> @llvm.experimental.constrained.trunc.v1f32(<1 x float>, metadata, metadata)

; Illegal width declarations		; Illegal width declarations
Show All 30 Lines
declare <3 x float> @llvm.experimental.constrained.rint.v3f32(<3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.rint.v3f32(<3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.rint.v3f64(<3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.rint.v3f64(<3 x double>, metadata, metadata)
declare <3 x float> @llvm.experimental.constrained.nearbyint.v3f32(<3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.nearbyint.v3f32(<3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.nearbyint.v3f64(<3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.nearbyint.v3f64(<3 x double>, metadata, metadata)
declare <3 x float> @llvm.experimental.constrained.maxnum.v3f32(<3 x float>, <3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.maxnum.v3f32(<3 x float>, <3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.maxnum.v3f64(<3 x double>, <3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.maxnum.v3f64(<3 x double>, <3 x double>, metadata, metadata)
declare <3 x float> @llvm.experimental.constrained.minnum.v3f32(<3 x float>, <3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.minnum.v3f32(<3 x float>, <3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.minnum.v3f64(<3 x double>, <3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.minnum.v3f64(<3 x double>, <3 x double>, metadata, metadata)
		declare <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f32(<3 x float>, metadata)
		declare <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f32(<3 x float>, metadata)
		declare <3 x i32> @llvm.experimental.constrained.fptosi.v3i32.v3f64(<3 x double>, metadata)
		declare <3 x i64> @llvm.experimental.constrained.fptosi.v3i64.v3f64(<3 x double>, metadata)
		declare <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f32(<3 x float>, metadata)
		declare <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f32(<3 x float>, metadata)
		declare <3 x i32> @llvm.experimental.constrained.fptoui.v3i32.v3f64(<3 x double>, metadata)
		declare <3 x i64> @llvm.experimental.constrained.fptoui.v3i64.v3f64(<3 x double>, metadata)
declare <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64(<3 x double>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.fptrunc.v3f32.v3f64(<3 x double>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32(<3 x float>, metadata)		declare <3 x double> @llvm.experimental.constrained.fpext.v3f64.v3f32(<3 x float>, metadata)
declare <3 x float> @llvm.experimental.constrained.ceil.v3f32(<3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.ceil.v3f32(<3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.ceil.v3f64(<3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.ceil.v3f64(<3 x double>, metadata, metadata)
declare <3 x float> @llvm.experimental.constrained.floor.v3f32(<3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.floor.v3f32(<3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.floor.v3f64(<3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.floor.v3f64(<3 x double>, metadata, metadata)
declare <3 x float> @llvm.experimental.constrained.round.v3f32(<3 x float>, metadata, metadata)		declare <3 x float> @llvm.experimental.constrained.round.v3f32(<3 x float>, metadata, metadata)
declare <3 x double> @llvm.experimental.constrained.round.v3f64(<3 x double>, metadata, metadata)		declare <3 x double> @llvm.experimental.constrained.round.v3f64(<3 x double>, metadata, metadata)
Show All 15 Lines
declare <4 x double> @llvm.experimental.constrained.exp2.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.exp2.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.log.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.log.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.log10.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.log10.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.log2.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.log2.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.rint.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.rint.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.nearbyint.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.nearbyint.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.maxnum.v4f64(<4 x double>, <4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.maxnum.v4f64(<4 x double>, <4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.minnum.v4f64(<4 x double>, <4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.minnum.v4f64(<4 x double>, <4 x double>, metadata, metadata)
		declare <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f32(<4 x float>, metadata)
		declare <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f32(<4 x float>, metadata)
		cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Same question as the scalar versions. Do we want <4 x float> to <4 x i32>/etc casts? cameron.mcinally: Same question as the scalar versions. Do we want <4 x float> to <4 x i32>/etc casts?
		kpnAuthorUnsubmitted Not Done Reply Inline Actions On the odd chance that it may tickle the vector legalizer I'll go ahead and add tests with float. Same answer as the scalar tests: Rounding towards zero is correct. kpn: On the odd chance that it may tickle the vector legalizer I'll go ahead and add tests with…
		declare <4 x i32> @llvm.experimental.constrained.fptosi.v4i32.v4f64(<4 x double>, metadata)
		declare <4 x i64> @llvm.experimental.constrained.fptosi.v4i64.v4f64(<4 x double>, metadata)
		declare <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f32(<4 x float>, metadata)
		declare <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f32(<4 x float>, metadata)
		declare <4 x i32> @llvm.experimental.constrained.fptoui.v4i32.v4f64(<4 x double>, metadata)
		declare <4 x i64> @llvm.experimental.constrained.fptoui.v4i64.v4f64(<4 x double>, metadata)
declare <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double>, metadata, metadata)		declare <4 x float> @llvm.experimental.constrained.fptrunc.v4f32.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata)		declare <4 x double> @llvm.experimental.constrained.fpext.v4f64.v4f32(<4 x float>, metadata)
declare <4 x double> @llvm.experimental.constrained.ceil.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.ceil.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.floor.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.floor.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.round.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.round.v4f64(<4 x double>, metadata, metadata)
declare <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double>, metadata, metadata)		declare <4 x double> @llvm.experimental.constrained.trunc.v4f64(<4 x double>, metadata, metadata)

test/Feature/fp-intrinsics.ll

	Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines
	define double @f17() {			define double @f17() {
	entry:			entry:
	%result = call double @llvm.experimental.constrained.fma.f64(double 42.1, double 42.1, double 42.1,			%result = call double @llvm.experimental.constrained.fma.f64(double 42.1, double 42.1, double 42.1,
	metadata !"round.dynamic",			metadata !"round.dynamic",
	metadata !"fpexcept.strict")			metadata !"fpexcept.strict")
	ret double %result			ret double %result
	}			}

				; Verify that fptoui(42.1) isn't simplified when the rounding mode is
				; unknown.
				; CHECK-LABEL: f18
				; CHECK: call zeroext i32 @llvm.experimental.constrained.fptoui
				define zeroext i32 @f18() {
				entry:
				%result = call zeroext i32 @llvm.experimental.constrained.fptoui.i32.f64(
				double 42.1,
				metadata !"fpexcept.strict")
				ret i32 %result
				}

				; Verify that fptosi(42.1) isn't simplified when the rounding mode is
				; unknown.
				; CHECK-LABEL: f19
				; CHECK: call i32 @llvm.experimental.constrained.fptosi
				define i32 @f19() {
				entry:
				%result = call i32 @llvm.experimental.constrained.fptosi.i32.f64(double 42.1,
				metadata !"fpexcept.strict")
				ret i32 %result
				}

	; Verify that fptrunc(42.1) isn't simplified when the rounding mode is			; Verify that fptrunc(42.1) isn't simplified when the rounding mode is
	; unknown.			; unknown.
	; CHECK-LABEL: f20			; CHECK-LABEL: f20
	; CHECK: call float @llvm.experimental.constrained.fptrunc			; CHECK: call float @llvm.experimental.constrained.fptrunc
	define float @f20() {			define float @f20() {
	entry:			entry:
	%result = call float @llvm.experimental.constrained.fptrunc.f32.f64(			%result = call float @llvm.experimental.constrained.fptrunc.f32.f64(
	double 42.1,			double 42.1,
	Show All 26 Lines
	declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.exp.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.exp2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log10.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.log2.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.rint.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)			declare double @llvm.experimental.constrained.nearbyint.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)			declare double @llvm.experimental.constrained.fma.f64(double, double, double, metadata, metadata)
				declare i32 @llvm.experimental.constrained.fptosi.i32.f64(double, metadata)
				declare i32 @llvm.experimental.constrained.fptoui.i32.f64(double, metadata)
				cameron.mcinallyUnsubmitted Not Done Reply Inline Actions I haven't been following along closely, so please forgive if this was already discussed... Should we have float->i32 casts too? Also double->i64? cameron.mcinally: I haven't been following along closely, so please forgive if this was already discussed...
				kpnAuthorUnsubmitted Not Done Reply Inline Actions This is an opt test. I'm not sure we'd benefit from placing those extra tests here. kpn: This is an opt test. I'm not sure we'd benefit from placing those extra tests here.
	declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)			declare float @llvm.experimental.constrained.fptrunc.f32.f64(double, metadata, metadata)
	declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)			declare double @llvm.experimental.constrained.fpext.f64.f32(float, metadata)
				cameron.mcinallyUnsubmitted Not Done Reply Inline Actions Should 'fpext.f64' have a float argument instead of a double? cameron.mcinally: Should 'fpext.f64' have a float argument instead of a double?
				kpnAuthorUnsubmitted Not Done Reply Inline Actions Probably, yes. Will fix. kpn: Probably, yes. Will fix.

This is an archive of the discontinued LLVM Phabricator instance.

More math intrinsics for conservative math handlingAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 201965

docs/LangRef.rst

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/CodeGen/TargetLowering.h

include/llvm/IR/IntrinsicInst.h

include/llvm/IR/Intrinsics.td

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/CodeGen/SelectionDAG/TargetLowering.cpp

lib/IR/IntrinsicInst.cpp

lib/IR/Verifier.cpp

test/CodeGen/X86/fp-intrinsics.ll

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll

test/Feature/fp-intrinsics.ll

More math intrinsics for conservative math handling
AbandonedPublic