This is an archive of the discontinued LLVM Phabricator instance.

Part 2 to fix x86_64 fp128 calling convention.
ClosedPublic

Authored by chh on Jul 22 2015, 4:12 PM.

Download Raw Diff

Details

Reviewers

qcolombet
grosbach
davidxl
echristo
rnk
resistor

Commits

rG7993e18e804d: [X86] Part 2 to fix x86-64 fp128 calling convention.
rL255558: [X86] Part 2 to fix x86-64 fp128 calling convention.

Summary

Part 1 was submitted in http://reviews.llvm.org/D15134.
Bugs:

Changes:

X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
X86CallingConv.td: Pass f128 values in XMM registers or on stack.
X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128.
X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code.
Add unit tests for x86_64 fp128 type.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Same changes as the previous diff, but use -U999999.

davidxl added inline comments.Sep 28 2015, 10:13 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8589 ↗	(On Diff #35928)	Can you explain this a little more?
8593 ↗	(On Diff #35928)	Should it be ... && (N1VT == N1OpVT \|\| N1OpVT != MVT::f128)?
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗	(On Diff #35928)	Why hasType(VT) returns false for FR128? Is it expected?
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3535 ↗	(On Diff #35928)	unconditionally expand? Do you need to add some assertion on the type expected?
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
52 ↗	(On Diff #35928)	why this change?
68 ↗	(On Diff #35928)	Why returning false for some cases and true for others?
78 ↗	(On Diff #35928)	Explain this change?
145 ↗	(On Diff #35928)	Should the problem be fixed in SoftenFloatOperand method?

Add more comments and changes per davlidxl's suggestions.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8589 ↗	(On Diff #35928)	Done. Please see new comment in code.
8593 ↗	(On Diff #35928)	Good suggestion. I might be too conservative here. Now I changed the condition to only N1Op0VT != MVT::f128, and all my tests are still working.
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗	(On Diff #35928)	See also my previous answers and comments at line 143. The question is whether ComRC returned by TRI->getCommonSubClass should guarantee ComRC->hasType(Node->getSimpleValueType(ResNo), or FR128 should be defined to work with existing getCommonSubClass. I couldn't find a clean way to change either FR128 or getCommonSubClass, without breaking many other things. So, here it seems to me the smallest fix on the user of ComRC (UseRC).
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3535 ↗	(On Diff #35928)	Is there some invalid constant type? I think ExpandConstant(CP) should already check necessary node types.
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
52 ↗	(On Diff #35928)	The new default value R will make other code simpler like SetSoftenedFloat(R, R), which just wants to mark this node as softened without changing it.
68 ↗	(On Diff #35928)	See new inline comment at line 62.
78 ↗	(On Diff #35928)	Added more comments in code, line 62. Hope that will make more sense to the comments about FABS, FCOPYSIGN, FNEG, ConstantFP.
145 ↗	(On Diff #35928)	SoftenFloatOperand works with ScanOperands if !KeepFloat. So, the new condition here is when KeepFloat, we should call ReplaceValueWith. It might be possible to fix every SoftenFloatOperand method and also simplify ScanOperands, but that looks like a more complex and risky move.

davidxl added inline comments.Oct 5 2015, 11:22 AM

lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗	(On Diff #36313)	What is VT in this case? What reg class is ComRC here? How about the DstRC below TLI->getRegClassFor(VT)?
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
291 ↗	(On Diff #36313)	Change the name to ExpandConstantFP128?
3535 ↗	(On Diff #36313)	I don't see what expected EVTs are checked in this method?
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
64 ↗	(On Diff #36313)	Does the following change affects other types other than f128?

Adding Owen and Quentin to this review. I'd feel more comfortable if they were to comment on the long term maintainability of this.

Thanks!

-eric

chh added inline comments.Oct 7 2015, 3:03 PM

lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗	(On Diff #36313)	One of the test cases showing this problem is test/CodeGen/X86/avx512-calling-conv.ll, compiled with -mtriple=x86_64-apple-darwin -mcpu=knl VT is v16i8 (23) ComRC is the new FR128RegClassID (74) DstRC is not set yet, but the correct TLI->getRegClassFor(VT) is VR128RegClassID (75) It means that my current definition of FR128 has registers in VR128, and ComRC could be FR128 where VR128 was the only candidate before. But of course FR128 with my current instruction selection patterns won't work for v16i8, and it should not. I tried in my early development of this patch to store f128 value in existing register classes, without a new FR128. But that approach has even more troubles. I also tried to copy some VR128 instruction patterns to FR128 to make this work without this patch in EmitCopyFromReg. But that requires a lot of duplication in instruction selection pattern, and I had not even resolved all the problems. One thing I think might be possible, as my original TODO item, is to change the semantic of getCommonSubClass to guarantee ComRC->hasType(VT), or an extra flag for getCommonSubclass to guarantee ComRC->hasType(VT). I was hoping to break this patch into smaller pieces, and those refactoring tasks seem to be good candidates for follow up patches.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
291 ↗	(On Diff #36313)	Although it is triggered now only for f128 type, but it should work for other types say f80, f96, etc. Actually, the constant value is not floating type, but an integer (binary bits) of the floating point value from CP->getConstantIntValue.
3535 ↗	(On Diff #36313)	I think there is no target-independent type restriction here. I see TLI.isFPImmLegal check in ISD::ConstantFP, but don't see similar limitation for non-FP types. ExpandConstant could have more value range limit if necessary.
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
64 ↗	(On Diff #36313)	Not now. Only x86_64 f128 is configured this way. But in the future new configurations can use this logic to keep value legal (in register) but with soften operators.

Quentin, can you help do a quick review of the patch and give
suggestions on high level direction of the patch?

thanks,

David

Hi Chih-Hung,

I believe the current patch can be split in several smaller patches. For instance, improving the AsmPrinter and changing the ABI seem pretty orthogonal to not have to come in one patch.

Now, as a high level comment, this patch seems quite involved for something that sounds fairly common from the legalizer point of view. I.e., we have a legal type, fp128, with illegal instructions (soften float). Have you checked how we deal with those cases usually? If so, could you explain what does and does not work?
I have not spent a lot of time on thinking or reviewing the patch, so I admit I can tell nonsense, but the fell I have on the patch is that we are doing it wrong.

Regarding the ABI changes, the patch obviously breaks the existing code that have been generated by clang for those types. I would like to have a sense of how risky are that changes. Could you list the targets (OS) that use the variants you are changing?

Thanks,
-Quentin

include/llvm/Target/TargetLowering.h
1907 ↗	(On Diff #36313)	That change makes me shiver. We probably do something against the design if we need to do that.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8592 ↗	(On Diff #36313)	Based on the comment, this seems like a hack that impact all targets to fix a x86_64 backend limitation. The bottom line is that I am guessing we should push that change.
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
147 ↗	(On Diff #36313)	By definition the common sub class of A and B must be able to hold both the types of A and B, since the related registers are both in A and in B (i.e., could be access by A or B register names). The bottom line is that comment must be always wrong and the target must do the right thing to ensure this is true, IMO.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1205 ↗	(On Diff #36313)	How could that be different?! That looks wrong to have to check for both conditions.

This patch should change output of fp128 for Android x86_64 only,
to fix mainly the problem in https://llvm.org/bugs/show_bug.cgi?id=23897.

No regression test failure was found for other targets, but since I am not sure if all other targets have enough regression tests on fp128, I tried to make changes only depend on the new configuration or work for all configurations.

It's possible to break this patch into several ones, but then only the last one should turn on the configuration change for Android x86_64. Then, all patches before the last one will be incomplete and not really tested. It's not right to change the Android x86_64 configuration before the last patch, because right now, fp128 for Android x86_64 works by itself, just not compatible with gcc. A partial change of the fp128 code will fail to compile libm. For example, the AsmPrinter change was actually discovered when libm was compiled with -g. The current code cannot handle new fp128 constant value.

The change was larger than typical bug patch because I think this is the first configuration to keep fp128 value in SSE registers. Before this, most x86 configurations used fp80 on floating point stack, and fp128 on x86 was split into two 64-bit registers. Hence, fp128 was passed in two 64-bit registers or on stack, but we need to pass them in SSE to be compatible with gcc and the ABI. Current nice legalization abstraction for floating point values worked mostly for most target, except a few cases that we have seen hacked for ppc. For fp128 in SSE registers, we need more enhancements to this legalization abstraction layer, or I would have to write up another new kind of action maybe 90% similar to soften floating point, but do something differently.

Some changes to ad-hoc optimizations related to fp128 were made only now because the existing code have not handled fp128 before. The fp128 values were split into two registers and/or converted to library function calls and do not hit the optimization code. Even the ppc f128 type is in two f64 registers. These were mostly found in the compilation of libm, which heavily cast fp128 to/from i128 to use faster bit-wise operations. I have added test cases for those problems I found. So hopefully if any of such changes were reverted, some regression test will fail.

include/llvm/Target/TargetLowering.h
1907 ↗	(On Diff #36313)	I think an alternative is to keep ValueTypeActions private here, but to hack into computeRegisterProperties of lib/CodeGen/TargetLoweringBase.cpp to define different actions for Android x86_64 fp128. That function now has one hack for ppcf128, but that is depending on a special MVT::ppcf128. If we add target dependent code into computeRegisterProperties, for only Android x86_64, not all f128, then that might look more like a hack than making ValueTypeActions accessible to child classes.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8592 ↗	(On Diff #36313)	Yes, i think this impact all x86_64 f128 values in one register. We didn't hit this problem because no target keeps f128 in one register. What is the alternative change?
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
147 ↗	(On Diff #36313)	I can think about this more. I was putting off that kind of change to getCommonSubClass as a TODO in the future, but that was more than a month ago.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1205 ↗	(On Diff #36313)	This is needed only for the new special configuration of Android x86_64 fp128, which is TypeLegal to be in a register, but with a type action to soften the operations to library calls. It's TypeAction is not TypeLegal.

Hi Chih-Hung,

A few email discussions about this patch was sent to reviews+D11438+public+6e2ebe02f28cebc7@reviews.llvm.org and llvm-commits@lists.llvm.org, but they were not shown here. I don't know the cause, but please check llvm-commits mailing list about this topic. Thanks.

Sync with latest LLVM source and
add -attr=+mmx to test cases since new default mmx is off.
Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.

chh updated this object.Oct 20 2015, 5:24 PM

martell added a subscriber: martell.Oct 21 2015, 2:20 AM

In D11438#271749, @chh wrote:

I usually found missed handling of f128 type from testing libm long double on Android target, and those cases appeared in many different places.

Hi @chh,
I'm currently looking into a similar calling convention bug. I just came across this patch as you updated it today
I'll start testing your patch soon but I feel you would have some insight into this bug
https://llvm.org/bugs/show_bug.cgi?id=24398
based on the content of this patch ?

Martell, I don't have Windows or mingw-w64 system to test now, specifically I don't know what gcc generates for that system. If you were compiling test code with gcc and clang, and link with libraries compiled by gcc, then you could hit calling convention problem. You could compare first assembly code output from both compilers.
Please take a look and try the test cases in
https://llvm.org/bugs/show_bug.cgi?id=23897
https://llvm.org/bugs/show_bug.cgi?id=24111

http://reviews.llvm.org/D11437 has the clang changes to fix the calling convention for f128, and this patch is for the llvm part. If mingw-w64 does not use 128-bit floating point, then my patches shouldn't affect your problem, but you can find the source locations to fix mingw-w64 issues.

I've posted my reply to the llvm-developers list to not clog your review
with noise :)

All reviewers, please take a look of the latest diff.
Please add comment for anything you think must be changed before submit.
I test it with Android and on Linux the LLVM regression tests.
After this long review period, I think a field test would be more effective to find any error.
Thanks.

It seems to me the patch does not use the right approach in SoftenFloatResult and SoftenFloatOperands. The patch tries to workaround the the problem that the softened result does not have the right type (same size integer instead of float type). To workaround the problem, the patch skips 'softening' completely for some opcodes such as BITCAST, SELECT, and as a result, has to change the prototype of SoftenFloatResult and allows operands scanning which should not be necessary if done properly.

It seems to be the right approach is to make sure those variants of SoftenFloatRes_<OPCODE> to return and set the right 'softened' float type for f128. There are two main places that need to be changed: TargetLowering::makeLibCall and BitConvertToInteger by making them f128 aware. Does it make sense?

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
726 ↗	(On Diff #37952)	Why is keepFloat computed differently from SoftenFloatResult?
lib/CodeGen/SelectionDAG/LegalizeTypes.h
381 ↗	(On Diff #37952)	If SetSoftenedFloat was called with the right value, then there is no need to do special handling here. This is the reason why I think SoftenFloatResult should not special handling those op code.

David, I have reconsidered a few alternatives and checked again current llvm architecture.

First a few observations for other people not following all the issues before:

The core problem is that llvm classifies types into
legally-in-register-with-HW-instruction or
illegal-and-should-be-expanded-promoted-or-soften.
f128 is classified as illegal for most targets and soften to i128,
which is passed/returned in two 64-bit registers.
The SoftenFloat* functions not only convert operations to library function calls,
but also the result type to i128. 
But on x86_64, calling convention for i128 is different from f128.
GCC and llvm are correct for i128, but llvm incorrectly passed f128 the same as i128.

To give f128 its own calling convention, I still don't have other solution but
to create its own register class. That makes it a new kind of type not well fit
into current scheme.

It does not seem feasible to me to keep using SoftenFloat* functions and
convert f128 to i128, and then add other hacks to use different calling convention
for real i128 and f128-converted-i128. If it is possible, please let me know
how to do it.

If I understand correctly, your suggestion is to fix some SoftenFloatRes_* functions
to return f128 for f128 in this case instead of skipping them.
If we want to fix them and avoid the changes to LegalizeTypes.cpp, they will
also need to scan-and-convert operands, because current LegalizeTypes.cpp simply
"goto NodeDone" after calling SoftenFloatResult.
Wouldn't that change duplicate the ScanOperands actions from LegalizeTypes.cpp to
LegalizeFloatTypes.cpp?

I did not like the duplication of ScanOperands, so I added new return value to
SoftenFloatResult to indicate if ScanOperands is necessary. Then, some
node types no longer need any change inside SoftenFloatResult.

Maybe what I should do is adding more comments to SoftenFloat* functions to
explain that they now do not always convert floating point types to integer types.
In fact, they convert nodes to the TransformToType[VT], which could be f128 or i128.
Current comment of getTypeToTransformTo says:

/// For types supported by the target, this is an identity function....

That can be extended to as "supported by the target or passed/returned in registers".

Another alternative is to keep the exact meaning of LegalizeTypeAction::TypeSoftenFloat
and add a new action like TypeSoftenOnlyFloatOps. We still need to change comments of
several functions to deal with the new legally in register but no HW support f128 type.
For other targets, with the same calling convention for f128 and i128,
they can keep on using TypeSoftenFloat, which still converts nodes to the
TransformToType[VT] but that is assumed to be an integer type.
For x86_64, the new TypeSoftenOnlyFloatOps action will assume that TransformToType[VT]
could be the same as VT and VT could be legally in register.

At the end, SoftenFloat* and SoftenOnlyFloatOps* functions could share some code
to minimize duplication. The shared code would have some flag to serve similar purpose
as the KeepFloat variable in my patch.
Many places that handle LegalizeTypeAction::TypeSoftenFloat will need to be changed
to also handle LegalizeTypeAction::TypeSoftenOnlyFloatOps.
That doesn't seem to be a simpler patch either.

BTW, I am going to LLVM dev meeting too.
So if I don't have time left this week, I will continue next week.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
726 ↗	(On Diff #37952)	SoftenFloatResult converts its given SDNode N, so its KeepFloat is computed from N's value type: EVT VT = N->getValueType(ResNo); EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT); bool KeepFloat = VT.isSimple() && VT == NVT && TLI.isTypeLegal(VT); SoftenFloatOperand converts N's operand, so its KeepFloat is computed from the OpNo operand: EVT OpVT = N->getOperand(OpNo).getValueType(); bool KeepFloat = OpVT.isSimple() && TLI.isTypeLegal(OpVT); The extra check of VT == TLI.getTypeToTransformTo(....) is not really necessary when we have only x86_64 f128 that can be legal and transformed to itself at the same time. I can remove it, or keep it as an redundant check in both places to filter out any other new unexpected configuration. Is this the difference you noticed? Would you rather keep or remove (VT == NVT ) at both places?
lib/CodeGen/SelectionDAG/LegalizeTypes.h
381 ↗	(On Diff #37952)	In LegalizeTypesGeneric.cpp, you can see that a caller of GetSoftenedFloat assumes that a softened float should be split into two integers. We don't want that. So either we modify the meaning of TargetLowering::TypeSoftenFloat and GetSoftenedFloat or we need to use a new action type like TypeSoftenOnlyFloatOps.

Sync with latest source and update comments about TypeSoftenFloat, SoftenFloat*, KeepFloat, and GetSoftenedFloat.
New comments should clarify that only non-HW-supported operations need to be converted and use integer type.
We should think about non-HW-supported operations instead of "illegal" types.
SoftenFloat is really about converting float operations, not necessarily converting to integers.

I am still not convinced that you need to special handling certain opcodes in SoftenFloatResult and returns a bool to indicate if operands need to be scanned.

The difference I can see is that without this patch, CopyToReg and CopyFromReg opcodes are expected to be already type legalized. With your patch, can you add these functions:

SoftenFloatRes_CopyToReg and SoftenFloatRes_CopyFromReg -- it does not need to do anything but it makes sure that the 'softened' result is stored. Currently you simply short-circuit it with

default:
if (keepFloat)

return false;

Use davidxl's suggestion and add case ISD::Register, CopyFromReg, CopyToReg into SoftenFloatResult. They are now only used when KeepFloat, to keep the node unchanged. All other not handled node opcode will still by default abort the compiler.

davidxl added inline comments.Nov 10 2015, 9:29 AM

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
60 ↗	(On Diff #39757)	KeepFloat does not convey the real meaning. Is is better to call it LegalInHWReg?
68 ↗	(On Diff #39757)	With the handling of CopyFromReg CopyToReg, there does not seem to be a need to special case SELECT, SELECT_CC by briefly looking at the code. In fact, I think we should push the logic all down to the leaf and get rid of return true/false change. It is the change of 'flow' and special handling of operand scanning for f128 that makes the current patch look intrusive.
145 ↗	(On Diff #39757)	Which sub-methods need special handling here?
159 ↗	(On Diff #39757)	Just return SDValue(N, ResNo)?

Rename KeepFloat to LegalInHWReg.
Clarify some comments.
Embed into original switch cases the handle of LegalInHWReg for the soften of opcode Register, CopyFromReg, CopyToReg, ConstantFP, FABS, FCOPYSIGN, and FNEG.

chh marked an inline comment as done.Nov 10 2015, 5:17 PM

chh added inline comments.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
68 ↗	(On Diff #39872)	David, Before my patch, DAGTypeLegalizer::run's loop over results always (1) goto NodeDone with a changed (new) node, or (2) goto ScanOperands without changed node, for the TargetLowering::TypeLegal case. All SoftenFloatRes_* functions made new nodes of integer types. For x86_64 fp128, we really don't want to change the f128 type so that instruction selection can be correct or more efficient based on f128 type instead of i128. For BITCAST, SELECT, and SELECT_CC opcode, we need to do down and check their operands as if f128 is TypeLegal. So there is no simpler way to treat it the same as other SoftFloatRes_* functions. For Register, CopyFromReg, CopyToReg, ConstantFP, FABS, FCOPYSIGN, and FNEG they are simple enough to skip ScanOperands, so we can let them return the same SDNode and pretend that the result node was changed and goto NodeDone. So I simplified all of them into the old switch cases without new extra functions.
154 ↗	(On Diff #39872)	See updated comment. Examples are those calling TLI.makeLibCall.
168 ↗	(On Diff #39872)	Okay, changed to an assert, suppress compiler warning, and simple return.

davidxl added inline comments.Nov 11 2015, 12:25 AM

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
70 ↗	(On Diff #39872)	I got curious why these opcodes need to be skipped and picked ISD::SELECT to look at. Commented line 70 out does trigger error -- but further debugging indicate the bug can be in the original code. In particular, PromoteIntRes_SETCC should call ReplaceValueWith, but does not. Fixing SETCC handling makes the test case TestMax pass. Can you verify this is a bug triggered by f128 change?
158 ↗	(On Diff #39872)	Where is the short-cut? Is it the new default handling in SoftenFloatOperand? Making short cut there and have weak handshaking seems fragile.
752 ↗	(On Diff #39872)	What are the new opcodes that reach here? Please make such opcode explict and add assertions for unexpected ones.
791 ↗	(On Diff #39872)	This seems redundant -- see above early return.

Will upload diff 11 soon with suggested changes.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
70 ↗	(On Diff #39872)	Yes, I can reproduce that error too. I believe that it happens only with my new x86_64 f128 configuration, but I have no way to verify for all targets and illegal types. It's arguably a "bug" as it worked with existing ScanOperands. However, the reason to skip some Opcode is not to workaround the problem of missed calls to ReplaceValueWith. We want to keep some Opcode with f128 type for instruction selection to correct calling convention or more efficient code. When we do not change SDNode for f128 type, we need to tell LegalizeTypes to treat it as a "legal" type and go to ScanOperands, so all other existing cases will work with this unchanged SDNode.
158 ↗	(On Diff #39872)	Yes, in SoftenFloatOperand, I do not try to repeatedly replace a SDNode when its operand type can be kept as float and not changed. Current LegalizeTypes and LegalizeFloatTypes depend on ScanOperands to replace SDNode where DAGTypeLegalizer::Result functions "missed" calls to ReplaceValueWith. That is kind of strange and somewhat inefficient. However, changing that architecture is too big a risk and work for me to fix x86_64 f128 calling convention and make sure the change work for all targets now and in the future. I don't think there are enough regression tests for all target and types that depend on LegalizeTypes. I found all broken cases when trying to compile libm on Android. Hence, I use this safer and "smaller" patch to match existing architecture and limit the impact to only SoftenFloat and LegalInHWReg cases. It could be a separate patch to clean up the complicated loops inside LegalizeTypes, or LegalizeTypes could be rewritten as we know that there are non-HW-supported operations on some types but not really "illegal types".
752 ↗	(On Diff #39872)	Unspecified cases will dump and abort as before.
791 ↗	(On Diff #39872)	Fixed.

Update some comments, remove redundant code, and explicitly list legal
operand types in SoftenFloatOperand.

davidxl added inline comments.Nov 21 2015, 11:54 AM

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
158 ↗	(On Diff #40071)	I don't quite like the short cut here either. The original architected flow is reasonable to me: basically After softenfloat result, the old SDValue is mapped to the new result the worklist based loop follow DU and use instrctions of the old result get processed -- operands are scanned. during scanning, the use of the old value will be replaced with the new value. if scan operands result in a new node, replaceValueWith then needs to be eagerly called here on demand. I prefer you do not change the flow in SoftenFloatOperands and make a patch here to workaround it. The resulting change may look bigger (as more opcodes need to be handled), but it is what it is. I also suggest you extract the following patches out seperately softenFloatResult and softenFloatOperands f128 constantFP register class small refactoring the rest (may also be breakable )

David,

I will upload a new patch with some of your older suggested changes first.
Please see details in the new code diff and upload comment.

I met many difficulties implementing your latest inline comments
in LegalizeFloatTypes.cpp, so there is no change with that idea yet.
Please see my reply to the inline comment.

For separating this large patch, I have not tried yet,
but it might be okay to split at least into two parts:
(2) The new unit test files *.ll and X86CallingConv.td, X86ISelLowering.cpp, X86InstrInfo.td, X86InstrSSE.td, X86RegisterInfo.td.
(1) All the other files.

We should be able to check in (1) and expect no change to all targets.
Patch (2) will fix Android x86_64 problem, but still no change to other targets.
I am assuming that we want some time to test (1) before submitting (2).
Further split of (1) might be possible, but that would take more review
and test cycles if there is little regression. When there is any regression,
it should be easy to revert all changes in (1) and then debug-and-retry, right?
I don't see a way to split (2), as we don't want to have the Android target
in a new incomplete state different from the current one.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
158 ↗	(On Diff #40071)	The difficulties I found in the current architecture are (1) Avoid duplicating a lot of code from SoftenFloatRes_* to SoftenFloatOp_. (2) Make sure the changes in multiple SoftenFloatRes_ and SoftenFloatOp_* will be fixed correctly at the same time in the future. In SoftenFloatOperand, we can see that it calls ReplaceValueWith at the end for all kinds of opcode. The number of calls to ReplaceValueWith is not as many as we expected. Hence, it seems to me the norm is to share code at higher level than pushing down into SoftenFloatOp_* and SoftenFloatRes_. It seems reasonable to call ReplaceValueWith at the end of SoftenFloatOperand and SoftenFloatResult. I tried to remove this part, but the result was either (a) calling ReplaceValuewith inside SoftenFloatRes_, or (b) copying many SoftenFloatRes_* code into SoftenFloatOp_* and then call ReplaceValueWith in SoftenFloatOp_. The copied code is not all simple and small. For example, SoftenFloatRes_FCOPYSIGN is complicated, and I don't see how simpler it is to duplicate it into SoftenFloatOp_FCOPYSIGN, or abstract it out into a shared function. What you mentioned in another email earlier might be better to move some SoftenFloatOp_ responsibility into SoftenFloatRes_. Then, we have less code duplication, and we need to make sure that ReplaceValueWith is called whenever something is changed by SoftenFloatRes_. I could try that when I have more time. My concern (2) is more scaring. During my test of those code movement, I made a few mistakes and the compiler could run and generate inefficient code because some ReplaceValueWith call was missed or some fp128 was not kept in register due to a short-cut was missed. So I would prefer a way easier and smaller change to verify no no-impact to existing targets, by conditioned on isLegalInHWReg, even if some can be applied to other targets.

Push short cuts inside SoftenFloatResult down to SoftenFloatResultOp_* functions,
which now take extra argument ResNo. SoftenFloatResult is cleaner and faster,
SoftenFloatResultOp_* functions have some repeated patterns.

Abstract out the function isLegalInHWReg used in multiple places.
Spell out more short-cut opcode inside SoftenFloatOperand when isLegalInHWReg.
Add assert of isLegalInHWReg after call of SoftenFloatResult when result is not changed.

Split out part 1 in http://reviews.llvm.org/D15134.

Sync with latest llvm source and new changes suggested in http://reviews.llvm.org/D15134.

chh updated this revision to Diff 41694.Dec 2 2015, 4:41 PM

chh mentioned this in D15134: Part 1 to fix x86_64 fp128 calling convention..Dec 3 2015, 9:48 AM

chh updated this revision to Diff 41778.Dec 3 2015, 10:47 AM

Now the diff does not contain changes in submitted part1.

davidxl added inline comments.Dec 5 2015, 6:46 PM

lib/Target/X86/X86ISelLowering.cpp
14628	This needs some explanation. Why can the Op1's value type be i128?
27775	Why TODO here? 'x' constraint should work.
27888	Explain TODO here.
lib/Target/X86/X86InstrInfo.td
957–962	Unrelated format change here.

davidxl added inline comments.Dec 5 2015, 9:32 PM

lib/Target/X86/X86InstrSSE.td
8862	Move the comment above the pattern def. movaps is shorter, not 'should be' regarding 'faster' part -- put a reference there. In fact, f128 operations should be considered in integer domain so movdqa should be used to avoid domain bypass penalty.
8868	pand is for SIMD integer. andps is shorter though.

chh updated this revision to Diff 42090.Dec 7 2015, 12:13 PM

chh added inline comments.

lib/Target/X86/X86ISelLowering.cpp
14628	Removed it. It was an condition only triggered by my older hacks. Not it should not happen.
27775	f128 and i128 were not in SSE_REG before. So I think this is a new feature that can be added later.
27888	Same as above. f128 and i128 not in SSE_REG before this change.
lib/Target/X86/X86InstrInfo.td
957–962	They are changed to align up with new line 962.
lib/Target/X86/X86InstrSSE.td
8862	I updated comment to keep only the shorter and SSE reasons. I was not sure about 'faster with movaps', which seems to be used more in clang than gcc. My main reasons are shorter and available in SSE for Android's applications.
8868	I also choose andps over pand for shorter and availability in SSE, not sure about performance difference when combined with other instructions. I think by not splitting f128 into two registers, we already saved more code and execution time.

davidxl added inline comments.Dec 8 2015, 5:39 PM

lib/Target/X86/X86InstrInfo.td
957–963	It seems you are adding extra space before :
lib/Target/X86/X86InstrSSE.td
8871	Is there any coding style guidelines for table gen code? There are long lines that are wrapped ..
test/CodeGen/X86/fp128-calling-conv.ll
16	Is this a relevant test? non-f128 related tests can be submitted in a different patch.
test/CodeGen/X86/fp128-cast.ll
13	There are also lots of irrelevant tests added in this file .

chh added inline comments.Dec 9 2015, 4:37 PM

lib/Target/X86/X86InstrInfo.td
957–963	Yes, it seems to be the style at line 964-969 too.
lib/Target/X86/X86InstrSSE.td
8871	Note sure if there is special rule for table gen code. http://llvm.org/docs/CodingStandards.html says 80 columns. There are few exceptions in this file, but now I wrapped all my new lines to less than 80 characters.
test/CodeGen/X86/fp128-calling-conv.ll
16	Okay, i will put non-fp128 type tests to another patch.
test/CodeGen/X86/fp128-cast.ll
13	Okay, i will put non-fp128 type tests to another patch.

chh updated this revision to Diff 42354.Dec 9 2015, 4:38 PM

davidxl added inline comments.Dec 10 2015, 11:32 AM

test/CodeGen/X86/fp128-cast.ll
155	Please move this test case to a different patch.
test/CodeGen/X86/fp128-compare.ll
5	Move this test case out of the patch -- -there are more than 20 or so test cases below that need to be separated out.

Will reduce fp128-compare.ll in the next diff.

test/CodeGen/X86/fp128-cast.ll
155	This is about f128 calling convention to return f128 in SSE. Are you sure about removing this one?

davidxl added inline comments.Dec 10 2015, 11:42 AM

test/CodeGen/X86/fp128-cast.ll
155	you are right -- this one is relevant. Please check the rest.

Removed test cases without f128 type.

great. I will look at the tests in more details.

David

davidxl added inline comments.Dec 10 2015, 2:56 PM

test/CodeGen/X86/fp128-calling-conv.ll
5	Is this variable used?
8	Is it used?
11	Are the two parts swapped? GCC seems to generates: 3FFF0000000000000000000000000000
test/CodeGen/X86/fp128-cast.ll
12	TestFPExtF32_F128
25	TestFPExt ..
38	Missing a test for conversion to unsigned I32
126	might be better to relax the check a little : testq %rax, %rax should be fine too.
135	Can you simplify the variable names ?
166	There is no guarantee adcq will be after movq ...
test/CodeGen/X86/fp128-compare.ll
23	missing check of 'set<cc>'
33	is this correct?
45	Missing check
test/CodeGen/X86/fp128-i128.ll
4	Better comment?
92	var names can be cleaned up to be shorter.
125	seems irrelevant.
137	Seems irrelevant.
205	Can this test case be simplified more?

chh marked 7 inline comments as done.Dec 10 2015, 7:42 PM

chh added inline comments.

test/CodeGen/X86/fp128-calling-conv.ll
5	No, Removed now.
8	No. Removed now.
11	That looked strange but correct. I copied that from clang's dump, and the llvm output assembly code is the same as gcc's. http://llvm.org/docs/LangRef.html says big-endian is used for hexadecimal floating point constants.
test/CodeGen/X86/fp128-cast.ll
38	Added conversion to uint32 and uint64.
135	These were generated by clang for my simplified C code from libm. They are useful to show the clang transformations. I will add the C code example as comments.
166	Okay, relaxed the check patterns.
test/CodeGen/X86/fp128-compare.ll
33	Yes. It's a strange optimization, which returns 1 if %cmp is negative as __lttf2 will return when %d1 < %d2.
test/CodeGen/X86/fp128-i128.ll
92	These IL were copied from libm compiled code. Clang has its way to convert C union structure references. I will add original C code as comment.
125	Okay, removed the i64 test.
137	We need to test i128 too, since this patch put also i128 into the FR128 register class. The i128 instruction was generated from f128 C code.
205	This was from libm C code, which triggered one error related to f128 without a complete patch. So I added it. I tried to reduce the original C code but then that won't trigger the problem.

chh updated this revision to Diff 42494.Dec 10 2015, 7:42 PM

chh edited edge metadata.

chh marked an inline comment as done.

davidxl added inline comments.Dec 11 2015, 10:59 AM

test/CodeGen/X86/fp128-cast.ll
137	Why do we care what transformations have been done to get the IR? The IR code should by itself readable -- so while the C example is useful, I still prefer the naming in IR simplified.
test/CodeGen/X86/fp128-compare.ll
35	ok. But it is possible with test + sets, right? may be adding a comment so that people know how to fix the test if it breaks in the future?
test/CodeGen/X86/fp128-i128.ll
8	__float128
24	long double --> __float128?
45	The pattern checked is pretty long -- I worry it may break in the future. Is it possible to relax it some how?
58	long double --> __float128
82	long double --> __float128
107	long double --> __float128

chh added inline comments.Dec 11 2015, 3:38 PM

test/CodeGen/X86/fp128-cast.ll
137	The comment now seems to be showing at wrong place in this code diff. The biggest confusing name is u.sroa.0.4.extract.shift in TestBits128. So I will shorten long names in this function for now. The C code is important for anyone in the future to test, if not having time to rebuild and test the whole AOSP with libm. Any simplification of the IR might work, but clang could generate different bit operators for those long double union types and trigger problems with any ad-hoc f128 optimization.
test/CodeGen/X86/fp128-compare.ll
35	Sure, added comment. If it is broken in the future, maybe it would be easier to continue using this trick. :-)
test/CodeGen/X86/fp128-i128.ll
8	Unfortunately, clang does not accept __float128 keyword, although it can emit f128 for llvm.
24	no __float128 in clang.
45	I tried to reduce the C code, but any reduction won't trigger the complicated IL that reached a point that my partial fix core dumped. Maybe we can take out a few CHECK-NEXT requirements. On the other hard, I was terrified by so many ad-hoc optimizations of floating points for the usage patterns in libm. I guess llvm tried to match or do better then gcc and libm tried to use every possible bit operations. So maybe it is better for Android or anyone depends on f128 type to have more check rules here. Whoever changes float optimization in the future has better fully understand and update these tests.

chh updated this revision to Diff 42595.Dec 11 2015, 3:39 PM

davidxl added inline comments.Dec 11 2015, 3:49 PM

test/CodeGen/X86/fp128-i128.ll
9	This should be fixed in clang FE. By default, long double is extended FP, not quadFP --- so do fix the comment to avoid confusion.

chh updated this revision to Diff 42634.Dec 11 2015, 10:57 PM

chh marked an inline comment as done.

chh added inline comments.

test/CodeGen/X86/fp128-i128.ll
9	Used __float128 here and added more comments at the top of file.

LGTM.

This revision is now accepted and ready to land.Dec 12 2015, 7:04 PM

Fix infinite loop in SelectionDAGBuilder.cpp, caught by new regression tests in this patch.
It happens only with new f128 type, whose VT == TLI.getTypeToTransformTo(Ctx, VT).

David, thanks for the review suggestions and approval.
There is one recent regression caught by two of my unit tests,
so I need to fix it too in the updated diff.

Closed by commit rL255558: [X86] Part 2 to fix x86-64 fp128 calling convention. (authored by chh). · Explain WhyDec 14 2015, 2:11 PM

This revision was automatically updated to reflect the committed changes.

fhahn mentioned this in D29265: [legalize-types] Remove stale entries from SoftenedFloats..Mar 1 2017, 2:43 PM

Revision Contents

Path

Size

	lib/	Target/	X86/
	disk5/	chh/	LLVM/

	X86CallingConv.td/
	svn.fp128/	llvm/	.svn/	pristine/	9f/


	9f349ca326b1cb21838236ad1f8f3bb4139898b1.svn-base 2015-12-02 09:08:49.556486766 -0800

5 lines

	X86ISelLowering.cpp/
	svn.fp128/	llvm/	.svn/	pristine/	3e/


	3e6415acde8a004ee98de35dc7746860c873072a.svn-base 2015-12-04 15:48:28.761277088 -0800

59 lines

	X86InstrCompiler.td/
	svn.fp128/	llvm/	.svn/	pristine/	cb/


	cb1899c62ac6314e071a3eb8235c3fa8821eb05c.svn-base 2015-12-02 09:08:49.576486840 -0800

1 line

	X86InstrInfo.td/
	svn.fp128/	llvm/	.svn/	pristine/	d9/


	d92ea05ade61b3ef7faccbb81ad76126693f08a2.svn-base 2015-12-04 15:48:28.745277024 -0800

11 lines

	X86InstrSSE.td/
	svn.fp128/	llvm/	.svn/	pristine/	bb/


	bb55385e0dea60456a6262bf37c7f606be871893.svn-base 2015-12-02 09:08:49.340485971 -0800

58 lines

	X86RegisterInfo.td/
	svn.fp128/	llvm/	.svn/	pristine/	15/


	152429bfdfa52ba08e4fbccb5da96dd40af5d8a8.svn-base 2015-12-02 09:08:49.580486854 -0800

2 lines

	test/	CodeGen/	X86/
			tmp/

	fp128-calling-conv.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

53 lines

	fp128-cast.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

196 lines

	fp128-compare.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

83 lines

	fp128-i128.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

241 lines

	fp128-libcalls.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

107 lines

	fp128-load.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

35 lines

	fp128-store.ll
	svn-IGVzSF 2015-12-10 12:34:46.174943308 -0800

14 lines

	test/	CodeGen/	X86/	soft-fp.ll/
	disk5/	chh/	LLVM/	svn.fp128/	llvm/	.svn/	pristine/	7b/


	7bf1d1f6cbba55fd279c22066a976c3f8f965292.svn-base 2015-09-17 10:27:23.045394741 -0700

35 lines

	utils/	TableGen/	X86RecognizableInstr.cpp/
	disk5/	chh/	LLVM/	svn.fp128/	llvm/	.svn/	pristine/	d5/


	d5b7d45aee48a1ac6f8ff7de7f5a0dfb7ae79fc3.svn-base 2015-07-28 11:24:33.215253156 -0700

5 lines

Diff 42455

lib/Target/X86/X86CallingConv.td

Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	def RetCC_X86_32_VectorCall : CallingConv<[
CCDelegateTo<RetCC_X86Common>		CCDelegateTo<RetCC_X86Common>
]>;		]>;

// X86-64 C return-value convention.		// X86-64 C return-value convention.
def RetCC_X86_64_C : CallingConv<[		def RetCC_X86_64_C : CallingConv<[
// The X86-64 calling convention always returns FP values in XMM0.		// The X86-64 calling convention always returns FP values in XMM0.
CCIfType<[f32], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[f32], CCAssignToReg<[XMM0, XMM1]>>,
CCIfType<[f64], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[f64], CCAssignToReg<[XMM0, XMM1]>>,
		CCIfType<[f128], CCAssignToReg<[XMM0, XMM1]>>,

// MMX vector types are always returned in XMM0.		// MMX vector types are always returned in XMM0.
CCIfType<[x86mmx], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[x86mmx], CCAssignToReg<[XMM0, XMM1]>>,
CCDelegateTo<RetCC_X86Common>		CCDelegateTo<RetCC_X86Common>
]>;		]>;

// X86-Win64 C return-value convention.		// X86-Win64 C return-value convention.
def RetCC_X86_Win64_C : CallingConv<[		def RetCC_X86_Win64_C : CallingConv<[
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	def CC_X86_64_C : CallingConv<[
CCIfType<[v2i1], CCPromoteToType<v2i64>>,		CCIfType<[v2i1], CCPromoteToType<v2i64>>,
CCIfType<[v4i1], CCPromoteToType<v4i32>>,		CCIfType<[v4i1], CCPromoteToType<v4i32>>,
CCIfType<[v8i1], CCPromoteToType<v8i16>>,		CCIfType<[v8i1], CCPromoteToType<v8i16>>,
CCIfType<[v16i1], CCPromoteToType<v16i8>>,		CCIfType<[v16i1], CCPromoteToType<v16i8>>,
CCIfType<[v32i1], CCPromoteToType<v32i8>>,		CCIfType<[v32i1], CCPromoteToType<v32i8>>,
CCIfType<[v64i1], CCPromoteToType<v64i8>>,		CCIfType<[v64i1], CCPromoteToType<v64i8>>,

// The first 8 FP/Vector arguments are passed in XMM registers.		// The first 8 FP/Vector arguments are passed in XMM registers.
CCIfType<[f32, f64, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],		CCIfType<[f32, f64, f128, v16i8, v8i16, v4i32, v2i64, v4f32, v2f64],
CCIfSubtarget<"hasSSE1()",		CCIfSubtarget<"hasSSE1()",
CCAssignToReg<[XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7]>>>,		CCAssignToReg<[XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7]>>>,

// The first 8 256-bit vector arguments are passed in YMM registers, unless		// The first 8 256-bit vector arguments are passed in YMM registers, unless
// this is a vararg function.		// this is a vararg function.
// FIXME: This isn't precisely correct; the x86-64 ABI document says that		// FIXME: This isn't precisely correct; the x86-64 ABI document says that
// fixed arguments to vararg functions are supposed to be passed in		// fixed arguments to vararg functions are supposed to be passed in
// registers. Actually modeling that would be a lot of work, though.		// registers. Actually modeling that would be a lot of work, though.
CCIfNotVarArg<CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],		CCIfNotVarArg<CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
CCIfSubtarget<"hasFp256()",		CCIfSubtarget<"hasFp256()",
CCAssignToReg<[YMM0, YMM1, YMM2, YMM3,		CCAssignToReg<[YMM0, YMM1, YMM2, YMM3,
YMM4, YMM5, YMM6, YMM7]>>>>,		YMM4, YMM5, YMM6, YMM7]>>>>,

// The first 8 512-bit vector arguments are passed in ZMM registers.		// The first 8 512-bit vector arguments are passed in ZMM registers.
CCIfNotVarArg<CCIfType<[v64i8, v32i16, v16i32, v8i64, v16f32, v8f64],		CCIfNotVarArg<CCIfType<[v64i8, v32i16, v16i32, v8i64, v16f32, v8f64],
CCIfSubtarget<"hasAVX512()",		CCIfSubtarget<"hasAVX512()",
CCAssignToReg<[ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7]>>>>,		CCAssignToReg<[ZMM0, ZMM1, ZMM2, ZMM3, ZMM4, ZMM5, ZMM6, ZMM7]>>>>,

// Integer/FP values get stored in stack slots that are 8 bytes in size and		// Integer/FP values get stored in stack slots that are 8 bytes in size and
// 8-byte aligned if there are no more registers to hold them.		// 8-byte aligned if there are no more registers to hold them.
CCIfType<[i32, i64, f32, f64], CCAssignToStack<8, 8>>,		CCIfType<[i32, i64, f32, f64], CCAssignToStack<8, 8>>,

// Long doubles get stack slots whose size and alignment depends on the		// Long doubles get stack slots whose size and alignment depends on the
// subtarget.		// subtarget.
CCIfType<[f80], CCAssignToStack<0, 0>>,		CCIfType<[f80, f128], CCAssignToStack<0, 0>>,

// Vectors get 16-byte stack slots that are 16-byte aligned.		// Vectors get 16-byte stack slots that are 16-byte aligned.
CCIfType<[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64], CCAssignToStack<16, 16>>,		CCIfType<[v16i8, v8i16, v4i32, v2i64, v4f32, v2f64], CCAssignToStack<16, 16>>,

// 256-bit vectors get 32-byte stack slots that are 32-byte aligned.		// 256-bit vectors get 32-byte stack slots that are 32-byte aligned.
CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],		CCIfType<[v32i8, v16i16, v8i32, v4i64, v8f32, v4f64],
CCAssignToStack<32, 32>>,		CCAssignToStack<32, 32>>,

▲ Show 20 Lines • Show All 518 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	for (auto VT : { MVT::i8, MVT::i16, MVT::i32, MVT::i64 }) {
setOperationAction(ISD::SUBE, VT, Custom);		setOperationAction(ISD::SUBE, VT, Custom);
}		}

setOperationAction(ISD::BR_JT , MVT::Other, Expand);		setOperationAction(ISD::BR_JT , MVT::Other, Expand);
setOperationAction(ISD::BRCOND , MVT::Other, Custom);		setOperationAction(ISD::BRCOND , MVT::Other, Custom);
setOperationAction(ISD::BR_CC , MVT::f32, Expand);		setOperationAction(ISD::BR_CC , MVT::f32, Expand);
setOperationAction(ISD::BR_CC , MVT::f64, Expand);		setOperationAction(ISD::BR_CC , MVT::f64, Expand);
setOperationAction(ISD::BR_CC , MVT::f80, Expand);		setOperationAction(ISD::BR_CC , MVT::f80, Expand);
		setOperationAction(ISD::BR_CC , MVT::f128, Expand);
setOperationAction(ISD::BR_CC , MVT::i8, Expand);		setOperationAction(ISD::BR_CC , MVT::i8, Expand);
setOperationAction(ISD::BR_CC , MVT::i16, Expand);		setOperationAction(ISD::BR_CC , MVT::i16, Expand);
setOperationAction(ISD::BR_CC , MVT::i32, Expand);		setOperationAction(ISD::BR_CC , MVT::i32, Expand);
setOperationAction(ISD::BR_CC , MVT::i64, Expand);		setOperationAction(ISD::BR_CC , MVT::i64, Expand);
setOperationAction(ISD::SELECT_CC , MVT::f32, Expand);		setOperationAction(ISD::SELECT_CC , MVT::f32, Expand);
setOperationAction(ISD::SELECT_CC , MVT::f64, Expand);		setOperationAction(ISD::SELECT_CC , MVT::f64, Expand);
setOperationAction(ISD::SELECT_CC , MVT::f80, Expand);		setOperationAction(ISD::SELECT_CC , MVT::f80, Expand);
		setOperationAction(ISD::SELECT_CC , MVT::f128, Expand);
setOperationAction(ISD::SELECT_CC , MVT::i8, Expand);		setOperationAction(ISD::SELECT_CC , MVT::i8, Expand);
setOperationAction(ISD::SELECT_CC , MVT::i16, Expand);		setOperationAction(ISD::SELECT_CC , MVT::i16, Expand);
setOperationAction(ISD::SELECT_CC , MVT::i32, Expand);		setOperationAction(ISD::SELECT_CC , MVT::i32, Expand);
setOperationAction(ISD::SELECT_CC , MVT::i64, Expand);		setOperationAction(ISD::SELECT_CC , MVT::i64, Expand);
if (Subtarget->is64Bit())		if (Subtarget->is64Bit())
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Legal);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i32, Legal);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16 , Legal);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i16 , Legal);
setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8 , Legal);		setOperationAction(ISD::SIGN_EXTEND_INREG, MVT::i8 , Legal);
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	X86TargetLowering::X86TargetLowering(const X86TargetMachine &TM,
setOperationAction(ISD::SELECT , MVT::i1 , Promote);		setOperationAction(ISD::SELECT , MVT::i1 , Promote);
// X86 wants to expand cmov itself.		// X86 wants to expand cmov itself.
setOperationAction(ISD::SELECT , MVT::i8 , Custom);		setOperationAction(ISD::SELECT , MVT::i8 , Custom);
setOperationAction(ISD::SELECT , MVT::i16 , Custom);		setOperationAction(ISD::SELECT , MVT::i16 , Custom);
setOperationAction(ISD::SELECT , MVT::i32 , Custom);		setOperationAction(ISD::SELECT , MVT::i32 , Custom);
setOperationAction(ISD::SELECT , MVT::f32 , Custom);		setOperationAction(ISD::SELECT , MVT::f32 , Custom);
setOperationAction(ISD::SELECT , MVT::f64 , Custom);		setOperationAction(ISD::SELECT , MVT::f64 , Custom);
setOperationAction(ISD::SELECT , MVT::f80 , Custom);		setOperationAction(ISD::SELECT , MVT::f80 , Custom);
		setOperationAction(ISD::SELECT , MVT::f128 , Custom);
setOperationAction(ISD::SETCC , MVT::i8 , Custom);		setOperationAction(ISD::SETCC , MVT::i8 , Custom);
setOperationAction(ISD::SETCC , MVT::i16 , Custom);		setOperationAction(ISD::SETCC , MVT::i16 , Custom);
setOperationAction(ISD::SETCC , MVT::i32 , Custom);		setOperationAction(ISD::SETCC , MVT::i32 , Custom);
setOperationAction(ISD::SETCC , MVT::f32 , Custom);		setOperationAction(ISD::SETCC , MVT::f32 , Custom);
setOperationAction(ISD::SETCC , MVT::f64 , Custom);		setOperationAction(ISD::SETCC , MVT::f64 , Custom);
setOperationAction(ISD::SETCC , MVT::f80 , Custom);		setOperationAction(ISD::SETCC , MVT::f80 , Custom);
		setOperationAction(ISD::SETCC , MVT::f128 , Custom);
setOperationAction(ISD::SETCCE , MVT::i8 , Custom);		setOperationAction(ISD::SETCCE , MVT::i8 , Custom);
setOperationAction(ISD::SETCCE , MVT::i16 , Custom);		setOperationAction(ISD::SETCCE , MVT::i16 , Custom);
setOperationAction(ISD::SETCCE , MVT::i32 , Custom);		setOperationAction(ISD::SETCCE , MVT::i32 , Custom);
if (Subtarget->is64Bit()) {		if (Subtarget->is64Bit()) {
setOperationAction(ISD::SELECT , MVT::i64 , Custom);		setOperationAction(ISD::SELECT , MVT::i64 , Custom);
setOperationAction(ISD::SETCC , MVT::i64 , Custom);		setOperationAction(ISD::SETCC , MVT::i64 , Custom);
setOperationAction(ISD::SETCCE , MVT::i64 , Custom);		setOperationAction(ISD::SETCCE , MVT::i64 , Custom);
}		}
▲ Show 20 Lines • Show All 182 Lines • ▼ Show 20 Lines	if (!Subtarget->useSoftFloat() && X86ScalarSSEf64) {
addLegalFPImmediate(APFloat(-0.0f)); // FLD0/FCHS		addLegalFPImmediate(APFloat(-0.0f)); // FLD0/FCHS
addLegalFPImmediate(APFloat(-1.0f)); // FLD1/FCHS		addLegalFPImmediate(APFloat(-1.0f)); // FLD1/FCHS
}		}

// We don't support FMA.		// We don't support FMA.
setOperationAction(ISD::FMA, MVT::f64, Expand);		setOperationAction(ISD::FMA, MVT::f64, Expand);
setOperationAction(ISD::FMA, MVT::f32, Expand);		setOperationAction(ISD::FMA, MVT::f32, Expand);

// Long double always uses X87.		// Long double always uses X87, except f128 in MMX.
if (!Subtarget->useSoftFloat()) {		if (!Subtarget->useSoftFloat()) {
		if (Subtarget->is64Bit() && Subtarget->hasMMX()) {
		addRegisterClass(MVT::f128, &X86::FR128RegClass);
		ValueTypeActions.setTypeAction(MVT::f128, TypeSoftenFloat);
		setOperationAction(ISD::FABS , MVT::f128, Custom);
		setOperationAction(ISD::FNEG , MVT::f128, Custom);
		setOperationAction(ISD::FCOPYSIGN, MVT::f128, Custom);
		}

addRegisterClass(MVT::f80, &X86::RFP80RegClass);		addRegisterClass(MVT::f80, &X86::RFP80RegClass);
setOperationAction(ISD::UNDEF, MVT::f80, Expand);		setOperationAction(ISD::UNDEF, MVT::f80, Expand);
setOperationAction(ISD::FCOPYSIGN, MVT::f80, Expand);		setOperationAction(ISD::FCOPYSIGN, MVT::f80, Expand);
{		{
APFloat TmpFlt = APFloat::getZero(APFloat::x87DoubleExtended);		APFloat TmpFlt = APFloat::getZero(APFloat::x87DoubleExtended);
addLegalFPImmediate(TmpFlt); // FLD0		addLegalFPImmediate(TmpFlt); // FLD0
TmpFlt.changeSign();		TmpFlt.changeSign();
addLegalFPImmediate(TmpFlt); // FLD0/FCHS		addLegalFPImmediate(TmpFlt); // FLD0/FCHS
▲ Show 20 Lines • Show All 1,716 Lines • ▼ Show 20 Lines	X86TargetLowering::LowerCallResult(SDValue Chain, SDValue InFlag,
CCInfo.AnalyzeCallResult(Ins, RetCC_X86);		CCInfo.AnalyzeCallResult(Ins, RetCC_X86);

// Copy all of the result registers out of their specified physreg.		// Copy all of the result registers out of their specified physreg.
for (unsigned i = 0, e = RVLocs.size(); i != e; ++i) {		for (unsigned i = 0, e = RVLocs.size(); i != e; ++i) {
CCValAssign &VA = RVLocs[i];		CCValAssign &VA = RVLocs[i];
EVT CopyVT = VA.getLocVT();		EVT CopyVT = VA.getLocVT();

// If this is x86-64, and we disabled SSE, we can't return FP values		// If this is x86-64, and we disabled SSE, we can't return FP values
if ((CopyVT == MVT::f32 \|\| CopyVT == MVT::f64) &&		if ((CopyVT == MVT::f32 \|\| CopyVT == MVT::f64 \|\| CopyVT == MVT::f128) &&
((Is64Bit \|\| Ins[i].Flags.isInReg()) && !Subtarget->hasSSE1())) {		((Is64Bit \|\| Ins[i].Flags.isInReg()) && !Subtarget->hasSSE1())) {
report_fatal_error("SSE register return with SSE disabled");		report_fatal_error("SSE register return with SSE disabled");
}		}

// If we prefer to use the value in xmm registers, copy it out as f80 and		// If we prefer to use the value in xmm registers, copy it out as f80 and
// use a truncate to move it from fp stack reg to xmm reg.		// use a truncate to move it from fp stack reg to xmm reg.
bool RoundAfterCopy = false;		bool RoundAfterCopy = false;
if ((VA.getLocReg() == X86::FP0 \|\| VA.getLocReg() == X86::FP1) &&		if ((VA.getLocReg() == X86::FP0 \|\| VA.getLocReg() == X86::FP1) &&
▲ Show 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	if (VA.isRegLoc()) {
if (RegVT == MVT::i32)		if (RegVT == MVT::i32)
RC = &X86::GR32RegClass;		RC = &X86::GR32RegClass;
else if (Is64Bit && RegVT == MVT::i64)		else if (Is64Bit && RegVT == MVT::i64)
RC = &X86::GR64RegClass;		RC = &X86::GR64RegClass;
else if (RegVT == MVT::f32)		else if (RegVT == MVT::f32)
RC = &X86::FR32RegClass;		RC = &X86::FR32RegClass;
else if (RegVT == MVT::f64)		else if (RegVT == MVT::f64)
RC = &X86::FR64RegClass;		RC = &X86::FR64RegClass;
		else if (RegVT == MVT::f128)
		RC = &X86::FR128RegClass;
else if (RegVT.is512BitVector())		else if (RegVT.is512BitVector())
RC = &X86::VR512RegClass;		RC = &X86::VR512RegClass;
else if (RegVT.is256BitVector())		else if (RegVT.is256BitVector())
RC = &X86::VR256RegClass;		RC = &X86::VR256RegClass;
else if (RegVT.is128BitVector())		else if (RegVT.is128BitVector())
RC = &X86::VR128RegClass;		RC = &X86::VR128RegClass;
else if (RegVT == MVT::x86mmx)		else if (RegVT == MVT::x86mmx)
RC = &X86::VR64RegClass;		RC = &X86::VR64RegClass;
▲ Show 20 Lines • Show All 10,732 Lines • ▼ Show 20 Lines	static SDValue LowerFABSorFNEG(SDValue Op, SelectionDAG &DAG) {
if (IsFABS)		if (IsFABS)
for (SDNode *User : Op->uses())		for (SDNode *User : Op->uses())
if (User->getOpcode() == ISD::FNEG)		if (User->getOpcode() == ISD::FNEG)
return Op;		return Op;

SDLoc dl(Op);		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

		bool IsF128 = (VT == MVT::f128);

// FIXME: Use function attribute "OptimizeForSize" and/or CodeGenOpt::Level to		// FIXME: Use function attribute "OptimizeForSize" and/or CodeGenOpt::Level to
// decide if we should generate a 16-byte constant mask when we only need 4 or		// decide if we should generate a 16-byte constant mask when we only need 4 or
// 8 bytes for the scalar case.		// 8 bytes for the scalar case.

MVT LogicVT;		MVT LogicVT;
MVT EltVT;		MVT EltVT;
unsigned NumElts;		unsigned NumElts;

if (VT.isVector()) {		if (VT.isVector()) {
LogicVT = VT;		LogicVT = VT;
EltVT = VT.getVectorElementType();		EltVT = VT.getVectorElementType();
NumElts = VT.getVectorNumElements();		NumElts = VT.getVectorNumElements();
		} else if (IsF128) {
		// SSE instructions are used for optimized f128 logical operations.
		LogicVT = MVT::f128;
		EltVT = VT;
		NumElts = 1;
} else {		} else {
// There are no scalar bitwise logical SSE/AVX instructions, so we		// There are no scalar bitwise logical SSE/AVX instructions, so we
// generate a 16-byte vector constant and logic op even for the scalar case.		// generate a 16-byte vector constant and logic op even for the scalar case.
// Using a 16-byte mask allows folding the load of the mask with		// Using a 16-byte mask allows folding the load of the mask with
// the logic op, so it can save (~4 bytes) on code size.		// the logic op, so it can save (~4 bytes) on code size.
LogicVT = (VT == MVT::f64) ? MVT::v2f64 : MVT::v4f32;		LogicVT = (VT == MVT::f64) ? MVT::v2f64 : MVT::v4f32;
EltVT = VT;		EltVT = VT;
NumElts = (VT == MVT::f64) ? 2 : 4;		NumElts = (VT == MVT::f64) ? 2 : 4;
Show All 15 Lines	SDValue Mask =
false, false, false, Alignment);		false, false, false, Alignment);

SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
bool IsFNABS = !IsFABS && (Op0.getOpcode() == ISD::FABS);		bool IsFNABS = !IsFABS && (Op0.getOpcode() == ISD::FABS);
unsigned LogicOp =		unsigned LogicOp =
IsFABS ? X86ISD::FAND : IsFNABS ? X86ISD::FOR : X86ISD::FXOR;		IsFABS ? X86ISD::FAND : IsFNABS ? X86ISD::FOR : X86ISD::FXOR;
SDValue Operand = IsFNABS ? Op0.getOperand(0) : Op0;		SDValue Operand = IsFNABS ? Op0.getOperand(0) : Op0;

if (VT.isVector())		if (VT.isVector() \|\| IsF128)
return DAG.getNode(LogicOp, dl, LogicVT, Operand, Mask);		return DAG.getNode(LogicOp, dl, LogicVT, Operand, Mask);

// For the scalar case extend to a 128-bit vector, perform the logic op,		// For the scalar case extend to a 128-bit vector, perform the logic op,
// and extract the scalar result back out.		// and extract the scalar result back out.
Operand = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Operand);		Operand = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Operand);
SDValue LogicNode = DAG.getNode(LogicOp, dl, LogicVT, Operand, Mask);		SDValue LogicNode = DAG.getNode(LogicOp, dl, LogicVT, Operand, Mask);
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, LogicNode,		return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, VT, LogicNode,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
}		}

static SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerFCOPYSIGN(SDValue Op, SelectionDAG &DAG) {
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
LLVMContext *Context = DAG.getContext();		LLVMContext *Context = DAG.getContext();
SDValue Op0 = Op.getOperand(0);		SDValue Op0 = Op.getOperand(0);
SDValue Op1 = Op.getOperand(1);		SDValue Op1 = Op.getOperand(1);
SDLoc dl(Op);		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();
MVT SrcVT = Op1.getSimpleValueType();		MVT SrcVT = Op1.getSimpleValueType();
		bool IsF128 = (VT == MVT::f128);

// If second operand is smaller, extend it first.		// If second operand is smaller, extend it first.
if (SrcVT.bitsLT(VT)) {		if (SrcVT.bitsLT(VT)) {
Op1 = DAG.getNode(ISD::FP_EXTEND, dl, VT, Op1);		Op1 = DAG.getNode(ISD::FP_EXTEND, dl, VT, Op1);
SrcVT = VT;		SrcVT = VT;
}		}
// And if it is bigger, shrink it first.		// And if it is bigger, shrink it first.
if (SrcVT.bitsGT(VT)) {		if (SrcVT.bitsGT(VT)) {
Op1 = DAG.getNode(ISD::FP_ROUND, dl, VT, Op1, DAG.getIntPtrConstant(1, dl));		Op1 = DAG.getNode(ISD::FP_ROUND, dl, VT, Op1, DAG.getIntPtrConstant(1, dl));
SrcVT = VT;		SrcVT = VT;
}		}

// At this point the operands and the result should have the same		// At this point the operands and the result should have the same
// type, and that won't be f80 since that is not custom lowered.		// type, and that won't be f80 since that is not custom lowered.
		assert((VT == MVT::f64 \|\| VT == MVT::f32 \|\| IsF128) &&
		"Unexpected type in LowerFCOPYSIGN");

const fltSemantics &Sem =		const fltSemantics &Sem =
VT == MVT::f64 ? APFloat::IEEEdouble : APFloat::IEEEsingle;		VT == MVT::f64 ? APFloat::IEEEdouble :
		(IsF128 ? APFloat::IEEEquad : APFloat::IEEEsingle);
const unsigned SizeInBits = VT.getSizeInBits();		const unsigned SizeInBits = VT.getSizeInBits();

SmallVector<Constant *, 4> CV(		SmallVector<Constant *, 4> CV(
VT == MVT::f64 ? 2 : 4,		VT == MVT::f64 ? 2 : (IsF128 ? 1 : 4),
ConstantFP::get(*Context, APFloat(Sem, APInt(SizeInBits, 0))));		ConstantFP::get(*Context, APFloat(Sem, APInt(SizeInBits, 0))));

// First, clear all bits but the sign bit from the second operand (sign).		// First, clear all bits but the sign bit from the second operand (sign).
CV[0] = ConstantFP::get(*Context,		CV[0] = ConstantFP::get(*Context,
APFloat(Sem, APInt::getHighBitsSet(SizeInBits, 1)));		APFloat(Sem, APInt::getHighBitsSet(SizeInBits, 1)));
Constant *C = ConstantVector::get(CV);		Constant *C = ConstantVector::get(CV);
auto PtrVT = TLI.getPointerTy(DAG.getDataLayout());		auto PtrVT = TLI.getPointerTy(DAG.getDataLayout());
SDValue CPIdx = DAG.getConstantPool(C, PtrVT, 16);		SDValue CPIdx = DAG.getConstantPool(C, PtrVT, 16);

// Perform all logic operations as 16-byte vectors because there are no		// Perform all logic operations as 16-byte vectors because there are no
// scalar FP logic instructions in SSE. This allows load folding of the		// scalar FP logic instructions in SSE. This allows load folding of the
// constants into the logic instructions.		// constants into the logic instructions.
MVT LogicVT = (VT == MVT::f64) ? MVT::v2f64 : MVT::v4f32;		MVT LogicVT = (VT == MVT::f64) ? MVT::v2f64 : (IsF128 ? MVT::f128 : MVT::v4f32);
SDValue Mask1 =		SDValue Mask1 =
DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,		DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),
false, false, false, 16);		false, false, false, 16);
		if (!IsF128)
Op1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Op1);		Op1 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Op1);
SDValue SignBit = DAG.getNode(X86ISD::FAND, dl, LogicVT, Op1, Mask1);		SDValue SignBit = DAG.getNode(X86ISD::FAND, dl, LogicVT, Op1, Mask1);

// Next, clear the sign bit from the first operand (magnitude).		// Next, clear the sign bit from the first operand (magnitude).
// If it's a constant, we can clear it here.		// If it's a constant, we can clear it here.
if (ConstantFPSDNode *Op0CN = dyn_cast<ConstantFPSDNode>(Op0)) {		if (ConstantFPSDNode *Op0CN = dyn_cast<ConstantFPSDNode>(Op0)) {
APFloat APF = Op0CN->getValueAPF();		APFloat APF = Op0CN->getValueAPF();
// If the magnitude is a positive zero, the sign bit alone is enough.		// If the magnitude is a positive zero, the sign bit alone is enough.
if (APF.isPosZero())		if (APF.isPosZero())
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, SrcVT, SignBit,		return IsF128 ? SignBit :
		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, SrcVT, SignBit,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
APF.clearSign();		APF.clearSign();
CV[0] = ConstantFP::get(*Context, APF);		CV[0] = ConstantFP::get(*Context, APF);
} else {		} else {
CV[0] = ConstantFP::get(		CV[0] = ConstantFP::get(
*Context,		*Context,
APFloat(Sem, APInt::getLowBitsSet(SizeInBits, SizeInBits - 1)));		APFloat(Sem, APInt::getLowBitsSet(SizeInBits, SizeInBits - 1)));
}		}
C = ConstantVector::get(CV);		C = ConstantVector::get(CV);
CPIdx = DAG.getConstantPool(C, PtrVT, 16);		CPIdx = DAG.getConstantPool(C, PtrVT, 16);
SDValue Val =		SDValue Val =
DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,		DAG.getLoad(LogicVT, dl, DAG.getEntryNode(), CPIdx,
MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),		MachinePointerInfo::getConstantPool(DAG.getMachineFunction()),
false, false, false, 16);		false, false, false, 16);
// If the magnitude operand wasn't a constant, we need to AND out the sign.		// If the magnitude operand wasn't a constant, we need to AND out the sign.
if (!isa<ConstantFPSDNode>(Op0)) {		if (!isa<ConstantFPSDNode>(Op0)) {
		if (!IsF128)
Op0 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Op0);		Op0 = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, LogicVT, Op0);
Val = DAG.getNode(X86ISD::FAND, dl, LogicVT, Op0, Val);		Val = DAG.getNode(X86ISD::FAND, dl, LogicVT, Op0, Val);
}		}
// OR the magnitude value with the sign bit.		// OR the magnitude value with the sign bit.
Val = DAG.getNode(X86ISD::FOR, dl, LogicVT, Val, SignBit);		Val = DAG.getNode(X86ISD::FOR, dl, LogicVT, Val, SignBit);
return DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, SrcVT, Val,		return IsF128 ? Val :
		DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, SrcVT, Val,
DAG.getIntPtrConstant(0, dl));		DAG.getIntPtrConstant(0, dl));
}		}

static SDValue LowerFGETSIGN(SDValue Op, SelectionDAG &DAG) {		static SDValue LowerFGETSIGN(SDValue Op, SelectionDAG &DAG) {
SDValue N0 = Op.getOperand(0);		SDValue N0 = Op.getOperand(0);
SDLoc dl(Op);		SDLoc dl(Op);
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

// Lower ISD::FGETSIGN to (AND (X86ISD::FGETSIGNx86 ...) 1).		// Lower ISD::FGETSIGN to (AND (X86ISD::FGETSIGNx86 ...) 1).
▲ Show 20 Lines • Show All 1,062 Lines • ▼ Show 20 Lines	if (SDValue NewSetCC = LowerToBT(Op0, CC, dl, DAG)) {
return DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, NewSetCC);		return DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, NewSetCC);
return NewSetCC;		return NewSetCC;
}		}
}		}

// Look for X == 0, X == 1, X != 0, or X != 1. We can simplify some forms of		// Look for X == 0, X == 1, X != 0, or X != 1. We can simplify some forms of
// these.		// these.
if ((isOneConstant(Op1) \|\| isNullConstant(Op1)) &&		if ((isOneConstant(Op1) \|\| isNullConstant(Op1)) &&
(CC == ISD::SETEQ \|\| CC == ISD::SETNE)) {		(CC == ISD::SETEQ \|\| CC == ISD::SETNE)) {
		davidxlUnsubmitted Not Done Reply Inline Actions This needs some explanation. Why can the Op1's value type be i128? davidxl: This needs some explanation. Why can the Op1's value type be i128?
		chhAuthorUnsubmitted Not Done Reply Inline Actions Removed it. It was an condition only triggered by my older hacks. Not it should not happen. chh: Removed it. It was an condition only triggered by my older hacks. Not it should not happen.

// If the input is a setcc, then reuse the input setcc or use a new one with		// If the input is a setcc, then reuse the input setcc or use a new one with
// the inverted condition.		// the inverted condition.
if (Op0.getOpcode() == X86ISD::SETCC) {		if (Op0.getOpcode() == X86ISD::SETCC) {
X86::CondCode CCode = (X86::CondCode)Op0.getConstantOperandVal(0);		X86::CondCode CCode = (X86::CondCode)Op0.getConstantOperandVal(0);
bool Invert = (CC == ISD::SETNE) ^ isNullConstant(Op1);		bool Invert = (CC == ISD::SETNE) ^ isNullConstant(Op1);
if (!Invert)		if (!Invert)
return Op0;		return Op0;
▲ Show 20 Lines • Show All 7,377 Lines • ▼ Show 20 Lines	X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
case X86::SEG_ALLOCA_32:		case X86::SEG_ALLOCA_32:
case X86::SEG_ALLOCA_64:		case X86::SEG_ALLOCA_64:
return EmitLoweredSegAlloca(MI, BB);		return EmitLoweredSegAlloca(MI, BB);
case X86::TLSCall_32:		case X86::TLSCall_32:
case X86::TLSCall_64:		case X86::TLSCall_64:
return EmitLoweredTLSCall(MI, BB);		return EmitLoweredTLSCall(MI, BB);
case X86::CMOV_FR32:		case X86::CMOV_FR32:
case X86::CMOV_FR64:		case X86::CMOV_FR64:
		case X86::CMOV_FR128:
case X86::CMOV_GR8:		case X86::CMOV_GR8:
case X86::CMOV_GR16:		case X86::CMOV_GR16:
case X86::CMOV_GR32:		case X86::CMOV_GR32:
case X86::CMOV_RFP32:		case X86::CMOV_RFP32:
case X86::CMOV_RFP64:		case X86::CMOV_RFP64:
case X86::CMOV_RFP80:		case X86::CMOV_RFP80:
case X86::CMOV_V2F64:		case X86::CMOV_V2F64:
case X86::CMOV_V2I64:		case X86::CMOV_V2I64:
▲ Show 20 Lines • Show All 1,647 Lines • ▼ Show 20 Lines	static SDValue PerformSELECTCombine(SDNode *N, SelectionDAG &DAG,
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

// If we have SSE[12] support, try to form min/max nodes. SSE min/max		// If we have SSE[12] support, try to form min/max nodes. SSE min/max
// instructions match the semantics of the common C idiom x<y?x:y but not		// instructions match the semantics of the common C idiom x<y?x:y but not
// x<=y?x:y, because of how they handle negative zero (which can be		// x<=y?x:y, because of how they handle negative zero (which can be
// ignored in unsafe-math mode).		// ignored in unsafe-math mode).
// We also try to create v2f32 min/max nodes, which we later widen to v4f32.		// We also try to create v2f32 min/max nodes, which we later widen to v4f32.
if (Cond.getOpcode() == ISD::SETCC && VT.isFloatingPoint() &&		if (Cond.getOpcode() == ISD::SETCC && VT.isFloatingPoint() &&
VT != MVT::f80 && (TLI.isTypeLegal(VT) \|\| VT == MVT::v2f32) &&		VT != MVT::f80 && VT != MVT::f128 &&
		(TLI.isTypeLegal(VT) \|\| VT == MVT::v2f32) &&
(Subtarget->hasSSE2() \|\|		(Subtarget->hasSSE2() \|\|
(Subtarget->hasSSE1() && VT.getScalarType() == MVT::f32))) {		(Subtarget->hasSSE1() && VT.getScalarType() == MVT::f32))) {
ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(Cond.getOperand(2))->get();

unsigned Opcode = 0;		unsigned Opcode = 0;
// Check for x CC y ? x : y.		// Check for x CC y ? x : y.
if (DAG.isEqualTo(LHS, Cond.getOperand(0)) &&		if (DAG.isEqualTo(LHS, Cond.getOperand(0)) &&
DAG.isEqualTo(RHS, Cond.getOperand(1))) {		DAG.isEqualTo(RHS, Cond.getOperand(1))) {
▲ Show 20 Lines • Show All 4,071 Lines • ▼ Show 20 Lines	case 'x': // SSE_REGS if SSE1 allowed or AVX_REGS if AVX allowed
default: break;		default: break;
// Scalar SSE types.		// Scalar SSE types.
case MVT::f32:		case MVT::f32:
case MVT::i32:		case MVT::i32:
return std::make_pair(0U, &X86::FR32RegClass);		return std::make_pair(0U, &X86::FR32RegClass);
case MVT::f64:		case MVT::f64:
case MVT::i64:		case MVT::i64:
return std::make_pair(0U, &X86::FR64RegClass);		return std::make_pair(0U, &X86::FR64RegClass);
		// TODO: Handle f128 and i128 in FR128RegClass after it is tested well.
		davidxlUnsubmitted Not Done Reply Inline Actions Why TODO here? 'x' constraint should work. davidxl: Why TODO here? 'x' constraint should work.
		chhAuthorUnsubmitted Not Done Reply Inline Actions f128 and i128 were not in SSE_REG before. So I think this is a new feature that can be added later. chh: f128 and i128 were not in SSE_REG before. So I think this is a new feature that can be added…
// Vector types.		// Vector types.
case MVT::v16i8:		case MVT::v16i8:
case MVT::v8i16:		case MVT::v8i16:
case MVT::v4i32:		case MVT::v4i32:
case MVT::v2i64:		case MVT::v2i64:
case MVT::v4f32:		case MVT::v4f32:
case MVT::v2f64:		case MVT::v2f64:
return std::make_pair(0U, &X86::VR128RegClass);		return std::make_pair(0U, &X86::VR128RegClass);
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	if (DestReg > 0) {
Class == &X86::FR32XRegClass \|\| Class == &X86::FR64XRegClass \|\|		Class == &X86::FR32XRegClass \|\| Class == &X86::FR64XRegClass \|\|
Class == &X86::VR128XRegClass \|\| Class == &X86::VR256XRegClass \|\|		Class == &X86::VR128XRegClass \|\| Class == &X86::VR256XRegClass \|\|
Class == &X86::VR512RegClass) {		Class == &X86::VR512RegClass) {
// Handle references to XMM physical registers that got mapped into the		// Handle references to XMM physical registers that got mapped into the
// wrong class. This can happen with constraints like {xmm0} where the		// wrong class. This can happen with constraints like {xmm0} where the
// target independent register mapper will just pick the first match it can		// target independent register mapper will just pick the first match it can
// find, ignoring the required type.		// find, ignoring the required type.

		// TODO: Handle f128 and i128 in FR128RegClass after it is tested well.
		davidxlUnsubmitted Not Done Reply Inline Actions Explain TODO here. davidxl: Explain TODO here.
		chhAuthorUnsubmitted Not Done Reply Inline Actions Same as above. f128 and i128 not in SSE_REG before this change. chh: Same as above. f128 and i128 not in SSE_REG before this change.
if (VT == MVT::f32 \|\| VT == MVT::i32)		if (VT == MVT::f32 \|\| VT == MVT::i32)
Res.second = &X86::FR32RegClass;		Res.second = &X86::FR32RegClass;
else if (VT == MVT::f64 \|\| VT == MVT::i64)		else if (VT == MVT::f64 \|\| VT == MVT::i64)
Res.second = &X86::FR64RegClass;		Res.second = &X86::FR64RegClass;
else if (X86::VR128RegClass.hasType(VT))		else if (X86::VR128RegClass.hasType(VT))
Res.second = &X86::VR128RegClass;		Res.second = &X86::VR128RegClass;
else if (X86::VR256RegClass.hasType(VT))		else if (X86::VR256RegClass.hasType(VT))
Res.second = &X86::VR256RegClass;		Res.second = &X86::VR256RegClass;
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrCompiler.td

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	let usesCustomInserter = 1, Uses = [EFLAGS] in {

let Predicates = [FPStackf64] in		let Predicates = [FPStackf64] in
defm _RFP64 : CMOVrr_PSEUDO<RFP64, f64>;		defm _RFP64 : CMOVrr_PSEUDO<RFP64, f64>;

defm _RFP80 : CMOVrr_PSEUDO<RFP80, f80>;		defm _RFP80 : CMOVrr_PSEUDO<RFP80, f80>;

defm _FR32 : CMOVrr_PSEUDO<FR32, f32>;		defm _FR32 : CMOVrr_PSEUDO<FR32, f32>;
defm _FR64 : CMOVrr_PSEUDO<FR64, f64>;		defm _FR64 : CMOVrr_PSEUDO<FR64, f64>;
		defm _FR128 : CMOVrr_PSEUDO<FR128, f128>;
defm _V4F32 : CMOVrr_PSEUDO<VR128, v4f32>;		defm _V4F32 : CMOVrr_PSEUDO<VR128, v4f32>;
defm _V2F64 : CMOVrr_PSEUDO<VR128, v2f64>;		defm _V2F64 : CMOVrr_PSEUDO<VR128, v2f64>;
defm _V2I64 : CMOVrr_PSEUDO<VR128, v2i64>;		defm _V2I64 : CMOVrr_PSEUDO<VR128, v2i64>;
defm _V8F32 : CMOVrr_PSEUDO<VR256, v8f32>;		defm _V8F32 : CMOVrr_PSEUDO<VR256, v8f32>;
defm _V4F64 : CMOVrr_PSEUDO<VR256, v4f64>;		defm _V4F64 : CMOVrr_PSEUDO<VR256, v4f64>;
defm _V4I64 : CMOVrr_PSEUDO<VR256, v4i64>;		defm _V4I64 : CMOVrr_PSEUDO<VR256, v4i64>;
defm _V8I64 : CMOVrr_PSEUDO<VR512, v8i64>;		defm _V8I64 : CMOVrr_PSEUDO<VR512, v8i64>;
defm _V8F64 : CMOVrr_PSEUDO<VR512, v8f64>;		defm _V8F64 : CMOVrr_PSEUDO<VR512, v8f64>;
▲ Show 20 Lines • Show All 1,326 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.td

Show First 20 Lines • Show All 948 Lines • ▼ Show 20 Lines	def loadi32 : PatFrag<(ops node:$ptr), (i32 (unindexedload node:$ptr)), [{
ISD::LoadExtType ExtType = LD->getExtensionType();		ISD::LoadExtType ExtType = LD->getExtensionType();
if (ExtType == ISD::NON_EXTLOAD)		if (ExtType == ISD::NON_EXTLOAD)
return true;		return true;
if (ExtType == ISD::EXTLOAD)		if (ExtType == ISD::EXTLOAD)
return LD->getAlignment() >= 4 && !LD->isVolatile();		return LD->getAlignment() >= 4 && !LD->isVolatile();
return false;		return false;
}]>;		}]>;

def loadi8 : PatFrag<(ops node:$ptr), (i8 (load node:$ptr))>;		def loadi8 : PatFrag<(ops node:$ptr), (i8 (load node:$ptr))>;
def loadi64 : PatFrag<(ops node:$ptr), (i64 (load node:$ptr))>;		def loadi64 : PatFrag<(ops node:$ptr), (i64 (load node:$ptr))>;
def loadf32 : PatFrag<(ops node:$ptr), (f32 (load node:$ptr))>;		def loadf32 : PatFrag<(ops node:$ptr), (f32 (load node:$ptr))>;
def loadf64 : PatFrag<(ops node:$ptr), (f64 (load node:$ptr))>;		def loadf64 : PatFrag<(ops node:$ptr), (f64 (load node:$ptr))>;
def loadf80 : PatFrag<(ops node:$ptr), (f80 (load node:$ptr))>;		def loadf80 : PatFrag<(ops node:$ptr), (f80 (load node:$ptr))>;
		def loadf128 : PatFrag<(ops node:$ptr), (f128 (load node:$ptr))>;
		davidxlUnsubmitted Not Done Reply Inline Actions Unrelated format change here. davidxl: Unrelated format change here.
		chhAuthorUnsubmitted Not Done Reply Inline Actions They are changed to align up with new line 962. chh: They are changed to align up with new line 962.

		davidxlUnsubmitted Not Done Reply Inline Actions It seems you are adding extra space before : davidxl: It seems you are adding extra space before :
		chhAuthorUnsubmitted Not Done Reply Inline Actions Yes, it seems to be the style at line 964-969 too. chh: Yes, it seems to be the style at line 964-969 too.
def sextloadi16i8 : PatFrag<(ops node:$ptr), (i16 (sextloadi8 node:$ptr))>;		def sextloadi16i8 : PatFrag<(ops node:$ptr), (i16 (sextloadi8 node:$ptr))>;
def sextloadi32i8 : PatFrag<(ops node:$ptr), (i32 (sextloadi8 node:$ptr))>;		def sextloadi32i8 : PatFrag<(ops node:$ptr), (i32 (sextloadi8 node:$ptr))>;
def sextloadi32i16 : PatFrag<(ops node:$ptr), (i32 (sextloadi16 node:$ptr))>;		def sextloadi32i16 : PatFrag<(ops node:$ptr), (i32 (sextloadi16 node:$ptr))>;
def sextloadi64i8 : PatFrag<(ops node:$ptr), (i64 (sextloadi8 node:$ptr))>;		def sextloadi64i8 : PatFrag<(ops node:$ptr), (i64 (sextloadi8 node:$ptr))>;
def sextloadi64i16 : PatFrag<(ops node:$ptr), (i64 (sextloadi16 node:$ptr))>;		def sextloadi64i16 : PatFrag<(ops node:$ptr), (i64 (sextloadi16 node:$ptr))>;
def sextloadi64i32 : PatFrag<(ops node:$ptr), (i64 (sextloadi32 node:$ptr))>;		def sextloadi64i32 : PatFrag<(ops node:$ptr), (i64 (sextloadi32 node:$ptr))>;

def zextloadi8i1 : PatFrag<(ops node:$ptr), (i8 (zextloadi1 node:$ptr))>;		def zextloadi8i1 : PatFrag<(ops node:$ptr), (i8 (zextloadi1 node:$ptr))>;
▲ Show 20 Lines • Show All 2,082 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrSSE.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	let Predicates = [HasSSE2] in {
def : Pat<(v4f32 (bitconvert (v8i16 VR128:$src))), (v4f32 VR128:$src)>;		def : Pat<(v4f32 (bitconvert (v8i16 VR128:$src))), (v4f32 VR128:$src)>;
def : Pat<(v4f32 (bitconvert (v16i8 VR128:$src))), (v4f32 VR128:$src)>;		def : Pat<(v4f32 (bitconvert (v16i8 VR128:$src))), (v4f32 VR128:$src)>;
def : Pat<(v4f32 (bitconvert (v2f64 VR128:$src))), (v4f32 VR128:$src)>;		def : Pat<(v4f32 (bitconvert (v2f64 VR128:$src))), (v4f32 VR128:$src)>;
def : Pat<(v2f64 (bitconvert (v2i64 VR128:$src))), (v2f64 VR128:$src)>;		def : Pat<(v2f64 (bitconvert (v2i64 VR128:$src))), (v2f64 VR128:$src)>;
def : Pat<(v2f64 (bitconvert (v4i32 VR128:$src))), (v2f64 VR128:$src)>;		def : Pat<(v2f64 (bitconvert (v4i32 VR128:$src))), (v2f64 VR128:$src)>;
def : Pat<(v2f64 (bitconvert (v8i16 VR128:$src))), (v2f64 VR128:$src)>;		def : Pat<(v2f64 (bitconvert (v8i16 VR128:$src))), (v2f64 VR128:$src)>;
def : Pat<(v2f64 (bitconvert (v16i8 VR128:$src))), (v2f64 VR128:$src)>;		def : Pat<(v2f64 (bitconvert (v16i8 VR128:$src))), (v2f64 VR128:$src)>;
def : Pat<(v2f64 (bitconvert (v4f32 VR128:$src))), (v2f64 VR128:$src)>;		def : Pat<(v2f64 (bitconvert (v4f32 VR128:$src))), (v2f64 VR128:$src)>;
		def : Pat<(f128 (bitconvert (i128 FR128:$src))), (f128 FR128:$src)>;
		def : Pat<(i128 (bitconvert (f128 FR128:$src))), (i128 FR128:$src)>;
}		}

// Bitcasts between 256-bit vector types. Return the original type since		// Bitcasts between 256-bit vector types. Return the original type since
// no instruction is needed for the conversion		// no instruction is needed for the conversion
let Predicates = [HasAVX] in {		let Predicates = [HasAVX] in {
def : Pat<(v4f64 (bitconvert (v8f32 VR256:$src))), (v4f64 VR256:$src)>;		def : Pat<(v4f64 (bitconvert (v8f32 VR256:$src))), (v4f64 VR256:$src)>;
def : Pat<(v4f64 (bitconvert (v8i32 VR256:$src))), (v4f64 VR256:$src)>;		def : Pat<(v4f64 (bitconvert (v8i32 VR256:$src))), (v4f64 VR256:$src)>;
def : Pat<(v4f64 (bitconvert (v4i64 VR256:$src))), (v4f64 VR256:$src)>;		def : Pat<(v4f64 (bitconvert (v4i64 VR256:$src))), (v4f64 VR256:$src)>;
▲ Show 20 Lines • Show All 8,422 Lines • ▼ Show 20 Lines	let ExeDomain = SSEPackedDouble in {
defm VGATHERQPD : avx2_gather<0x93, "vgatherqpd", VR256, vx64mem, vy64mem>, VEX_W;		defm VGATHERQPD : avx2_gather<0x93, "vgatherqpd", VR256, vx64mem, vy64mem>, VEX_W;
}		}

let ExeDomain = SSEPackedSingle in {		let ExeDomain = SSEPackedSingle in {
defm VGATHERDPS : avx2_gather<0x92, "vgatherdps", VR256, vx32mem, vy32mem>;		defm VGATHERDPS : avx2_gather<0x92, "vgatherdps", VR256, vx32mem, vy32mem>;
defm VGATHERQPS : avx2_gather<0x93, "vgatherqps", VR128, vx32mem, vy32mem>;		defm VGATHERQPS : avx2_gather<0x93, "vgatherqps", VR128, vx32mem, vy32mem>;
}		}
}		}

		//===----------------------------------------------------------------------===//
		// Extra selection patterns for FR128, f128, f128mem

		// movaps is shorter than movdqa. movaps is in SSE and movdqa is in SSE2.
		def : Pat<(store (f128 FR128:$src), addr:$dst),
		(MOVAPSmr addr:$dst, (COPY_TO_REGCLASS (f128 FR128:$src), VR128))>;
		davidxlUnsubmitted Not Done Reply Inline Actions Move the comment above the pattern def. movaps is shorter, not 'should be' regarding 'faster' part -- put a reference there. In fact, f128 operations should be considered in integer domain so movdqa should be used to avoid domain bypass penalty. davidxl: Move the comment above the pattern def. 1) movaps is shorter, not 'should be' 2) regarding…
		chhAuthorUnsubmitted Not Done Reply Inline Actions I updated comment to keep only the shorter and SSE reasons. I was not sure about 'faster with movaps', which seems to be used more in clang than gcc. My main reasons are shorter and available in SSE for Android's applications. chh: I updated comment to keep only the shorter and SSE reasons. I was not sure about 'faster with…

		def : Pat<(loadf128 addr:$src),
		(COPY_TO_REGCLASS (MOVAPSrm addr:$src), FR128)>;

		// andps is shorter than andpd or pand. andps is SSE and andpd/pand are in SSE2
		def : Pat<(X86fand FR128:$src1, (loadf128 addr:$src2)),
		davidxlUnsubmitted Not Done Reply Inline Actions pand is for SIMD integer. andps is shorter though. davidxl: pand is for SIMD integer. andps is shorter though.
		chhAuthorUnsubmitted Not Done Reply Inline Actions I also choose andps over pand for shorter and availability in SSE, not sure about performance difference when combined with other instructions. I think by not splitting f128 into two registers, we already saved more code and execution time. chh: I also choose andps over pand for shorter and availability in SSE, not sure about performance…
		(COPY_TO_REGCLASS
		(ANDPSrm (COPY_TO_REGCLASS FR128:$src1, VR128), f128mem:$src2),
		FR128)>;
		davidxlUnsubmitted Not Done Reply Inline Actions Is there any coding style guidelines for table gen code? There are long lines that are wrapped .. davidxl: Is there any coding style guidelines for table gen code? There are long lines that are wrapped .
		chhAuthorUnsubmitted Not Done Reply Inline Actions Note sure if there is special rule for table gen code. http://llvm.org/docs/CodingStandards.html says 80 columns. There are few exceptions in this file, but now I wrapped all my new lines to less than 80 characters. chh: Note sure if there is special rule for table gen code. http://llvm.org/docs/CodingStandards.

		def : Pat<(X86fand FR128:$src1, FR128:$src2),
		(COPY_TO_REGCLASS
		(ANDPSrr (COPY_TO_REGCLASS FR128:$src1, VR128),
		(COPY_TO_REGCLASS FR128:$src2, VR128)), FR128)>;

		def : Pat<(and FR128:$src1, FR128:$src2),
		(COPY_TO_REGCLASS
		(ANDPSrr (COPY_TO_REGCLASS FR128:$src1, VR128),
		(COPY_TO_REGCLASS FR128:$src2, VR128)), FR128)>;

		def : Pat<(X86for FR128:$src1, (loadf128 addr:$src2)),
		(COPY_TO_REGCLASS
		(ORPSrm (COPY_TO_REGCLASS FR128:$src1, VR128), f128mem:$src2),
		FR128)>;

		def : Pat<(X86for FR128:$src1, FR128:$src2),
		(COPY_TO_REGCLASS
		(ORPSrr (COPY_TO_REGCLASS FR128:$src1, VR128),
		(COPY_TO_REGCLASS FR128:$src2, VR128)), FR128)>;

		def : Pat<(or FR128:$src1, FR128:$src2),
		(COPY_TO_REGCLASS
		(ORPSrr (COPY_TO_REGCLASS FR128:$src1, VR128),
		(COPY_TO_REGCLASS FR128:$src2, VR128)), FR128)>;

		def : Pat<(X86fxor FR128:$src1, (loadf128 addr:$src2)),
		(COPY_TO_REGCLASS
		(XORPSrm (COPY_TO_REGCLASS FR128:$src1, VR128), f128mem:$src2),
		FR128)>;

		def : Pat<(X86fxor FR128:$src1, FR128:$src2),
		(COPY_TO_REGCLASS
		(XORPSrr (COPY_TO_REGCLASS FR128:$src1, VR128),
		(COPY_TO_REGCLASS FR128:$src2, VR128)), FR128)>;

		def : Pat<(xor FR128:$src1, FR128:$src2),
		(COPY_TO_REGCLASS
		(XORPSrr (COPY_TO_REGCLASS FR128:$src1, VR128),
		(COPY_TO_REGCLASS FR128:$src2, VR128)), FR128)>;

lib/Target/X86/X86RegisterInfo.td

	Show First 20 Lines • Show All 417 Lines • ▼ Show 20 Lines
	// A class to support the 'A' assembler constraint: EAX then EDX.			// A class to support the 'A' assembler constraint: EAX then EDX.
	def GR32_AD : RegisterClass<"X86", [i32], 32, (add EAX, EDX)>;			def GR32_AD : RegisterClass<"X86", [i32], 32, (add EAX, EDX)>;

	// Scalar SSE2 floating point registers.			// Scalar SSE2 floating point registers.
	def FR32 : RegisterClass<"X86", [f32], 32, (sequence "XMM%u", 0, 15)>;			def FR32 : RegisterClass<"X86", [f32], 32, (sequence "XMM%u", 0, 15)>;

	def FR64 : RegisterClass<"X86", [f64], 64, (add FR32)>;			def FR64 : RegisterClass<"X86", [f64], 64, (add FR32)>;

				def FR128 : RegisterClass<"X86", [i128, f128], 128, (add FR32)>;


	// FIXME: This sets up the floating point register files as though they are f64			// FIXME: This sets up the floating point register files as though they are f64
	// values, though they really are f80 values. This will cause us to spill			// values, though they really are f80 values. This will cause us to spill
	// values as 64-bit quantities instead of 80-bit quantities, which is much much			// values as 64-bit quantities instead of 80-bit quantities, which is much much
	// faster on common hardware. In reality, this should be controlled by a			// faster on common hardware. In reality, this should be controlled by a
	// command line option or something.			// command line option or something.

	def RFP32 : RegisterClass<"X86",[f32], 32, (sequence "FP%u", 0, 6)>;			def RFP32 : RegisterClass<"X86",[f32], 32, (sequence "FP%u", 0, 6)>;
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

test/CodeGen/X86/fp128-calling-conv.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				; double myD = 1.0;
				@myD = global double 1.000000e+00, align 8
				davidxlUnsubmitted Not Done Reply Inline Actions Is this variable used? davidxl: Is this variable used?
				chhAuthorUnsubmitted Not Done Reply Inline Actions No, Removed now. chh: No, Removed now.

				; long double myFP80 = 1.0L; // x86_64-linux-gnu
				@myFP80 = global x86_fp80 0xK3FFF8000000000000000, align 16
				davidxlUnsubmitted Not Done Reply Inline Actions Is it used? davidxl: Is it used?
				chhAuthorUnsubmitted Not Done Reply Inline Actions No. Removed now. chh: No. Removed now.

				; long double myFP128 = 1.0L; // x86_64-linux-android
				@myFP128 = global fp128 0xL00000000000000003FFF000000000000, align 16
				davidxlUnsubmitted Not Done Reply Inline Actions Are the two parts swapped? GCC seems to generates: 3FFF0000000000000000000000000000 davidxl: Are the two parts swapped? GCC seems to generates: 3FFF0000000000000000000000000000
				chhAuthorUnsubmitted Not Done Reply Inline Actions That looked strange but correct. I copied that from clang's dump, and the llvm output assembly code is the same as gcc's. http://llvm.org/docs/LangRef.html says big-endian is used for hexadecimal floating point constants. chh: That looked strange but correct. I copied that from clang's dump, and the llvm output assembly…

				; The first few parameters are passed in registers and the other are on stack.

				define fp128 @TestParam_FP128_0(fp128 %d0, fp128 %d1, fp128 %d2, fp128 %d3, fp128 %d4, fp128 %d5, fp128 %d6, fp128 %d7, fp128 %d8, fp128 %d9, fp128 %d10, fp128 %d11, fp128 %d12, fp128 %d13, fp128 %d14, fp128 %d15, fp128 %d16, fp128 %d17, fp128 %d18, fp128 %d19) {
				entry:
				davidxlUnsubmitted Not Done Reply Inline Actions Is this a relevant test? non-f128 related tests can be submitted in a different patch. davidxl: Is this a relevant test? non-f128 related tests can be submitted in a different patch.
				chhAuthorUnsubmitted Not Done Reply Inline Actions Okay, i will put non-fp128 type tests to another patch. chh: Okay, i will put non-fp128 type tests to another patch.
				ret fp128 %d0
				; CHECK-LABEL: TestParam_FP128_0:
				; CHECK-NOT: mov
				; CHECK: retq
				}

				define fp128 @TestParam_FP128_1(fp128 %d0, fp128 %d1, fp128 %d2, fp128 %d3, fp128 %d4, fp128 %d5, fp128 %d6, fp128 %d7, fp128 %d8, fp128 %d9, fp128 %d10, fp128 %d11, fp128 %d12, fp128 %d13, fp128 %d14, fp128 %d15, fp128 %d16, fp128 %d17, fp128 %d18, fp128 %d19) {
				entry:
				ret fp128 %d1
				; CHECK-LABEL: TestParam_FP128_1:
				; CHECK: movaps %xmm1, %xmm0
				; CHECK-NEXT: retq
				}

				define fp128 @TestParam_FP128_7(fp128 %d0, fp128 %d1, fp128 %d2, fp128 %d3, fp128 %d4, fp128 %d5, fp128 %d6, fp128 %d7, fp128 %d8, fp128 %d9, fp128 %d10, fp128 %d11, fp128 %d12, fp128 %d13, fp128 %d14, fp128 %d15, fp128 %d16, fp128 %d17, fp128 %d18, fp128 %d19) {
				entry:
				ret fp128 %d7
				; CHECK-LABEL: TestParam_FP128_7:
				; CHECK: movaps %xmm7, %xmm0
				; CHECK-NEXT: retq
				}

				define fp128 @TestParam_FP128_8(fp128 %d0, fp128 %d1, fp128 %d2, fp128 %d3, fp128 %d4, fp128 %d5, fp128 %d6, fp128 %d7, fp128 %d8, fp128 %d9, fp128 %d10, fp128 %d11, fp128 %d12, fp128 %d13, fp128 %d14, fp128 %d15, fp128 %d16, fp128 %d17, fp128 %d18, fp128 %d19) {
				entry:
				ret fp128 %d8
				; CHECK-LABEL: TestParam_FP128_8:
				; CHECK: movaps 8(%rsp), %xmm0
				; CHECK-NEXT: retq
				}

				define fp128 @TestParam_FP128_9(fp128 %d0, fp128 %d1, fp128 %d2, fp128 %d3, fp128 %d4, fp128 %d5, fp128 %d6, fp128 %d7, fp128 %d8, fp128 %d9, fp128 %d10, fp128 %d11, fp128 %d12, fp128 %d13, fp128 %d14, fp128 %d15, fp128 %d16, fp128 %d17, fp128 %d18, fp128 %d19) {
				entry:
				ret fp128 %d9
				; CHECK-LABEL: TestParam_FP128_9:
				; CHECK: movaps 24(%rsp), %xmm0
				; CHECK-NEXT: retq
				}

test/CodeGen/X86/fp128-cast.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				; Check soft floating point conversion function calls.

				@vi32 = common global i32 0, align 4
				@vi64 = common global i64 0, align 8
				@vf32 = common global float 0.000000e+00, align 4
				@vf64 = common global double 0.000000e+00, align 8
				@vf128 = common global fp128 0xL00000000000000000000000000000000, align 16

				define void @TestCastF32_F128() {
				davidxlUnsubmitted Done Reply Inline Actions TestFPExtF32_F128 davidxl: TestFPExtF32_F128
				entry:
				davidxlUnsubmitted Not Done Reply Inline Actions There are also lots of irrelevant tests added in this file . davidxl: There are also lots of irrelevant tests added in this file .
				chhAuthorUnsubmitted Not Done Reply Inline Actions Okay, i will put non-fp128 type tests to another patch. chh: Okay, i will put non-fp128 type tests to another patch.
				%0 = load float, float* @vf32, align 4
				%conv = fpext float %0 to fp128
				store fp128 %conv, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: TestCastF32_F128:
				; CHECK: movss vf32(%rip), %xmm0
				; CHECK-NEXT: callq __extendsftf2
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @TestCastF64_F128() {
				davidxlUnsubmitted Done Reply Inline Actions TestFPExt .. davidxl: TestFPExt ..
				entry:
				%0 = load double, double* @vf64, align 8
				%conv = fpext double %0 to fp128
				store fp128 %conv, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: TestCastF64_F128:
				; CHECK: movsd vf64(%rip), %xmm0
				; CHECK-NEXT: callq __extenddftf2
				; CHECK-NEXT: movapd %xmm0, vf128(%rip)
				; CHECK: ret
				}

				define void @TestCastF128_I32() {
				davidxlUnsubmitted Done Reply Inline Actions Missing a test for conversion to unsigned I32 davidxl: Missing a test for conversion to unsigned I32
				chhAuthorUnsubmitted Not Done Reply Inline Actions Added conversion to uint32 and uint64. chh: Added conversion to uint32 and uint64.
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%conv = fptosi fp128 %0 to i32
				store i32 %conv, i32* @vi32, align 4
				ret void
				; CHECK-LABEL: TestCastF128_I32:
				; CHECK: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __fixtfsi
				; CHECK-NEXT: movl %eax, vi32(%rip)
				; CHECK: retq
				}

				define void @TestCastF128_I64() {
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%conv = fptosi fp128 %0 to i32
				%conv1 = sext i32 %conv to i64
				store i64 %conv1, i64* @vi64, align 8
				ret void
				; CHECK-LABEL: TestCastF128_I64:
				; CHECK: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __fixtfsi
				; CHECK-NEXT: cltq
				; CHECK-NEXT: movq %rax, vi64(%rip)
				; CHECK: retq
				}

				define void @TestCastF128_F32() {
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%conv = fptrunc fp128 %0 to float
				store float %conv, float* @vf32, align 4
				ret void
				; CHECK-LABEL: TestCastF128_F32:
				; CHECK: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __trunctfsf2
				; CHECK-NEXT: movss %xmm0, vf32(%rip)
				; CHECK: retq
				}

				define void @TestCastF128_F64() {
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%conv = fptrunc fp128 %0 to double
				store double %conv, double* @vf64, align 8
				ret void
				; CHECK-LABEL: TestCastF128_F64:
				; CHECK: movapd vf128(%rip), %xmm0
				; CHECK-NEXT: callq __trunctfdf2
				; CHECK-NEXT: movsd %xmm0, vf64(%rip)
				; CHECK: retq
				}

				define void @TestCastI32_F128() {
				entry:
				%0 = load i32, i32* @vi32, align 4
				%conv = sitofp i32 %0 to fp128
				store fp128 %conv, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: TestCastI32_F128:
				; CHECK: movl vi32(%rip), %edi
				; CHECK-NEXT: callq __floatsitf
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @TestCastI64_F128(){
				entry:
				%0 = load i64, i64* @vi64, align 8
				%conv = sitofp i64 %0 to fp128
				store fp128 %conv, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: TestCastI64_F128:
				; CHECK: movq vi64(%rip), %rdi
				; CHECK-NEXT: callq __floatditf
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define i32 @TestConst128(fp128 %v) {
				entry:
				%cmp = fcmp ogt fp128 %v, 0xL00000000000000003FFF000000000000
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestConst128:
				; CHECK: movaps {{.*}}, %xmm1
				; CHECK-NEXT: callq __gttf2
				; CHECK-NEXT: testl %eax, %eax
				davidxlUnsubmitted Done Reply Inline Actions might be better to relax the check a little : testq %rax, %rax should be fine too. davidxl: might be better to relax the check a little : testq %rax, %rax should be fine too.
				; CHECK-NEXT: setg %al
				; CHECK: retq
				}

				define i32 @TestBits128(fp128 %ld) {
				entry:
				%mul = fmul fp128 %ld, %ld
				%0 = bitcast fp128 %mul to i128
				%u.sroa.0.4.extract.shift = lshr i128 %0, 32
				davidxlUnsubmitted Not Done Reply Inline Actions Can you simplify the variable names ? davidxl: Can you simplify the variable names ?
				chhAuthorUnsubmitted Not Done Reply Inline Actions These were generated by clang for my simplified C code from libm. They are useful to show the clang transformations. I will add the C code example as comments. chh: These were generated by clang for my simplified C code from libm. They are useful to show the…
				%or5 = or i128 %u.sroa.0.4.extract.shift, %0
				%or = trunc i128 %or5 to i32
				davidxlUnsubmitted Not Done Reply Inline Actions Why do we care what transformations have been done to get the IR? The IR code should by itself readable -- so while the C example is useful, I still prefer the naming in IR simplified. davidxl: Why do we care what transformations have been done to get the IR? The IR code should by itself…
				chhAuthorUnsubmitted Not Done Reply Inline Actions The comment now seems to be showing at wrong place in this code diff. The biggest confusing name is u.sroa.0.4.extract.shift in TestBits128. So I will shorten long names in this function for now. The C code is important for anyone in the future to test, if not having time to rebuild and test the whole AOSP with libm. Any simplification of the IR might work, but clang could generate different bit operators for those long double union types and trigger problems with any ad-hoc f128 optimization. chh: The comment now seems to be showing at wrong place in this code diff. The biggest confusing…
				%cmp = icmp eq i32 %or, 0
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestBits128:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: callq __multf3
				; CHECK-NEXT: movaps %xmm0, (%rsp)
				; CHECK-NEXT: movq (%rsp),
				; CHECK-NEXT: movq %
				; CHECK-NEXT: shrq $32,
				; CHECK: orl
				; CHECK-NEXT: sete %al
				; CHECK-NEXT: movzbl %al, %eax
				; CHECK: retq
				}

				define fp128 @TestPair128(i64 %a, i64 %b) {
				entry:
				davidxlUnsubmitted Not Done Reply Inline Actions Please move this test case to a different patch. davidxl: Please move this test case to a different patch.
				chhAuthorUnsubmitted Not Done Reply Inline Actions This is about f128 calling convention to return f128 in SSE. Are you sure about removing this one? chh: This is about f128 calling convention to return f128 in SSE. Are you sure about removing this…
				davidxlUnsubmitted Not Done Reply Inline Actions you are right -- this one is relevant. Please check the rest. davidxl: you are right -- this one is relevant. Please check the rest.
				%conv = zext i64 %a to i128
				%shl = shl nuw i128 %conv, 64
				%conv1 = zext i64 %b to i128
				%or = or i128 %shl, %conv1
				%add = add i128 %or, 3
				%0 = bitcast i128 %add to fp128
				ret fp128 %0
				; CHECK-LABEL: TestPair128:
				; CHECK: addq $3, %rsi
				; CHECK-NEXT: movq %rsi, -24(%rsp)
				; CHECK-NEXT: adcq $0, %rdi
				davidxlUnsubmitted Not Done Reply Inline Actions There is no guarantee adcq will be after movq ... davidxl: There is no guarantee adcq will be after movq ...
				chhAuthorUnsubmitted Not Done Reply Inline Actions Okay, relaxed the check patterns. chh: Okay, relaxed the check patterns.
				; CHECK-NEXT: movq %rdi, -16(%rsp)
				; CHECK-NEXT: movaps -24(%rsp), %xmm0
				; CHECK-NEXT: retq
				}

				define fp128 @TestTruncCopysign(fp128 %x, i32 %n) {
				entry:
				%cmp = icmp sgt i32 %n, 50000
				br i1 %cmp, label %if.then, label %cleanup

				if.then: ; preds = %entry
				%conv = fptrunc fp128 %x to double
				%call = tail call double @copysign(double 0x7FF0000000000000, double %conv) #2
				%conv1 = fpext double %call to fp128
				br label %cleanup

				cleanup: ; preds = %entry, %if.then
				%retval.0 = phi fp128 [ %conv1, %if.then ], [ %x, %entry ]
				ret fp128 %retval.0
				; CHECK-LABEL: TestTruncCopysign:
				; CHECK: callq __trunctfdf2
				; CHECK-NEXT: andpd {{.*}}, %xmm0
				; CHECK-NEXT: orpd {{.*}}, %xmm0
				; CHECK-NEXT: callq __extenddftf2
				; CHECK: retq
				}

				declare double @copysign(double, double) #1

				attributes #2 = { nounwind readnone }

test/CodeGen/X86/fp128-compare.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				define i32 @TestComp128GT(fp128 %d1, fp128 %d2) {
				entry:
				davidxlUnsubmitted Done Reply Inline Actions Move this test case out of the patch -- -there are more than 20 or so test cases below that need to be separated out. davidxl: Move this test case out of the patch -- -there are more than 20 or so test cases below that…
				%cmp = fcmp ogt fp128 %d1, %d2
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestComp128GT:
				; CHECK: callq __gttf2
				; CHECK: setg %al
				; CHECK: retq
				}

				define i32 @TestComp128GE(fp128 %d1, fp128 %d2) {
				entry:
				%cmp = fcmp oge fp128 %d1, %d2
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestComp128GE:
				; CHECK: callq __getf2
				; CHECK: testl %eax, %eax
				; CHECK: retq
				davidxlUnsubmitted Done Reply Inline Actions missing check of 'set<cc>' davidxl: missing check of 'set<cc>'
				}

				define i32 @TestComp128LT(fp128 %d1, fp128 %d2) {
				entry:
				%cmp = fcmp olt fp128 %d1, %d2
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestComp128LT:
				; CHECK: callq __lttf2
				; CHECK-NEXT: shrl $31, %eax
				davidxlUnsubmitted Not Done Reply Inline Actions is this correct? davidxl: is this correct?
				chhAuthorUnsubmitted Not Done Reply Inline Actions Yes. It's a strange optimization, which returns 1 if %cmp is negative as __lttf2 will return when %d1 < %d2. chh: Yes. It's a strange optimization, which returns 1 if %cmp is negative as __lttf2 will return…
				; CHECK: retq
				}
				davidxlUnsubmitted Not Done Reply Inline Actions ok. But it is possible with test + sets, right? may be adding a comment so that people know how to fix the test if it breaks in the future? davidxl: ok. But it is possible with test + sets, right? may be adding a comment so that people know…
				chhAuthorUnsubmitted Not Done Reply Inline Actions Sure, added comment. If it is broken in the future, maybe it would be easier to continue using this trick. :-) chh: Sure, added comment. If it is broken in the future, maybe it would be easier to continue using…

				define i32 @TestComp128LE(fp128 %d1, fp128 %d2) {
				entry:
				%cmp = fcmp ole fp128 %d1, %d2
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestComp128LE:
				; CHECK: callq __letf2
				; CHECK-NEXT: testl %eax, %eax
				; CHECK: retq
				davidxlUnsubmitted Done Reply Inline Actions Missing check davidxl: Missing check
				}

				define i32 @TestComp128EQ(fp128 %d1, fp128 %d2) {
				entry:
				%cmp = fcmp oeq fp128 %d1, %d2
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestComp128EQ:
				; CHECK: callq __eqtf2
				; CHECK-NEXT: testl %eax, %eax
				; CHECK: retq
				}

				define i32 @TestComp128NE(fp128 %d1, fp128 %d2) {
				entry:
				%cmp = fcmp une fp128 %d1, %d2
				%conv = zext i1 %cmp to i32
				ret i32 %conv
				; CHECK-LABEL: TestComp128NE:
				; CHECK: callq __netf2
				; CHECK-NEXT: testl %eax, %eax
				; CHECK: retq
				}

				define fp128 @TestMax(fp128 %x, fp128 %y) {
				entry:
				%cmp = fcmp ogt fp128 %x, %y
				%cond = select i1 %cmp, fp128 %x, fp128 %y
				ret fp128 %cond
				; CHECK-LABEL: TestMax:
				; CHECK: movaps %xmm1
				; CHECK: movaps %xmm0
				; CHECK: callq __gttf2
				; CHECK: movaps {{.*}}, %xmm0
				; CHECK: testl %eax, %eax
				; CHECK: movaps {{.*}}, %xmm0
				; CHECK: retq
				}

test/CodeGen/X86/fp128-i128.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				; Check some i128 instruction patterns triggered by fp128.
				davidxlUnsubmitted Done Reply Inline Actions Better comment? davidxl: Better comment?

				define void @TestUnionLD1(fp128 %s, i64 %n) #0 {
				entry:
				%0 = bitcast fp128 %s to i128
				davidxlUnsubmitted Not Done Reply Inline Actions __float128 davidxl: __float128
				chhAuthorUnsubmitted Not Done Reply Inline Actions Unfortunately, clang does not accept __float128 keyword, although it can emit f128 for llvm. chh: Unfortunately, clang does not accept __float128 keyword, although it can emit f128 for llvm.
				%1 = zext i64 %n to i128
				davidxlUnsubmitted Done Reply Inline Actions This should be fixed in clang FE. By default, long double is extended FP, not quadFP --- so do fix the comment to avoid confusion. davidxl: This should be fixed in clang FE. By default, long double is extended FP, not quadFP --- so do…
				chhAuthorUnsubmitted Not Done Reply Inline Actions Used __float128 here and added more comments at the top of file. chh: Used __float128 here and added more comments at the top of file.
				%bf.value = shl nuw i128 %1, 64
				%bf.shl = and i128 %bf.value, 5192296858534809181786422619668480
				%bf.clear = and i128 %0, -5192296858534809181786422619668481
				%bf.set = or i128 %bf.shl, %bf.clear
				%2 = bitcast i128 %bf.set to fp128
				tail call void @foo(fp128 %2) #2
				ret void
				; CHECK-LABEL: TestUnionLD1:
				; CHECK: movaps %xmm0, -24(%rsp)
				; CHECK-NEXT: movq -24(%rsp), %rax
				; CHECK-NEXT: movabsq $281474976710655, %rcx
				; CHECK-NEXT: andq %rdi, %rcx
				; CHECK-NEXT: movabsq $-281474976710656, %rdx
				; CHECK-NEXT: andq -16(%rsp), %rdx
				; CHECK-NEXT: movq %rax, -40(%rsp)
				davidxlUnsubmitted Not Done Reply Inline Actions long double --> __float128? davidxl: long double --> __float128?
				chhAuthorUnsubmitted Not Done Reply Inline Actions no __float128 in clang. chh: no __float128 in clang.
				; CHECK-NEXT: orq %rcx, %rdx
				; CHECK-NEXT: movq %rdx, -32(%rsp)
				; CHECK-NEXT: movaps -40(%rsp), %xmm0
				; CHECK-NEXT: jmp foo
				}

				define fp128 @TestUnionLD2(fp128 %s) #0 {
				entry:
				%0 = bitcast fp128 %s to i128
				%bf.clear = and i128 %0, -18446744073709551616
				%1 = bitcast i128 %bf.clear to fp128
				ret fp128 %1
				; CHECK-LABEL: TestUnionLD2:
				; CHECK: movaps %xmm0, -24(%rsp)
				; CHECK-NEXT: movq -16(%rsp), %rax
				; CHECK-NEXT: movq %rax, -32(%rsp)
				; CHECK-NEXT: movq $0, -40(%rsp)
				; CHECK-NEXT: movaps -40(%rsp), %xmm0
				; CHECK-NEXT: retq
				}

				davidxlUnsubmitted Not Done Reply Inline Actions The pattern checked is pretty long -- I worry it may break in the future. Is it possible to relax it some how? davidxl: The pattern checked is pretty long -- I worry it may break in the future. Is it possible to…
				chhAuthorUnsubmitted Not Done Reply Inline Actions I tried to reduce the C code, but any reduction won't trigger the complicated IL that reached a point that my partial fix core dumped. Maybe we can take out a few CHECK-NEXT requirements. On the other hard, I was terrified by so many ad-hoc optimizations of floating points for the usage patterns in libm. I guess llvm tried to match or do better then gcc and libm tried to use every possible bit operations. So maybe it is better for Android or anyone depends on f128 type to have more check rules here. Whoever changes float optimization in the future has better fully understand and update these tests. chh: I tried to reduce the C code, but any reduction won't trigger the complicated IL that reached a…
				define fp128 @TestI128_1(fp128 %x) #0 {
				entry:
				%0 = bitcast fp128 %x to i128
				%bf.clear = and i128 %0, 170141183460469231731687303715884105727
				%1 = bitcast i128 %bf.clear to fp128
				%cmp = fcmp olt fp128 %1, 0xL999999999999999A3FFB999999999999
				%cond = select i1 %cmp, fp128 0xL00000000000000003FFF000000000000, fp128 0xL00000000000000004000000000000000
				ret fp128 %cond
				; CHECK-LABEL: TestI128_1:
				; CHECK: movaps %xmm0,
				; CHECK: movabsq $9223372036854775807,
				; CHECK: callq __lttf2
				; CHECK: testl %eax, %eax
				davidxlUnsubmitted Not Done Reply Inline Actions long double --> __float128 davidxl: long double --> __float128
				; CHECK: movaps {{.*}}, %xmm0
				; CHECK: retq
				}

				define fp128 @TestI128_2(fp128 %x, fp128 %y) #0 {
				entry:
				%0 = bitcast fp128 %x to i128
				%cmp = icmp sgt i128 %0, -1
				%cond = select i1 %cmp, fp128 %x, fp128 %y
				ret fp128 %cond
				; CHECK-LABEL: TestI128_2:
				; CHECK: movaps %xmm0, -24(%rsp)
				; CHECK-NEXT: cmpq $0, -16(%rsp)
				; CHECK-NEXT: jns
				; CHECK: movaps %xmm1, %xmm0
				; CHECK: retq
				}

				define fp128 @TestI128_3(fp128 %x, i32* nocapture readnone %ex) #0 {
				entry:
				%0 = bitcast fp128 %x to i128
				%bf.cast = and i128 %0, 170135991163610696904058773219554885632
				%cmp = icmp eq i128 %bf.cast, 0
				br i1 %cmp, label %if.then, label %if.end
				davidxlUnsubmitted Not Done Reply Inline Actions long double --> __float128 davidxl: long double --> __float128

				if.then: ; preds = %entry
				%mul = fmul fp128 %x, 0xL00000000000000004201000000000000
				%1 = bitcast fp128 %mul to i128
				%bf.clear4 = and i128 %1, -170135991163610696904058773219554885633
				%bf.set = or i128 %bf.clear4, 85060207136517546210586590865283612672
				br label %if.end

				if.end: ; preds = %if.then, %entry
				%u.sroa.0.0 = phi i128 [ %bf.set, %if.then ], [ %0, %entry ]
				davidxlUnsubmitted Not Done Reply Inline Actions var names can be cleaned up to be shorter. davidxl: var names can be cleaned up to be shorter.
				chhAuthorUnsubmitted Not Done Reply Inline Actions These IL were copied from libm compiled code. Clang has its way to convert C union structure references. I will add original C code as comment. chh: These IL were copied from libm compiled code. Clang has its way to convert C union structure…
				%2 = bitcast i128 %u.sroa.0.0 to fp128
				ret fp128 %2
				; CHECK-LABEL: TestI128_3:
				; CHECK: movaps %xmm0,
				; CHECK: movabsq $9223090561878065152,
				; CHECK: testq
				; CHECK: callq __multf3
				; CHECK-NEXT: movaps %xmm0
				; CHECK: movabsq $-9223090561878065153,
				; CHECK: movabsq $4611123068473966592,
				; CHECK: retq
				}

				define fp128 @TestI128_4(fp128 %x) #0 {
				entry:
				davidxlUnsubmitted Not Done Reply Inline Actions long double --> __float128 davidxl: long double --> __float128
				%0 = bitcast fp128 %x to i128
				%bf.clear = and i128 %0, -18446744073709551616
				%1 = bitcast i128 %bf.clear to fp128
				%add = fadd fp128 %1, %x
				ret fp128 %add
				; CHECK-LABEL: TestI128_4:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: movaps %xmm1, 16(%rsp)
				; CHECK-NEXT: movq 24(%rsp), %rax
				; CHECK-NEXT: movq %rax, 8(%rsp)
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: movaps (%rsp), %xmm0
				; CHECK-NEXT: callq __addtf3
				; CHECK: retq
				}

				define { i64, i64 } @TestShift128(i64 %x.coerce0, i64 %x.coerce1) #0 {
				entry:
				davidxlUnsubmitted Not Done Reply Inline Actions seems irrelevant. davidxl: seems irrelevant.
				chhAuthorUnsubmitted Not Done Reply Inline Actions Okay, removed the i64 test. chh: Okay, removed the i64 test.
				%.fca.1.insert = insertvalue { i64, i64 } { i64 0, i64 undef }, i64 %x.coerce0, 1
				ret { i64, i64 } %.fca.1.insert
				; CHECK-LABEL: TestShift128:
				; CHECK: xorl %eax, %eax
				; CHECK-NEXT: movq %rdi, %rdx
				; CHECK-NEXT: retq
				}

				@v128 = common global i128 0, align 16
				@v128_2 = common global i128 0, align 16

				define void @TestShift128_2() #2 {
				davidxlUnsubmitted Not Done Reply Inline Actions Seems irrelevant. davidxl: Seems irrelevant.
				chhAuthorUnsubmitted Not Done Reply Inline Actions We need to test i128 too, since this patch put also i128 into the FR128 register class. The i128 instruction was generated from f128 C code. chh: We need to test i128 too, since this patch put also i128 into the FR128 register class. The…
				entry:
				%0 = load i128, i128* @v128, align 16
				%shl = shl i128 %0, 96
				%1 = load i128, i128* @v128_2, align 16
				%or = or i128 %shl, %1
				store i128 %or, i128* @v128, align 16
				ret void
				; CHECK-LABEL: TestShift128_2:
				; CHECK: movq v128(%rip), %rax
				; CHECK-NEXT: shlq $32, %rax
				; CHECK-NEXT: movq v128_2(%rip), %rcx
				; CHECK-NEXT: orq v128_2+8(%rip), %rax
				; CHECK-NEXT: movq %rcx, v128(%rip)
				; CHECK-NEXT: movq %rax, v128+8(%rip)
				; CHECK-NEXT: retq
				}

				define fp128 @acosl(fp128 %x) #0 {
				entry:
				%0 = bitcast fp128 %x to i128
				%bf.clear = and i128 %0, -18446744073709551616
				%1 = bitcast i128 %bf.clear to fp128
				%add = fadd fp128 %1, %x
				ret fp128 %add
				; CHECK-LABEL: acosl:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: movaps %xmm1, 16(%rsp)
				; CHECK-NEXT: movq 24(%rsp), %rax
				; CHECK-NEXT: movq %rax, 8(%rsp)
				; CHECK-NEXT: movq $0, (%rsp)
				; CHECK-NEXT: movaps (%rsp), %xmm0
				; CHECK-NEXT: callq __addtf3
				; CHECK: retq
				}

				; Compare i128 values and check i128 constants.
				define fp128 @TestComp(fp128 %x, fp128 %y) #0 {
				entry:
				%0 = bitcast fp128 %x to i128
				%cmp = icmp sgt i128 %0, -1
				%cond = select i1 %cmp, fp128 %x, fp128 %y
				ret fp128 %cond
				; CHECK-LABEL: TestComp:
				; CHECK: movaps %xmm0, -24(%rsp)
				; CHECK-NEXT: cmpq $0, -16(%rsp)
				; CHECK-NEXT: jns
				; CHECK: movaps %xmm1, %xmm0
				; CHECK: retq
				}

				declare void @foo(fp128) #1

				; Test logical operations on fp128 values.
				define fp128 @TestFABS_LD(fp128 %x) #0 {
				entry:
				%call = tail call fp128 @fabsl(fp128 %x) #2
				ret fp128 %call
				; CHECK-LABEL: TestFABS_LD
				; CHECK: andps {{.*}}, %xmm0
				; CHECK-NEXT: retq
				}

				declare fp128 @fabsl(fp128) #1

				declare fp128 @copysignl(fp128, fp128) #1

				; Test more complicated logical operations generated from copysignl.
				define void @TestCopySign({ fp128, fp128 }* noalias nocapture sret %agg.result, { fp128, fp128 }* byval nocapture readonly align 16 %z) #0 {
				davidxlUnsubmitted Not Done Reply Inline Actions Can this test case be simplified more? davidxl: Can this test case be simplified more?
				chhAuthorUnsubmitted Not Done Reply Inline Actions This was from libm C code, which triggered one error related to f128 without a complete patch. So I added it. I tried to reduce the original C code but then that won't trigger the problem. chh: This was from libm C code, which triggered one error related to f128 without a complete patch.
				entry:
				%z.realp = getelementptr inbounds { fp128, fp128 }, { fp128, fp128 }* %z, i64 0, i32 0
				%z.real = load fp128, fp128* %z.realp, align 16
				%z.imagp = getelementptr inbounds { fp128, fp128 }, { fp128, fp128 }* %z, i64 0, i32 1
				%z.imag4 = load fp128, fp128* %z.imagp, align 16
				%cmp = fcmp ogt fp128 %z.real, %z.imag4
				%sub = fsub fp128 %z.imag4, %z.imag4
				br i1 %cmp, label %if.then, label %cleanup

				if.then: ; preds = %entry
				%call = tail call fp128 @fabsl(fp128 %sub) #2
				br label %cleanup

				cleanup: ; preds = %entry, %if.then
				%z.real.sink = phi fp128 [ %z.real, %if.then ], [ %sub, %entry ]
				%call.sink = phi fp128 [ %call, %if.then ], [ %z.real, %entry ]
				%call5 = tail call fp128 @copysignl(fp128 %z.real.sink, fp128 %z.imag4) #2
				%0 = getelementptr inbounds { fp128, fp128 }, { fp128, fp128 }* %agg.result, i64 0, i32 0
				%1 = getelementptr inbounds { fp128, fp128 }, { fp128, fp128 }* %agg.result, i64 0, i32 1
				store fp128 %call.sink, fp128* %0, align 16
				store fp128 %call5, fp128* %1, align 16
				ret void
				; CHECK-LABEL: TestCopySign
				; CHECK-NOT: call
				; CHECK: callq __subtf3
				; CHECK-NOT: call
				; CHECK: callq __gttf2
				; CHECK-NOT: call
				; CHECK: andps {{.*}}, %xmm0
				; CHECK: retq
				}


				attributes #0 = { nounwind uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+ssse3,+sse3,+popcnt,+sse,+sse2,+sse4.1,+sse4.2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+ssse3,+sse3,+popcnt,+sse,+sse2,+sse4.1,+sse4.2" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nounwind readnone }

test/CodeGen/X86/fp128-libcalls.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				; Check all soft floating point library function calls.

				@vf64 = common global double 0.000000e+00, align 8
				@vf128 = common global fp128 0xL00000000000000000000000000000000, align 16

				define void @Test128Add(fp128 %d1, fp128 %d2) {
				entry:
				%add = fadd fp128 %d1, %d2
				store fp128 %add, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128Add:
				; CHECK: callq __addtf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128_1Add(fp128 %d1){
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%add = fadd fp128 %0, %d1
				store fp128 %add, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128_1Add:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __addtf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128Sub(fp128 %d1, fp128 %d2){
				entry:
				%sub = fsub fp128 %d1, %d2
				store fp128 %sub, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128Sub:
				; CHECK: callq __subtf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128_1Sub(fp128 %d1){
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%sub = fsub fp128 %0, %d1
				store fp128 %sub, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128_1Sub:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __subtf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128Mul(fp128 %d1, fp128 %d2){
				entry:
				%mul = fmul fp128 %d1, %d2
				store fp128 %mul, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128Mul:
				; CHECK: callq __multf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128_1Mul(fp128 %d1){
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%mul = fmul fp128 %0, %d1
				store fp128 %mul, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128_1Mul:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __multf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128Div(fp128 %d1, fp128 %d2){
				entry:
				%div = fdiv fp128 %d1, %d2
				store fp128 %div, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128Div:
				; CHECK: callq __divtf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

				define void @Test128_1Div(fp128 %d1){
				entry:
				%0 = load fp128, fp128* @vf128, align 16
				%div = fdiv fp128 %0, %d1
				store fp128 %div, fp128* @vf128, align 16
				ret void
				; CHECK-LABEL: Test128_1Div:
				; CHECK: movaps %xmm0, %xmm1
				; CHECK-NEXT: movaps vf128(%rip), %xmm0
				; CHECK-NEXT: callq __divtf3
				; CHECK-NEXT: movaps %xmm0, vf128(%rip)
				; CHECK: retq
				}

test/CodeGen/X86/fp128-load.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				; long double myFP128 = 1.0L; // x86_64-linux-android
				@my_fp128 = global fp128 0xL00000000000000003FFF000000000000, align 16

				define fp128 @get_fp128() {
				entry:
				%0 = load fp128, fp128* @my_fp128, align 16
				ret fp128 %0
				; CHECK-LABEL: get_fp128:
				; CHECK: movaps my_fp128(%rip), %xmm0
				; CHECK-NEXT: retq
				}

				@TestLoadExtend.data = internal unnamed_addr constant [2 x float] [float 0x3FB99999A0000000, float 0x3FC99999A0000000], align 4

				define fp128 @TestLoadExtend(fp128 %x, i32 %n) {
				entry:
				%idxprom = sext i32 %n to i64
				%arrayidx = getelementptr inbounds [2 x float], [2 x float]* @TestLoadExtend.data, i64 0, i64 %idxprom
				%0 = load float, float* %arrayidx, align 4
				%conv = fpext float %0 to fp128
				ret fp128 %conv
				; CHECK-LABEL: TestLoadExtend:
				; CHECK: movslq %edi, %rax
				; CHECK-NEXT: movss TestLoadExtend.data(,%rax,4), %xmm0
				; CHECK-NEXT: callq __extendsftf2
				; CHECK: retq
				}

				; CHECK-LABEL: my_fp128:
				; CHECK-NEXT: .quad 0
				; CHECK-NEXT: .quad 4611404543450677248
				; CHECK-NEXT: .size my_fp128, 16

test/CodeGen/X86/fp128-store.ll

				; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
				; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

				; long double myFP128 = 1.0L; // x86_64-linux-android
				@myFP128 = global fp128 0xL00000000000000003FFF000000000000, align 16

				define void @set_FP128(fp128 %x) {
				entry:
				store fp128 %x, fp128* @myFP128, align 16
				ret void
				; CHECK-LABEL: set_FP128:
				; CHECK: movaps %xmm0, myFP128(%rip)
				; CHECK-NEXT: retq
				}

test/CodeGen/X86/soft-fp.ll

	; RUN: llc < %s -march=x86 -mattr=+sse2,+soft-float \| FileCheck %s			; RUN: llc < %s -march=x86 -mattr=+mmx,+sse,+soft-float \
	; RUN: llc < %s -march=x86-64 -mattr=+sse2,+soft-float \| FileCheck %s			; RUN: \| FileCheck %s --check-prefix=SOFT1 --check-prefix=CHECK
	; RUN: llc < %s -mtriple=x86_64-gnux32 -mattr=+sse2,+soft-float \| FileCheck %s			; RUN: llc < %s -march=x86-64 -mattr=+mmx,+sse2,+soft-float \
				; RUN: \| FileCheck %s --check-prefix=SOFT2 --check-prefix=CHECK
				; RUN: llc < %s -march=x86-64 -mattr=+mmx,+sse \
				; RUN: \| FileCheck %s --check-prefix=SSE1 --check-prefix=CHECK
				; RUN: llc < %s -march=x86-64 -mattr=+mmx,+sse2 \
				; RUN: \| FileCheck %s --check-prefix=SSE2 --check-prefix=CHECK
				; RUN: llc < %s -mtriple=x86_64-gnux32 -mattr=+mmx,+sse2,+soft-float \| FileCheck %s

	; CHECK-NOT: xmm{[0-9]+}			; CHECK-NOT: xmm{{[0-9]+}}

	%struct.__va_list_tag = type { i32, i32, i8, i8 }			%struct.__va_list_tag = type { i32, i32, i8, i8 }

	define i32 @t1(i32 %a, ...) nounwind {			define i32 @t1(i32 %a, ...) nounwind {
	entry:			entry:
	%va = alloca [1 x %struct.__va_list_tag], align 8 ; <[1 x %struct.__va_list_tag]*> [#uses=2]			%va = alloca [1 x %struct.__va_list_tag], align 8 ; <[1 x %struct.__va_list_tag]*> [#uses=2]
	%va12 = bitcast [1 x %struct.__va_list_tag]* %va to i8* ; <i8*> [#uses=2]			%va12 = bitcast [1 x %struct.__va_list_tag]* %va to i8* ; <i8*> [#uses=2]
	call void @llvm.va_start(i8* %va12)			call void @llvm.va_start(i8* %va12)
	%va3 = getelementptr [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0 ; <%struct.__va_list_tag*> [#uses=1]			%va3 = getelementptr [1 x %struct.__va_list_tag], [1 x %struct.__va_list_tag]* %va, i64 0, i64 0 ; <%struct.__va_list_tag*> [#uses=1]
	call void @bar(%struct.__va_list_tag* %va3) nounwind			call void @bar(%struct.__va_list_tag* %va3) nounwind
	call void @llvm.va_end(i8* %va12)			call void @llvm.va_end(i8* %va12)
	ret i32 undef			ret i32 undef
				; CHECK-LABEL: t1:
				; CHECK: ret{{[lq]}}
	}			}

	declare void @llvm.va_start(i8*) nounwind			declare void @llvm.va_start(i8*) nounwind

	declare void @bar(%struct.__va_list_tag*)			declare void @bar(%struct.__va_list_tag*)

	declare void @llvm.va_end(i8*) nounwind			declare void @llvm.va_end(i8*) nounwind

	define float @t2(float %a, float %b) nounwind readnone {			define float @t2(float %a, float %b) nounwind readnone {
	entry:			entry:
	%0 = fadd float %a, %b ; <float> [#uses=1]			%0 = fadd float %a, %b ; <float> [#uses=1]
	ret float %0			ret float %0
				; CHECK-LABEL: t2:
				; SOFT1-NOT: xmm{{[0-9]+}}
				; SOFT2-NOT: xmm{{[0-9]+}}
				; SSE1: xmm{{[0-9]+}}
				; SSE2: xmm{{[0-9]+}}
				; CHECK: ret{{[lq]}}
				}

				; soft-float means no SSE instruction and passing fp128 as pair of i64.
				define fp128 @t3(fp128 %a, fp128 %b) nounwind readnone {
				entry:
				%0 = fadd fp128 %b, %a
				ret fp128 %0
				; CHECK-LABEL: t3:
				; SOFT1-NOT: xmm{{[0-9]+}}
				; SOFT2-NOT: xmm{{[0-9]+}}
				; SSE1: xmm{{[0-9]+}}
				; SSE2: xmm{{[0-9]+}}
				; CHECK: ret{{[lq]}}
	}			}

utils/TableGen/X86RecognizableInstr.cpp

Show First 20 Lines • Show All 945 Lines • ▼ Show 20 Lines	OperandType RecognizableInstr::typeFromString(const std::string &s,
TYPE("u8imm", TYPE_UIMM8)		TYPE("u8imm", TYPE_UIMM8)
TYPE("i32u8imm", TYPE_UIMM8)		TYPE("i32u8imm", TYPE_UIMM8)
TYPE("GR8", TYPE_R8)		TYPE("GR8", TYPE_R8)
TYPE("VR128", TYPE_XMM128)		TYPE("VR128", TYPE_XMM128)
TYPE("VR128X", TYPE_XMM128)		TYPE("VR128X", TYPE_XMM128)
TYPE("f128mem", TYPE_M128)		TYPE("f128mem", TYPE_M128)
TYPE("f256mem", TYPE_M256)		TYPE("f256mem", TYPE_M256)
TYPE("f512mem", TYPE_M512)		TYPE("f512mem", TYPE_M512)
		TYPE("FR128", TYPE_XMM128)
TYPE("FR64", TYPE_XMM64)		TYPE("FR64", TYPE_XMM64)
TYPE("FR64X", TYPE_XMM64)		TYPE("FR64X", TYPE_XMM64)
TYPE("f64mem", TYPE_M64FP)		TYPE("f64mem", TYPE_M64FP)
TYPE("sdmem", TYPE_M64FP)		TYPE("sdmem", TYPE_M64FP)
TYPE("FR32", TYPE_XMM32)		TYPE("FR32", TYPE_XMM32)
TYPE("FR32X", TYPE_XMM32)		TYPE("FR32X", TYPE_XMM32)
TYPE("f32mem", TYPE_M32FP)		TYPE("f32mem", TYPE_M32FP)
TYPE("ssmem", TYPE_M32FP)		TYPE("ssmem", TYPE_M32FP)
▲ Show 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	RecognizableInstr::immediateEncodingFromString(const std::string &s,
ENCODING("i64i8imm", ENCODING_IB)		ENCODING("i64i8imm", ENCODING_IB)
ENCODING("i8imm", ENCODING_IB)		ENCODING("i8imm", ENCODING_IB)
ENCODING("u8imm", ENCODING_IB)		ENCODING("u8imm", ENCODING_IB)
ENCODING("i32u8imm", ENCODING_IB)		ENCODING("i32u8imm", ENCODING_IB)
// This is not a typo. Instructions like BLENDVPD put		// This is not a typo. Instructions like BLENDVPD put
// register IDs in 8-bit immediates nowadays.		// register IDs in 8-bit immediates nowadays.
ENCODING("FR32", ENCODING_IB)		ENCODING("FR32", ENCODING_IB)
ENCODING("FR64", ENCODING_IB)		ENCODING("FR64", ENCODING_IB)
		ENCODING("FR128", ENCODING_IB)
ENCODING("VR128", ENCODING_IB)		ENCODING("VR128", ENCODING_IB)
ENCODING("VR256", ENCODING_IB)		ENCODING("VR256", ENCODING_IB)
ENCODING("FR32X", ENCODING_IB)		ENCODING("FR32X", ENCODING_IB)
ENCODING("FR64X", ENCODING_IB)		ENCODING("FR64X", ENCODING_IB)
ENCODING("VR128X", ENCODING_IB)		ENCODING("VR128X", ENCODING_IB)
ENCODING("VR256X", ENCODING_IB)		ENCODING("VR256X", ENCODING_IB)
ENCODING("VR512", ENCODING_IB)		ENCODING("VR512", ENCODING_IB)
errs() << "Unhandled immediate encoding " << s << "\n";		errs() << "Unhandled immediate encoding " << s << "\n";
llvm_unreachable("Unhandled immediate encoding");		llvm_unreachable("Unhandled immediate encoding");
}		}

OperandEncoding		OperandEncoding
RecognizableInstr::rmRegisterEncodingFromString(const std::string &s,		RecognizableInstr::rmRegisterEncodingFromString(const std::string &s,
uint8_t OpSize) {		uint8_t OpSize) {
ENCODING("RST", ENCODING_FP)		ENCODING("RST", ENCODING_FP)
ENCODING("GR16", ENCODING_RM)		ENCODING("GR16", ENCODING_RM)
ENCODING("GR32", ENCODING_RM)		ENCODING("GR32", ENCODING_RM)
ENCODING("GR32orGR64", ENCODING_RM)		ENCODING("GR32orGR64", ENCODING_RM)
ENCODING("GR64", ENCODING_RM)		ENCODING("GR64", ENCODING_RM)
ENCODING("GR8", ENCODING_RM)		ENCODING("GR8", ENCODING_RM)
ENCODING("VR128", ENCODING_RM)		ENCODING("VR128", ENCODING_RM)
ENCODING("VR128X", ENCODING_RM)		ENCODING("VR128X", ENCODING_RM)
		ENCODING("FR128", ENCODING_RM)
ENCODING("FR64", ENCODING_RM)		ENCODING("FR64", ENCODING_RM)
ENCODING("FR32", ENCODING_RM)		ENCODING("FR32", ENCODING_RM)
ENCODING("FR64X", ENCODING_RM)		ENCODING("FR64X", ENCODING_RM)
ENCODING("FR32X", ENCODING_RM)		ENCODING("FR32X", ENCODING_RM)
ENCODING("VR64", ENCODING_RM)		ENCODING("VR64", ENCODING_RM)
ENCODING("VR256", ENCODING_RM)		ENCODING("VR256", ENCODING_RM)
ENCODING("VR256X", ENCODING_RM)		ENCODING("VR256X", ENCODING_RM)
ENCODING("VR512", ENCODING_RM)		ENCODING("VR512", ENCODING_RM)
Show All 13 Lines
RecognizableInstr::roRegisterEncodingFromString(const std::string &s,		RecognizableInstr::roRegisterEncodingFromString(const std::string &s,
uint8_t OpSize) {		uint8_t OpSize) {
ENCODING("GR16", ENCODING_REG)		ENCODING("GR16", ENCODING_REG)
ENCODING("GR32", ENCODING_REG)		ENCODING("GR32", ENCODING_REG)
ENCODING("GR32orGR64", ENCODING_REG)		ENCODING("GR32orGR64", ENCODING_REG)
ENCODING("GR64", ENCODING_REG)		ENCODING("GR64", ENCODING_REG)
ENCODING("GR8", ENCODING_REG)		ENCODING("GR8", ENCODING_REG)
ENCODING("VR128", ENCODING_REG)		ENCODING("VR128", ENCODING_REG)
		ENCODING("FR128", ENCODING_REG)
ENCODING("FR64", ENCODING_REG)		ENCODING("FR64", ENCODING_REG)
ENCODING("FR32", ENCODING_REG)		ENCODING("FR32", ENCODING_REG)
ENCODING("VR64", ENCODING_REG)		ENCODING("VR64", ENCODING_REG)
ENCODING("SEGMENT_REG", ENCODING_REG)		ENCODING("SEGMENT_REG", ENCODING_REG)
ENCODING("DEBUG_REG", ENCODING_REG)		ENCODING("DEBUG_REG", ENCODING_REG)
ENCODING("CONTROL_REG", ENCODING_REG)		ENCODING("CONTROL_REG", ENCODING_REG)
ENCODING("VR256", ENCODING_REG)		ENCODING("VR256", ENCODING_REG)
ENCODING("VR256X", ENCODING_REG)		ENCODING("VR256X", ENCODING_REG)
Show All 21 Lines
}		}

OperandEncoding		OperandEncoding
RecognizableInstr::vvvvRegisterEncodingFromString(const std::string &s,		RecognizableInstr::vvvvRegisterEncodingFromString(const std::string &s,
uint8_t OpSize) {		uint8_t OpSize) {
ENCODING("GR32", ENCODING_VVVV)		ENCODING("GR32", ENCODING_VVVV)
ENCODING("GR64", ENCODING_VVVV)		ENCODING("GR64", ENCODING_VVVV)
ENCODING("FR32", ENCODING_VVVV)		ENCODING("FR32", ENCODING_VVVV)
		ENCODING("FR128", ENCODING_VVVV)
ENCODING("FR64", ENCODING_VVVV)		ENCODING("FR64", ENCODING_VVVV)
ENCODING("VR128", ENCODING_VVVV)		ENCODING("VR128", ENCODING_VVVV)
ENCODING("VR256", ENCODING_VVVV)		ENCODING("VR256", ENCODING_VVVV)
ENCODING("FR32X", ENCODING_VVVV)		ENCODING("FR32X", ENCODING_VVVV)
ENCODING("FR64X", ENCODING_VVVV)		ENCODING("FR64X", ENCODING_VVVV)
ENCODING("VR128X", ENCODING_VVVV)		ENCODING("VR128X", ENCODING_VVVV)
ENCODING("VR256X", ENCODING_VVVV)		ENCODING("VR256X", ENCODING_VVVV)
ENCODING("VR512", ENCODING_VVVV)		ENCODING("VR512", ENCODING_VVVV)
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Part 2 to fix x86_64 fp128 calling convention. ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 42455

lib/Target/X86/X86CallingConv.td

lib/Target/X86/X86ISelLowering.cpp

lib/Target/X86/X86InstrCompiler.td

lib/Target/X86/X86InstrInfo.td

lib/Target/X86/X86InstrSSE.td

lib/Target/X86/X86RegisterInfo.td

test/CodeGen/X86/fp128-calling-conv.ll

test/CodeGen/X86/fp128-cast.ll

test/CodeGen/X86/fp128-compare.ll

test/CodeGen/X86/fp128-i128.ll

test/CodeGen/X86/fp128-libcalls.ll

test/CodeGen/X86/fp128-load.ll

test/CodeGen/X86/fp128-store.ll

test/CodeGen/X86/soft-fp.ll

utils/TableGen/X86RecognizableInstr.cpp

Part 2 to fix x86_64 fp128 calling convention.
ClosedPublic