This is an archive of the discontinued LLVM Phabricator instance.

Part 2 to fix x86_64 fp128 calling convention.
ClosedPublic

Authored by chh on Jul 22 2015, 4:12 PM.

Details

Summary

Part 1 was submitted in http://reviews.llvm.org/D15134.
Bugs:

Changes:

  • X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
  • X86CallingConv.td: Pass f128 values in XMM registers or on stack.
  • X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128.
  • X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code.
  • Add unit tests for x86_64 fp128 type.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
chh updated this revision to Diff 35928.Sep 28 2015, 5:08 PM

Same changes as the previous diff, but use -U999999.

davidxl added inline comments.Sep 28 2015, 10:13 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8589 ↗(On Diff #35928)

Can you explain this a little more?

8593 ↗(On Diff #35928)

Should it be ... && (N1VT == N1OpVT || N1OpVT != MVT::f128)?

lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗(On Diff #35928)

Why hasType(VT) returns false for FR128? Is it expected?

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3535 ↗(On Diff #35928)

unconditionally expand? Do you need to add some assertion on the type expected?

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
52 ↗(On Diff #35928)

why this change?

68 ↗(On Diff #35928)

Why returning false for some cases and true for others?

78 ↗(On Diff #35928)

Explain this change?

145 ↗(On Diff #35928)

Should the problem be fixed in SoftenFloatOperand method?

chh updated this revision to Diff 36313.Oct 1 2015, 3:29 PM
chh marked an inline comment as done.

Add more comments and changes per davlidxl's suggestions.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8589 ↗(On Diff #35928)

Done. Please see new comment in code.

8593 ↗(On Diff #35928)

Good suggestion. I might be too conservative here.
Now I changed the condition to only N1Op0VT != MVT::f128,
and all my tests are still working.

lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗(On Diff #35928)

See also my previous answers and comments at line 143.
The question is whether ComRC returned by TRI->getCommonSubClass should guarantee ComRC->hasType(Node->getSimpleValueType(ResNo), or FR128 should be defined to work with existing getCommonSubClass. I couldn't find a clean way to change either FR128 or getCommonSubClass, without breaking many other things. So, here it seems to me the smallest fix on the user of ComRC (UseRC).

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3535 ↗(On Diff #35928)

Is there some invalid constant type?
I think ExpandConstant(CP) should already check necessary node types.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
52 ↗(On Diff #35928)

The new default value R will make other code simpler like SetSoftenedFloat(R, R),
which just wants to mark this node as softened without changing it.

68 ↗(On Diff #35928)

See new inline comment at line 62.

78 ↗(On Diff #35928)

Added more comments in code, line 62.
Hope that will make more sense to the comments about FABS, FCOPYSIGN, FNEG, ConstantFP.

145 ↗(On Diff #35928)

SoftenFloatOperand works with ScanOperands if !KeepFloat.
So, the new condition here is when KeepFloat, we should call ReplaceValueWith.
It might be possible to fix every SoftenFloatOperand method and also simplify ScanOperands, but that looks like a more complex and risky move.

davidxl added inline comments.Oct 5 2015, 11:22 AM
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗(On Diff #36313)

What is VT in this case? What reg class is ComRC here? How about the DstRC below TLI->getRegClassFor(VT)?

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
291 ↗(On Diff #36313)

Change the name to ExpandConstantFP128?

3535 ↗(On Diff #36313)

I don't see what expected EVTs are checked in this method?

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
64 ↗(On Diff #36313)

Does the following change affects other types other than f128?

Adding Owen and Quentin to this review. I'd feel more comfortable if they were to comment on the long term maintainability of this.

Thanks!

-eric

chh added inline comments.Oct 7 2015, 3:03 PM
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
169 ↗(On Diff #36313)

One of the test cases showing this problem is
test/CodeGen/X86/avx512-calling-conv.ll,
compiled with -mtriple=x86_64-apple-darwin -mcpu=knl

VT is v16i8 (23)
ComRC is the new FR128RegClassID (74)
DstRC is not set yet, but the correct
TLI->getRegClassFor(VT) is VR128RegClassID (75)

It means that my current definition of FR128 has registers in VR128,
and ComRC could be FR128 where VR128 was the only candidate before.
But of course FR128 with my current instruction selection patterns won't
work for v16i8, and it should not.

  • I tried in my early development of this patch to store f128 value in existing register classes, without a new FR128. But that approach has even more troubles.
  • I also tried to copy some VR128 instruction patterns to FR128 to make this work without this patch in EmitCopyFromReg. But that requires a lot of duplication in instruction selection pattern, and I had not even resolved all the problems.
  • One thing I think might be possible, as my original TODO item, is to change the semantic of getCommonSubClass to guarantee ComRC->hasType(VT), or an extra flag for getCommonSubclass to guarantee ComRC->hasType(VT). I was hoping to break this patch into smaller pieces, and those refactoring tasks seem to be good candidates for follow up patches.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
291 ↗(On Diff #36313)

Although it is triggered now only for f128 type, but it should work for other types say f80, f96, etc. Actually, the constant value is not floating type, but an integer (binary bits) of the floating point value from CP->getConstantIntValue.

3535 ↗(On Diff #36313)

I think there is no target-independent type restriction here.
I see TLI.isFPImmLegal check in ISD::ConstantFP, but don't see similar limitation for non-FP types.
ExpandConstant could have more value range limit if necessary.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
64 ↗(On Diff #36313)

Not now. Only x86_64 f128 is configured this way. But in the future new configurations can use this logic to keep value legal (in register) but with soften operators.

Quentin, can you help do a quick review of the patch and give
suggestions on high level direction of the patch?

thanks,

David

qcolombet edited edge metadata.Oct 12 2015, 1:08 PM

Hi Chih-Hung,

I believe the current patch can be split in several smaller patches. For instance, improving the AsmPrinter and changing the ABI seem pretty orthogonal to not have to come in one patch.

Now, as a high level comment, this patch seems quite involved for something that sounds fairly common from the legalizer point of view. I.e., we have a legal type, fp128, with illegal instructions (soften float). Have you checked how we deal with those cases usually? If so, could you explain what does and does not work?
I have not spent a lot of time on thinking or reviewing the patch, so I admit I can tell nonsense, but the fell I have on the patch is that we are doing it wrong.

Regarding the ABI changes, the patch obviously breaks the existing code that have been generated by clang for those types. I would like to have a sense of how risky are that changes. Could you list the targets (OS) that use the variants you are changing?

Thanks,
-Quentin

include/llvm/Target/TargetLowering.h
1907 ↗(On Diff #36313)

That change makes me shiver.
We probably do something against the design if we need to do that.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8592 ↗(On Diff #36313)

Based on the comment, this seems like a hack that impact all targets to fix a x86_64 backend limitation.
The bottom line is that I am guessing we should push that change.

lib/CodeGen/SelectionDAG/InstrEmitter.cpp
147 ↗(On Diff #36313)

By definition the common sub class of A and B must be able to hold both the types of A and B, since the related registers are both in A and in B (i.e., could be access by A or B register names).
The bottom line is that comment must be always wrong and the target must do the right thing to ensure this is true, IMO.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1205 ↗(On Diff #36313)

How could that be different?!
That looks wrong to have to check for both conditions.

chh added a comment.Oct 12 2015, 5:27 PM

This patch should change output of fp128 for Android x86_64 only,
to fix mainly the problem in https://llvm.org/bugs/show_bug.cgi?id=23897.

No regression test failure was found for other targets, but since I am not sure if all other targets have enough regression tests on fp128, I tried to make changes only depend on the new configuration or work for all configurations.

It's possible to break this patch into several ones, but then only the last one should turn on the configuration change for Android x86_64. Then, all patches before the last one will be incomplete and not really tested. It's not right to change the Android x86_64 configuration before the last patch, because right now, fp128 for Android x86_64 works by itself, just not compatible with gcc. A partial change of the fp128 code will fail to compile libm. For example, the AsmPrinter change was actually discovered when libm was compiled with -g. The current code cannot handle new fp128 constant value.

The change was larger than typical bug patch because I think this is the first configuration to keep fp128 value in SSE registers. Before this, most x86 configurations used fp80 on floating point stack, and fp128 on x86 was split into two 64-bit registers. Hence, fp128 was passed in two 64-bit registers or on stack, but we need to pass them in SSE to be compatible with gcc and the ABI. Current nice legalization abstraction for floating point values worked mostly for most target, except a few cases that we have seen hacked for ppc. For fp128 in SSE registers, we need more enhancements to this legalization abstraction layer, or I would have to write up another new kind of action maybe 90% similar to soften floating point, but do something differently.

Some changes to ad-hoc optimizations related to fp128 were made only now because the existing code have not handled fp128 before. The fp128 values were split into two registers and/or converted to library function calls and do not hit the optimization code. Even the ppc f128 type is in two f64 registers. These were mostly found in the compilation of libm, which heavily cast fp128 to/from i128 to use faster bit-wise operations. I have added test cases for those problems I found. So hopefully if any of such changes were reverted, some regression test will fail.

include/llvm/Target/TargetLowering.h
1907 ↗(On Diff #36313)

I think an alternative is to keep ValueTypeActions private here, but to hack into computeRegisterProperties of lib/CodeGen/TargetLoweringBase.cpp to define different actions for Android x86_64 fp128. That function now has one hack for ppcf128, but that is depending on a special MVT::ppcf128. If we add target dependent code into computeRegisterProperties, for only Android x86_64, not all f128, then that might look more like a hack than making ValueTypeActions accessible to child classes.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
8592 ↗(On Diff #36313)

Yes, i think this impact all x86_64 f128 values in one register.
We didn't hit this problem because no target keeps f128 in one register.
What is the alternative change?

lib/CodeGen/SelectionDAG/InstrEmitter.cpp
147 ↗(On Diff #36313)

I can think about this more. I was putting off that kind of change to getCommonSubClass as a TODO in the future, but that was more than a month ago.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1205 ↗(On Diff #36313)

This is needed only for the new special configuration of Android x86_64 fp128, which is TypeLegal to be in a register, but with a type action to soften the operations to library calls. It's TypeAction is not TypeLegal.

chh added a comment.Oct 16 2015, 4:57 PM

A few email discussions about this patch was sent to reviews+D11438+public+6e2ebe02f28cebc7@reviews.llvm.org and llvm-commits@lists.llvm.org, but they were not shown here. I don't know the cause, but please check llvm-commits mailing list about this topic. Thanks.

chh updated this revision to Diff 37952.Oct 20 2015, 5:23 PM
chh edited edge metadata.

Sync with latest LLVM source and
add -attr=+mmx to test cases since new default mmx is off.
Add one more parameter to getCommonSubClass and firstCommonClass,
to guarantee that returned common sub class will contain the specified simple value type.
This extra parameter is used by EmitCopyFromReg in InstrEmitter.cpp.

chh updated this object.Oct 20 2015, 5:24 PM
In D11438#271749, @chh wrote:

I usually found missed handling of f128 type from testing libm long double on Android target, and those cases appeared in many different places.

Hi @chh,
I'm currently looking into a similar calling convention bug. I just came across this patch as you updated it today
I'll start testing your patch soon but I feel you would have some insight into this bug
https://llvm.org/bugs/show_bug.cgi?id=24398
based on the content of this patch ?

chh added a comment.Oct 21 2015, 10:41 AM

Martell, I don't have Windows or mingw-w64 system to test now, specifically I don't know what gcc generates for that system. If you were compiling test code with gcc and clang, and link with libraries compiled by gcc, then you could hit calling convention problem. You could compare first assembly code output from both compilers.
Please take a look and try the test cases in
https://llvm.org/bugs/show_bug.cgi?id=23897
https://llvm.org/bugs/show_bug.cgi?id=24111

http://reviews.llvm.org/D11437 has the clang changes to fix the calling convention for f128, and this patch is for the llvm part. If mingw-w64 does not use 128-bit floating point, then my patches shouldn't affect your problem, but you can find the source locations to fix mingw-w64 issues.

I've posted my reply to the llvm-developers list to not clog your review
with noise :)

chh added a comment.Oct 23 2015, 3:18 PM

All reviewers, please take a look of the latest diff.
Please add comment for anything you think must be changed before submit.
I test it with Android and on Linux the LLVM regression tests.
After this long review period, I think a field test would be more effective to find any error.
Thanks.

It seems to me the patch does not use the right approach in SoftenFloatResult and SoftenFloatOperands. The patch tries to workaround the the problem that the softened result does not have the right type (same size integer instead of float type). To workaround the problem, the patch skips 'softening' completely for some opcodes such as BITCAST, SELECT, and as a result, has to change the prototype of SoftenFloatResult and allows operands scanning which should not be necessary if done properly.

It seems to be the right approach is to make sure those variants of SoftenFloatRes_<OPCODE> to return and set the right 'softened' float type for f128. There are two main places that need to be changed: TargetLowering::makeLibCall and BitConvertToInteger by making them f128 aware. Does it make sense?

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
726 ↗(On Diff #37952)

Why is keepFloat computed differently from SoftenFloatResult?

lib/CodeGen/SelectionDAG/LegalizeTypes.h
381 ↗(On Diff #37952)

If SetSoftenedFloat was called with the right value, then there is no need to do special handling here. This is the reason why I think SoftenFloatResult should not special handling those op code.

chh added a comment.Oct 28 2015, 10:03 PM

David, I have reconsidered a few alternatives and checked again current llvm architecture.

First a few observations for other people not following all the issues before:

The core problem is that llvm classifies types into
legally-in-register-with-HW-instruction or
illegal-and-should-be-expanded-promoted-or-soften.
f128 is classified as illegal for most targets and soften to i128,
which is passed/returned in two 64-bit registers.
The SoftenFloat* functions not only convert operations to library function calls,
but also the result type to i128. 
But on x86_64, calling convention for i128 is different from f128.
GCC and llvm are correct for i128, but llvm incorrectly passed f128 the same as i128.

To give f128 its own calling convention, I still don't have other solution but
to create its own register class. That makes it a new kind of type not well fit
into current scheme.

It does not seem feasible to me to keep using SoftenFloat* functions and
convert f128 to i128, and then add other hacks to use different calling convention
for real i128 and f128-converted-i128. If it is possible, please let me know
how to do it.

If I understand correctly, your suggestion is to fix some SoftenFloatRes_* functions
to return f128 for f128 in this case instead of skipping them.
If we want to fix them and avoid the changes to LegalizeTypes.cpp, they will
also need to scan-and-convert operands, because current LegalizeTypes.cpp simply
"goto NodeDone" after calling SoftenFloatResult.
Wouldn't that change duplicate the ScanOperands actions from LegalizeTypes.cpp to
LegalizeFloatTypes.cpp?

I did not like the duplication of ScanOperands, so I added new return value to
SoftenFloatResult to indicate if ScanOperands is necessary. Then, some
node types no longer need any change inside SoftenFloatResult.

Maybe what I should do is adding more comments to SoftenFloat* functions to
explain that they now do not always convert floating point types to integer types.
In fact, they convert nodes to the TransformToType[VT], which could be f128 or i128.
Current comment of getTypeToTransformTo says:

/// For types supported by the target, this is an identity function....

That can be extended to as "supported by the target or passed/returned in registers".

Another alternative is to keep the exact meaning of LegalizeTypeAction::TypeSoftenFloat
and add a new action like TypeSoftenOnlyFloatOps. We still need to change comments of
several functions to deal with the new legally in register but no HW support f128 type.
For other targets, with the same calling convention for f128 and i128,
they can keep on using TypeSoftenFloat, which still converts nodes to the
TransformToType[VT] but that is assumed to be an integer type.
For x86_64, the new TypeSoftenOnlyFloatOps action will assume that TransformToType[VT]
could be the same as VT and VT could be legally in register.

At the end, SoftenFloat* and SoftenOnlyFloatOps* functions could share some code
to minimize duplication. The shared code would have some flag to serve similar purpose
as the KeepFloat variable in my patch.
Many places that handle LegalizeTypeAction::TypeSoftenFloat will need to be changed
to also handle LegalizeTypeAction::TypeSoftenOnlyFloatOps.
That doesn't seem to be a simpler patch either.

BTW, I am going to LLVM dev meeting too.
So if I don't have time left this week, I will continue next week.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
726 ↗(On Diff #37952)

SoftenFloatResult converts its given SDNode N,
so its KeepFloat is computed from N's value type:

EVT VT = N->getValueType(ResNo);
EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
bool KeepFloat = VT.isSimple() && VT == NVT && TLI.isTypeLegal(VT);

SoftenFloatOperand converts N's operand,
so its KeepFloat is computed from the OpNo operand:

EVT OpVT = N->getOperand(OpNo).getValueType();
bool KeepFloat = OpVT.isSimple() && TLI.isTypeLegal(OpVT);

The extra check of VT == TLI.getTypeToTransformTo(....)
is not really necessary when we have only x86_64 f128 that
can be legal and transformed to itself at the same time.
I can remove it, or keep it as an redundant check in both places
to filter out any other new unexpected configuration.

Is this the difference you noticed?
Would you rather keep or remove (VT == NVT ) at both places?

lib/CodeGen/SelectionDAG/LegalizeTypes.h
381 ↗(On Diff #37952)

In LegalizeTypesGeneric.cpp, you can see that a caller of GetSoftenedFloat
assumes that a softened float should be split into two integers.
We don't want that. So either we modify the meaning of TargetLowering::TypeSoftenFloat
and GetSoftenedFloat or we need to use a new action type like TypeSoftenOnlyFloatOps.

chh updated this revision to Diff 39010.Nov 2 2015, 5:18 PM

Sync with latest source and update comments about TypeSoftenFloat, SoftenFloat*, KeepFloat, and GetSoftenedFloat.
New comments should clarify that only non-HW-supported operations need to be converted and use integer type.
We should think about non-HW-supported operations instead of "illegal" types.
SoftenFloat is really about converting float operations, not necessarily converting to integers.

I am still not convinced that you need to special handling certain opcodes in SoftenFloatResult and returns a bool to indicate if operands need to be scanned.

The difference I can see is that without this patch, CopyToReg and CopyFromReg opcodes are expected to be already type legalized. With your patch, can you add these functions:

SoftenFloatRes_CopyToReg and SoftenFloatRes_CopyFromReg -- it does not need to do anything but it makes sure that the 'softened' result is stored. Currently you simply short-circuit it with

default:
if (keepFloat)

return false;
chh updated this revision to Diff 39757.Nov 9 2015, 2:57 PM

Use davidxl's suggestion and add case ISD::Register, CopyFromReg, CopyToReg into SoftenFloatResult. They are now only used when KeepFloat, to keep the node unchanged. All other not handled node opcode will still by default abort the compiler.

davidxl added inline comments.Nov 10 2015, 9:29 AM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
60 ↗(On Diff #39757)

KeepFloat does not convey the real meaning. Is is better to call it LegalInHWReg?

68 ↗(On Diff #39757)

With the handling of CopyFromReg CopyToReg, there does not seem to be a need to special case SELECT, SELECT_CC by briefly looking at the code.

In fact, I think we should push the logic all down to the leaf and get rid of return true/false change. It is the change of 'flow' and special handling of operand scanning for f128 that makes the current patch look intrusive.

145 ↗(On Diff #39757)

Which sub-methods need special handling here?

159 ↗(On Diff #39757)

Just return SDValue(N, ResNo)?

chh updated this revision to Diff 39872.Nov 10 2015, 5:13 PM
  • Rename KeepFloat to LegalInHWReg.
  • Clarify some comments.
  • Embed into original switch cases the handle of LegalInHWReg for the soften of opcode Register, CopyFromReg, CopyToReg, ConstantFP, FABS, FCOPYSIGN, and FNEG.
chh marked an inline comment as done.Nov 10 2015, 5:17 PM
chh added inline comments.
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
68 ↗(On Diff #39872)

David,

Before my patch, DAGTypeLegalizer::run's loop over results always
(1) goto NodeDone with a changed (new) node, or
(2) goto ScanOperands without changed node,

for the TargetLowering::TypeLegal case.

All SoftenFloatRes_* functions made new nodes of integer types.

For x86_64 fp128, we really don't want to change the f128 type so that instruction selection can be correct or more efficient based on f128 type instead of i128.
For BITCAST, SELECT, and SELECT_CC opcode, we need to do down and check their operands as if f128 is TypeLegal. So there is no simpler way to treat it the same as other SoftFloatRes_* functions.
For Register, CopyFromReg, CopyToReg, ConstantFP, FABS, FCOPYSIGN, and FNEG they are simple enough to skip ScanOperands, so we can let them return the same SDNode and pretend that the result node was changed and goto NodeDone. So I simplified all of them into the old switch cases without new extra functions.

154 ↗(On Diff #39872)

See updated comment. Examples are those calling TLI.makeLibCall.

168 ↗(On Diff #39872)

Okay, changed to an assert, suppress compiler warning, and simple return.

davidxl added inline comments.Nov 11 2015, 12:25 AM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
70 ↗(On Diff #39872)

I got curious why these opcodes need to be skipped and picked ISD::SELECT to look at.

Commented line 70 out does trigger error -- but further debugging indicate the bug can be in the original code. In particular, PromoteIntRes_SETCC should call ReplaceValueWith, but does not.

Fixing SETCC handling makes the test case TestMax pass.

Can you verify this is a bug triggered by f128 change?

158 ↗(On Diff #39872)

Where is the short-cut? Is it the new default handling in SoftenFloatOperand? Making short cut there and have weak handshaking seems fragile.

752 ↗(On Diff #39872)

What are the new opcodes that reach here? Please make such opcode explict and add assertions for unexpected ones.

791 ↗(On Diff #39872)

This seems redundant -- see above early return.

chh marked 2 inline comments as done.Nov 12 2015, 11:06 AM

Will upload diff 11 soon with suggested changes.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
70 ↗(On Diff #39872)

Yes, I can reproduce that error too. I believe that it happens only with
my new x86_64 f128 configuration, but I have no way to verify for all targets
and illegal types. It's arguably a "bug" as it worked with existing ScanOperands.

However, the reason to skip some Opcode is not to workaround the problem of
missed calls to ReplaceValueWith. We want to keep some Opcode with f128 type
for instruction selection to correct calling convention or more efficient code.
When we do not change SDNode for f128 type, we need to tell LegalizeTypes to
treat it as a "legal" type and go to ScanOperands, so all other existing cases
will work with this unchanged SDNode.

158 ↗(On Diff #39872)

Yes, in SoftenFloatOperand, I do not try to repeatedly replace a SDNode
when its operand type can be kept as float and not changed.
Current LegalizeTypes and LegalizeFloatTypes depend on ScanOperands to
replace SDNode where DAGTypeLegalizer::*Result functions "missed" calls
to ReplaceValueWith. That is kind of strange and somewhat inefficient.

However, changing that architecture is too big a risk and work for me to
fix x86_64 f128 calling convention and make sure the change work for
all targets now and in the future. I don't think there are enough regression
tests for all target and types that depend on LegalizeTypes. I found all broken
cases when trying to compile libm on Android.

Hence, I use this safer and "smaller" patch to match existing architecture
and limit the impact to only SoftenFloat* and LegalInHWReg cases.
It could be a separate patch to clean up the complicated loops inside LegalizeTypes,
or LegalizeTypes could be rewritten as we know that there are non-HW-supported operations
on some types but not really "illegal types".

752 ↗(On Diff #39872)

Unspecified cases will dump and abort as before.

791 ↗(On Diff #39872)

Fixed.

chh updated this revision to Diff 40071.Nov 12 2015, 11:07 AM
chh marked 2 inline comments as done.

Update some comments, remove redundant code, and explicitly list legal
operand types in SoftenFloatOperand.

davidxl added inline comments.Nov 21 2015, 11:54 AM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
158 ↗(On Diff #40071)

I don't quite like the short cut here either. The original architected flow is reasonable to me: basically

  1. After softenfloat result, the old SDValue is mapped to the new result
  2. the worklist based loop follow DU and use instrctions of the old result get processed -- operands are scanned.
  3. during scanning, the use of the old value will be replaced with the new value.
  4. if scan operands result in a new node, replaceValueWith then needs to be eagerly called here on demand.

I prefer you do not change the flow in SoftenFloatOperands and make a patch here to workaround it. The resulting change may look bigger (as more opcodes need to be handled), but it is what it is.

I also suggest you extract the following patches out seperately

  1. softenFloatResult and softenFloatOperands
  2. f128 constantFP
  3. register class small refactoring
  4. the rest (may also be breakable )
chh added a comment.Nov 24 2015, 5:22 PM

David,

I will upload a new patch with some of your older suggested changes first.
Please see details in the new code diff and upload comment.

I met many difficulties implementing your latest inline comments
in LegalizeFloatTypes.cpp, so there is no change with that idea yet.
Please see my reply to the inline comment.

For separating this large patch, I have not tried yet,
but it might be okay to split at least into two parts:
(2) The new unit test files *.ll and X86CallingConv.td, X86ISelLowering.cpp, X86InstrInfo.td, X86InstrSSE.td, X86RegisterInfo.td.
(1) All the other files.

We should be able to check in (1) and expect no change to all targets.
Patch (2) will fix Android x86_64 problem, but still no change to other targets.
I am assuming that we want some time to test (1) before submitting (2).
Further split of (1) might be possible, but that would take more review
and test cycles if there is little regression. When there is any regression,
it should be easy to revert all changes in (1) and then debug-and-retry, right?
I don't see a way to split (2), as we don't want to have the Android target
in a new incomplete state different from the current one.

lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
158 ↗(On Diff #40071)

The difficulties I found in the current architecture are
(1) Avoid duplicating a lot of code from SoftenFloatRes_* to SoftenFloatOp_*.
(2) Make sure the changes in multiple SoftenFloatRes_* and SoftenFloatOp_* will be fixed correctly at the same time in the future.

In SoftenFloatOperand, we can see that it calls ReplaceValueWith
at the end for all kinds of opcode. The number of calls to ReplaceValueWith
is not as many as we expected. Hence, it seems to me the norm is to share
code at higher level than pushing down into SoftenFloatOp_* and SoftenFloatRes_*.
It seems reasonable to call ReplaceValueWith at the end of
SoftenFloatOperand and SoftenFloatResult.

I tried to remove this part, but the result was either
(a) calling ReplaceValuewith inside SoftenFloatRes_*, or
(b) copying many SoftenFloatRes_* code into SoftenFloatOp_* and then call ReplaceValueWith in SoftenFloatOp_*. The copied code is not all simple and small. For example, SoftenFloatRes_FCOPYSIGN is complicated, and I don't see how simpler it is to duplicate it into SoftenFloatOp_FCOPYSIGN, or abstract it out into a shared function.

What you mentioned in another email earlier might be better to
move some SoftenFloatOp_* responsibility into SoftenFloatRes_*.
Then, we have less code duplication, and we need to make sure
that ReplaceValueWith is called whenever something is changed
by SoftenFloatRes_*. I could try that when I have more time.

My concern (2) is more scaring. During my test of those code movement,
I made a few mistakes and the compiler could run and generate inefficient
code because some ReplaceValueWith call was missed or some fp128 was
not kept in register due to a short-cut was missed. So I would prefer
a way easier and smaller change to verify no no-impact to existing targets,
by conditioned on isLegalInHWReg, even if some can be applied to other targets.

chh updated this revision to Diff 41100.Nov 24 2015, 5:23 PM

Push short cuts inside SoftenFloatResult down to SoftenFloatResultOp_* functions,
which now take extra argument ResNo. SoftenFloatResult is cleaner and faster,
SoftenFloatResultOp_* functions have some repeated patterns.

Abstract out the function isLegalInHWReg used in multiple places.
Spell out more short-cut opcode inside SoftenFloatOperand when isLegalInHWReg.
Add assert of isLegalInHWReg after call of SoftenFloatResult when result is not changed.

chh updated this revision to Diff 41673.Dec 2 2015, 1:53 PM

Sync with latest llvm source and new changes suggested in http://reviews.llvm.org/D15134.

chh updated this revision to Diff 41694.Dec 2 2015, 4:41 PM
chh updated this revision to Diff 41778.Dec 3 2015, 10:47 AM
chh updated this revision to Diff 41962.Dec 4 2015, 4:55 PM
chh retitled this revision from Fix x86_64 fp128 calling convention to Part 2 to fix x86_64 fp128 calling convention. .
chh updated this object.
chh added a reviewer: davidxl.
chh removed a subscriber: davidxl.

Now the diff does not contain changes in submitted part1.

davidxl added inline comments.Dec 5 2015, 6:46 PM
lib/Target/X86/X86ISelLowering.cpp
14628

This needs some explanation. Why can the Op1's value type be i128?

27775

Why TODO here? 'x' constraint should work.

27888

Explain TODO here.

lib/Target/X86/X86InstrInfo.td
957–962

Unrelated format change here.

davidxl added inline comments.Dec 5 2015, 9:32 PM
lib/Target/X86/X86InstrSSE.td
8862

Move the comment above the pattern def.

  1. movaps is shorter, not 'should be'
  2. regarding 'faster' part -- put a reference there. In fact, f128 operations should be considered in integer domain so movdqa should be used to avoid domain bypass penalty.
8868

pand is for SIMD integer. andps is shorter though.

chh updated this revision to Diff 42090.Dec 7 2015, 12:13 PM
chh added inline comments.
lib/Target/X86/X86ISelLowering.cpp
14628

Removed it. It was an condition only triggered by my older hacks.
Not it should not happen.

27775

f128 and i128 were not in SSE_REG before.
So I think this is a new feature that can be added later.

27888

Same as above. f128 and i128 not in SSE_REG before this change.

lib/Target/X86/X86InstrInfo.td
957–962

They are changed to align up with new line 962.

lib/Target/X86/X86InstrSSE.td
8862

I updated comment to keep only the shorter and SSE reasons.
I was not sure about 'faster with movaps',
which seems to be used more in clang than gcc.
My main reasons are shorter and available in SSE for Android's applications.

8868

I also choose andps over pand for shorter and availability in SSE,
not sure about performance difference when combined with other instructions.
I think by not splitting f128 into two registers, we already saved more code and execution time.

davidxl added inline comments.Dec 8 2015, 5:39 PM
lib/Target/X86/X86InstrInfo.td
957–963

It seems you are adding extra space before :

lib/Target/X86/X86InstrSSE.td
8871

Is there any coding style guidelines for table gen code? There are long lines that are wrapped ..

test/CodeGen/X86/fp128-calling-conv.ll
16

Is this a relevant test? non-f128 related tests can be submitted in a different patch.

test/CodeGen/X86/fp128-cast.ll
13

There are also lots of irrelevant tests added in this file .

chh added inline comments.Dec 9 2015, 4:37 PM
lib/Target/X86/X86InstrInfo.td
957–963

Yes, it seems to be the style at line 964-969 too.

lib/Target/X86/X86InstrSSE.td
8871

Note sure if there is special rule for table gen code.
http://llvm.org/docs/CodingStandards.html says 80 columns.
There are few exceptions in this file,
but now I wrapped all my new lines to less than 80 characters.

test/CodeGen/X86/fp128-calling-conv.ll
16

Okay, i will put non-fp128 type tests to another patch.

test/CodeGen/X86/fp128-cast.ll
13

Okay, i will put non-fp128 type tests to another patch.

chh updated this revision to Diff 42354.Dec 9 2015, 4:38 PM
davidxl added inline comments.Dec 10 2015, 11:32 AM
test/CodeGen/X86/fp128-cast.ll
155

Please move this test case to a different patch.

test/CodeGen/X86/fp128-compare.ll
5

Move this test case out of the patch -- -there are more than 20 or so test cases below that need to be separated out.

chh added a comment.Dec 10 2015, 11:39 AM

Will reduce fp128-compare.ll in the next diff.

test/CodeGen/X86/fp128-cast.ll
155

This is about f128 calling convention to return f128 in SSE.
Are you sure about removing this one?

davidxl added inline comments.Dec 10 2015, 11:42 AM
test/CodeGen/X86/fp128-cast.ll
155

you are right -- this one is relevant.

Please check the rest.

chh updated this revision to Diff 42455.Dec 10 2015, 12:38 PM

Removed test cases without f128 type.

chh marked an inline comment as done.Dec 10 2015, 12:41 PM
davidxl edited edge metadata.Dec 10 2015, 12:54 PM

great. I will look at the tests in more details.

David

davidxl added inline comments.Dec 10 2015, 2:56 PM
test/CodeGen/X86/fp128-calling-conv.ll
5

Is this variable used?

8

Is it used?

11

Are the two parts swapped? GCC seems to generates: 3FFF0000000000000000000000000000

test/CodeGen/X86/fp128-cast.ll
12

TestFPExtF32_F128

25

TestFPExt ..

38

Missing a test for conversion to unsigned I32

126

might be better to relax the check a little : testq %rax, %rax should be fine too.

135

Can you simplify the variable names ?

166

There is no guarantee adcq will be after movq ...

test/CodeGen/X86/fp128-compare.ll
23

missing check of 'set<cc>'

33

is this correct?

45

Missing check

test/CodeGen/X86/fp128-i128.ll
4

Better comment?

92

var names can be cleaned up to be shorter.

125

seems irrelevant.

137

Seems irrelevant.

205

Can this test case be simplified more?

chh marked 7 inline comments as done.Dec 10 2015, 7:42 PM
chh added inline comments.
test/CodeGen/X86/fp128-calling-conv.ll
5

No, Removed now.

8

No. Removed now.

11

That looked strange but correct. I copied that from clang's dump, and the llvm output assembly code is the same as gcc's. http://llvm.org/docs/LangRef.html says big-endian is used for hexadecimal floating point constants.

test/CodeGen/X86/fp128-cast.ll
38

Added conversion to uint32 and uint64.

135

These were generated by clang for my simplified C code from libm.
They are useful to show the clang transformations.
I will add the C code example as comments.

166

Okay, relaxed the check patterns.

test/CodeGen/X86/fp128-compare.ll
33

Yes. It's a strange optimization, which returns 1 if %cmp is negative as __lttf2 will return when %d1 < %d2.

test/CodeGen/X86/fp128-i128.ll
92

These IL were copied from libm compiled code. Clang has its way to convert C union structure references. I will add original C code as comment.

125

Okay, removed the i64 test.

137

We need to test i128 too, since this patch put also i128 into the FR128 register class. The i128 instruction was generated from f128 C code.

205

This was from libm C code, which triggered one error related to f128 without a complete patch. So I added it. I tried to reduce the original C code but then that won't trigger the problem.

chh updated this revision to Diff 42494.Dec 10 2015, 7:42 PM
chh edited edge metadata.
chh marked an inline comment as done.
davidxl added inline comments.Dec 11 2015, 10:59 AM
test/CodeGen/X86/fp128-cast.ll
137

Why do we care what transformations have been done to get the IR? The IR code should by itself readable -- so while the C example is useful, I still prefer the naming in IR simplified.

test/CodeGen/X86/fp128-compare.ll
35

ok. But it is possible with test + sets, right? may be adding a comment so that people know how to fix the test if it breaks in the future?

test/CodeGen/X86/fp128-i128.ll
8

__float128

24

long double --> __float128?

45

The pattern checked is pretty long -- I worry it may break in the future. Is it possible to relax it some how?

58

long double --> __float128

82

long double --> __float128

107

long double --> __float128

chh added inline comments.Dec 11 2015, 3:38 PM
test/CodeGen/X86/fp128-cast.ll
137

The comment now seems to be showing at wrong place in this code diff.
The biggest confusing name is u.sroa.0.4.extract.shift in TestBits128.
So I will shorten long names in this function for now.

The C code is important for anyone in the future to test, if not having time to rebuild and test the whole AOSP with libm.
Any simplification of the IR might work, but clang could generate different bit operators for those long double union types and trigger problems with any ad-hoc f128 optimization.

test/CodeGen/X86/fp128-compare.ll
35

Sure, added comment.
If it is broken in the future, maybe it would be easier to continue using this trick. :-)

test/CodeGen/X86/fp128-i128.ll
8

Unfortunately, clang does not accept __float128 keyword, although it can emit f128 for llvm.

24

no __float128 in clang.

45

I tried to reduce the C code, but any reduction won't trigger the complicated IL that reached a point that my partial fix core dumped. Maybe we can take out a few CHECK-NEXT requirements.

On the other hard, I was terrified by so many ad-hoc optimizations of floating points for the usage patterns in libm.
I guess llvm tried to match or do better then gcc and libm tried to use every possible bit operations.
So maybe it is better for Android or anyone depends on f128 type to have more check rules here.
Whoever changes float optimization in the future has better fully understand and update these tests.

chh updated this revision to Diff 42595.Dec 11 2015, 3:39 PM
davidxl added inline comments.Dec 11 2015, 3:49 PM
test/CodeGen/X86/fp128-i128.ll
9

This should be fixed in clang FE. By default, long double is extended FP, not quadFP --- so do fix the comment to avoid confusion.

chh updated this revision to Diff 42634.Dec 11 2015, 10:57 PM
chh marked an inline comment as done.
chh added inline comments.
test/CodeGen/X86/fp128-i128.ll
9

Used __float128 here and added more comments at the top of file.

davidxl accepted this revision.Dec 12 2015, 7:04 PM
davidxl edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Dec 12 2015, 7:04 PM
chh updated this revision to Diff 42766.Dec 14 2015, 1:51 PM
chh edited edge metadata.

Fix infinite loop in SelectionDAGBuilder.cpp, caught by new regression tests in this patch.
It happens only with new f128 type, whose VT == TLI.getTypeToTransformTo(Ctx, VT).

chh added a comment.Dec 14 2015, 1:54 PM

David, thanks for the review suggestions and approval.
There is one recent regression caught by two of my unit tests,
so I need to fix it too in the updated diff.

This revision was automatically updated to reflect the committed changes.