Page MenuHomePhabricator

[X86] Move x86_64 fp128 conversion to libcalls from type legalization to DAG legalization
ClosedPublic

Authored by craig.topper on Tue, Sep 3, 2:53 PM.

Details

Summary

fp128 is considered a legal type for a register, but has almost no legal operations so everything needs to be converted to a libcall. Previously this was implemented by tricking type legalization into softening the operations with various checks for "is legal in hardware register" to change the behavior to still use f128 as the resulting type instead of converting to i128.

This patch abandons this approach and instead moves the libcall conversions to LegalizeDAG. This is the approach taken by AArch64 where they also have a legal fp128 type, but no legal operations. I think this is more in spirit with how SelectionDAG's phases are supposed to work.

I had to make some hacks for STRICT_FP_ROUND because some of the strict FP handling checks if ISD::FP_ROUND is Legal for a given result type, but I had to make ISD::FP_ROUND Custom to allow making a libcall when the input is f128. For all other types the Custom handler just returns the original node. These hacks are incomplete and don't work for a strict truncate from f128, but I don't think it worked before either since LegalizeFloatTypes doesn't know about strict ops yet. I've also raised PR43209 against AArch64 which currently crashes on a strict ftrunc from f64->f32 because of FP_ROUND being marked Custom for the same reason there.

I wouldn't be surprised if I'm missing some operation handling here, so @chh if you can test this on real code that would be appreciated.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Tue, Sep 3, 2:53 PM
craig.topper marked an inline comment as done.Tue, Sep 3, 2:56 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
666 ↗(On Diff #218536)

I suspect we can use Expand here and let LegalizeDAG form libcalls for FADD/FSUB/FMUL/FDIV, but what I have here is what AArch64 does. I think LegalizeDAG didn't handle those operations for libcalls at the time the AArch64 code was written. I went conservative for X86 here because LegalizeDAG can form tail calls to the library which legalize types didn't do and my custom handler doesn't either. This avoided some test divergences.

Only mark FP_ROUND as Custom for legal types. Not sure if we can end up getting called by the type legalizer otherwise.

chh added a comment.Wed, Sep 4, 11:22 AM

I haven't read all code details yet; just some quick questions and comments.

Are the 3 test cases the only ones changed?
I know that we don't have very good coverage.
The ultimate test will be compatibility with Android x86_64 target,
which has some unique ABI not the same as AArch64 and normal x86_64.

Among those 3 changed unit tests, 2 seemed getting longer output code.
Could they be improved to be as good as before?

I'm surprised that no change is needed to LegalizeFloatTypes.cpp.
On the other hand, please keep this change as small as possible.
In the worst case, if we find any regression, we will revert it in the Android release.

llvm/lib/Target/X86/X86ISelLowering.cpp
626 ↗(On Diff #218568)

This comment is incorrect now. f128 still should be stored in SSE when available.

682–693 ↗(On Diff #218568)

Are the changes to f32, f64, f80 dependent to this f128 change?
If they are, please add some comment.

chh added subscribers: srhines, enh.Wed, Sep 4, 11:22 AM

Improve comments.

craig.topper marked 2 inline comments as done.Wed, Sep 4, 11:59 AM
In D67128#1658096, @chh wrote:

I haven't read all code details yet; just some quick questions and comments.

Are the 3 test cases the only ones changed?

Yes these are the only test changes. I did have issues with other tests failing while I was developing this, but this is where we ended up at the end.

I know that we don't have very good coverage.
The ultimate test will be compatibility with Android x86_64 target,
which has some unique ABI not the same as AArch64 and normal x86_64.

Among those 3 changed unit tests, 2 seemed getting longer output code.
Could they be improved to be as good as before?

I think with a little bit of work they could be improved. The one in fp128-compare got longer because originally we lowered to __lttf2 and ISD::SETCC. We have a DAG combine that was able to optimize the zext and that setcc to the shr. But now we never create the ISD::SETCC and go directly to a target specific X86ISD::CMP and X86ISD::SETCC. We don't have a DAG combine to optimize that. In the usual case that the rseult of the fcmp is being used by a branch or select we probably still generate the same code. The zext is what is special here.

For the fp128-i128.ll case it looks like X86TargetLowering::reduceSelectOfFPConstantLoads needs to know that the FP compare is for an f128 type and not f32/f64. The interface doesn't allow for that currently, but could easily be fixed in a follow up.

I'm surprised that no change is needed to LegalizeFloatTypes.cpp.
On the other hand, please keep this change as small as possible.
In the worst case, if we find any regression, we will revert it in the Android release.

I think a bunch of changes can be removed from LegalizeFloatTypes.cpp if this patch goes in.

llvm/lib/Target/X86/X86ISelLowering.cpp
626 ↗(On Diff #218568)

I've removed the reference to "long double" here. That's a C/C++ source language type which has different sizes on different targets. Better to be agnostic here and just talk in IR types.

llvm/test/CodeGen/X86/fp128-compare.ll
62 ↗(On Diff #218761)

I need to remove this comment until we go back to shr.

Replace test comment after shrl with a FIXME

chh added a comment.Fri, Sep 6, 11:02 AM

Craig, I tested your patch with some AOSP libm code.

The good news is that we found several codegen fatal errors
and they might be the same pattern. So I attached only one
preprocessed file, its compilation command, and the error dump.
You only need to replace the CLANG variable to your local clang
compiler and compare the output. Please let me know if you have
trouble reproducing it or need more test cases.

The bad news is that I need to modify AOSP compilation flags,
to suppress new warnings from the latest clang compiler.
I haven't been able to compile all AOSP source code yet.
When you have a fix for this bug, I will try again your new patch.

The libm code has some very tricky f128 and i128 conversion code.
In the past, I usually found and fixed failures one after another.
I added unit tests from those found cases, but still we don't cover
all libm code patterns. Please add unit tests for the cases you found too.

We also need to check if output code size increases with this patch.
I compared with gcc f128 output before, and it sometimes take extra
llvm codegen tricks to generate comparable code.

Thanks.

Expand FSQRT nodes. A test case for this was pre-commited in r371283. The codegen does not change with this patch from what we had before.

chh added a comment.Tue, Sep 10, 10:50 AM

I applied this patch to current Android clang/llvm toolchain to build clang compiler.
Not all toolchain was built due to dependencies on other clang/llvm changes not picked into Android yet.
We will known only later when Android toolchain is updated with this and all dependent changes
whether f128 type still works on Android x86_64 devices.
As long as we keep this and future dependent changes small, we should be able to manage any regression found later.

So far new clang compiler built with this change can compile all Android code, especially the libm.
I didn't run on device test due to other missing components. The output code of libm look good.
Some .o files have increased size but overall libm.a and libm.so size difference is negligible, less than 0.1%.

I added some trivial inline comments.
I think this change is good to submit after those issues are addressed.

llvm/lib/Target/X86/X86ISelLowering.cpp
687 ↗(On Diff #219283)

s/ff128/f128/

18694 ↗(On Diff #219283)

Is the change to SrcVT at line 18694 to fix some other bug or an error?
Before this change, it was Op.getSimpleValueType() or DstVT.
Why is it changed to SrcVT?

19520 ↗(On Diff #219283)

Coding style comment.
Shouldn't it be

RTLIB::Libcall LC = RTLIB::getFPEXT(SVT, VT);
craig.topper marked 2 inline comments as done.Tue, Sep 10, 2:04 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
18694 ↗(On Diff #219283)

Looks like a mistake on my part. I didn't mean to change it. If DstVT is a vector, SrcVT will also be a vector and vice versa so it doesn't matter. But I'll change it back to DstVT.

19520 ↗(On Diff #219283)

I copied part of this from somewhere else that had an if after the declaration. Looks like I failed to merge the two lines together the way I intended

Fix review comments

chh accepted this revision.Wed, Sep 11, 10:14 AM
This revision is now accepted and ready to land.Wed, Sep 11, 10:14 AM
This revision was automatically updated to reflect the committed changes.