This is an archive of the discontinued LLVM Phabricator instance.

[X86] Custom lowering of llvm.is_fpclass for x86_fp80
Needs ReviewPublic

Authored by sepavloff on Nov 8 2021, 9:07 AM.

Diff Detail

Event Timeline

sepavloff created this revision.Nov 8 2021, 9:07 AM
sepavloff requested review of this revision.Nov 8 2021, 9:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2021, 9:07 AM

This patch was tested using runtime tests from https://reviews.llvm.org/D112933 and a patch for clang https://reviews.llvm.org/D112932.

pengfei added inline comments.Nov 8 2021, 6:52 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

Add f32/f64 too?

22707

Why only f80?

22762

Can we define it with a new name? This give an impression the check conditions can be permuted. But the code seems don't support it.

22765

ditto.

22769

Define it in FPSWFlag ?

22808

It's equal to assert at line 22751 and better there with message.

22848

This only works for fp80. We should check the type first.
Can we check exception #IA for f32 and f64?

llvm/test/CodeGen/X86/x86-is_fpclass-fp80.ll
9

Where's fucomp generated from? I didn't see it in the code.

21

Why fucompi?

sepavloff updated this revision to Diff 385847.Nov 9 2021, 9:04 AM

Address reviewer's notes.

sepavloff added inline comments.Nov 9 2021, 9:25 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

There is concern that FXAM can be slow (https://reviews.llvm.org/D104853#2839746). So it is used only for f80, where x87 anyway is used. For other types a default lowering is used, which use usual float operations or integer arithmetic to make the tests. The resulted code for x86 is presented in D112025 in the test.

22707

You are right, this check is not needed. Removed.

22762

Actually the checks can be permuted. I reformatted the comments, hopefully they make this code clearer.

22769

I would like but I cannot invent a good name for it. Added comment to describe why this constant is used.

22808

The assert only checked that for each CCToCompare there is corresponding ExpectedCC. There are not many cases here, it seems that this check is safe to remove.

22848

Fixed.

llvm/test/CodeGen/X86/x86-is_fpclass-fp80.ll
9

It appears as a result of the call DAG.getSetCC(DL, ResultVT, Arg, Arg, ISD::CondCode::SETUNE), which is called at the beginning of lowerIS_FPCLASS if exceptions are ignored.

21

It is also the result of getSetCC. This code must be generated for unordered comparison.

pengfei added inline comments.Nov 9 2021, 10:18 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

Given that and we still need to distinguish sNaN with qNaN. Why don't we use integer arithmetic for f80 too? We just need to check the high 32 bits for all classes expect fcZero and fcInf.

22704

This should also help for nan combined with other flags.

22712–22725

Can we assert the type only be f80?

22762

But the switch doesn't enumerate all possibilities, e.g., ISD::fcZero | ISD::fcInf. I don't think silently break for these cases are correct.

22848

I don't understand. If we only customize for f80. Why do we always add code for f32 and f64?

sepavloff updated this revision to Diff 386144.Nov 10 2021, 6:40 AM

Remove support of f32 and f64

sepavloff added inline comments.Nov 10 2021, 7:34 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

There is no i80, so doing integer arithmetic on f80 is complicated. Existing default lowering simply crashes on f80. Also for many classes, like fcZero, fcSubnormal, fcInf or fcSNan we need to analyze all three words of f80. One instruction FXAM does all the job, and seems faster even if its latency is high.

22704

It looks faster to execute FXAM and then analyze its result in integer pipeline than doing checks in x87 registers and move the results to integer registers for analysis.

22712–22725

Done just at the start of the function.

22762

This switch only processes some combinations that are simple to implement. For all other combinations the general case is applied.

This patch was tested using runtime tests from D112933. They contains all combinations of two basic tests and many other combinations.

22848

There was some hesitation whether using FXAM could be used for types other than f80 in some cases. Now support of these types is removed.

pengfei added inline comments.Nov 10 2021, 6:27 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

I mean we still customize f80 here, but enrich the code for checking S/QNAN to more classes by loading bits [79:48] rather than [63:32]. We can easily check whether the remainding 48 bits are zero for fcZero and fcInf.
Not only for performance, but also we should lower it when f80 is used without x87 enabled. Although it's rare of such case, it's supported by compiler.

sepavloff added inline comments.Nov 11 2021, 2:35 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

but also we should lower it when f80 is used without x87 enabled. Although it's rare of such case, it's supported by compiler.

How can this mode be activated? If I provide option -mno-x87, compiler fails in TargetLoweringBase::getRegClassFor in:

assert(RC && "This value type is not natively supported!");
pengfei added inline comments.Nov 11 2021, 6:13 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

Yeah, Clang may still have some problem to handle such cases, but GCC does support it. https://godbolt.org/z/aza81fKx4
We shouldn't make it worse :)

sepavloff added inline comments.Nov 11 2021, 8:29 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

As I understand, it is what -msoft-float should do but it doesn't in clang. Things are already bad :)

If FP operations in this case are implemented by library functions, it must be easier, more efficient and reliable to implement a special function for this intrinsic, rather than emulate the FP operations in the compiler.

sepavloff updated this revision to Diff 410288.Feb 21 2022, 5:45 AM

Updated patch because its dependency changed.

sepavloff added inline comments.Feb 21 2022, 5:52 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
721

Adding a new library function is not an option, because we cannot add it to libgcc. The base patch (D112025) changed to support fp80 as well. Functions with suffix "soft" in the test file is_fpclass-fp80.ll demonstrate lowering for this case when x87 unit is not available.