This is an archive of the discontinued LLVM Phabricator instance.

clang: Replace implementation of __builtin_isnormal
AbandonedPublic

Authored by arsenm on Dec 19 2022, 5:00 AM.

Details

Summary

This was doing an explicit non-canonical isnan check, then two
unordered comparisons. We can fold the ordered check into an ordered
compare. I recently fixed InstCombine to fold this pattern, but might
as well give it less work to do.

https://alive2.llvm.org/ce/z/ZWVHhT

Diff Detail

Event Timeline

arsenm created this revision.Dec 19 2022, 5:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2022, 5:00 AM
arsenm requested review of this revision.Dec 19 2022, 5:00 AM
foad added inline comments.Dec 19 2022, 5:31 AM
clang/lib/CodeGen/CGBuiltin.cpp
3308

Why not make both compares ordered and write this as just fabs(x) < infinity && fabs(x) >= float_min? That seems conceptually simpler - i.e. it makes the comment easier to understand and the IR no worse.

This change can have negative consequences in some cases. Some targets have dedicated instruction to test FP class and often this instruction is faster than arithmetic operations. Replacement of one operation with two arithmetic and two logic plus cost of FP constant materialization does not provide any visible benefit.

One of advantages that llvm.is_fpclass provides is the possibility for a target to use FCLASS instruction if it is available. Besides, llvm.is_fpclass incapsulates semantics of a class check, which is not identical to compare operations. Keeping the class checks in IR can facilitate FP-specific optimizations which provide different handing depending on possible FP classes. Fast math is one example, denormals is another.

I would propose to generate llvm.is_fpclass for all classification intrinsics (see D137811). Targets then may lower the intrinsic in a way most suitable way depending on ISA.

This change can have negative consequences in some cases. Some targets have dedicated instruction to test FP class and often this instruction is faster than arithmetic operations. Replacement of one operation with two arithmetic and two logic plus cost of FP constant materialization does not provide any visible benefit.

Not sure what you're talking about, there's no class call here. Besides, optimal codegen is not clang's concern

One of advantages that llvm.is_fpclass provides is the possibility for a target to use FCLASS instruction if it is available. Besides, llvm.is_fpclass incapsulates semantics of a class check, which is not identical to compare operations. Keeping the class checks in IR can facilitate FP-specific optimizations which provide different handing depending on possible FP classes. Fast math is one example, denormals is another.

It's almost but not quite the same. The middle end can swap back to fcmp as long as we don't have FP exceptions to worry about (in the denormal case we can reform compares against 0/smallest_normal depending on the known denormal mode). I have patches ready to swap between all of these cases

I would propose to generate llvm.is_fpclass for all classification intrinsics (see D137811). Targets then may lower the intrinsic in a way most suitable way depending on ISA.

Right, I think clang emitting is.fpclass and letting the middle and and backend fold back into fcmp as legal is a reasonable strategy. In cases surrounding denormals we just need to know the floating point mode. The possible downside to always emitting class upfront in clang is you immediately lose fast math flags (I'm not entirely sure having fast math flags on fcmp is a good idea though)

As mentioned in another review, using fcmp to represent isnormal in IR as well as any other classification function would result in incorrect code in strict mode. The source:

inline __attribute__((always_inline)) _Bool func(float x) {
  return __builtin_isnormal(x);
}

_Bool check(float x) {
#pragma STDC FENV_ACCESS ON
  return func(x);
}

in this case would produce code that uses FP comparisons to implement isnormal (in the form of constrained intrinsics), this is incorrect. Using correct representation is impossible in this case, as the information about original check is already lost.

arsenm abandoned this revision.Jun 22 2023, 10:39 AM

Obsoleted by D112932