This was doing an explicit non-canonical isnan check, then two
unordered comparisons. We can fold the ordered check into an ordered
compare. I recently fixed InstCombine to fold this pattern, but might
as well give it less work to do.
Details
- Reviewers
bkramer foad sepavloff andrew.w.kaylor kpn
Diff Detail
Event Timeline
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
3308 | Why not make both compares ordered and write this as just fabs(x) < infinity && fabs(x) >= float_min? That seems conceptually simpler - i.e. it makes the comment easier to understand and the IR no worse. |
This change can have negative consequences in some cases. Some targets have dedicated instruction to test FP class and often this instruction is faster than arithmetic operations. Replacement of one operation with two arithmetic and two logic plus cost of FP constant materialization does not provide any visible benefit.
One of advantages that llvm.is_fpclass provides is the possibility for a target to use FCLASS instruction if it is available. Besides, llvm.is_fpclass incapsulates semantics of a class check, which is not identical to compare operations. Keeping the class checks in IR can facilitate FP-specific optimizations which provide different handing depending on possible FP classes. Fast math is one example, denormals is another.
I would propose to generate llvm.is_fpclass for all classification intrinsics (see D137811). Targets then may lower the intrinsic in a way most suitable way depending on ISA.
Not sure what you're talking about, there's no class call here. Besides, optimal codegen is not clang's concern
One of advantages that llvm.is_fpclass provides is the possibility for a target to use FCLASS instruction if it is available. Besides, llvm.is_fpclass incapsulates semantics of a class check, which is not identical to compare operations. Keeping the class checks in IR can facilitate FP-specific optimizations which provide different handing depending on possible FP classes. Fast math is one example, denormals is another.
It's almost but not quite the same. The middle end can swap back to fcmp as long as we don't have FP exceptions to worry about (in the denormal case we can reform compares against 0/smallest_normal depending on the known denormal mode). I have patches ready to swap between all of these cases
I would propose to generate llvm.is_fpclass for all classification intrinsics (see D137811). Targets then may lower the intrinsic in a way most suitable way depending on ISA.
Right, I think clang emitting is.fpclass and letting the middle and and backend fold back into fcmp as legal is a reasonable strategy. In cases surrounding denormals we just need to know the floating point mode. The possible downside to always emitting class upfront in clang is you immediately lose fast math flags (I'm not entirely sure having fast math flags on fcmp is a good idea though)
As mentioned in another review, using fcmp to represent isnormal in IR as well as any other classification function would result in incorrect code in strict mode. The source:
inline __attribute__((always_inline)) _Bool func(float x) { return __builtin_isnormal(x); } _Bool check(float x) { #pragma STDC FENV_ACCESS ON return func(x); }
in this case would produce code that uses FP comparisons to implement isnormal (in the form of constrained intrinsics), this is incorrect. Using correct representation is impossible in this case, as the information about original check is already lost.
Why not make both compares ordered and write this as just fabs(x) < infinity && fabs(x) >= float_min? That seems conceptually simpler - i.e. it makes the comment easier to understand and the IR no worse.