This is an archive of the discontinued LLVM Phabricator instance.

[builtins] Define fmax and scalbn inline
ClosedPublic

Authored by rprichard on Nov 19 2020, 9:17 PM.

Details

Summary

Define inline versions of compiler_rt_fmax* and compiler_rt_scalbn*
rather than depend on the versions in libm. As with
__compiler_rt_logbn*, these functions are only defined for single,
double, and quad precision (binary128).

Fixes PR32279 for targets using only these FP formats (e.g. Android
on arm/arm64/x86/x86_64).

For single and double precision, on AArch64, use __builtin_fmax[f]
instead of the new inline function, because the builtin expands to the
AArch64 fmaxnm instruction.

Diff Detail

Event Timeline

rprichard created this revision.Nov 19 2020, 9:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2020, 9:17 PM
Herald added subscribers: Restricted Project, pengfei, kbarton and 2 others. · View Herald Transcript
rprichard requested review of this revision.Nov 19 2020, 9:17 PM

With this change, I think PR32279 is still an issue with non-standard FP formats, like the x86 80-bit extended-precision format. In principle, it seems straightforward to extend the __compiler_rt_* inline functions to handle that format.

compiler-rt/test/builtins/Unit/compiler_rt_scalbn_test.c
35

FWIW, msvc added support for C99/C++11 hex float literals somewhere between 19.10 and 19.14: https://godbolt.org/z/fchz37. I'm not sure it's OK to take this dependency, but it's also used only in a test.

The C11/C++11 {FLT,DBL,LDBL}_TRUE_MIN macros are defined in msvc 19.00.23506 and up, though (verified with rextester.com).

Switch scalbn calls in ppc/divtc3.c to __compiler_rt_scalbn. I verified that, after doing so, libclang_rt.builtins-powerpc64le.a has no unresolved symbols for fmax*/logb*/scalbn*.

The existing __compiler_rt_logb* inline functions were also added to address PR32279, in D49514. On Android, all of fmax*/logb*/scalbn* are in libm, not libc. I commented more on the bug: https://bugs.llvm.org/show_bug.cgi?id=32279#c7.

I wrote D49514 a long time ago, and since then, llvm-libc was created -- and it looks like it implements fmax, though not scalbn. Anyway, I wonder if it would be possible to use the llvm-libc implementation of fmax instead of adding one here? And similarly, for scalbn, maybe it would be easier/better long term to add one to libc instead of here.

Siva: are there docs anywhere saying what the status of each libc function is? Which functions have been added, are mature enough to actually use, etc.?

Siva: are there docs anywhere saying what the status of each libc function is? Which functions have been added, are mature enough to actually use, etc.?

What is available can be found here: https://github.com/llvm/llvm-project/tree/master/libc/src/math
Specifically, fmax and friends are available. While scalbn is not available, ldexp which is equivalent to is scalbn for radix 2 systems, is available.

I do not know much about the compiler_rt use cases. But, it seems to me that you want to avoid dependence on a libc. So, what exactly do you mean by pulling the implementations from LLVM libc? Do you mean to suggest that compiler_rt should use [parts of] LLVM libc as a library of math functions? If yes, then that is possible with the basic functions: https://github.com/llvm/llvm-project/tree/master/libc/utils/FPUtil

FPUtil went through few iterations so it is not as clean as we would like it to be. But, all the functions defined in there are template functions which handle float, double and long double (even for x86_64). So, for the callers, it is just a header library one can call into. No link time dependency.

I don't think that we can use llvm-libc, but I suppose that lifting the functionality from there is possible. What does that gain over this though?

compiler-rt/lib/builtins/divdc3.c
23–24

If you want to change the prefix (which I think is probably a good idea), please do so in a separate (preliminary) change for crt_fmax, crt_scalbn, and crt_fabs.

The __compiler_rt_{logb,scalbn,fmax}* functions don't work with x87 extended-precision FP, and I think FPUtil would handle that. Perhaps the existing inline functions could be extended to support x87, though.

Is there an ABI issue with using FPUtil headers from the builtins? e.g. The compiler is allowed to output an out-of-line version of C++ inline functions, so if a builtins archive and an LLVM libc.a were built from different versions of LLVM, could we have incompatible versions of (say) NormalFloat's methods between the two archives? Maybe FPUtil could be in an anonymous namespace.

I think FPBits is assuming a GCC/Clang-like compiler, and the builtins have some support for MSVC. e.g. FPBits is using __attribute__((packed)) and appears to use bitfields to match an FP type's layout. The bitfields have different base types, so I think the layout with MSVC won't work as expected. (But MSVC should be OK if the base types are the same for all fields, I think.)

FPUtil logb has this code:

if (bits.isZero()) {
  // TODO(Floating point exception): Raise div-by-zero exception.
  // TODO(errno): POSIX requires setting errno to ERANGE.
  return FPBits<T>::negInf();
} else if (bits.isNaN()) {

I think the builtins are not supposed to set errno. (But they currently can: https://bugs.llvm.org/show_bug.cgi?id=32279#c8)

The builtins currently don't have any C++ code, AFAIK. I don't know that this matters, though.

I also noticed that __llvm_libc::fputil::ldexp appears to ignore the rounding mode (fesetround), whereas the ldexp/scalbn functions in glibc, musl, and bionic all respect the rounding mode. I don't know whether the compiler-rt builtins (e.g. for complex division) currently have the correct rounding-mode behavior -- for __compiler_rt_scalbn*, I was just preserving the behavior I saw in scalbn.

compiler-rt/lib/builtins/divdc3.c
23–24

I haven't renamed the functions. My change adds __compiler_rt_scalbn* and __compiler_rt_fmax* as functions in fp_lib.h, but it doesn't remove the existing crt_scalbn* and crt_fmax* functions from int_math.h.

The long double crt_logbl, crt_scalbnl, and crt_fmaxl functions are still used:

  • As the fallback for non-binary128 long double FP in fp_lib.h. (I'm not sure that the __compiler_rt_*l functions are actually called for a non-binary128 QUAD_PRECISION mode, though...)
  • In divxc3.c (x87 80-bit long double complex division)

I noticed that D49514 removed the non-long-double crt_* functions it replaced, so I suppose I should also do that.

rprichard updated this revision to Diff 307236.Nov 23 2020, 9:03 PM

Remove obsoleted crt_fmax[f] and crt_scalbn[f] functions.

sivachandra added a comment.EditedNov 23 2020, 9:55 PM

The __compiler_rt_{logb,scalbn,fmax}* functions don't work with x87 extended-precision FP, and I think FPUtil would handle that. Perhaps the existing inline functions could be extended to support x87, though.

Is there an ABI issue with using FPUtil headers from the builtins? e.g. The compiler is allowed to output an out-of-line version of C++ inline functions, so if a builtins archive and an LLVM libc.a were built from different versions of LLVM, could we have incompatible versions of (say) NormalFloat's methods between the two archives? Maybe FPUtil could be in an anonymous namespace.

I think FPBits is assuming a GCC/Clang-like compiler, and the builtins have some support for MSVC. e.g. FPBits is using __attribute__((packed)) and appears to use bitfields to match an FP type's layout. The bitfields have different base types, so I think the layout with MSVC won't work as expected. (But MSVC should be OK if the base types are the same for all fields, I think.)

If one wants cross-platform code out of the box, LLVM libc in general is not yet ready for it. We are working on it, and if everything goes well, end of Q1/early Q2 2021 is when I would think we will see LLVM libc slowly coming up on Windows.

I also noticed that __llvm_libc::fputil::ldexp appears to ignore the rounding mode (fesetround), whereas the ldexp/scalbn functions in glibc, musl, and bionic all respect the rounding mode. I don't know whether the compiler-rt builtins (e.g. for complex division) currently have the correct rounding-mode behavior -- for __compiler_rt_scalbn*, I was just preserving the behavior I saw in scalbn.

Yes. We are working on the floating point exception and rounding mode story currently. If everything goes well, we expect to sort this out before the end of this year.

If the concerns you raised here are actually hard requirements, I would say FPUtil is not quite ready for your use case. But, if you do want to use FPUtil, I will be happy to prioritize your uses cases higher and get them out faster.

I don't think that we can use llvm-libc, but I suppose that lifting the functionality from there is possible. What does that gain over this though?

It lets you reuse an already written implementation instead of adding a (possibly buggy) implementation here. In general it's better to have one implementation of things than N implementations -- code is a liability, not an asset.

Anyway, it sounds like it isn't possible, at least not yet. Thanks all for at least entertaining my idea :)

I may not be the best reviewer for this, but I'll take a pass today.

LGTM, though it'd be good to have another set of eyes on this to approve it.

compiler-rt/lib/builtins/fp_lib.h
309–310

IIUC this also handles a = -0.0, so the comment should say // +/- 0.0, NaN, ...

346–347

For my own curiosity, C99 says on fmax:

361) Ideally, fmax would be sensitive to the sign of zero, for example fmax(−0. 0, +0. 0) would
return +0; however, implementation in software might be impractical.

If this were exported outside of compiler-rt, I'd consider asking for that to be changed, but it looks like it doesn't matter -- the only context it's used in compiler-rt is variants of "fmax(fabs(x), fabs(y))", so sign should never be a factor.

362–368

Normally, the way to go would be to override this in compiler-rt/lib/builtins/aarch64. I'm not sure that's an option here though.

rprichard added inline comments.Nov 30 2020, 8:33 PM
compiler-rt/lib/builtins/fp_lib.h
362–368

I could also leave the special case AArch64 out. Maybe it's faster to use fmaxnm, but it also might not be important. Clang manages to compile the expression into 3 FP instructions. GCC manages it with 5 instructions. https://godbolt.org/z/Yc9zoK.

This isn't so much an AArch64-specific version of fmax as a list of configurations where the compiler is expected to inline __builtin_fmax* rather than call a library function.

rprichard updated this revision to Diff 308522.Nov 30 2020, 8:43 PM

Rename new fmax/scalbn fp_lib.h params from a/b to x/y for consistency with the new tests and with the existing __compiler_rt_logb*.

rprichard updated this revision to Diff 308524.Nov 30 2020, 8:47 PM

Add +/- to __compiler_rt_scalbnX comment.

MaskRay added a subscriber: MaskRay.Dec 2 2020, 9:52 AM

The runtime libraries should have a proper layering. Generally compiler-rt builtins (libgcc_s.so.1) should not depend on libc. (There are use cases where people depend on compiler-rt builtins but not libc) abort is a known exception but we should generally avoid adding more exceptions.

Implementing the relevant functions called by divxc3 is one choice, another choice is to emulate libgcc - don't use scalbn.

Implementing the relevant functions called by divxc3 is one choice, another choice is to emulate libgcc - don't use scalbn.

The libgcc version goes about the computation in a slightly different way -- e.g. it computes an intermediate ratio of c / d or d / c (ensuring that the fabs(ratio) is at most 1.0) to avoid overflow, whereas the current compiler-rt version uses scalbn (and closely matches the reference implementation in Annex G of the C specification). The scalbn approach seems to be more accurate? e.g.:

  • From G.5.1 Multiplicative operators (paragraph 9):

Scaling the denominator alleviates the main overflow and underflow problem, which is more serious than for multiplication.
In the spirit of the multiplication example above, this code does not defend against overflow and underflow in the calculation
of the numerator. Scaling with the scalbn function, instead of with division, provides better roundoff characteristics.

/* ??? We can get better behavior from logarithmic scaling instead of
   the division.  But that would mean starting to link libgcc against
   libm.  We could implement something akin to ldexp/frexp as gcc builtins
   fairly easily...  */

I'm guessing we wouldn't want to reduce accuracy to break the libm dependency?

Adding a couple of other people as possible reviewers.

Implementing the relevant functions called by divxc3 is one choice, another choice is to emulate libgcc - don't use scalbn.

The libgcc version goes about the computation in a slightly different way -- e.g. it computes an intermediate ratio of c / d or d / c (ensuring that the fabs(ratio) is at most 1.0) to avoid overflow, whereas the current compiler-rt version uses scalbn (and closely matches the reference implementation in Annex G of the C specification). The scalbn approach seems to be more accurate? e.g.:

  • From G.5.1 Multiplicative operators (paragraph 9):

Scaling the denominator alleviates the main overflow and underflow problem, which is more serious than for multiplication.
In the spirit of the multiplication example above, this code does not defend against overflow and underflow in the calculation
of the numerator. Scaling with the scalbn function, instead of with division, provides better roundoff characteristics.

/* ??? We can get better behavior from logarithmic scaling instead of
   the division.  But that would mean starting to link libgcc against
   libm.  We could implement something akin to ldexp/frexp as gcc builtins
   fairly easily...  */

I'm guessing we wouldn't want to reduce accuracy to break the libm dependency?

@MaskRay do you still have concerns with the approach, or are you happy with the better accuracy? I think in practice most binaries will depend on libm anyway (iirc libc++ requires it already). If you're okay with this approach, mind LGTMing? :)

MaskRay accepted this revision.Feb 19 2021, 7:07 PM

LGTM.

This revision is now accepted and ready to land.Feb 19 2021, 7:07 PM
This revision was landed with ongoing or failed builds.Feb 24 2021, 2:33 PM
This revision was automatically updated to reflect the committed changes.

The new unit tests failed on sanitizer-windows. I reverted it for now. https://lab.llvm.org/buildbot/#builders/127/builds/6620

rprichard reopened this revision.Feb 26 2021, 4:26 AM
This revision is now accepted and ready to land.Feb 26 2021, 4:26 AM
rprichard updated this revision to Diff 326653.Feb 26 2021, 4:27 AM

Disable non-default-rounding-mode scalbn[f] tests when using the MSVC libraries.

MSVC's scalbn appears to ignore the current rounding mode, so the value returned from __compiler_rt_scalbn[f] disagreed with the value returned from scalbn[f]. I think it's sufficient to disable the non-default-rounding-mode tests for MSVC. With this change, the __divdc3 and __divsc3 builtins, when compiled with MSVC, should produce the same results as on Linux systems (or as with MinGW and the other Unix systems I tested).

These two builtins don't seem to have much use when LLVM is configured to use MSVC's libraries. In this configuration, complex.h doesn't define a complex macro for complex float and complex double. It's possible to use _Complex float and _Complex double directly, but they can't be passed to functions in the C library, because those functions accept MSVC's _Fcomplex and _Dcomplex struct types instead. By default, the builtins library isn't linked -- I must pass --rtlib=compiler-rt to enable it.

For reference, I wrote a test demonstrating some behavior I saw with MSVC's scalbn and ldexp functions: https://gist.github.com/rprichard/8a4e8c3edd73ffbceb187a9e8cb7e9ed#file-zzz-output-msvc-19-16-x64-txt

  • ldexp respected the current rounding mode on overflow, but scalbn didn't.
  • For rounding of subnormals:
    • scalbn rounded to nearest-even and didn't set FE_INEXACT or FE_UNDERFLOW.
    • ldexp rounded towards zero and set FE_INEXACT and FE_UNDERFLOW.
srhines accepted this revision.Feb 26 2021, 3:04 PM

+2 for relanding this.

This revision was landed with ongoing or failed builds.Feb 26 2021, 4:22 PM
This revision was automatically updated to reflect the committed changes.
zzheng added a subscriber: zzheng.Apr 9 2021, 4:28 PM

Hello,

When integrating this change to our downstream toolchain, the new tests for scalbn and scalbnf fails:

  • TEST 'Builtins-arm-linux :: compiler_rt_scalbn_test.c' FAILED ****

Exit Code: 1
Command Output (stdout):

error: [FE_UPWARD] in __compiler_rt_scalbn(-0x1p+0 [BFF0000000000000], 10000) = -0x1.fffffffffffffp+1023 [FFEFFFFFFFFFFFFF] != -inf [FFF0000000000000]

  • TEST 'Builtins-arm-linux :: compiler_rt_scalbnf_test.c' FAILED ****

Exit Code: 1
Command Output (stdout):

error: [FE_UPWARD] in __compiler_rt_scalbnf(-0x1p+0 [BF800000], 1000) = -0x1.fffffep+127 [FF7FFFFF] != -inf [FF800000]

It seems there's some inaccuracies, can someone provide some tip on how to debug these errors?

error: [FE_UPWARD] in __compiler_rt_scalbn(-0x1p+0 [BFF0000000000000], 10000) = -0x1.fffffffffffffp+1023 [FFEFFFFFFFFFFFFF] != -inf [FFF0000000000000]

It looks like compiler-rt's inlined __compiler_rt_scalbn is respecting the floating-point rounding mode (fesetround(FE_UPWARD)), but the libm scalbn is ignoring it instead. What libc/libm and (sub)architecture is this?

The rounding mode tests are already disabled on arm if __ARM_FP isn't defined, but maybe that's not sufficient.

What libc/libm and (sub)architecture is this?

ARM32 on linux, using -mfloat-abi=soft. We've tried libc/libm distributed with gcc (4.9.0 and 7.2 released by fsf, and 7.1.1 release by linaro).

We also tried -mfloat-abi=hard on gcc 10.2's libc/libm, released by ARM, which passed the tests. So.. these tests should be disabled when using -mfloat-abi=soft?

hmm... per GCC/ARM manual about -mfloat-abi:

-mfloat-abi=name

Specifies which floating-point ABI to use. Permissible values are: ‘soft’, ‘softfp’ and ‘hard’.

Specifying ‘soft’ causes GCC to generate output containing library calls for floating-point operations. **‘softfp’ allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions.** ‘hard’ allows generation of floating-point instructions and uses FPU-specific calling conventions.

We build compiler-rt and the these tests using 'softfp' by default in our downstream environment. We'll build & run these tests with 'soft' for now. Please let us know if this is the proper solution @rprichard.