- Rework the host runtime table so that it is constexpr to avoid having to construct it and to store/propagate it.
- Make the interface simpler (remove many templates and a file)
- Enable 16bits float folding using 32bits float host runtime
- Move StaticMultimapView into its own header to use it for host folding
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There's a lot of work here and it's quite well done; thank you.
I suggest that you add meinersbur as a reviewer to check that this code doesn't run afoul of some MSVC bug.
flang/include/flang/Common/static-multimap-view.h | ||
---|---|---|
43 | The argument could be const std::array<V,N> &array. | |
48 | This could be a constexpr check in the constructor. | |
51 | We use GetRange naming in Common. Could GetRange be private? | |
86 | C++20 has constexpr versions of std::equal_range, std::lower_bound, std::binary_search, &c. If you could use them now, would this new class be necessary? If we will eventually use them, maybe what's needed here would be just a place-holding constexpr function or two, possibly conditional on the C++ language standard version. | |
flang/include/flang/Evaluate/common.h | ||
11–12 | Does removing this member from the folding context make them cheap to construct again? | |
flang/lib/Evaluate/fold-implementation.h | ||
88 | return std::nullopt; is a little more clear, I think. | |
flang/lib/Evaluate/intrinsics-library.cpp | ||
142 | Please capitalize the names of these functions; thanks. | |
flang/lib/Lower/IntrinsicCall.cpp | ||
507–508 | I know that this isn't part of this change, but maybe this argument should be a const std::array<> &. | |
flang/unittests/Evaluate/folding.cpp | ||
55–56 | I wonder if this constant should be checked; I worry about C++ compilation on a target machine that likes to flush subnormals to zero. Maybe a hexadecimal floating point constant should be used, or even a reinterpreted 0x00400000. (The value should be 5.877471754111438E-39 if you want a double constant that converts to and back from the desired value; the value you have above is slightly too large and converts to a double with a lower bit set that's lost in the conversion to float in most rounding modes, but why take chances.) |
Thanks for working on this, Jean. I have to confess that I don't understand most of the code. But it all built and tested correctly for me. It would be nice to have some constant folding tests that fold REAL(2) operands.
Where do things stand on 80 bit and 128 bit REALs?
- Simplify StaticMultimapView (use std::equal_range and removed unused member functions)
- Add REAL(2) and REAL(3) folding test
- Enable X87 float folding with IEEE 128 bits float runtime if available on the host.
- Use RealKindForPrecision instead of TypeOf
- Add a compile time verfication that StaticMultimapView is sorted
- Capitalize some functions. std::nullopt instead of {}.
- Use raw subnormal in flushing test instead of decimal input.
Remove static_assert from StaticMultimapView
The last update added a compile time assert that the array
is sorted before constructing a StaticMultimapView. It worked well with clang
but was actually most likely illegal C++ and failed with g++ since there is
no way to guarantee that the content of the array passed in template argument
is constexpr (only its address is guaranteed to be so). Clang++ seems to be
clever enough to propagate the constexpr aspect of the array address it gets
in template argument, but this is not standard C++.
Instead Verify() has to be called next to every instantiation of the map
where it is visible to the compiler that the map was built at compile time...
Also avoid the early return since constexpr functions should only have one return.
Thanks for the reviews, I have updated the patch where I could.
flang/include/flang/Common/static-multimap-view.h | ||
---|---|---|
43 | I do not know a way to do that without having the tables declared as std::array which I find annoying because you cannot nicely delegate the array length to the compiler (e.g. T x[]{...} vs std::array<T, /* need to compute/ guess that */> x{...}). | |
48 | Thanks, I tried hard to make that happen in the ctor/a static Create helper, but there is no way I could find to have it constexpr check that way since the constexpr aspect does not actually guarantee that this will be called with a constexpr array at compile time, so static_assert are rejected. | |
51 | GetRange was removed. | |
86 | What is mainly missing for me here is a constexpr std::is_storted (also C++20), as well as something like std::span (C++20) that (from what I understand) can "erase" the lenght from the base container type and still allow constexpr iteration on it. | |
flang/unittests/Evaluate/folding.cpp | ||
55–56 | Thanks for the tip, I used the raw evaluate::Real constructor to avoid any issue with C++ compilers here. |
Where do things stand on 80 bit and 128 bit REALs?
I have updated the patch to allow using IEEE 128bits to fold 80 bits (X87) floats when IEEE 128 bits is available but not X87.
However 80/128 bits folding that require host runtime is currently not expected to work on all host at all (hence I did not add tests yet).
The folding code is designed to take advantage of when 80/128bits can be safely folded using long double without assuming too much about the platform. It is also designed to allow platform where 80/128bits cannot be safely folded to still compile/run flang normally and only refuse to compile programs that require such host folding. More precisely:
- 80 bits folding works if the host long double is the X87 or IEEE 128bits float (for this last case we would convert 80bits to 128bits before folding).
- 128 bits folding works if the host long double is the IEEE 128 bits.
- When the host long double type does not allow folding 80bits/128bits, Fold() will return the original expressions, and a warning message (non fatal) saying " intrinsic acos(real(kind=16)) cannot be folded on host". If the expressions must be a constant, compilation should then failed complaining it did not get a constant after Fold() calls. Otherwise, compilation will continue normally (leaving the expression unfolded since we do not have to, but warning the user that we could not fold it due to host limitation).
Hosts that needs to fold 80bits/128bits will need to modify host.h to map some non standard types (e.g. __float128) in host.h provided the c++ standard math library works with them (or they will need to provide an ad-hoc runtime with it in intrinsics-library.cpp).
I tested that 80 bits folding works OK on X86-64 linux. I expect 80bits and 128bits folding to work on Aarch64, but did not test it. On Power8/9, I think that depends on how the OS is compiled there. By default long double maps to IBM double double format, but the OS/libc/gcc... can also be compiled to map long double to IEEE 128bits in which case folding 80bits and 128bits should work. As you see, all this is very dependent on the platform.
Note that this only applies to folding intrinsics for which there is no abstract implementation in include/flang/Evaluate/real.h and for which we have to resort to host runtime, otherwise all REALs can be folded regardless of the platform.
@Meinersbur, I have added you as a reviewer since this patch is touching a lot of template/constexpr features so it is likely it will hit MSVC errors. If you have the opportunity to test it ahead that would be great, otherwise beware that it may lead to MSVC regressions.
The patch compiles successfully with msvc (with a patch to trunk that I still need to upload a patch for).
The patch compiles successfully with msvc (with a patch to trunk that I still need to upload a patch for).
Thanks for testing this @Meinersbur !
flang/include/flang/Evaluate/common.h | ||
---|---|---|
11–12 | Yes, FoldingContext are 100 times cheaper to construct according to my measurements. This improves fcvs f18 -fparse-only time by on 12% on average. FoldingContext ctor 100x speedup Given FoldingContext are constructed for every function call check when an explicit interface that can translate in x4 speed-up one time f18 -fparse-only on carefully designed tests like: real, parameter :: x = 0.5 ! Each following line semantic analysis end-up in 3 FoldingContext ctor call real, parameter :: y1 = acos(x) real, parameter :: y2 = acos(x) ! ... repeated 9997 times real, parameter :: y10000 = acos(x) end I measured 2s before vs 0.5s with this patch (time f18 -fparse-only real time). Host folding 1.2x slowdown Conclusion: 12% overall parsing+semantics speed-up on real code |
The argument could be const std::array<V,N> &array.