Generate outline atomics if compiling for armv8-a non-LSE AArch64 Linux
(including Android) targets to use LSE instructions, if they are available,
at runtime. Library support is checked by clang driver which doesn't enable
outline atomics if no proper libraries (libgcc >= 9.3.1 or compiler-rt) found.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
As it stands this needs to be a platform-level change. The compiler-rt stuff may now compile outside Linux (thanks!), but the capability detection is still fundamentally Linux (and even glibc) only so nothing else would use it as intended.
I'd be wary of making it universal even if that's fixed though, unless it turns out that the code-size improvement outweighs the extra operations.
Speaking of which, do you have any benchmark numbers?
Also, even on Linux it seems Clang is inclined to link against libgcc rather than libclang by default (at least that's what my Debian one does). When did these functions get into libgcc? I'm worried that a significant proportion of users there will find themselves having to add extra command-line options (whether disabling this or forcing libclang) to produce binaries.
Outline atomics were added with gcc 9.3.1 and turned on by default in gcc 10.1. Consequently most of distributions had libgcc with outline atomics already.
Besides Linux, Android will utilize them as well. Don't know about iOs and MacOs, I guess they are compiling with LSE enabled, so outline atomics should not affect them.
As for benchmarks, I rely on investigations completed during gcc outline atomics enablement work.
Choice of solution ../gcc/libgcc/config/aarch64/lse.S:
The problem that we are trying to solve is operating system deployment of ARMv8.1-Atomics, also known as Large System Exensions (LSE). There are a number of potential solutions for this problem which have been proposed and rejected for various reasons. To recap: (1) Multiple builds. The dynamic linker will examine /lib64/atomics/ if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten. However, not all Linux distributions are happy with multiple builds, and anyway it has no effect on main applications. (2) IFUNC. We could put these functions into libgcc_s.so, and have a single copy of each function for all DSOs. However, ARM is concerned that the branch-to-indirect-branch that is implied by using a PLT, as required by IFUNC, is too much overhead for smaller cpus. (3) Statically predicted direct branches. This is the approach that is taken here. These functions are linked into every DSO that uses them. All of the symbols are hidden, so that the functions are called via a direct branch. The choice of LSE vs non-LSE is done via one byte load followed by a well-predicted direct branch. The functions are compiled separately to minimize code size.
Performance impact:
Various members in the Arm ecosystem have measured the performance impact of this indirection on a diverse set of systems and we were happy to find out that it was minimal compared to the benefit of using the LSE instructions for better scalability at large core counts.
Outline atomics were added with gcc 9.3.1 and turned on by default in gcc 10.1. Consequently most of distributions had libgcc with outline atomics already.
I think that works for people who use the packages that come with their distro (if they have post-now Clang they ought to have that GCC available). But we release binaries for an LTS Linux distribution too (Ubuntu 16.04 currently) and that only has 5.3.1.
I think this ought to be a Clang patch that detects which platform and libgcc it's targeting before adding the attribute.
Don't know about iOs and MacOs, I guess they are compiling with LSE enabled, so outline atomics should not affect them.
Most iOS compilation still targets baseline ARMv8.0 so doesn't use LSE. The latest OS still supports phones that lack the instructions, and we'll allow back-deployment of user apps even after that stops being true.
I think this ought to be a Clang patch that detects which platform and libgcc it's targeting before adding the attribute.
Do you mean implementing in Clang Driver something like
Detect runtime library used: if ToolChain::RLT_CompilerRT leave outline atomics enabled if ToolChain::RLT_Libgcc run "gcc -dumpfullversion" to get version and disable outline atomics if version < 9.3.1 else disable outline atomics
?
Most iOS compilation still targets baseline ARMv8.0 so doesn't use LSE. The latest OS still supports phones that lack the instructions, and we'll allow back-deployment of user apps even after that stops being true.
Nice! So iOS will benefit outline atomics too.
Forgot to mention, some outline atomics benchmarks results (compiler-rt and libgcc) were posted here: https://reviews.llvm.org/D91156 https://reviews.llvm.org/D91157
Nice! So iOS will benefit outline atomics too.
It potentially could if someone ported the compiler-rt feature detection code, but that has to be done on a platform-by-platform basis and should be a prerequisite for enabling the feature there. As it stands iOS will always use the fallback path because the check is based on a glibc function.
You said Android is planning to adopt this. Do you know how? I thought they had their own libc; does it also implement getauxval, or do they have a separate compiler-rt, or something entirely different?
Do you mean implementing in Clang Driver something like
Detect runtime library used: if ToolChain::RLT_CompilerRT leave outline atomics enabled if ToolChain::RLT_Libgcc run "gcc -dumpfullversion" to get version and disable outline atomics if version < 9.3.1 else disable outline atomics
There would need to be more checks in the CompilerRT case because most platforms won't benefit. There's already Clang logic to discover GCC installations (search for GCCInstallationDetector); I hope you'll be able to use that instead of trying to redo it.
You said Android is planning to adopt this. Do you know how? I thought they had their own libc; does it also implement getauxval, or do they have a separate compiler-rt, or something entirely different?
Android uses compiler-rt and has getauxval support: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/getauxval.cpp
There would need to be more checks in the CompilerRT case because most platforms won't benefit. There's already Clang logic to discover GCC installations (search for GCCInstallationDetector); I hope you'll be able to use that instead of trying to redo it.
Thanks for the tip, I'll try to utilize it.
clang/include/clang/Driver/ToolChain.h | ||
---|---|---|
460 ↗ | (On Diff #315613) | This is a pretty niche feature, so it might be better to spell out the full name. Maybe even throw "AArch64" in there so more people know they can ignore it. |
clang/lib/Driver/ToolChains/Linux.cpp | ||
840 ↗ | (On Diff #315613) | Can this be an assertion? How does a non-Linux triple end up in Linux::IsOADefault? Also, I think the isAndroid check is redundant: Android is Linux (both in reality and in valid llvm::Triples). |
llvm/lib/Target/AArch64/AArch64.td | ||
1062 | I think this still enables it more widely than we want. Clang overrides it with -outline-atomics, but other front-ends don't. |
llvm/lib/Target/AArch64/AArch64.td | ||
---|---|---|
1062 | Could I ask you to clarify what front-ends you meant (to check outline atomics suport for them)? |
llvm/lib/Target/AArch64/AArch64.td | ||
---|---|---|
1062 | Any front-end that generates LLVM IR. Generally important ones are Swift and Rust, but there are many more and I kind of doubt it's feasible to ensure they all will. It's unfortunate, and maybe at some point we can put logic in LLVM to approximate Clang's decision-making and get the benefit without any opt-in, but I kind of think it's a bit early yet. We'd risk breaking too many people's builds. Also, putting it in the generic CPU model here means that if a front-end allows something like -mcpu=cortex-a57 it will suddenly disable outlined atomics. Whether they're good or bad, that's just plain weird behaviour. A CPU-level feature isn't the right place for this. |
llvm/lib/Target/AArch64/AArch64.td | ||
---|---|---|
1062 | Not necessarily arguing in either direction for where these should be enabled, but is that behaviour really so weird? I would expect that we want to specialise to inline atomics as soon as we're given architecture permission (-march=armv8.1-a+lse or greater) or CPU-directed specialisation. These things are only useful for the transition case where your platform mandates Armv8.0 only binaries but you're expecting to also run fast on cores with LSE support. What we're nervous about is burying this option, which we believe to be generally useful for the Armv8.0 platforms, such that it is opt-in. It really isn't the end of the world if this is C/C++ on LInux/Android only for now, but it feels like missed opportunity. That said, it is safe, so probably makes sense to go for that now Pavel and think again about where else we want it on. |
@t.p.northover sorry for pinging you, do you have any concerns about latest patch? It enables outline atomics by default on Aarch64/Linux/Clang with libraries check. I am hoping to get this in before LLVM 12 branch.
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
6448 ↗ | (On Diff #317849) | This needs a Clang driver tests. |
Thanks!
Is there a way we can test -rtlib=libgcc?
clang/lib/Driver/ToolChains/Linux.cpp | ||
---|---|---|
855 ↗ | (On Diff #319693) | Nit: if we change this into: if (GCCInstallation.getVersion().isOlderThan(9, 3, 1)) we can get rid of the curly brackets. |
857 ↗ | (On Diff #319693) | This means we need one more test and RUN line with the rtlib not being compiler-rt or libgcc. |
Sorry one last question: how about -mno-outline-atomics in combination with this? Think we need to add a test for that too?
Thanks, LGTM too.
One nit inlined that can be addressed before committing.
clang/test/Driver/aarch64-features.c | ||
---|---|---|
47 ↗ | (On Diff #320123) | Probably better to just check for the OFF case: "-target-feature" "-outline-atomics" |
I think this still enables it more widely than we want. Clang overrides it with -outline-atomics, but other front-ends don't.