Page MenuHomePhabricator

[AArch64] Enable out-of-line atomics by default.
Needs ReviewPublic

Authored by ilinpv on Sat, Dec 19, 12:45 PM.

Details

Summary

Generate outline atomics when compiling for a baseline armv8-a targets
to use LSE instructions if they are available at runtime.
Outline atomics toolchain support is checked by clang driver which disables
them for targets that doesn't have it yet.

Diff Detail

Event Timeline

ilinpv created this revision.Sat, Dec 19, 12:45 PM
ilinpv requested review of this revision.Sat, Dec 19, 12:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptSat, Dec 19, 12:45 PM

As it stands this needs to be a platform-level change. The compiler-rt stuff may now compile outside Linux (thanks!), but the capability detection is still fundamentally Linux (and even glibc) only so nothing else would use it as intended.

I'd be wary of making it universal even if that's fixed though, unless it turns out that the code-size improvement outweighs the extra operations.

Speaking of which, do you have any benchmark numbers?

Also, even on Linux it seems Clang is inclined to link against libgcc rather than libclang by default (at least that's what my Debian one does). When did these functions get into libgcc? I'm worried that a significant proportion of users there will find themselves having to add extra command-line options (whether disabling this or forcing libclang) to produce binaries.

Outline atomics were added with gcc 9.3.1 and turned on by default in gcc 10.1. Consequently most of distributions had libgcc with outline atomics already.
Besides Linux, Android will utilize them as well. Don't know about iOs and MacOs, I guess they are compiling with LSE enabled, so outline atomics should not affect them.
As for benchmarks, I rely on investigations completed during gcc outline atomics enablement work.
Choice of solution ../gcc/libgcc/config/aarch64/lse.S:

The problem that we are trying to solve is operating system deployment
of ARMv8.1-Atomics, also known as Large System Exensions (LSE).

There are a number of potential solutions for this problem which have
been proposed and rejected for various reasons.  To recap:

(1) Multiple builds.  The dynamic linker will examine /lib64/atomics/
if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
However, not all Linux distributions are happy with multiple builds,
and anyway it has no effect on main applications.
(2) IFUNC.  We could put these functions into libgcc_s.so, and have
a single copy of each function for all DSOs.  However, ARM is concerned
that the branch-to-indirect-branch that is implied by using a PLT,
as required by IFUNC, is too much overhead for smaller cpus.
(3) Statically predicted direct branches.  This is the approach that
is taken here.  These functions are linked into every DSO that uses them.
All of the symbols are hidden, so that the functions are called via a
direct branch.  The choice of LSE vs non-LSE is done via one byte load
followed by a well-predicted direct branch.  The functions are compiled
separately to minimize code size.

Performance impact:

Various members in the Arm ecosystem have measured the performance impact of this indirection on a diverse set of systems and we were happy to find out that it was minimal compared to the benefit of using the LSE instructions for better scalability at large core counts.

Outline atomics were added with gcc 9.3.1 and turned on by default in gcc 10.1. Consequently most of distributions had libgcc with outline atomics already.

I think that works for people who use the packages that come with their distro (if they have post-now Clang they ought to have that GCC available). But we release binaries for an LTS Linux distribution too (Ubuntu 16.04 currently) and that only has 5.3.1.

I think this ought to be a Clang patch that detects which platform and libgcc it's targeting before adding the attribute.

Don't know about iOs and MacOs, I guess they are compiling with LSE enabled, so outline atomics should not affect them.

Most iOS compilation still targets baseline ARMv8.0 so doesn't use LSE. The latest OS still supports phones that lack the instructions, and we'll allow back-deployment of user apps even after that stops being true.

ilinpv added a comment.EditedTue, Dec 22, 4:53 PM

I think this ought to be a Clang patch that detects which platform and libgcc it's targeting before adding the attribute.

Do you mean implementing in Clang Driver something like

Detect runtime library used: 
  if ToolChain::RLT_CompilerRT 
    leave outline atomics enabled
  if ToolChain::RLT_Libgcc 
    run "gcc -dumpfullversion" to get version and disable outline atomics if version < 9.3.1 
  else disable outline atomics

?

Most iOS compilation still targets baseline ARMv8.0 so doesn't use LSE. The latest OS still supports phones that lack the instructions, and we'll allow back-deployment of user apps even after that stops being true.

Nice! So iOS will benefit outline atomics too.

Forgot to mention, some outline atomics benchmarks results (compiler-rt and libgcc) were posted here: https://reviews.llvm.org/D91156 https://reviews.llvm.org/D91157

Nice! So iOS will benefit outline atomics too.

It potentially could if someone ported the compiler-rt feature detection code, but that has to be done on a platform-by-platform basis and should be a prerequisite for enabling the feature there. As it stands iOS will always use the fallback path because the check is based on a glibc function.

You said Android is planning to adopt this. Do you know how? I thought they had their own libc; does it also implement getauxval, or do they have a separate compiler-rt, or something entirely different?

Do you mean implementing in Clang Driver something like

Detect runtime library used: 
  if ToolChain::RLT_CompilerRT 
    leave outline atomics enabled
  if ToolChain::RLT_Libgcc 
    run "gcc -dumpfullversion" to get version and disable outline atomics if version < 9.3.1 
  else disable outline atomics

There would need to be more checks in the CompilerRT case because most platforms won't benefit. There's already Clang logic to discover GCC installations (search for GCCInstallationDetector); I hope you'll be able to use that instead of trying to redo it.

You said Android is planning to adopt this. Do you know how? I thought they had their own libc; does it also implement getauxval, or do they have a separate compiler-rt, or something entirely different?

Android uses compiler-rt and has getauxval support: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/getauxval.cpp

There would need to be more checks in the CompilerRT case because most platforms won't benefit. There's already Clang logic to discover GCC installations (search for GCCInstallationDetector); I hope you'll be able to use that instead of trying to redo it.

Thanks for the tip, I'll try to utilize it.

ilinpv updated this revision to Diff 315613.Sat, Jan 9, 9:29 AM
ilinpv edited the summary of this revision. (Show Details)

RT library detection and check for outline atomics support added to the driver.

Herald added a project: Restricted Project. · View Herald TranscriptSat, Jan 9, 9:29 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
t.p.northover added inline comments.Mon, Jan 11, 5:55 AM
clang/include/clang/Driver/ToolChain.h
460

This is a pretty niche feature, so it might be better to spell out the full name. Maybe even throw "AArch64" in there so more people know they can ignore it.

clang/lib/Driver/ToolChains/Linux.cpp
840

Can this be an assertion? How does a non-Linux triple end up in Linux::IsOADefault? Also, I think the isAndroid check is redundant: Android is Linux (both in reality and in valid llvm::Triples).

llvm/lib/Target/AArch64/AArch64.td
1087

I think this still enables it more widely than we want. Clang overrides it with -outline-atomics, but other front-ends don't.

ilinpv updated this revision to Diff 316471.Wed, Jan 13, 11:37 AM
ilinpv marked 2 inline comments as done.Wed, Jan 13, 11:50 AM
ilinpv added inline comments.
llvm/lib/Target/AArch64/AArch64.td
1087

Could I ask you to clarify what front-ends you meant (to check outline atomics suport for them)?