Page MenuHomePhabricator

[AArch64][Clang][Linux] Enable out-of-line atomics by default.
ClosedPublic

Authored by ilinpv on Dec 19 2020, 12:45 PM.

Details

Summary

Generate outline atomics if compiling for armv8-a non-LSE AArch64 Linux
(including Android) targets to use LSE instructions, if they are available,
at runtime. Library support is checked by clang driver which doesn't enable
outline atomics if no proper libraries (libgcc >= 9.3.1 or compiler-rt) found.

Diff Detail

Unit TestsFailed

TimeTest
310 msx64 debian > libarcher.races::task-dependency.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-dependency.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-dependency.c
260 msx64 debian > libarcher.races::task-taskgroup-unrelated.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskgroup-unrelated.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskgroup-unrelated.c
260 msx64 debian > libarcher.races::task-taskwait-nested.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskwait-nested.c
250 msx64 debian > libarcher.races::task-two.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-two.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-two.c
340 msx64 debian > libarcher.task::task-barrier.c
Script: -- : 'RUN: at line 15'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/task/task-barrier.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/task/task-barrier.c
View Full Test Results (13 Failed)

Event Timeline

ilinpv created this revision.Dec 19 2020, 12:45 PM
ilinpv requested review of this revision.Dec 19 2020, 12:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 19 2020, 12:45 PM

As it stands this needs to be a platform-level change. The compiler-rt stuff may now compile outside Linux (thanks!), but the capability detection is still fundamentally Linux (and even glibc) only so nothing else would use it as intended.

I'd be wary of making it universal even if that's fixed though, unless it turns out that the code-size improvement outweighs the extra operations.

Speaking of which, do you have any benchmark numbers?

Also, even on Linux it seems Clang is inclined to link against libgcc rather than libclang by default (at least that's what my Debian one does). When did these functions get into libgcc? I'm worried that a significant proportion of users there will find themselves having to add extra command-line options (whether disabling this or forcing libclang) to produce binaries.

Outline atomics were added with gcc 9.3.1 and turned on by default in gcc 10.1. Consequently most of distributions had libgcc with outline atomics already.
Besides Linux, Android will utilize them as well. Don't know about iOs and MacOs, I guess they are compiling with LSE enabled, so outline atomics should not affect them.
As for benchmarks, I rely on investigations completed during gcc outline atomics enablement work.
Choice of solution ../gcc/libgcc/config/aarch64/lse.S:

The problem that we are trying to solve is operating system deployment
of ARMv8.1-Atomics, also known as Large System Exensions (LSE).

There are a number of potential solutions for this problem which have
been proposed and rejected for various reasons.  To recap:

(1) Multiple builds.  The dynamic linker will examine /lib64/atomics/
if HWCAP_ATOMICS is set, allowing entire libraries to be overwritten.
However, not all Linux distributions are happy with multiple builds,
and anyway it has no effect on main applications.
(2) IFUNC.  We could put these functions into libgcc_s.so, and have
a single copy of each function for all DSOs.  However, ARM is concerned
that the branch-to-indirect-branch that is implied by using a PLT,
as required by IFUNC, is too much overhead for smaller cpus.
(3) Statically predicted direct branches.  This is the approach that
is taken here.  These functions are linked into every DSO that uses them.
All of the symbols are hidden, so that the functions are called via a
direct branch.  The choice of LSE vs non-LSE is done via one byte load
followed by a well-predicted direct branch.  The functions are compiled
separately to minimize code size.

Performance impact:

Various members in the Arm ecosystem have measured the performance impact of this indirection on a diverse set of systems and we were happy to find out that it was minimal compared to the benefit of using the LSE instructions for better scalability at large core counts.

Outline atomics were added with gcc 9.3.1 and turned on by default in gcc 10.1. Consequently most of distributions had libgcc with outline atomics already.

I think that works for people who use the packages that come with their distro (if they have post-now Clang they ought to have that GCC available). But we release binaries for an LTS Linux distribution too (Ubuntu 16.04 currently) and that only has 5.3.1.

I think this ought to be a Clang patch that detects which platform and libgcc it's targeting before adding the attribute.

Don't know about iOs and MacOs, I guess they are compiling with LSE enabled, so outline atomics should not affect them.

Most iOS compilation still targets baseline ARMv8.0 so doesn't use LSE. The latest OS still supports phones that lack the instructions, and we'll allow back-deployment of user apps even after that stops being true.

ilinpv added a comment.EditedDec 22 2020, 4:53 PM

I think this ought to be a Clang patch that detects which platform and libgcc it's targeting before adding the attribute.

Do you mean implementing in Clang Driver something like

Detect runtime library used: 
  if ToolChain::RLT_CompilerRT 
    leave outline atomics enabled
  if ToolChain::RLT_Libgcc 
    run "gcc -dumpfullversion" to get version and disable outline atomics if version < 9.3.1 
  else disable outline atomics

?

Most iOS compilation still targets baseline ARMv8.0 so doesn't use LSE. The latest OS still supports phones that lack the instructions, and we'll allow back-deployment of user apps even after that stops being true.

Nice! So iOS will benefit outline atomics too.

Forgot to mention, some outline atomics benchmarks results (compiler-rt and libgcc) were posted here: https://reviews.llvm.org/D91156 https://reviews.llvm.org/D91157

Nice! So iOS will benefit outline atomics too.

It potentially could if someone ported the compiler-rt feature detection code, but that has to be done on a platform-by-platform basis and should be a prerequisite for enabling the feature there. As it stands iOS will always use the fallback path because the check is based on a glibc function.

You said Android is planning to adopt this. Do you know how? I thought they had their own libc; does it also implement getauxval, or do they have a separate compiler-rt, or something entirely different?

Do you mean implementing in Clang Driver something like

Detect runtime library used: 
  if ToolChain::RLT_CompilerRT 
    leave outline atomics enabled
  if ToolChain::RLT_Libgcc 
    run "gcc -dumpfullversion" to get version and disable outline atomics if version < 9.3.1 
  else disable outline atomics

There would need to be more checks in the CompilerRT case because most platforms won't benefit. There's already Clang logic to discover GCC installations (search for GCCInstallationDetector); I hope you'll be able to use that instead of trying to redo it.

You said Android is planning to adopt this. Do you know how? I thought they had their own libc; does it also implement getauxval, or do they have a separate compiler-rt, or something entirely different?

Android uses compiler-rt and has getauxval support: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/getauxval.cpp

There would need to be more checks in the CompilerRT case because most platforms won't benefit. There's already Clang logic to discover GCC installations (search for GCCInstallationDetector); I hope you'll be able to use that instead of trying to redo it.

Thanks for the tip, I'll try to utilize it.

ilinpv updated this revision to Diff 315613.Jan 9 2021, 9:29 AM
ilinpv edited the summary of this revision. (Show Details)

RT library detection and check for outline atomics support added to the driver.

Herald added a project: Restricted Project. · View Herald TranscriptJan 9 2021, 9:29 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
t.p.northover added inline comments.Jan 11 2021, 5:55 AM
clang/include/clang/Driver/ToolChain.h
460

This is a pretty niche feature, so it might be better to spell out the full name. Maybe even throw "AArch64" in there so more people know they can ignore it.

clang/lib/Driver/ToolChains/Linux.cpp
849

Can this be an assertion? How does a non-Linux triple end up in Linux::IsOADefault? Also, I think the isAndroid check is redundant: Android is Linux (both in reality and in valid llvm::Triples).

llvm/lib/Target/AArch64/AArch64.td
1087 ↗(On Diff #315613)

I think this still enables it more widely than we want. Clang overrides it with -outline-atomics, but other front-ends don't.

ilinpv updated this revision to Diff 316471.Jan 13 2021, 11:37 AM
ilinpv marked 2 inline comments as done.Jan 13 2021, 11:50 AM
ilinpv added inline comments.
llvm/lib/Target/AArch64/AArch64.td
1087 ↗(On Diff #315613)

Could I ask you to clarify what front-ends you meant (to check outline atomics suport for them)?

t.p.northover added inline comments.Jan 19 2021, 10:55 AM
llvm/lib/Target/AArch64/AArch64.td
1087 ↗(On Diff #315613)

Any front-end that generates LLVM IR. Generally important ones are Swift and Rust, but there are many more and I kind of doubt it's feasible to ensure they all will. It's unfortunate, and maybe at some point we can put logic in LLVM to approximate Clang's decision-making and get the benefit without any opt-in, but I kind of think it's a bit early yet. We'd risk breaking too many people's builds.

Also, putting it in the generic CPU model here means that if a front-end allows something like -mcpu=cortex-a57 it will suddenly disable outlined atomics. Whether they're good or bad, that's just plain weird behaviour. A CPU-level feature isn't the right place for this.

jgreenhalgh added inline comments.
llvm/lib/Target/AArch64/AArch64.td
1087 ↗(On Diff #315613)

Not necessarily arguing in either direction for where these should be enabled, but is that behaviour really so weird?

I would expect that we want to specialise to inline atomics as soon as we're given architecture permission (-march=armv8.1-a+lse or greater) or CPU-directed specialisation. These things are only useful for the transition case where your platform mandates Armv8.0 only binaries but you're expecting to also run fast on cores with LSE support.

What we're nervous about is burying this option, which we believe to be generally useful for the Armv8.0 platforms, such that it is opt-in. It really isn't the end of the world if this is C/C++ on LInux/Android only for now, but it feels like missed opportunity. That said, it is safe, so probably makes sense to go for that now Pavel and think again about where else we want it on.

ilinpv updated this revision to Diff 317849.Jan 20 2021, 5:30 AM
ilinpv retitled this revision from [AArch64] Enable out-of-line atomics by default. to [AArch64][Clang][Linux] Enable out-of-line atomics by default..
ilinpv edited the summary of this revision. (Show Details)
ilinpv marked an inline comment as done.

@t.p.northover sorry for pinging you, do you have any concerns about latest patch? It enables outline atomics by default on Aarch64/Linux/Clang with libraries check. I am hoping to get this in before LLVM 12 branch.

SjoerdMeijer added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
6488

This needs a Clang driver tests.

ilinpv updated this revision to Diff 319693.Jan 27 2021, 4:04 PM

Clang driver tests for outline atomics were added.

ilinpv marked an inline comment as done.Jan 27 2021, 4:04 PM

Clang driver tests for outline atomics were added.

Thanks!

Is there a way we can test -rtlib=libgcc?

clang/lib/Driver/ToolChains/Linux.cpp
855

Nit: if we change this into:

if (GCCInstallation.getVersion().isOlderThan(9, 3, 1))

we can get rid of the curly brackets.

857

This means we need one more test and RUN line with the rtlib not being compiler-rt or libgcc.

ilinpv updated this revision to Diff 319951.Jan 28 2021, 2:47 PM
ilinpv marked 2 inline comments as done.

Sorry one last question: how about -mno-outline-atomics in combination with this? Think we need to add a test for that too?

I think this looks good now, modulo Sjoerd's last comment.

ilinpv updated this revision to Diff 320123.Jan 29 2021, 6:30 AM

Tests for "-m[no]outline-atomics" options added.

SjoerdMeijer accepted this revision.Jan 29 2021, 6:43 AM

Thanks, LGTM too.

One nit inlined that can be addressed before committing.

clang/test/Driver/aarch64-features.c
47

Probably better to just check for the OFF case:

"-target-feature" "-outline-atomics"
This revision is now accepted and ready to land.Jan 29 2021, 6:43 AM
ilinpv updated this revision to Diff 320142.Jan 29 2021, 8:21 AM
ilinpv marked an inline comment as done.
This revision was landed with ongoing or failed builds.Jan 29 2021, 9:46 AM
This revision was automatically updated to reflect the committed changes.