Page MenuHomePhabricator

[AArch64] Compiler-rt interface for out-of-line atomics.
ClosedPublic

Authored by ilinpv on Nov 10 2020, 5:52 AM.

Details

Summary

Out-of-line helper functions to support LSE deployment added.
This is a port of libgcc implementation:
https://gcc.gnu.org/git/?p=gcc.git;h=33befddcb849235353dc263db1c7d07dc15c9faa

Diff Detail

Event Timeline

ilinpv created this revision.Nov 10 2020, 5:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2020, 5:52 AM
Herald added subscribers: Restricted Project, danielkiss, phosek and 5 others. · View Herald Transcript
ilinpv requested review of this revision.Nov 10 2020, 5:52 AM

This is a port of libgcc implementation

I take it the licensing stuff all works out because Linaro is the original author and you're working with them?

danielkiss added inline comments.Nov 10 2020, 8:41 AM
compiler-rt/lib/builtins/assembly.h
91

.pushsection and a .popsection at the end would be safer.

103

bti c mnemonic can be used here, it is accepted for all architecture versions.

jfb added inline comments.Nov 10 2020, 8:48 AM
compiler-rt/lib/builtins/aarch64/lse-init.c
30 ↗(On Diff #304157)

Having an extra constructor isn't great, it would be better to put it in compiler-rt/lib/builtins/cpu_model.c.

jyknight added inline comments.Nov 10 2020, 2:31 PM
compiler-rt/cmake/builtin-config-ix.cmake
25

Question: is there a requirement that compiler-rt be buildable with an assembler which doesn't support the LSE instruction syntax?

This is a port of libgcc implementation

I take it the licensing stuff all works out because Linaro is the original author and you're working with them?

Thank you for pointing out, need to postpone this patch for licensing clarification.

ilinpv abandoned this revision.Nov 11 2020, 5:47 AM

Need to resolve license issue first.

As the original gcc author, and on behalf of Linaro, I'm happy to ack the relicense.

ilinpv reclaimed this revision.Nov 17 2020, 8:50 AM
danielkiss added inline comments.Nov 17 2020, 8:56 AM
compiler-rt/lib/builtins/CMakeLists.txt
512

I think it is enough to add it after line 502.

compiler-rt/lib/builtins/aarch64/lse-init.c
16 ↗(On Diff #304157)

Do we compile compiler-rt against glibc in any situation?

ilinpv updated this revision to Diff 307087.Nov 23 2020, 9:43 AM

Comments were taken into account.

ilinpv marked 3 inline comments as done.Nov 23 2020, 9:45 AM
ilinpv added inline comments.
compiler-rt/cmake/builtin-config-ix.cmake
25

I wouldn't say it is a requirement, be buildable by various gnu toolchains which include assembler without lse syntax is a plus.

ilinpv marked an inline comment as done.Nov 23 2020, 9:46 AM
ilinpv added inline comments.
compiler-rt/lib/builtins/aarch64/lse-init.c
16 ↗(On Diff #304157)

AOSP toolchain build against bionic

rth7680 added inline comments.Nov 23 2020, 12:57 PM
compiler-rt/lib/builtins/cpu_model.c
773

You can't include the weak attribute without also testing vs NULL when you use it.

Is there a particular system for which you added this attribute? If the function does not exist on a given system, you'd be better off just omitting the entire constructor.

ilinpv added inline comments.Nov 23 2020, 3:15 PM
compiler-rt/lib/builtins/cpu_model.c
773

Weak could be helpful if you compiled on old Android API without getauxval but run on newer one that has it.
You are right, if system has no getauxval you'll get NULL deference. Do you think it is worth to fix this "running binary on wrong system" case at the cost of additional instructions to check NULL? This binary will not work on such system anyway.

rth7680 added inline comments.Nov 23 2020, 4:09 PM
compiler-rt/lib/builtins/cpu_model.c
773

I don't know if it's worth providing the future API for an old Android. If you omit that, you can remove the defines for AT_HWCAP and HWCAP_ATOMICS as well. We could also reduce the explicit version checks with

#if defined(__has_include)
# if __has_include(<sys/auxv.h>)
#  include <sys/auxv.h>
#  if defined(AT_HWCAP) && defined(HWCAP_ATOMICS)
// constructor
#  endif
# endif
#endif
ilinpv updated this revision to Diff 307545.Nov 25 2020, 2:03 AM

Correct getauxval check.

ilinpv added inline comments.Nov 25 2020, 2:15 AM
compiler-rt/lib/builtins/cpu_model.c
773

Thanks for suggestion, I think we can omit that now and add support for old Android API later if required. HWCAP_ATOMICS is not defined on AOSP so I've added ifndefs.

Thank you, does anyone have any more concerns? Could you accept revision?

danielkiss accepted this revision.Dec 2 2020, 5:27 AM

LGTM, Thanks.

This revision is now accepted and ready to land.Dec 2 2020, 5:27 AM
This revision was landed with ongoing or failed builds.Dec 2 2020, 12:07 PM
This revision was automatically updated to reflect the committed changes.

This broke building for non-ELF targets, I'll send a patch for fixing that.

compiler-rt/lib/builtins/CMakeLists.txt
521

These generated templated files end up placed directly in the root of the builddir (when building standalone, with cmake pointing at compiler-rt/lib/builtins) which I find a bit unusual. Could some subdir be made for them to avoid cluttering the build dir too much?

Also, I see that this wasn't actually sent to any mailing list - I guess we should have some list auto-added for anything touching compiler-rt?

ilinpv added a comment.Dec 3 2020, 4:25 AM

Also, I see that this wasn't actually sent to any mailing list - I guess we should have some list auto-added for anything touching compiler-rt?

Plans and questions were sent to llvm-dev http://lists.llvm.org/pipermail/llvm-dev/2020-October/145841.html.

ilinpv added inline comments.Dec 3 2020, 4:33 AM
compiler-rt/lib/builtins/CMakeLists.txt
521

Yes, if it is more convenient I can create subdir like 'outline_atomic_helpers' for them

ilinpv added a comment.Dec 3 2020, 4:34 AM

This broke building for non-ELF targets, I'll send a patch for fixing that.

Thank you for that!

Also, I see that this wasn't actually sent to any mailing list - I guess we should have some list auto-added for anything touching compiler-rt?

Plans and questions were sent to llvm-dev http://lists.llvm.org/pipermail/llvm-dev/2020-October/145841.html.

Fair enough, but despite that, afaik the individual patch reviews still should have a public list set as subscriber by principle (either llvm-commits, cfe-commits or libcxx-commits). This becomes a bit easier to miss by the fact that such lists are automatically added as subscribers for certain subprojects, but apparently not for compiler-rt. (Also - even if the patch reviews had been sent to the list, I most certainly wouldn't have picked up and commented on them.)

compiler-rt/lib/builtins/CMakeLists.txt
521

Yeah, something like that sounds good to me.

ilinpv added inline comments.Dec 5 2020, 1:35 PM
compiler-rt/lib/builtins/CMakeLists.txt
521

Patch for that is here: https://reviews.llvm.org/D92724

sebpop added a subscriber: sebpop.Dec 16 2020, 10:15 PM

I tested this change on Graviton2 aarch64-linux with clang -moutline-atomics.
clang was configured with compiler-rt:

cmake -v -G Ninja \
  -DCLANG_DEFAULT_RTLIB:STRING=compiler-rt \
  -DLLVM_ENABLE_PROJECTS:STRING="clang;compiler-rt;libunwind" \
  -DCLANG_DEFAULT_UNWINDLIB:STRING=libunwind \
  -DCMAKE_BUILD_TYPE:STRING=Release \
  -DCMAKE_INSTALL_PREFIX:PATH=/home/ubuntu/llvm/usr/ \
  ../llvm

I built and tested https://github.com/xianyi/OpenBLAS and https://github.com/boostorg/boost.git and there are no new fails.
I also tested the micro-benchmark as described in https://reviews.llvm.org/D91157

+ export LD_LIBRARY_PATH=/home/ubuntu/llvm/usr/lib
+ LD_LIBRARY_PATH=/home/ubuntu/llvm/usr/lib
+ /home/ubuntu/llvm/usr/bin/clang++ -o clang-generic-v8 a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread
+ ./clang-generic-v8
2020-12-16 23:22:45
Running ./clang-generic-v8
Run on (64 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x64)
  L1 Instruction 64 KiB (x64)
  L2 Unified 1024 KiB (x64)
  L3 Unified 32768 KiB (x1)
Load Average: 0.33, 2.58, 7.68
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
BM_atomic_increment                  9.21 ns         9.21 ns     76001652
BM_atomic_fetch_add                  9.21 ns         9.21 ns     76021489
BM_atomic_compare_exchange           7.61 ns         7.61 ns     92024848
BM_std_atomic_compare_exchange       12.4 ns         12.4 ns     56402874
+ /home/ubuntu/llvm/usr/bin/clang++ -o clang-lse -march=armv8-a+lse a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread
+ ./clang-lse
2020-12-16 23:22:49
Running ./clang-lse
Run on (64 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x64)
  L1 Instruction 64 KiB (x64)
  L2 Unified 1024 KiB (x64)
  L3 Unified 32768 KiB (x1)
Load Average: 0.38, 2.55, 7.64
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
BM_atomic_increment                  7.21 ns         7.21 ns     97070064
BM_atomic_fetch_add                  7.21 ns         7.21 ns     97124037
BM_atomic_compare_exchange           7.21 ns         7.21 ns     97134576
BM_std_atomic_compare_exchange       11.8 ns         11.8 ns     59414449
+ /home/ubuntu/llvm/usr/bin/clang++ -o clang-moutline -moutline-atomics a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread
+ ./clang-moutline
2020-12-16 23:22:52
Running ./clang-moutline
Run on (64 X 243.75 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB (x64)
  L1 Instruction 64 KiB (x64)
  L2 Unified 1024 KiB (x64)
  L3 Unified 32768 KiB (x1)
Load Average: 0.38, 2.55, 7.64
***WARNING*** Library was built as DEBUG. Timings may be affected.
-------------------------------------------------------------------------
Benchmark                               Time             CPU   Iterations
-------------------------------------------------------------------------
BM_atomic_increment                  7.21 ns         7.21 ns     97090953
BM_atomic_fetch_add                  7.21 ns         7.21 ns     97136432
BM_atomic_compare_exchange           7.21 ns         7.21 ns     97132760
BM_std_atomic_compare_exchange       11.6 ns         11.6 ns     60292533