Page MenuHomePhabricator

cmake/modules/CheckAtomic.cmake: catch false positives in RISC-V
ClosedPublic

Authored by gokturk on Oct 14 2019, 5:12 PM.

Details

Summary

The check for 'HAVE_CXX_ATOMICS_WITHOUT_LIB' may create false
positives in RISC-V. This is reproducible when compiling LLVM natively
using GCC on a rv64gc (rv64imafdgc) host. Due to the 'A' (atomic)
extension, g++ replaces calls to libatomic operations on the
std::atomic<int> type with the native hardware instructions. As a
result, the compilation succeeds and the build system thinks it
doesn't need to pass '-latomic'.

To complicate matters, even when the 'A' extension is forcibly
disabled (by passing '-march=rv64imfd'), g++ compiles the test code
snippet just fine.

To the contrary, '__atomic' builtins or C++11 atomics requires linking
against '-latomic' in RISC-V (See
https://github.com/riscv/riscv-gcc/issues/12).

This causes the following build failure for dsymutil:

: && /usr/bin/riscv64-unknown-linux-gnu-g++ -O2 -pipe -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++14 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -fdiagnostics-color -ffunction-sections -fdata-sections -Wl,-O1 -Wl,--as-needed -Wl,-allow-shlib-undefined -Wl,-rpath-link,/var/tmp/portage/sys-devel/llvm-10.0.0.9999/work/llvm-10.0.0.9999-abi_riscv_lp64d.lp64d/./lib64/lp64d -Wl,-O3 -Wl,--gc-sections tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/BinaryHolder.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/CFBundle.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/CompileUnit.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/DebugMap.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/DeclContext.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/DwarfLinker.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/DwarfStreamer.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/MachODebugMapParser.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/MachOUtils.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/NonRelocatableStringpool.cpp.o tools/dsymutil/CMakeFiles/dsymutil.dir/SymbolMap.cpp.o -o bin/dsymutil -Wl,-rpath,"\$ORIGIN/../lib64/lp64d" lib64/lp64d/libLLVM-10svn.so -lpthread && :
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o: in function `.LEHB79':
dsymutil.cpp:(.text._ZNSt17_Function_handlerIFvvESt5_BindIFZ4mainEUlSt10shared_ptrIN4llvm14raw_fd_ostreamEENS3_8dsymutil11LinkOptionsEE_S5_S7_EEE9_M_invokeERKSt9_Any_data+0x168): undefined reference to `atomic_fetch_and_1'
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o: in function `.L0 ':
dsymutil.cpp:(.text._ZNSt17_Function_handlerIFvvESt5_BindIFZ4mainEUlSt10shared_ptrIN4llvm14raw_fd_ostreamEENS3_8dsymutil11LinkOptionsEE_S5_S7_EEE9_M_invokeERKSt9_Any_data+0x2ee): undefined reference to `
atomic_fetch_and_1'
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: dsymutil.cpp:(.text.startup.main+0xefc): undefined reference to `atomic_fetch_and_1'
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o: in function `.L2508':
dsymutil.cpp:(.text.startup.main+0x1b2a): undefined reference to `
atomic_fetch_and_1'
collect2: error: ld returned 1 exit status

This is likely because dsymutil is defining a variable of type
'std::atomic_char' (See:
https://github.com/llvm/llvm-project/blob/release/9.x/llvm/tools/dsymutil/dsymutil.cpp#L542)

Improve the reliability of the 'HAVE_CXX_ATOMICS_WITHOUT_LIB' test in
two steps:

  1. Force a pre-increment on x (++x), which should force a call to a

libatomic function per the prototype:

__int_type
operator++(int) volatile noexcept
{ return fetch_add(1); }
  1. Because step 1 would resolve the increment to 'amoadd.w.aq' under

the 'A' extension, force the same operation on sub-word types, for
which there is no hardware support. This effectively forces g++ to
insert calls to libatomic functions.

Diff Detail

Event Timeline

gokturk created this revision.Oct 14 2019, 5:12 PM

libc++ also has a version of this check (https://github.com/llvm/llvm-project/blob/master/libcxx/cmake/Modules/CheckLibcxxAtomic.cmake). Does that need to be adjusted as well?

jfb accepted this revision.Oct 14 2019, 8:54 PM

This is pretty brittle... but 🤷‍♂️

This revision is now accepted and ready to land.Oct 14 2019, 8:54 PM

libc++ also has a version of this check (https://github.com/llvm/llvm-project/blob/master/libcxx/cmake/Modules/CheckLibcxxAtomic.cmake). Does that need to be adjusted as well?

It does create false positives. If I use the following code sample from the libcxx cmake test:

#include <cstdint>
#include <atomic>
std::atomic<uintptr_t> x;
std::atomic<uintmax_t> y;
int main(int, char**) {
  return x + y;
}

g++ compiles it just fine, with or without the 'A' extension defined (-march=rv64imafd vs -march=rv64imfd). It seems like g++ is inlining the following two functions in the binary:

std::__atomic_base<unsigned long>::operator unsigned long() const;
std::operator&(std::memory_order, std::__memory_order_modifier;

but performing the x + y as two sign-extended integer addition:

...
# initialize and sign-extend x, store it in register s1
jal	ra,3a8 <std::__atomic_base<unsigned long>::operator unsigned long() const>
mv	a5,a0
sext.w	s1,a5
...
# Initialize and sign-extend y, store it in register a5
jal	ra,3a8 <std::__atomic_base<unsigned long>::operator unsigned long() const>
mv	a5,a0
sext.w	a5,a5
# Perform x + y, sign extend the result twice (??)
addw	a5,s1,a5
sext.w	a5,a5
sext.w	a5,a5

As a result, no need for libatomic in the compilation. If I tweak the code sample a little bit so that x + y is a separate statement:

#include <cstdint>
#include <atomic>
std::atomic<uintptr_t> x;
std::atomic<uintmax_t> y;
int main(int, char**) {
  x += y;
  return x;
}

this still compiles fine under the 'A' extension:

$ g++ -march=rv64imafd -std=c++11 -nostdlib libcxx-atomic-check.cpp 
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000298

but fails without the 'A' extension:

$ g++ -march=rv64imfd -std=c++11 -nostdlib libcxx-atomic-check.cpp 
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000000310
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: /tmp/ccNebYb9.o: in function `.L0 ':
libcxx-atomic-check.cpp:(.text._ZNSt13__atomic_baseImEpLEm[_ZNSt13__atomic_baseImEpLEm]+0x28): undefined reference to `__atomic_fetch_add_8'
collect2: error: ld returned 1 exit status

I think it needs to be fixed as well.

Please do not merge.

I do not see how the result of this check is propagated to LDFLAGS by the build system. Even after this patch, the build fails the same way:

/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o: in function `.LEHB79':
dsymutil.cpp:(.text._ZNSt17_Function_handlerIFvvESt5_BindIFZ4mainEUlSt10shared_ptrIN4llvm14raw_fd_ostreamEENS3_8dsymutil11LinkOptionsEE_S5_S7_EEE9_M_invokeERKSt9_Any_data+0x168): undefined reference to `__atomic_fetch_and_1'
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o: in function `.L0 ':
dsymutil.cpp:(.text._ZNSt17_Function_handlerIFvvESt5_BindIFZ4mainEUlSt10shared_ptrIN4llvm14raw_fd_ostreamEENS3_8dsymutil11LinkOptionsEE_S5_S7_EEE9_M_invokeERKSt9_Any_data+0x2ee): undefined reference to `__atomic_fetch_and_1'
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: dsymutil.cpp:(.text.startup.main+0xefc): undefined reference to `__atomic_fetch_and_1'
/usr/lib/gcc/riscv64-unknown-linux-gnu/9.1.0/../../../../riscv64-unknown-linux-gnu/bin/ld: tools/dsymutil/CMakeFiles/dsymutil.dir/dsymutil.cpp.o: in function `.L2508':
dsymutil.cpp:(.text.startup.main+0x1b2a): undefined reference to `__atomic_fetch_and_1'
collect2: error: ld returned 1 exit status

libcxx has the following in CMakeLists.txt:

if (LIBCXX_HAVE_CXX_ATOMICS_WITH_LIB)
  target_link_libraries(${target} PRIVATE atomic)
endif()

but LLVM's CMakeLists.txt does not contain anything like that.

I think we have potentially 3 separate bugs:

  1. In LLVM, the check for libatomic has false positives in RISC-V (which the patch is addressing).
  2. In LLVM, the result of the libatomic check is not being used to add '-latomic' to LDFLAGS.
  3. libcxx libatomic check is likely to exhibit false positive in RISC-V.

I think we have potentially 3 separate bugs:

  1. In LLVM, the check for libatomic has false positives in RISC-V (which the patch is addressing).
  2. In LLVM, the result of the libatomic check is not being used to add '-latomic' to LDFLAGS.

See: https://reviews.llvm.org/D69003

I'm not sure if this was the right location for it. Let me know if you want it fixed differently.

  1. libcxx libatomic check is likely to exhibit false positive in RISC-V.

This is bogus -- the RISCV GCC implementation is broken. It should not be mixing libatomic function calls with direct atomics.

jfb added a comment.Oct 16 2019, 9:53 AM

This is bogus -- the RISCV GCC implementation is broken. It should not be mixing libatomic function calls with direct atomics.

Yeah, but if it doesn't work then we should work around it. Using a single atomic load in the test, or using multiple atomic instructions which are representative of what LLVM actually uses, is fine by me. The test is still simple, it's still brittle and silly, but then it lets them build so that's fine.

This revision was automatically updated to reflect the committed changes.

Could this be backported to 10.0?

smeenai added a subscriber: hans.Feb 18 2020, 9:49 AM

CC @hans for 10.0

hans added a reviewer: asb.Feb 19 2020, 4:14 AM

CC @hans for 10.0

+asb from code_owners.txt, what do you think?

asb added a comment.Feb 19 2020, 9:29 AM

CC @hans for 10.0

+asb from code_owners.txt, what do you think?

The backport request seems ok to me, assuming you're happy taking a patch like this at this stage in the release process.

hans added a comment.Feb 20 2020, 6:23 AM
In D68964#1882946, @asb wrote:

CC @hans for 10.0

+asb from code_owners.txt, what do you think?

The backport request seems ok to me, assuming you're happy taking a patch like this at this stage in the release process.

Yes, pushed as a572a8a147c76b9d31585c2d4257a5db566c9a9d. Thanks!

In D68964#1882946, @asb wrote:

The backport request seems ok to me, assuming you're happy taking a patch like this at this stage in the release process.

This patch is part of a broader effort from @gokturk, where the various patches addressed different facets of this issue. If this is to be backported, then probably all of them should:

To properly land D69869 you'll need the tweak I introduced in commit f128f442a3d, otherwise MSVC builds will fail. (@hans)

hans added a comment.Feb 20 2020, 6:45 AM
In D68964#1882946, @asb wrote:

The backport request seems ok to me, assuming you're happy taking a patch like this at this stage in the release process.

This patch is part of a broader effort from @gokturk, where the various patches addressed different facets of this issue. If this is to be backported, then probably all of them should:

Oh I see. I've backed it out again in d75ce45777d9802d43b555993fde8ed6562fb368, this is too much stuff to land so late. I think it can wait until the 11 release.

Yes, please backport (if possible) all other related changes mentioned above (if they are not yet part of 10.0).

This would help distributions to avoid hacking LDFLAGS / build systems files while building LLVM/Clang stack on Linux.

Oh I see. I've backed it out again in d75ce45777d9802d43b555993fde8ed6562fb368, this is too much stuff to land so late. I think it can wait until the 11 release.

@hans: this is a set of 5 patches, so it might seem a lot, but they are fairly simple (this effort just happens to be split across several patches, for review friendliness) and quite nonintrusive, once f128f442a3d is applied. Several people seem to be asking for this, so maybe reconsider if there's still time to backport them?

asb added a comment.Feb 20 2020, 7:48 AM

Yes, please backport (if possible) all other related changes mentioned above (if they are not yet part of 10.0).

This would help distributions to avoid hacking LDFLAGS / build systems files while building LLVM/Clang stack on Linux.

It's unfortunate, but as Hans says this series of patches is likely too much at this stage in the release process. If there's a 10.0.1, perhaps it makes sense there. They benefit the RISC-V community, but there's a risk of regressing somewhere else in a way that's probably not fair to other targets and their release testers.

In D68964#1884694, @asb wrote:

It's unfortunate, but as Hans says this series of patches is likely too much at this stage in the release process. If there's a 10.0.1, perhaps it makes sense there. They benefit the RISC-V community, but there's a risk of regressing somewhere else in a way that's probably not fair to other targets and their release testers.

That's a reasonable point. Thanks.