Set setMaxAtomicSizeInBitsSupported for SystemZ.
|60,080 ms||x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp|
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -fsanitize=thread -Wall -m64 -msse4.2 -gline-tables-only -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/../ -std=c++11 -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/../ -O1 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/restore_stack.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/tsan/X86_64Config/Output/restore_stack.cpp.tmp && not /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/tsan/X86_64Config/Output/restore_stack.cpp.tmp 2>&1 | FileCheck /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/tsan/restore_stack.cpp
|60,060 ms||x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test|
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/LeakTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/fuzzer-leak.test.tmp-LeakTest
|60,060 ms||x64 debian > libFuzzer.libFuzzer::minimize_crash.test|
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang --driver-mode=g++ -O2 -gline-tables-only -fsanitize=address,fuzzer -I/var/lib/buildkite-agent/builds/llvm-project/compiler-rt/lib/fuzzer -m64 /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/fuzzer/NullDerefTest.cpp -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/fuzzer/X86_64DefaultLinuxConfig/Output/minimize_crash.test.tmp-NullDerefTest
I don't think this is quite correct, as we do support 128-bit atomics on properly aligned addresses. In any case, the back-end implements atomic load, store, and cmpxchg for i128. We do not currently implement atomicrmw, but that could be handled in terms of cmpxchg of course. For non-aligned addresses, we do need to fall back to the library routine, however.
Interestingly enough, the SystemZ back-end does not even use the AtomicExpandPass at all at the moment, so the setMaxAtomicSizeInBitsSupported change probably doesn't even have any effect? Everything still seems to work, however ...
@jonpa we don't seem to have test cases for i128 atomicrmw at the moment - could you have a loop at what happens here, and if there's maybe something we need to change? Thanks!
LegalizeTypes expands the i128 result of e.g. an AtomicLoadAdd node, which is done by DAGTypeLegalizer::ExpandAtomic(), which results in a library call. There are no tests for the atomic operations with i128, but it should work due to the TypeLegalizer. i128 is not a legal type so I guess therefore they are all expanded this way.
I could imagine that it might be beneficial to use the AtomicExpandPass to use the target instructions whenever possible, and libcalls otherwise. However, I am unsure about this: The AtomicExpandPass seems to do this (emit the libcalls only if needed, after checking alignment as well), but I see this in the Atomics document:
"One very important property of the atomic operations is that if your backend supports any inline lock-free atomic operations of a given size, you should support ALL operations of that size in a lock-free manner...it is invalid to implement atomic load using the native instruction, but cmpxchg using a library call to a function that uses a mutex, atomic load must also expand to a library call on such architectures, so that it can remain atomic with regards to a simultaneous cmpxchg, by using the same mutex."
I guess I must be missing something somehow...
No, you don't. You cannot mix lock-free and mutex atomics. E.g, if loads are lock-free and stores use a mutex. Then you have lost atomicity. Your mutex version could do stores one byte at a time.
Note that this is handled by the particular implementation of the 16-byte atomic library routines on s390x: they all perform a *runtime* check for proper alignment, and if the address actually *is* 16-byte aligned, the library routine will in fact use the appropriate 16-byte atomic instructions to implement the operation (instead of a separate mutex). That is why it *is* safe for the compiler to emit those instructions itself *if* it can prove at compile-time that the address will in fact be properly aligned at run-time. (That's also done by GCC, for example.)