This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][test] Heed COMPILER_RT_DEBUG when compiling unittests
ClosedPublic

Authored by ro on Nov 17 2020, 4:57 AM.

Details

Summary

When trying to debug some compiler-rt unittest, I initially had a hard time because

  • even in a Debug build one needs to set COMPILER_RT_DEBUG to get debugging info for some of the code and
  • even the so the unittests used a hardcoded -O2 which often makes debugging impossible.

This patch addresses this by instead using -O0 if COMPILER_RT_DEBUG. sanitizer_stacktrace_test.cpp needs a definition of BufferedStackTrace::UnwindImpl to avoid a link failure.

Tested on sparcv9-sun-solaris2.11, amd64-pc-solaris2.11, and x86_64-pc-linux-gnu.

Diff Detail

Event Timeline

ro created this revision.Nov 17 2020, 4:57 AM
Herald added subscribers: Restricted Project, pengfei, fedor.sergeev and 3 others. · View Herald TranscriptNov 17 2020, 4:57 AM
ro requested review of this revision.Nov 17 2020, 4:57 AM
vitalybuka added inline comments.Nov 18 2020, 1:19 AM
compiler-rt/lib/asan/tests/CMakeLists.txt
41–42

why do we need this after list(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS -gline-tables-only) above

compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp
71 ↗(On Diff #305745)

why it's here?

ro added inline comments.Nov 18 2020, 6:04 AM
compiler-rt/lib/asan/tests/CMakeLists.txt
41–42

AFAICS COMPILER_RT_TEST_COMPILER_CFLAGS is only used in compiler-rt/test/* (via get_test_cc_for_arch) whereas ASAN_UNITTEST_COMMON_CFLAGS is used in compiler-rt/lib/asan/test. Pretty confusing IMO ;-)

compiler-rt/lib/sanitizer_common/tests/sanitizer_stacktrace_test.cpp
71 ↗(On Diff #305745)

Without it, e.g. Sanitizer-sparc-Test fails to link at g3 -O0:

Undefined			first referenced
 symbol  			    in file
_ZN11__sanitizer18BufferedStackTrace10UnwindImplEmmPvbj SANITIZER_TEST_OBJECTS.sanitizer_stacktrace_test.cpp.sparc.o

Every user of BufferedStackStrace::Unwind needs to provide its own implementation of UnwindImpl, e.g. in asan_stack.cpp. However, that's not appropriate in sanitizer_common, thus I chose to place it in the test case that triggers the use.

vitalybuka added inline comments.Nov 18 2020, 11:32 PM
compiler-rt/lib/asan/tests/CMakeLists.txt
41–42

Maybe better to update COMPILER_RT_UNITTEST_CFLAGS where we update COMPILER_RT_TEST_COMPILER_CFLAGS
and remove all following blocks

if(COMPILER_RT_TEST_COMPILER_ID MATCHES "Clang" AND NOT COMPILER_RT_DEBUG)
  list(APPEND *_TEST_CFLAGS_COMMON -gline-tables-only)
elseif(COMPILER_RT_DEBUG)
  # -g3 already in COMPILER_RT_UNITEST_CFLAGS.
else()
  list(APPEND *_TEST_CFLAGS_COMMON -g)
endif()
vitalybuka requested changes to this revision.Feb 17 2021, 1:55 PM
This revision now requires changes to proceed.Feb 17 2021, 1:55 PM
ro updated this revision to Diff 464957.Oct 4 2022, 4:50 AM
  • Update COMPILER_RT_UNITTEST_CFLAGS as requested.
  • Change SANITIZER_COMMON_CFLAGS to -O0 for COMPILER_RT_DEBUG, cannot reliably debug with optimization.
  • compiler-rt/lib/sanitizer_common/tests/sanitizer_type_traits_test.cpp tests SanitizerCommon.IsTriviallyDestructible and SanitizerCommon.IsTriviallyCopyable fail to link with debug build:
FAILED: projects/compiler-rt/lib/sanitizer_common/tests/Sanitizer-i386-Test
[...]
Undefined                       first referenced
 symbol                             in file
_ZN11__sanitizer17integral_constantIbLb1EE5valueE SANITIZER_TEST_OBJECTS.sanitizer_type_traits_test.cpp.i386.o
ld: fatal: symbol referencing errors
Maybe one can provide a definition of `__sanitizer::integral_constant<bool, true>::value`?
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 4:50 AM
Herald added a subscriber: Enna1. · View Herald Transcript
vitalybuka accepted this revision.Oct 4 2022, 10:34 PM
This revision is now accepted and ready to land.Oct 4 2022, 10:34 PM
ro added a comment.Oct 5 2022, 2:02 AM

This patch caused a regression on the sanitizer-x86_64-linux-autoconf buildbot running a debug build:

ThreadSanitizer-x86_64 :: Linux/check_memcpy.c

The executable now contains calls to memset. AFAICS this is a consequence of changing debug builds from -O1 to -O0 and thus is expected. Therefore I suggest to just add

XFAIL: !compiler-rt-optimized

Unfortunately I missed this in my local builds: I had run Debug builds on x86_64-pc-linux-gnu both with a vanilla tree and my patch, but already the vanilla results were insanely bad (309 FAILs), so I didn't notice the regression.

ro added a subscriber: uweigand.Oct 5 2022, 7:09 AM

This seems to cause another regression, apparently only visible in cross builds: clang-s390x-linux buildbot: a link failure like this:

******************** TEST 'AddressSanitizer-s390x-linux ::
TestCases/Linux/init-order-dlopen.cpp' FAILED ********************
Script:
[...]

: 'RUN: at line 5';
/home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/./bin/clang
--driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer
-fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only
-O3;-g -mbackchain -O0
/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/asan/TestCases/Linux/init-order-dlopen.cpp
-ldl -Wl,--export-dynamic -o
/home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/projects/compiler-rt/test/asan/S390XLinuxConfig/TestCases/Linux/Output/init-order-dlopen.cpp.tmp
[...]
--
Exit Code: 127

Command Output (stderr):
--
clang-16: error: no input files
/home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/projects/compiler-rt/test/asan/S390XLinuxConfig/TestCases/Linux/Output/init-order-dlopen.cpp.script:
line 1: -g: command not found

The -O3;-g on the link line seems to come from compiler-rt/test/lit.common.configured. I suspect that can be traced down to code in compiler-rt/CMakeLists.txt which replaces " " by ";" in COMPILER_RT_TEST_COMPILER_CFLAGS. I don't really understand this and it seems terribly fragile to me, but I believe this can be avoided by prepending the new instances of setting COMPILER_RT_TEST_COMPILER_CFLAGS in compiler-rt/CMakeLists.txt with a blank. I have a patch for that and am trying to test it, but as mentioned the issue hasn't affected me. @uweigand may be able to help.

In D91620#3836854, @ro wrote:

This seems to cause another regression, apparently only visible in cross builds: clang-s390x-linux buildbot: a link failure like this:

Just to clarify, this build bot does *not* perform a cross build, it's a regular native build, just running on the s390x architecture.

******************** TEST 'AddressSanitizer-s390x-linux ::
TestCases/Linux/init-order-dlopen.cpp' FAILED ********************
Script:
[...]

: 'RUN: at line 5';
/home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/./bin/clang
--driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer
-fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only
-O3;-g -mbackchain -O0
/home/uweigand/sandbox/buildbot/clang-s390x-linux/llvm/compiler-rt/test/asan/TestCases/Linux/init-order-dlopen.cpp
-ldl -Wl,--export-dynamic -o
/home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/projects/compiler-rt/test/asan/S390XLinuxConfig/TestCases/Linux/Output/init-order-dlopen.cpp.tmp
[...]
--
Exit Code: 127

Command Output (stderr):
--
clang-16: error: no input files
/home/uweigand/sandbox/buildbot/clang-s390x-linux/stage1/projects/compiler-rt/test/asan/S390XLinuxConfig/TestCases/Linux/Output/init-order-dlopen.cpp.script:
line 1: -g: command not found

The -O3;-g on the link line seems to come from compiler-rt/test/lit.common.configured. I suspect that can be traced down to code in compiler-rt/CMakeLists.txt which replaces " " by ";" in COMPILER_RT_TEST_COMPILER_CFLAGS. I don't really understand this and it seems terribly fragile to me, but I believe this can be avoided by prepending the new instances of setting COMPILER_RT_TEST_COMPILER_CFLAGS in compiler-rt/CMakeLists.txt with a blank. I have a patch for that and am trying to test it, but as mentioned the issue hasn't affected me. @uweigand may be able to help.

I don't see anything particularly special to s390x here, with the only exception of -mbackchain. This flag is added to sanitizer tests on s390x, where we need it for the same reason other platforms tend to need -fno-omit-frame-pointer. It's unclear how that would be related to this ";" issue.

ro added a comment.Oct 5 2022, 7:38 AM

[...]

The -O3;-g on the link line seems to come from compiler-rt/test/lit.common.configured. I suspect that can be traced down to code in compiler-rt/CMakeLists.txt which replaces " " by ";" in COMPILER_RT_TEST_COMPILER_CFLAGS. I don't really understand this and it seems terribly fragile to me, but I believe this can be avoided by prepending the new instances of setting COMPILER_RT_TEST_COMPILER_CFLAGS in compiler-rt/CMakeLists.txt with a blank. I have a patch for that and am trying to test it, but as mentioned the issue hasn't affected me. @uweigand may be able to help.

I don't see anything particularly special to s390x here, with the only exception of -mbackchain. This flag is added to sanitizer tests on s390x, where we need it for the same reason other platforms tend to need -fno-omit-frame-pointer. It's unclear how that would be related to this ";" issue.

As I said, I do see target_cflags set to -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta;-O0;-g in lit.common.configured, but it had no ill effect on the the x86_64-pc-linux-gnu build. I'm running a new build with the patch to prepend a blank in every case COMPILER_RT_TEST_COMPILER_CFLAGS is set. Unfortunately (the build isn't completed yet), it just changes target_cflags to -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta; -O0; -g .

I'd really be grateful if someone could explain what this " " -> ";" substitution is for. Maybe it can just be dropped?

If we don't have a fix, better to revert it for now. I can help with debugging on these bot later, but not today.

gulfem added a subscriber: gulfem.Oct 5 2022, 9:41 AM

We started seeing test failures in AddressSanitizer. For ex:

FAIL: AddressSanitizer-x86_64-linux :: TestCases/Linux/asan_default_suppressions.cpp (254 of 12246)
******************** TEST 'AddressSanitizer-x86_64-linux :: TestCases/Linux/asan_default_suppressions.cpp' FAILED ********************
Script:
--
: 'RUN: at line 2';      /b/s/w/ir/x/w/staging/llvm_build/./bin/clang  --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only   -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta;-O3;-gline-tables-only   /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/asan/TestCases/Linux/asan_default_suppressions.cpp -o /b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Linux/Output/asan_default_suppressions.cpp.tmp && not  /b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Linux/Output/asan_default_suppressions.cpp.tmp 2>&1 | FileCheck /b/s/w/ir/x/w/llvm-llvm-project/compiler-rt/test/asan/TestCases/Linux/asan_default_suppressions.cpp
--
Exit Code: 127

Command Output (stderr):
--
clang-16: error: no input files
/b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Linux/Output/asan_default_suppressions.cpp.script: line 1: -O3: command not found
/b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Linux/Output/asan_default_suppressions.cpp.script: line 1: -gline-tables-only: command not found

--

https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8801158669988232321/+/u/clang/test/stdout

vitalybuka reopened this revision.Oct 5 2022, 9:59 AM
This revision is now accepted and ready to land.Oct 5 2022, 9:59 AM
ro added a comment.Oct 6 2022, 4:45 AM

I think I've figured out what's wrong. AFAICS there are three separate issues:

  • Stray ; in compiler flags:

    cmake --trace revealed that while SANITIZER_COMMON_CFLAGS is added to with list(APPEND), all additions to COMPILER_RT_TEST_COMPILER_CFLAGS are with string(APPEND), which explains the vs. ; difference. While this difference is highly confusing, it's easily fixed.
  • Tests FAILing in Debug/COMPILER_RT_DEBUG builds:

    E.g ThreadSanitizer-x86_64 :: Linux/check_memcpy.c which now contains calls to memset when previously there were none. This can be explained by the change from -O1 to -O0, is to be expected IMO and can be avoided by XFAILing the test. The only thing I'm not certain about is if this affects only select targets (x86_64 so far) or all.
  • Tests FAILing in Release builds:

    There are quite a number of those, e.g. Issue #58169, SanitizerCommon-asan-aarch64-Linux :: Posix/weak_hook_test.cpp. Comparing compile commands for the latter between Linux/aarch64 and Linux/x86_64 reveals: the former uses -gline-tables-only -fsanitize=address -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -O3 -gline-tables-only -funwind-tables while the latter only has -gline-tables-only -fsanitize=address -m64 -funwind-tables , i.e. -O3 among others is lost/missing. If I build the test on Linux/x86_64 with -O3, it fails the same way as on Linux/aarch64. I found that lit.common.configured has
set_default("target_cflags", " -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -O3 -gline-tables-only ")

on Linux/x86_64, too, but that's later overridden in asan/X86_64LinuxConfig/lit.site.cfg.py which has just

config.target_cflags = "-m64"

However, there's more: checking cmake --trace again shows that before my patch COMPILER_RT_TEST_COMPILER_CFLAGS is only changed/set like

compiler-rt/CMakeLists.txt(400):  string(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS  ${thread_safety_flags_space_sep} )
compiler-rt/CMakeLists.txt(538):  string(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS  ${stdlib_flag} )
compiler-rt/CMakeLists.txt(542):  string(REPLACE   ; COMPILER_RT_UNITTEST_CFLAGS ${COMPILER_RT_TEST_COMPILER_CFLAGS} )

while now there's

compiler-rt/CMakeLists.txt(400):  string(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS  ${thread_safety_flags_space_sep} )
compiler-rt/CMakeLists.txt(413):  string(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS  -O3 )
compiler-rt/CMakeLists.txt(459):  string(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS  -gline-tables-only )
compiler-rt/CMakeLists.txt(546):  string(APPEND COMPILER_RT_TEST_COMPILER_CFLAGS  ${stdlib_flag} )
compiler-rt/CMakeLists.txt(550):  string(REPLACE   ; COMPILER_RT_UNITTEST_CFLAGS ${COMPILER_RT_TEST_COMPILER_CFLAGS} )

So, -O3 is added when it wasn't there before. While I believe this is correct (a Release build should be tested with optimization), but given the fallout this should be omitted for now and handled separately.

In addition to this, I believe I've found a way to define an instantiation of __sanitizer::integral_constant<bool, true>::value, based on what GCC's <type_traits>. The only thing I'm unsure about is if we can rely on the presence of __cpp_inline_variables.

ro updated this revision to Diff 465701.Oct 6 2022, 4:51 AM
  • Use string(APPEND) for COMPILER_RT_TEST_COMPILER_CFLAGS.
  • Omit -O3 from COMPILER_RT_TEST_COMPILER_CFLAGS in non-debug builds for now.
  • Provide __sanitizer::integral_constant<bool, true>::value instantiation.
  • XFAIL tsan/Linux/check_memcpy.c in debug builds.
ro added a comment.Oct 6 2022, 4:52 AM

I'd be grateful if the owners of the affected builders could retest the patch before relanding, preferably in Release and Debug/COMPILER_RT_DEBUG builds. Thanks a lot.

gulfem added a comment.EditedOct 6 2022, 11:56 AM
In D91620#3839716, @ro wrote:

I'd be grateful if the owners of the affected builders could retest the patch before relanding, preferably in Release and Debug/COMPILER_RT_DEBUG builds. Thanks a lot.

I locally verified that with we do not see a timeout issue in Clang toolchain builders for Fuchsia in the latest patchset.

vitalybuka added a comment.EditedOct 11 2022, 1:51 PM

On zorg/buildbot/builders/sanitizers/buildbot_standard.sh (or sanitizer-x86_64-linux-autoconf) I see

+ cd tsan_debug_build
+ ninja compiler-rt-clear
[1/1] Clobberring compiler-rt build and stamp directories
+ cd tsan_debug_build
+ ninja tsan
ninja: error: 'tools/clang/runtime/compiler-rt-stamps/compiler-rt-source_dirinfo.txt', needed by 'tools/clang/runtime/compiler-rt-stamps/compiler-rt-download', missing and no known

Please disregard this. It was something wrong with my setup.

Please disregard this. It was something wrong with my setup.

So I applied the patch on the actual bot and still see tsan go timeouts.

Maybe we go incrementally with -O0 -> -O1 in this patch? And if needed figure out -O0 in follow up.

vitalybuka added inline comments.Oct 11 2022, 7:07 PM
compiler-rt/test/tsan/Linux/check_memcpy.c
10

I see, XFAIL will run the test. So if it's about timeouts, it will still timeout.

Just make it
// REQUIRES: compiler-rt-optimized

vitalybuka accepted this revision.Oct 12 2022, 1:43 PM

Patched sanitizer-aarch64-linux-bootstrap-hwasan looks good to me too.

compiler-rt/test/tsan/Linux/check_memcpy.c
10
ro added a comment.Oct 14 2022, 5:34 AM

Please disregard this. It was something wrong with my setup.

So I applied the patch on the actual bot and still see tsan go timeouts.

I hadn't seen anything like this, but investgation led me down a rathole:

  • At first I thought this was because my Linux/x86_64 test system had neither go nor gccgo installed. However, adding go didn't make a difference in test results.
  • Looking for TSan Go tests initially found nothing. Later, I stumbled across buildgo.sh, which is a odd beast in various ways.
  • It builds the test behind the back of the build system, so ninja -v shows nothing.
  • It also runs the test outside of the testsuite (lit) and produces no test results (PASS/FAIL/UNSUPPORTED), so the test doesn't show u p in the summary.
  • The ninja check-all output shows
[295/1379] Checking TSan Go runtime...
/vol/llvm/src/llvm-project/local/compiler-rt/lib/tsan/rtl/../go/buildgo.sh: 62: [: unexpected operator
  • buildgo.sh uses == which is unportable, as documented e.g. in bash(1). Ubuntu 20/04 /bin/sh is dash, which doesn't support it. In the same place, it refers to a variable GOAMD64 which is never set anywhere AFAICT.

Whatever the case, even if run manually with bash, succeeds for me, no timeout or anything.

Maybe we go incrementally with -O0 -> -O1 in this patch? And if needed figure out -O0 in follow up.

In all my attempts, -O1 builds were useless for debugging because so many variables were optimized away. However, if that's a way to get the bulk of the patch in and only need a one-byte change to get debuggable output, that's ok, I guess.

However, speaking of timeouts there are four instances of the same issue I see on x86_64-pc-linux-gnu: those tests just hang and have to be killed manually. The test summary just shows

ThreadSanitizer-Unit :: unit/./TsanUnitTest-x86_64-Test/31/41
ThreadSanitizer-Unit :: unit/./TsanUnitTest-x86_64-Test/32/41
ThreadSanitizer-Unit :: unit/./TsanUnitTest-x86_64-Test/33/41
ThreadSanitizer-Unit :: unit/./TsanUnitTest-x86_64-Test/34/41

Here gtest sharding is raising its ugly head again (this feature has only caused problems so far). To investigate, I let TsanUnitTest-x86_64-Test list its subtests and ran them one by one. The four failing ones are

Trace.RestoreAccess
Trace.MemoryAccessSize
Trace.RestoreMutexLock
Trace.MultiPart

which all show the same failure mode:

[ RUN      ] Trace.MultiPart
ThreadSanitizer: internal deadlock: can't lock Slots under Slots mutex
ThreadSanitizer: internal deadlock: can't lock Slots under Slots mutex

I've augmented my patch to just disable those subtests if SANITIZER_DEBUG.

compiler-rt/test/tsan/Linux/check_memcpy.c
10

I see, XFAIL will run the test. So if it's about timeouts, it will still timeout.

Just make it
// REQUIRES: compiler-rt-optimized

It's not: the test FAILs at -O0 with

/vol/llvm/src/llvm-project/local-debug/compiler-rt/test/tsan/Linux/check_memcpy.c:17:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: callq {{.*<(__interceptor_)?mem(cpy|set)>}}
              ^
<stdin>:47254:24: note: found here
 454dc: e8 4f d4 02 00 callq 0x72930 <memset>
                       ^~~~~~~~~~~~~~~~~~~~~~

i.e. memset isn't inlined unlike at -O1. This is to be expected, I think, thus the XFAIL.

ro updated this revision to Diff 467753.Oct 14 2022, 5:36 AM

Disable subtests of tsan/tests/unit/tsan_trace_test.cpp that deadlock with SANITIZER_DEBUG.

vitalybuka accepted this revision.Oct 14 2022, 1:10 PM
vitalybuka added inline comments.
compiler-rt/test/tsan/Linux/check_memcpy.c
10

I see, XFAIL will run the test. So if it's about timeouts, it will still timeout.

Just make it
// REQUIRES: compiler-rt-optimized

It's not: the test FAILs at -O0 with

/vol/llvm/src/llvm-project/local-debug/compiler-rt/test/tsan/Linux/check_memcpy.c:17:15: error: CHECK-NOT: excluded string found in input
// CHECK-NOT: callq {{.*<(__interceptor_)?mem(cpy|set)>}}
              ^
<stdin>:47254:24: note: found here
 454dc: e8 4f d4 02 00 callq 0x72930 <memset>
                       ^~~~~~~~~~~~~~~~~~~~~~

i.e. memset isn't inlined unlike at -O1. This is to be expected, I think, thus the XFAIL.

Thanks. Looks like I identified the source of timeouts incorrectly.
I guess you should try to land it. If necessary we can revert and reiterate again.

ro added inline comments.Oct 14 2022, 1:16 PM
compiler-rt/test/tsan/Linux/check_memcpy.c
10

Thanks. Looks like I identified the source of timeouts incorrectly.
I guess you should try to land it. If necessary we can revert and reiterate again.

I'll be travelling for a week starting this Sunday, so I'd rather wait with relanding until after I return. Thanks.

ro added a comment.Dec 13 2022, 1:56 AM

I got distracted after returning from vacation, but I'll reland the patch shortly after retesting on x86_64-pc-linux-gnu.

This revision was landed with ongoing or failed builds.Dec 13 2022, 1:59 AM
This revision was automatically updated to reflect the committed changes.
haowei added a subscriber: haowei.Dec 13 2022, 12:53 PM

We are seeing following test failures on Fuchsia's Clang Linux x64 builders after this change (255c3e3dcb06299aa2365f70817322a8a381c351) landed:

MemorySanitizer-Unit :: ./Msan-x86_64-Test/34/38
MemorySanitizer-Unit :: ./Msan-x86_64-Test/MemorySanitizer/SelectPartial
MemorySanitizer-Unit :: ./Msan-x86_64-with-call-Test/34/38
MemorySanitizer-Unit :: ./Msan-x86_64-with-call-Test/MemorySanitizer/SelectPartial
MemorySanitizer-Unit :: ./Msan-x86_64-with-call-Test/MemorySanitizer/SelectPartial
Profile-x86_64 :: Linux/comdat_rename.test
Profile-x86_64 :: Linux/instrprof-show-debug-info-correlation.c
Profile-x86_64 :: instrprof-dump.c
Profile-x86_64 :: instrprof-write-file-only.c
Profile-x86_64 :: instrprof-write-file.c

Example of error message:

Script:
--
/b/s/w/ir/x/w/staging/llvm_build/runtimes/runtimes-x86_64-unknown-linux-gnu-bins/compiler-rt/lib/msan/tests/./Msan-x86_64-Test --gtest_filter=MemorySanitizer.SelectPartial
--
compiler-rt/lib/msan/tests/msan_test.cpp:178
Expected: (-1) != (__msan_test_shadow((void*)&t, sizeof(t))), actual: -1 vs -1

Link to the failed builder: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8794900470653863441/overview

Please take a look. And if it takes a long time to fix, could you revert this change please?

ro added a comment.Dec 13 2022, 1:05 PM

We are seeing following test failures on Fuchsia's Clang Linux x64 builders after this change (255c3e3dcb06299aa2365f70817322a8a381c351) landed:

MemorySanitizer-Unit :: ./Msan-x86_64-Test/34/38
MemorySanitizer-Unit :: ./Msan-x86_64-Test/MemorySanitizer/SelectPartial
MemorySanitizer-Unit :: ./Msan-x86_64-with-call-Test/34/38
MemorySanitizer-Unit :: ./Msan-x86_64-with-call-Test/MemorySanitizer/SelectPartial
MemorySanitizer-Unit :: ./Msan-x86_64-with-call-Test/MemorySanitizer/SelectPartial

I don't see any of those. In fact, Msan-x86_64-Test etc. aren't even built during my attempts, even in a vanilla tree. Initially, this seems to have been due to the lack of libcxx and libcxxabi, but even with those no dice. Unless someone can tell me what's wrong (or, better yet, check the differences without and with my patch), I fear there's nothing I can do.

Profile-x86_64 :: Linux/comdat_rename.test
Profile-x86_64 :: Linux/instrprof-show-debug-info-correlation.c
Profile-x86_64 :: instrprof-dump.c
Profile-x86_64 :: instrprof-write-file-only.c
Profile-x86_64 :: instrprof-write-file.c

All of those PASS for me. Again, not the slightest idea what's different in your builds.

Link to the failed builder: https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8794900470653863441/overview

At least those build logs are ways more useful than the ones from the LLVM buildbots: the latter lack all the crucial details like exact compile commands etc.

Please take a look. And if it takes a long time to fix, could you revert this change please?

I heavly rely on the builder maintainer's help here, when none of this is reproducible on my systems. Otherwise I'll have to give up on this patch for good.

What is your cmake command line when you test the code (when all Profile-x86_64 tests passes)?

Those tests shouldn't be affected by the way Fuchsia build the clang toolchain.
I am trying to reproduce the test failures without Fuchsia related stuff and see if I can get a reproducer.

So I tried

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;compiler-rt" ../llvm -DCMAKE_C_COMPILER=/mnt/nvme_sec/SRC/fuchsia/prebuilt/third_party/clang/linux-x64/bin/clang -DCMAKE_CXX_COMPILER=/mnt/nvme_sec/SRC/fuchsia/prebuilt/third_party/clang/linux-x64/bin/clang++ -DLLVM_ENABLE_LLD=On -DLLVM_TARGETS_TO_BUILD="X86"

and run

ninja
ninja check-runtimes

And I triggered those MSAN failures. The host toolchain I used are the prebuilts we built and can be found at https://chrome-infra-packages.appspot.com/p/fuchsia/third_party/clang/linux-amd64/+/integration The host toolchain shouldn't matter much as the runtimes should be built by the stage1 toolchain from the source.

Example of the test output:

FAIL: MemorySanitizer-Unit :: ./Msan-x86_64-Test/37/75 (15519 of 16268)
******************** TEST 'MemorySanitizer-Unit :: ./Msan-x86_64-Test/37/75' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/mnt/nvme_sec/SRC/llvm-project/build-cmake/msan-failure/runtimes/runtimes-bins/compiler-rt/lib/msan/tests/./Msan-x86_64-Test-MemorySanitizer-Unit-3368237-37-75.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=75 GTEST_SHARD_INDEX=37 /mnt/nvme_sec/SRC/llvm-project/build-cmake/msan-failure/runtimes/runtimes-bins/compiler-rt/lib/msan/tests/./Msan-x86_64-Test
--

Note: This is test shard 38 of 75.
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from MemorySanitizer
[ RUN      ] MemorySanitizer.pipe2
[       OK ] MemorySanitizer.pipe2 (0 ms)
[ RUN      ] MemorySanitizer.wcstod
[       OK ] MemorySanitizer.wcstod (0 ms)
[ RUN      ] MemorySanitizer.VAArgManyTest
[       OK ] MemorySanitizer.VAArgManyTest (0 ms)
[ RUN      ] MemorySanitizer.Bmi
MemorySanitizer:DEADLYSIGNAL
==229427==ERROR: MemorySanitizer: SEGV on unknown address 0x000000000000 (pc 0x55d6565b0663 bp 0x7ffc5788abb0 sp 0x7ffc5788ab60 T229427)
==229427==The signal is caused by a WRITE memory access.
==229427==Hint: address points to the zero page.
    #0 0x55d6565b0663 in HaveBmi() /mnt/nvme_sec/SRC/llvm-project/compiler-rt/lib/msan/tests/msan_test.cpp:4714:3
    #1 0x55d6565b04d2 in MemorySanitizer_Bmi_Test::TestBody() /mnt/nvme_sec/SRC/llvm-project/compiler-rt/lib/msan/tests/msan_test.cpp:4846:7
    #2 0x55d6569943a4 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2433:10
    #3 0x55d6567dfa76 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2469:14
    #4 0x55d6567df15e in testing::Test::Run() /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2508:5
    #5 0x55d6567e69a0 in testing::TestInfo::Run() /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2684:11
    #6 0x55d6567ee1ef in testing::TestSuite::Run() /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2816:28
    #7 0x55d65684106a in testing::internal::UnitTestImpl::RunAllTests() /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:5338:44
    #8 0x55d6569cda4f in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2433:10
    #9 0x55d65683dd2d in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:2469:14
    #10 0x55d65683cc1e in testing::UnitTest::Run() /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/src/gtest.cc:4925:10
    #11 0x55d656779555 in RUN_ALL_TESTS() /mnt/nvme_sec/SRC/llvm-project/runtimes/../third-party/unittest/googletest/include/gtest/gtest.h:2472:46
    #12 0x55d6567792e5 in main /mnt/nvme_sec/SRC/llvm-project/compiler-rt/lib/msan/tests/msan_test_main.cpp:19:10
    #13 0x7f3142229209 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #14 0x7f31422292bb in __libc_start_main csu/../csu/libc-start.c:389:3
    #15 0x55d65627a420 in _start (/mnt/nvme_sec/SRC/llvm-project/build-cmake/msan-failure/runtimes/runtimes-bins/compiler-rt/lib/msan/tests/Msan-x86_64-Test+0x16e420)

MemorySanitizer can not provide additional info.
SUMMARY: MemorySanitizer: SEGV /mnt/nvme_sec/SRC/llvm-project/compiler-rt/lib/msan/tests/msan_test.cpp:4714:3 in HaveBmi()
==229427==ABORTING

--
exit: 1

The profile tests are seems to not being set using this command line so they didn't run.

Those MSAN tests didn't fail when I use the same cmake/ninja command to build 0fa8f29a11c70426dda74158716ccd13dd7ff81a (1 patch ahead of this one).
Since this is an comfirmed test failure caused by this patch and we have a reproducer. Can we revert this change while it is being fixed?

I tried again with

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;compiler-rt" ../llvm -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_TARGETS_TO_BUILD="X86"

ninja

ninja check-runtimes

Which uses the host clang from debian and disabled lld. The same MSAN tests are still failing.

I reverted this change as this is a confirmed breakage now.

I can confirm my frustration with sanitizer settings -O for various configurations... I'd like that we move toward a cleaner state that no automatic -O is added. The user is free to specify explicit CXXFLAGS by themselves.

compiler-rt/CMakeLists.txt sets list(APPEND SANITIZER_COMMON_CFLAGS -O1) for COMPILER_RT_DEBUG. Even -O1 is oftentimes problematic for runtime debugging...

For a sanitized llvm-project build, LLVM_OPTIMIZE_SANITIZED_BUILDS changes -O0 to -O1.