This is an archive of the discontinued LLVM Phabricator instance.

[libunwind] Fix testing with sanitizers enabled
ClosedPublic

Authored by ldionne on Nov 22 2021, 12:01 PM.

Details

Reviewers
MaskRay
Group Reviewers
Restricted Project
Restricted Project
Commits
rGebfeeec4c4bc: [libunwind] Fix testing with sanitizers enabled
Summary

When testing with sanitizers enabled, we need to link against a plethora
of system libraries. Using -nodefaultlibs like we used to breaks this,
and we would have to add all these system libraries manually, which is
not portable and error prone. Instead, stop using -nodefaultlibs so
that we get the libraries added by default by the compiler.

The only caveat with this approach is that we are now relying on the
fact that -L <path-to-local-libunwind> will cause the just built
libunwind to be selected before the system implementation (either of
libunwind or libgcc_s.so), which is somewhat fragile.

Diff Detail

Event Timeline

ldionne created this revision.Nov 22 2021, 12:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 12:01 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ldionne requested review of this revision.Nov 22 2021, 12:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 12:01 PM
MaskRay added inline comments.
libunwind/test/configs/llvm-libunwind-shared.cfg.in
50

-ldl needs to be retained, otherwise on Linux x86-64:

ld.lld: error: undefined symbol: dladdr                                                                                                                                                                                                  
>>> referenced by forceunwind.pass.cpp                                                                                                                                                                                                   
>>>               /tmp/forceunwind-7489af.o:(stop(int, _Unwind_Action, unsigned long, _Unwind_Exception*, _Unwind_Context*, void*))                                                                                    
clang-14: error: linker command failed with exit code 1 (use -v to see invocation)                                                            

error: command failed with exit status: 1                              

--                                                                     

********************                                                   
********************                                                   
Failed Tests (3):                                                      
  llvm-libunwind-shared.cfg.in :: forceunwind.pass.cpp                                                                                        
  llvm-libunwind-shared.cfg.in :: signal_unwind.pass.cpp                                                                                      
  llvm-libunwind-shared.cfg.in :: unwind_leaffunction.pass.cpp
libunwind/test/configs/llvm-libunwind-static.cfg.in
52

ditto for -ldl

MaskRay accepted this revision.EditedNov 22 2021, 3:02 PM

When testing with sanitizers enabled, we need to link against a plethora of system libraries.

I do now know how to enable sanitizers for ninja check-unwind.

-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_USE_SANITIZER=Address -DLLVM_ENABLE_PROJECTS='clang' -DLLVM_ENABLE_RUNTIMES='libcxx;libcxxabi;libunwind;compiler-rt' doesn't pass -fsanitize= to ninja check-unwind tests.

Using -nodefaultlibs like we used to breaks this, and we would have to add all these system libraries manually, which is not portable and error prone.

That said, I agree with removing -nodefaultlibs.

The tests will pass if -ldl is added back.

This revision is now accepted and ready to land.Nov 22 2021, 3:02 PM
ldionne updated this revision to Diff 389191.Nov 23 2021, 6:48 AM

Add back -ldl. Thanks for the review!

MaskRay accepted this revision.Nov 23 2021, 11:14 AM
ldionne updated this revision to Diff 389302.Nov 23 2021, 1:17 PM

Fix CI.

The unwind_leaffunction.pass.cpp test was failing with ubsan because it was
relying on undefined behavior to "do the right thing" and result in a SIGSEGV.
IMO it makes more sense to simply raise a SIGSEGV directly, since that is guaranteed
to work.

The only CI that will still be failing is the 32 bits CI. I traced it back to the fact
that there is no /usr/lib32/libstdc++.so provided on Ubuntu, which leads to ld being
unable to find the library for -lstdc++ (yes, we do link against the system's C++ library
unconditionally now -- we need to, otherwise the sanitizer builds can't work).

I don't know who relies on the 32 bits configuration. We have been supporting it historically,
but in practice it doesn't seem to be very well supported (for example the missing libstdc++
library on Ubuntu). Furthermore, we started seeing a bunch of novel and extremely hard to explain
issues when running 32 bits CI after upgrading our version of Ubuntu. I will create a separate
review with a proposal to remove support for building in 32 bits mode and see what happens there.

I am going to make the 32 bit CI build a soft failure until we remove it entirely and ship this patch. See D114473 for removing the 32 bit multilib support.

This revision was landed with ongoing or failed builds.Nov 25 2021, 12:28 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptNov 25 2021, 12:28 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a subscriber: arichardson. · View Herald Transcript

The unwind_leaffunction.pass.cpp test was failing with ubsan because it was
relying on undefined behavior to "do the right thing" and result in a SIGSEGV.
IMO it makes more sense to simply raise a SIGSEGV directly, since that is guaranteed
to work.

It looks like using raise causes this test to fail on our aarch64 builder with exit code 255 (from the _Exit(-1) in signal_handler). I'm unsure why it would fail when using raise. These are a couple of things I noticed when debugging:

  • If I replace raise(SIGSEGV) with kill(getpid(), SIGSEGV), the test passes normally. The man page for raise says this is functionally equivalent.
  • If I manually print out the frame names from info.dli_sname, I notice with raise I only cover 2 frames (_Z14signal_handleri, __kernel_rt_sigreturn) but with kill I get 5 frames (_Z14signal_handleri, __kernel_rt_sigreturn, kill, _Z18crashing_leaf_funcv, main).
    • If I use the old UB approach, I get 4 frames (just excluding the call to kill).

Any ideas for why this might be the case?

The unwind_leaffunction.pass.cpp test was failing with ubsan because it was
relying on undefined behavior to "do the right thing" and result in a SIGSEGV.
IMO it makes more sense to simply raise a SIGSEGV directly, since that is guaranteed
to work.

It looks like using raise causes this test to fail on our aarch64 builder with exit code 255 (from the _Exit(-1) in signal_handler). I'm unsure why it would fail when using raise. These are a couple of things I noticed when debugging:

  • If I replace raise(SIGSEGV) with kill(getpid(), SIGSEGV), the test passes normally. The man page for raise says this is functionally equivalent.
  • If I manually print out the frame names from info.dli_sname, I notice with raise I only cover 2 frames (_Z14signal_handleri, __kernel_rt_sigreturn) but with kill I get 5 frames (_Z14signal_handleri, __kernel_rt_sigreturn, kill, _Z18crashing_leaf_funcv, main).
    • If I use the old UB approach, I get 4 frames (just excluding the call to kill).

Any ideas for why this might be the case?

No, that doesn't ring a bell. I think this is a question for whoever provides the C Standard Library on your aarch64 bot. Perhaps libunwind is somehow unable to understand the stack frames below raise before it gets to the kernel? Honestly I'm kind of shooting in the dark.

I'd be fine with switching to kill(getpid(), SIGSEGV) temporarily to unbreak your bot, but I'd like to avoid doing a blind fix and get the ball rolling on this issue with the C library folks. Do you think you can provide a patch to switch to kill?

No, that doesn't ring a bell. I think this is a question for whoever provides the C Standard Library on your aarch64 bot. Perhaps libunwind is somehow unable to understand the stack frames below raise before it gets to the kernel? Honestly I'm kind of shooting in the dark.

I'd be fine with switching to kill(getpid(), SIGSEGV) temporarily to unbreak your bot, but I'd like to avoid doing a blind fix and get the ball rolling on this issue with the C library folks. Do you think you can provide a patch to switch to kill?

I have D114743 up for review. Asking around to see why we might be running into this issue before submitting.

Ok I did more digging and it looks like the issue is that my host libc doesn't have an FDE for raise, but it does have one for kill (which I don't have an explanation for). I suppose that would mean D114743 isn't a bad alternative, but I guess this means it could also fail again for any user with a libc that doesn't happen to have unwind info for kill (could be a wait-and-see scenario). If we want to avoid the potential missing unwind info altogether, we could perhaps having some inline assembly that does a bad memory reference, but that might not be very portable. Thoughts?

phosek added a subscriber: phosek.Nov 29 2021, 10:21 PM

Ok I did more digging and it looks like the issue is that my host libc doesn't have an FDE for raise, but it does have one for kill (which I don't have an explanation for). I suppose that would mean D114743 isn't a bad alternative, but I guess this means it could also fail again for any user with a libc that doesn't happen to have unwind info for kill (could be a wait-and-see scenario). If we want to avoid the potential missing unwind info altogether, we could perhaps having some inline assembly that does a bad memory reference, but that might not be very portable. Thoughts?

I'd just point out that the host libc in this case is glibc shipped with Ubuntu 18.04 on arm64 which is a popular distribution (although it's possible that arm64 version doesn't receive as much testing as amd64).

danielkiss added inline comments.
libunwind/test/unwind_leaffunction.pass.cpp
41–43

adding a call here turns the leaf function into a not-leaf function which changes the intention of this test.

ldionne added inline comments.Nov 30 2021, 2:44 AM
libunwind/test/unwind_leaffunction.pass.cpp
41–43

Ugh, that's a good point. I'm not sure how to trigger SIGSEGV without invoking a function nor invoking undefined behavior. Do you have a suggestion?

Would using __builtin_trap() do any kind of good?

I have D114743 up for review. Asking around to see why we might be running into this issue before submitting.

Thanks a lot. Per @danielkiss 's comment, I think we'll need to find a way to do what we want without calling any other function, otherwise we're missing the point of this test entirely. Go ahead and ship D114743 just so it unbreaks the bots until we find a real solution.

danielkiss added inline comments.Nov 30 2021, 8:43 AM
libunwind/test/unwind_leaffunction.pass.cpp
41–43

__builtin_trap() looks good from codegen / unwind info point of view.

it generates SIGILL on X86, while SIGSEGV on Aarch64 but with the below change it should work.

int main(int, char**) {
  signal(SIGSEGV, signal_handler);
  signal(SIGILL, signal_handler);
danielkiss added inline comments.Nov 30 2021, 8:47 AM
libunwind/test/unwind_leaffunction.pass.cpp
41–43

mybad, it is SIGTRAP on linux/aarch64.

signal(SIGTRAP, signal_handler);
signal(SIGILL, signal_handler);