This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] Fall back to fast unwinder
ClosedPublic

Authored by MaskRay on May 6 2021, 9:35 PM.

Details

Reviewers
vitalybuka
yln
Group Reviewers
Restricted Project
Commits
rGfa27255d16c3: [sanitizer] Fall back to fast unwinder
Summary

-fno-exceptions -fno-asynchronous-unwind-tables compiled programs don't
produce .eh_frame on Linux and other ELF platforms, so the slow unwinder cannot
print stack traces. Just fall back to the fast unwinder: this allows
-fno-asynchronous-unwind-tables without requiring the sanitizer option
fast_unwind_on_fatal=1

Diff Detail

Event Timeline

MaskRay requested review of this revision.May 6 2021, 9:35 PM
MaskRay created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2021, 9:35 PM
vitalybuka added inline comments.May 13 2021, 5:17 PM
compiler-rt/lib/lsan/lsan.cpp
38 ↗(On Diff #343576)

what do you think if we have a single
void __sanitizer::BufferedStackTrace::UnwindImpl in sanitizer_common

and only make sanitizers to implement own GetStackBounds()

we have common GetThreadStackTopAndBottom but it's going to slower without info some sanitizers have

MaskRay added inline comments.May 13 2021, 6:03 PM
compiler-rt/lib/lsan/lsan.cpp
38 ↗(On Diff #343576)

Happy to investigate the code sharing idea after this patch.

Doing the simplification in this patch would change too many things.

vitalybuka accepted this revision.May 13 2021, 7:54 PM
vitalybuka added inline comments.
compiler-rt/lib/lsan/lsan.cpp
38 ↗(On Diff #343576)

Could you land BufferedStackTrace::UnwindImpl change in a separate first patch
and BufferedStackTrace::Unwind in the second
I believe the first one is expected to be NFC
Can be useful to bisect if I am wrong and it actually break stuff somewhere.

compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp
87

if (size > 2 || size >= max_depth)

This revision is now accepted and ready to land.May 13 2021, 7:54 PM
MaskRay updated this revision to Diff 345348.May 13 2021, 9:29 PM
MaskRay marked an inline comment as done.

Pushed the simplification as 261d6e05d5574bec753ea6b7e9a7f99229927753

Will wait til tomorrow to ensure the first commit is stable.

MaskRay marked an inline comment as done.May 13 2021, 9:29 PM
This revision was landed with ongoing or failed builds.May 14 2021, 12:27 PM
This revision was automatically updated to reflect the committed changes.

The new test case fails on s390x with "WARNING: Symbolizer buffer too small" from SymbolizerProcess::ReadFromSymbolizer, and it is indeed true that the full backtrace at this point does not fit in the default buffer size of 16 KB. In the innermost function we have 1024 instances of the string "std::vector<int, std::allocator<int> >" in the function type, which in itself is already 40 KB. Plus the higher stack frames as well ...

I can make it pass again by increasing the buffer size to 128 KB. But I'm not quite sure if that is the intended behavior here?

The new test case fails on s390x with "WARNING: Symbolizer buffer too small" from SymbolizerProcess::ReadFromSymbolizer, and it is indeed true that the full backtrace at this point does not fit in the default buffer size of 16 KB. In the innermost function we have 1024 instances of the string "std::vector<int, std::allocator<int> >" in the function type, which in itself is already 40 KB. Plus the higher stack frames as well ...

I can make it pass again by increasing the buffer size to 128 KB. But I'm not quite sure if that is the intended behavior here?

I copied the test code from ../symbolize_stack.cpp, so presumably that file may have a similar issue?
Both give me WARNING: Symbolizer buffer too small on my x86-64 Linux machine.

How about s/A<10>()/A<7>()/? Does that pass on s390x?

The new test case fails on s390x with "WARNING: Symbolizer buffer too small" from SymbolizerProcess::ReadFromSymbolizer, and it is indeed true that the full backtrace at this point does not fit in the default buffer size of 16 KB. In the innermost function we have 1024 instances of the string "std::vector<int, std::allocator<int> >" in the function type, which in itself is already 40 KB. Plus the higher stack frames as well ...

I can make it pass again by increasing the buffer size to 128 KB. But I'm not quite sure if that is the intended behavior here?

I copied the test code from ../symbolize_stack.cpp, so presumably that file may have a similar issue?
Both give me WARNING: Symbolizer buffer too small on my x86-64 Linux machine.

Well, the symbolize_stack.cpp passes for me on s390x for some reason.

How about s/A<10>()/A<7>()/? Does that pass on s390x?

This passes, yes.

How about s/A<10>()/A<7>()/? Does that pass on s390x?

This passes, yes.

Can we make that change in mainline? I'd really like to get our build bots green again ...

How about s/A<10>()/A<7>()/? Does that pass on s390x?

This passes, yes.

Can we make that change in mainline? I'd really like to get our build bots green again ...

Pushed 3a678fe3e29fe48d9b4efc936edd2ac43c720bae

You may want to investigate ../symbolize_stack.cpp further why that test passes while this fails...
And why "WARNING: Symbolizer buffer too small" is benign on other arches while being a hard error on s390x...

uweigand added a comment.EditedMay 31 2021, 6:00 AM

How about s/A<10>()/A<7>()/? Does that pass on s390x?

This passes, yes.

Can we make that change in mainline? I'd really like to get our build bots green again ...

Pushed 3a678fe3e29fe48d9b4efc936edd2ac43c720bae

Thanks! With a bunch unrelated issues also resolved, all our build bots are finally green again!

You may want to investigate ../symbolize_stack.cpp further why that test passes while this fails...
And why "WARNING: Symbolizer buffer too small" is benign on other arches while being a hard error on s390x...

I've looked into this a bit more, and the actual problem seems to be that for some reason the unwinder fails to unwind beyond the innermost RecursiveTemplateFunction frame. Now, in the A<7> case, the line printed for that innermost frame already matches the {{vector<.*vector<.*vector<.*vector<.*vector<}} string that FileCheck checks for, and therefore the test passes. In the A<10> case, for the innermost frame, no decoded line is printed (because it exceeds the buffer size), and because unwind fails to display outer frames, that FileCheck string never matches anywhere, and the test fails. On other target where the unwinder (presumably) works, the innermost frame will still have no decoded line, but the FileCheck string will match in one of the outer frames, and therefore the test passes again.

I'll have a look why the unwinder doesn't work here.

Ah wait: the test case uses -fno-asynchronous-unwind-tables ! This explains why unwind cannot work - on s390x the ABI requires .eh_frame everywhere, and there is no alternative unwind mechanism via a frame pointer. The "fast unwinder" instead uses the (deprecated) "backchain" mechanism, but that is off by default and requires use of the -mbackchain option. That option is actually set by the various .lit.cfg files, but I guess the special invocation in this test case ignores those flags?

I can confirm that the A<10> version also passes on s390x when adding -mbackchain to the compiler command line.

We have the same test failing on thumb because the fast unwinder doesn't (can't, for various reasons) work. Perhaps:

// REQUIRES: fast-unwinder-works

I have implemented that requires for Arm in https://reviews.llvm.org/D103512.

Perhaps the same could apply to s390x?

I have implemented that requires for Arm in https://reviews.llvm.org/D103512.

Perhaps the same could apply to s390x?

I mean, the fast unwinder does work on s390x (as opposed to Arm), it just requires passing the -mbackchain option at compiler time. So I think this is a bit different.