This is an archive of the discontinued LLVM Phabricator instance.

Cache uwnind frame headers as they are found.
ClosedPublic

Authored by saugustine on Mar 10 2020, 12:09 PM.

Details

Summary

This improves unwind performance quite substantially, and follows
a somewhat similar approach used in libgcc_s as described in the
thread here:

https://gcc.gnu.org/ml/gcc/2005-02/msg00625.html

On certain extremely exception heavy internal tests, the time
drops from about 80 minutes to about five minutes.

Diff Detail

Event Timeline

saugustine created this revision.Mar 10 2020, 12:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2020, 12:09 PM

I have some tests nearing completion, but want to get started with the feedback now.

jgorbe added inline comments.Mar 10 2020, 3:18 PM
libunwind/src/FrameHeaderCache.hpp
46

Do we need any synchronization here if there are multiple threads?

122

This line is redundant, Current has just been initialized to Unused. Did you want to explicitly set Current on each branch? If so, maybe something like this would be clearer?

CacheEntry *Current = nullptr;

if (Unused != nullptr) {
  Current = Unused;
  Unused = Unused->Next;
} else {
  Current = MostRecentlyUsed;
  // ... etc ...
}

in any case, I think changing the condition to Unused != nullptr would be a win, because at least to me it reads more easily as "it there are unused entries...".

saugustine marked 3 inline comments as done.

Update for upstream comments.

saugustine added inline comments.Mar 11 2020, 10:09 AM
libunwind/src/FrameHeaderCache.hpp
46

Good question. We don't because dl_iterate_phdr already holds the load lock and therefore does all the synchronization for us. That isn't obvious at all from reading the code, so adding a comment to that effect.

Add missing comment.

I think it looks good now. My only issue is that it seems to rely on a couple of glibc-specific features: glibc modifying the fields adds and subs in dl_phdr_info when loading/unloading libraries (which this patch uses to know when to invalidate the cache), and dl_iterate_phdr holding a lock (which the patch relies on to avoid races while accessing the cache). What other libc implementations do we support? Do they share these behaviors we rely on here?

I think it looks good now. My only issue is that it seems to rely on a couple of glibc-specific features: glibc modifying the fields adds and subs in dl_phdr_info when loading/unloading libraries (which this patch uses to know when to invalidate the cache), and dl_iterate_phdr holding a lock (which the patch relies on to avoid races while accessing the cache). What other libc implementations do we support? Do they share these behaviors we rely on here?

Freebsd has the dlpi_adds and dlpi_subs are both in freebsd since at least 2012, and musl too. I'm pretty sure the provide equal synchronization guarantees, otherwise they wouldn't be compatible with glibc. And looking at the similar libgcc code, it appears that libgcc also relies on such synchronization.

So I do believe we are OK from that perspective. If it comes to it, we could add a configure-time option, but I'm hoping to avoid that.

Add tests, and minor formatting changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2020, 3:15 PM

Update test to only run when relevant.

Thanks for doing the extra work of adding a test. Just one final nit.

libunwind/test/frameheadercache_test.pass.cpp
3

I think referring to AddressSpace.hpp for the reasoning behind the ugly #ifdef chain is fine, but I'd add a note about the intention. Suggestion:

// Run the test only in configurations where the caching system is used.
// This #if chain...

Correct the "do not run" cases to match the logic in AddressSpace.hpp.

fix comment

jgorbe accepted this revision.Mar 12 2020, 9:30 AM
This revision is now accepted and ready to land.Mar 12 2020, 9:30 AM
This revision was automatically updated to reflect the committed changes.

Hello @saugustine,

here is failed libunwind test on Aarch64 toolchain builder
http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/5660

  • FAIL: libunwind:: frameheadercache_test.pass.cpp

with a set of similar errors:

C:\buildbot\as-builder-2\llvm-clang-win-x-aarch64\llvm-project\libunwind\test/../src/Registers.hpp:3158:59: error: unused parameter 'regNum' [-Werror,-Wunused-parameter]

inline bool Registers_mips_newabi::validFloatRegister(int regNum) const {

would you take care of it?

dim added subscribers: emaste, dim.Aug 15 2020, 10:47 AM

It seems that after this change, I'm getting sporadic lld segfaults in libunwind, which look like:

Core was generated by `/usr/obj/share/dim/src/freebsd/clang1100-import/amd64.amd64/tmp/usr/bin/ld --sys'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000002479166 in libunwind::findUnwindSectionsByPhdr(dl_phdr_info*, unsigned long, void*) ()
[Current thread is 1 (LWP 100282)]
(gdb) bt
#0  0x0000000002479166 in libunwind::findUnwindSectionsByPhdr(dl_phdr_info*, unsigned long, void*) ()
#1  0x000000000247a67c in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) ()
#2  0x000000000247927a in unw_init_local ()
#3  0x0000000002477ba8 in _Unwind_Backtrace ()
#4  0x0000000002416bb1 in backtrace ()
#5  0x0000000001097c83 in llvm::sys::PrintStackTrace(llvm::raw_ostream&) ()
#6  0x0000000001096105 in llvm::sys::RunSignalHandlers() ()
#7  0x0000000001098676 in SignalHandler(int) ()
#8  0x0000000002437c70 in handle_signal ()
#9  0x0000000002437248 in thr_sighandler ()
#10 <signal handler called>
#11 0x0000000002479166 in libunwind::findUnwindSectionsByPhdr(dl_phdr_info*, unsigned long, void*) ()
#12 0x000000000247a67c in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) ()
#13 0x000000000247927a in unw_init_local ()
#14 0x000000000247884c in unwind_phase2_forced ()
#15 0x0000000002478b43 in _Unwind_ForcedUnwind ()
#16 0x00000000024421b6 in thread_unwind ()
#17 0x000000000244211f in _pthread_exit_mask ()
#18 0x000000000244208b in pthread_exit ()
#19 0x000000000243fcf3 in thread_start ()
#20 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdfbfc000

When I look at the code, it appears that it has inlined FrameHeaderCache::add(), and goes into the else part:

Current = MostRecentlyUsed;
CacheEntry *Previous = nullptr;
while (Current->Next != nullptr) {
  Previous = Current;
  Current = Current->Next;
}

However, it enters that part with MostRecentlyUsed being nullptr, and this causes a segfault in the expression while (Current->Next != nullptr).

Would it make sense for MostRecentlyUsed to ever be null at this point, and if so, shouldn't the code be mitigating this? And if MostRecentlyUsed should never be null, what could be the reason that it is?

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2020, 10:47 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
dim added a comment.Aug 15 2020, 11:36 AM

I turned on _LIBUNWIND_DEBUG_FRAMEHEADER_CACHE, and indeed such segfaults appear immediately after the message libunwind: FrameHeaderCache reset. That means FrameHeaderCache::MostRecentlyUsed has just been set to nullptr ...

dim added a subscriber: hans.Aug 15 2020, 12:17 PM

Hm, the static FrameHeaderCache ProcessFrameHeaderCache; is biting us here, at least in a multithreaded process like lld, which is the process that is crashing for me all the time now. I added some instrumentation, which shows the problem by adding a thread ID to each log message, and by adding an assert in FrameHeaderCache::add() that MostRecentlyUsed != nullptr:

cc -target x86_64-unknown-freebsd13.0 --sysroot=/usr/obj/usr/src/amd64.amd64/tmp -B/usr/obj/usr/src/amd64.amd64/tmp/usr/bin -Wl,--no-undefined -Wl,--version-script=/usr/src/contrib/com_err/version-script.map   -fstack-protector-strong -shared -Wl,-x -Wl,--fatal-warnings -Wl,--warn-shared-textrel  -o libcom_err.so.5.full -Wl,-soname,libcom_err.so.5  `NM='nm' NMFLAGS='' lorder com_err.pico error.pico |  tsort -q`
libunwind FrameHeaderCache.hpp(57)[tid=0x803805a00]: FrameHeaderCache reset
libunwind FrameHeaderCache.hpp(130)[tid=0x803805000]: UIS=0x7fffdfdfc9e0
libunwind FrameHeaderCache.hpp(130)[tid=0x803805500]: UIS=0x7fffdfbfb9e0
libunwind FrameHeaderCache.hpp(140)[tid=0x803805500]: Unused == nullptr, MostRecentlyUsed=0x0
libunwind FrameHeaderCache.hpp(140)[tid=0x803805000]: Unused == nullptr, MostRecentlyUsed=0x0
Assertion failed: (MostRecentlyUsed != nullptr), function add, file FrameHeaderCache.hpp, line 141.
libunwind FrameHeaderCache.hpp(130)[tid=0x803805a00]: UIS=0x7fffdf9fa9e0
libunwind FrameHeaderCache.hpp(134)[tid=0x803805a00]: Unused != nullptr
PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace.
libunwind FrameHeaderCache.hpp(161)[tid=0x803805a00]: FrameHeaderCache add [b060b0 - 256ec70)
cc: error: unable to execute command: Segmentation fault (core dumped)
cc: error: linker command failed due to signal (use -v to see invocation)

So what you see here is that thread 0x803805a00 resets, then thread 0x803805000 and 0x803805500 simultaneously invoke FrameHeaderCache::add(), and that leads to both Unused and MostRecentlyUsed being nulllptr, resulting in the crash.

It looks like the need for some locking in this FrameHeaderCache was completely overlooked? Or at least, having a cache per thread would make locking unnecessary, but it would cost more memory.

@hans in the state this is in now, I would suggest this to be reverted for 11.0, otherwise you will end up with a seriously broken release for multi-threaded programs.

In D75954#2219870, @dim wrote:

It looks like the need for some locking in this FrameHeaderCache was completely overlooked? Or at least, having a cache per thread would make locking unnecessary, but it would cost more memory.

Well, not _completely_ overlooked. Looks like glibc locks things and other libc's don't. Should be easy to add a lock here for non glibc targets.

hans added a comment.Aug 18 2020, 1:30 AM
In D75954#2219870, @dim wrote:

It looks like the need for some locking in this FrameHeaderCache was completely overlooked? Or at least, having a cache per thread would make locking unnecessary, but it would cost more memory.

Well, not _completely_ overlooked. Looks like glibc locks things and other libc's don't. Should be easy to add a lock here for non glibc targets.

Sterling, do you think this will be fixed soon so we can get it in the llvm 11 release?

In D75954#2219870, @dim wrote:

It looks like the need for some locking in this FrameHeaderCache was completely overlooked? Or at least, having a cache per thread would make locking unnecessary, but it would cost more memory.

Well, not _completely_ overlooked. Looks like glibc locks things and other libc's don't. Should be easy to add a lock here for non glibc targets.

Sterling, do you think this will be fixed soon so we can get it in the llvm 11 release?

It works great right now for glibc-based systems--in fact, Google uses it internally in production systems right now, today.

I am sending a patch shortly to allow enabling and disabling this cache at configure time.

hans added a comment.Aug 20 2020, 8:23 AM

Sterling, do you think this will be fixed soon so we can get it in the llvm 11 release?

It works great right now for glibc-based systems--in fact, Google uses it internally in production systems right now, today.

I am sending a patch shortly to allow enabling and disabling this cache at configure time.

That was https://reviews.llvm.org/D86163 and it's been cherry-picked to 11.x.