This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] Spring cleaning of compiler-rt symbolizers
AbandonedPublic

Authored by kubamracek on Feb 23 2015, 4:50 AM.

Details

Reviewers
samsonov
Summary

Hi everyone,

this is a refactoring, NFC patch that tries to clean up the current symbolizer code in sanitizer_common/sanitizer_symbolizer*.[cc|h]. One of the reasons for that is that I'd like to implement some OS X specific symbolizers that could be available in case llvm-symbolizer isn't (AtosSymbolizer and/or DlAddr symbolizer, see http://reviews.llvm.org/D6588). However, there are other weirdnesses in the current implementation: We have several symbolizers available (libbacktrace, llvm-symbolizer, internal symbolizer, addr2line), but they don't share a common interface. The POSIXSymbolizer which operates on top of the individual symbolizers has a hardcoded format of output that it expects. The symbolizers also form a very weird chain, e.g. if libbacktrace symbolizer is available but cannot symbolize something, we still try to symbolize using an internal or external symbolizer. But when an internal symbolizer returns "false", we don't try an external one. SymbolizerProcess which is supposed to only transfer communication from/to an external process actually contains logic to format commands.

This patch extracts an interface, SymbolizerInterface that all the symbolizers now implement. SymbolizerProcess now only takes a string command and returns a string response. Things in POSIXSymbolizer that were specific to llvm-symbolizer and now extracted into LLVMSymbolizer class. Parsing of llvm-symbolizer output is extracted into ParseSymbolizePCOutput and ParseSymbolizeDataOutput methods, and it's used to parse lines from both llvm-symbolizer and addr2line. The symbolizers now form an explicit chain via the next_symbolizer_ field, and POSIXSymbolizer walks this linked list when symbolizing an address.

Thanks for reviewing!
Kuba

Diff Detail

Event Timeline

kubamracek updated this revision to Diff 20505.Feb 23 2015, 4:50 AM
kubamracek retitled this revision from to [compiler-rt] Spring cleaning of compiler-rt symbolizers.
kubamracek updated this object.
kubamracek edited the test plan for this revision. (Show Details)
kubamracek added subscribers: Unknown Object (MLST), zaks.anna, glider and 2 others.
samsonov edited edge metadata.Feb 24 2015, 11:58 AM

Thanks for working on this! Let's work on landing this patch in pieces before moving on to http://reviews.llvm.org/D6588. The main comment is: this patch is still way too large to digest. Let's try to split it into bite-size pieces and land them one by one (move Extract routines, introduce and move SymbolizerProcess to header, change the interface, etc. For now a few high-level comments:

  • think of splitting the code in sanitizer_symbolizer_posix_libcdep.cc into more files, or moving it around. Some code shouldn't really be POSIX-specific (e.g. Extract routines).
  • think of adding more headers. For example, sanitizer_symbolizer.h is the header used by actual sanitizers, and Symbolizer is the class they are interested in. In the meantime, SymbolizerInterface (btw. the naming is really poor, considering we already have Symbolizer, maybe SymbolizerTool?) is an implementation detail, and could be hidden in another header.
  • I'm fine with general approach "use the chain of symbolizers - try the first one, fallback to the next if necessary, then to the next etc.". But this approach then should be shared between POSIX and Windows - that is, WinSymbolizer will be no different from, say, InternalSymbolizer - it will be just another symbolizer in a chain. In particular, probably Symbolizer class will not even need to be abstract, and you could move all the logic to common sources.
  • please re-use IntrusiveList instead of re-implementing linked lists, if possible.

SymbolizeData family of methods is really confusing. We should have renamed it to SymbolizeGlobal long ago (same for DataInfo -> GlobalInfo), I will do it soon.

Let's try to split it into bite-size pieces and land them one by one (move Extract routines, introduce and move SymbolizerProcess to header, change the interface, etc.

Sounds like a plan! Here's the first pieces: http://reviews.llvm.org/D7867, http://reviews.llvm.org/D7868.

http://reviews.llvm.org/D7936: SymbolizerTool and better interface

Just saw this CL.

Silently falling back to a different symbolizer is not always acceptable, for performance reasons. In Chromium, given the binary size, we never want to fall back to addr2line if llvm-symbolizer cannot be found. Otherwise this happens:
https://code.google.com/p/chromium/issues/detail?id=390552

So something like a flag to disable this behavior would be nice.

kubamracek abandoned this revision.Apr 12 2015, 3:23 AM