This is an archive of the discontinued LLVM Phabricator instance.

Refactor llvm-symbolizer to support using PDB instead of DWARF.
AbandonedPublic

Authored by zturner on Apr 20 2015, 9:30 AM.

Details

Reviewers
samsonov
Summary

In order to make llvm-symbolizer work on Windows, a first step is to remove the assumption that DebugInfo will be in DWARF format. In theory this is simple, because there are only about 2 places where llvm-symbolizer actually queries the DWARF. In practice abstracting this away turned out to be a bit of a large refactor. Basically instead of storing DIContext directly we introduce abstract classes SymbolizationContext, SymbolizedLineInfo, and SymbolizedInliningInfo. Concrete implementations of each of these are provided for the DWARF case as well as the PDB case. The PDB cases are not filled out yet, I planned to do that in a separate CL to keep this from getting too big.

Diff Detail

Event Timeline

zturner updated this revision to Diff 24025.Apr 20 2015, 9:30 AM
zturner retitled this revision from to Refactor llvm-symbolizer to support using PDB instead of DWARF..
zturner updated this object.
zturner edited the test plan for this revision. (Show Details)
zturner added a reviewer: samsonov.
zturner added a subscriber: Unknown Object (MLST).
emaste added a subscriber: emaste.Apr 20 2015, 10:07 AM
samsonov edited edge metadata.Apr 20 2015, 5:55 PM

Nit: could you upload a patch to Phabricator in a way that would save the context (http://llvm.org/docs/Phabricator.html)?

Now, I don't really like introducing this hierarchy in tools/llvm-symbolizer. IMO we should keep the tool code straightforward, and instead add the functionality to the library (lib/DebugInfo), so that anyone can use format-agnostic symbolization-as-a-library. Now, I admit that I've missed the discussion and patches that added lib/DebugInfo/PDB and include/llvm/DebugInfo/PDB, so let me know if I'm revising some stuff discussed long ago. Specifically,

adding SymbolizedLineInfo and SymbolizedLineInfoPDB doesn't look right. llvm::DILineInfo was already supposed to be "format-neutral container for source line information", and now
it's moved under DebugInfo/DWARF. Can we move it back to include/llvm/DebugInfo folder, and use both in DWARF and PDB debug info parsers? Same holds for DINameKind - it certainly should not be a DWARF-specific enum.

Same holds for DIContext - it would be nice to use that instead of introducing SymbolizationContext. We might have to change the layering of libraries - either introduce the library with code for DILineInfo/DIInliningInfo/DIContext used by DWARF and PDB readers, or merge DebugInfoDWARF and DebugInfoPDB into one fat library.

There had been significant discussion about producing a Format agnostic
debug info interface, but we all kind of decided that it would be
impossible in the general case, or at the very least extremely difficult.
Most situations don't require consuming both types of debug info so we
decided it would be best to defer this kind of abstraction to the places
that needed it.

We might be able to move a few specific utility classes to a common level,
but I don't think it would work in the general case.

The context, session information, way that you make queries against the
debug info are fundamentally different.

Also, even if we make DILineInfo and treat it as a format agnostic
container, we will still need something to wrap that on the PDB side,
because DebugInfoPDB library only supplies that information through pdb
specific interfaces. So I'm not sure it buys us much.

There had been significant discussion about producing a Format agnostic
debug info interface, but we all kind of decided that it would be
impossible in the general case, or at the very least extremely difficult.
Most situations don't require consuming both types of debug info so we
decided it would be best to defer this kind of abstraction to the places
that needed it.

We might be able to move a few specific utility classes to a common level,
but I don't think it would work in the general case.

The context, session information, way that you make queries against the
debug info are fundamentally different.

Fair enough. So, it means that, in particular, DIContext abstraction (and especially stuff
like DIDumpType) just doesn't work, and we should get rid of it and leave just DWARFContext and
friends. Then we can implement PDBContext in DebugInfoPDB library, that would have its own methods to
query debug info, and dynamically choose between these objects in llvm-symbolizer implementation.

However, for not I'm not convinced about DILineInfo. I haven't seen suggested
SymbolizationContextPDB::getLineInfoForAddress implementation yet, but probably it would be easy
enough to create DILineInfo structure, fill four its fields and return it, than create a class hierarchy
with a bunch of virtual functions.

We can discuss the alternatives in person tomorrow to save cycles.

tools/llvm-symbolizer/SymbolizationContext.h
30

Any specific reason why you wouldn't be able to use plain DILineInfo struct to hold data parsed from PDB (if its declaration would be visible?)

tools/llvm-symbolizer/SymbolizationContextDWARF.cpp
23

This should probably be moved to DILineInfo definition.

zturner abandoned this revision.Apr 23 2015, 2:35 PM