Page MenuHomePhabricator

Teach the symbolizer lib symbolize objects directly.
ClosedPublic

Authored by ychen on Tue, Jun 18, 3:25 PM.

Details

Summary

Currently, the symbolizer lib can only symbolize a file on disk.
This patch teaches the symbolizer lib to symbolize objects.
llvm-objdump needs this to support archive disassembly with source info.

https://bugs.llvm.org/show_bug.cgi?id=41871

Diff Detail

Repository
rL LLVM

Event Timeline

ychen created this revision.Tue, Jun 18, 3:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Jun 18, 3:25 PM
jhenderson added inline comments.
llvm/test/tools/llvm-objdump/X86/disassemble-archive-with-source.ll
3 ↗(On Diff #205452)

debuginfo is usually two words (i.e. debug info) or better yet, call it debug data, to avoid confusion with the .debug_info DWARF section.

ychen updated this revision to Diff 205678.Wed, Jun 19, 2:41 PM
  • update comment
ychen marked an inline comment as done.Wed, Jun 19, 2:41 PM
jhenderson accepted this revision.Thu, Jun 20, 2:17 AM

So I'm happy with this approach, but I'd like somebody else to confirm that they are too.

This revision is now accepted and ready to land.Thu, Jun 20, 2:17 AM

I am probably not the right person to accept this, so minor suggestion is below.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
92 ↗(On Diff #205678)

I'd simplify.

const auto &I = Modules.find(ModuleName);
if (I != Modules.end())
  return symbolizeCodeCommon(I->second.get(), ModuleOffset);

Expected<SymbolizableModule *> InfoOrErr =
                    createModuleInfo(&Obj, std::move(Context), ModuleName);
if (!InfoOrErr)
  return InfoOrErr.takeError();
return symbolizeCodeCommon(InfoOrErr.get(), ModuleOffset);
MaskRay added inline comments.Fri, Jun 21, 3:02 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
92 ↗(On Diff #205678)

Let's avoid the lifetime extension:

auto I = Modules.find(ModuleName);

const auto &I is a const reference to a temporary iterator. It is odd.

MaskRay added inline comments.Fri, Jun 21, 3:35 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
403 ↗(On Diff #205678)

I think we can split getOrCreateModuleInfo into two halves. The latter half can be reused by this part of code in the new symbolizeCode:

  std::unique_ptr<DIContext> Context =
      DWARFContext::create(Obj, nullptr, DWARFContext::defaultErrorHandler);
/////////// codeview is not supported
  Expected<SymbolizableModule *> InfoOrErr =
      createModuleInfo(&Obj, std::move(Context), ModuleName);
  if (!InfoOrErr)
    return InfoOrErr.takeError();
  Info = InfoOrErr.get();

This way, it may make it easy to add codeview support for archive files is someone cares about it.

(getOrCreateObjectPair handles .gnu_debuglink, but I think we can forget about that for an archive member..

470 ↗(On Diff #205678)

Nonnullness of Context is obvious.

ychen updated this revision to Diff 206255.Mon, Jun 24, 10:21 AM
  • address reviewer's comments.
ychen marked 4 inline comments as done.Mon, Jun 24, 10:22 AM
jhenderson added inline comments.Tue, Jun 25, 1:43 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
82 ↗(On Diff #206255)

This TODO seems somewhat arbitrary in this location. Is CodeView supported for other parts of LLVM symbolizer? What would happen if somebody tried to use this bit of code with CodeView, compared with old behaviour?

ychen marked an inline comment as done.Tue, Jun 25, 9:25 AM
ychen added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
82 ↗(On Diff #206255)

I'm not entirely sure. I know PDB and CodeView are not exactly the same thing. Looks like the name is used interchangeably in this context? COFFObjectFile::getDebugPDBInfo said it is COFF::IMAGE_DEBUG_TYPE_CODEVIEW. If somebody uses this code with CodeView, loadDataForEXE would fail because of Objects.first->getFileName(), symbolizeCode would return error. Old behavior of symbolizeCode(const ObjectFile &Obj, ..) would always create a DWARFContext.

// If this is a COFF object containing PDB info, use a PDBContext to
// symbolize. Otherwise, use DWARF.
if (auto CoffObject = dyn_cast<COFFObjectFile>(Objects.first)) {
  const codeview::DebugInfo *DebugInfo;
  StringRef PDBFileName;
  auto EC = CoffObject->getDebugPDBInfo(DebugInfo, PDBFileName);
  if (!EC && DebugInfo != nullptr && !PDBFileName.empty()) {
    using namespace pdb;
    std::unique_ptr<IPDBSession> Session;
    if (auto Err = loadDataForEXE(PDB_ReaderType::DIA,
                                  Objects.first->getFileName(), Session)) {
      Modules.insert(
          std::make_pair(ModuleName, std::unique_ptr<SymbolizableModule>()));
      // Return along the PDB filename to provide more context
      return createFileError(PDBFileName, std::move(Err));
    }
    Context.reset(new PDBContext(*CoffObject, std::move(Session)));
  }
}
jhenderson added inline comments.Wed, Jun 26, 2:04 AM
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
82 ↗(On Diff #206255)

I'm not comfortable with a loss in functionality versus the previous version, unless the old functionality was broken and wouldn't work anyway. Not knowing enough about how CodeView/PDB work, I'm not sure I can comment much further.

I don't think it's a loss of functionality versus the previous version which creates DWARFContext even for COFF object (not work obviously). This version still does not work for COFF archive but looks a little bit more reasonable. IMHO, we should have no or all support for COFF archive, not partial support. I could either

  1. Land the previous version (with the additional check for COFF so users are notified of the lack of support) and revisit COFF archive support later.
  2. Support COFF archive file in this patch with the necessary tests.

I would prefer the first choice since one patch for each makes more sense. @MaskRay @jhenderson thoughts?

MaskRay accepted this revision.Thu, Jun 27, 2:14 AM
MaskRay added a subscriber: rnk.
MaskRay added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
82 ↗(On Diff #206255)

This todo (archive not supported) is fine, as long as the normal code path LLVMSymbolizer::symbolizeCode(const std::string &ModuleName, object::SectionedAddress ModuleOffset) below still works.

93 ↗(On Diff #206255)

Should make sure test/tools/llvm-symbolizer/pdb/ still works.

I don't think it's a loss of functionality versus the previous version which creates DWARFContext even for COFF object (not work obviously). This version still does not work for COFF archive but looks a little bit more reasonable. IMHO, we should have no or all support for COFF archive, not partial support. I could either

  1. Land the previous version (with the additional check for COFF so users are notified of the lack of support) and revisit COFF archive support later.
  2. Support COFF archive file in this patch with the necessary tests.

    I would prefer the first choice since one patch for each makes more sense. @MaskRay @jhenderson thoughts?

If we don't support COFF archives already at all (i.e. they don't work under any situation), then I don't think there's a problem that needs solving here, but we should definitely not make things worse. If archives containing COFF with PDBs work as well as before, then I think this is fine. If they no longer work, that's a problem and you should go back to the old behaviour. Beyond that, I don't think you need to fix the world in this patch. You've fixed the problem for DWARF-based debug information, and further improvements can be separate patches. Perhaps best would be to file a separate bug to track the additional cases that are unfixed by this?

rnk added a comment.Thu, Jun 27, 1:46 PM

I'm pretty sure the PDB symbolization context assumes it has a PE image (EXE or DLL), which it can use to extract the moral equivalent of .gnu_debuglink to find the PDB. That's all handled internally.

Hypothetically, we could change things to read the debug info sections from plain object files before they have been linked, but it's difficult and probably not worth the effort.

ychen added a comment.Fri, Jun 28, 9:52 AM

Hi @MaskRay, from @rnk 's inputs, seems this is not a common case worth pursuing at least for the moment, is it ok to move forward with the first choice I described?

Hi @MaskRay, from @rnk 's inputs, seems this is not a common case worth pursuing at least for the moment, is it ok to move forward with the first choice I described?

My comment on your first version of this patch is about the potential code duplication. Now the codeview concern of the code duplication has been addressed.

The current version doesn't look bad to me. Do you think the initial version is better?

ychen added a comment.Mon, Jul 1, 10:02 AM

Hi @MaskRay, from @rnk 's inputs, seems this is not a common case worth pursuing at least for the moment, is it ok to move forward with the first choice I described?

My comment on your first version of this patch is about the potential code duplication. Now the codeview concern of the code duplication has been addressed.

The current version doesn't look bad to me. Do you think the initial version is better?

The expected client for this new symbolizeCode interface is for archive members and COFF is unlikely to symbolize plain object files for the moment.
I think the concern here the code movement for removing the potential duplication puts some untested code in the path for DWARF. I assume if there is a need for the COFF archive member symbolization we could easily deduplicate the code in the patch for COFF?

Hi @MaskRay, from @rnk 's inputs, seems this is not a common case worth pursuing at least for the moment, is it ok to move forward with the first choice I described?

My comment on your first version of this patch is about the potential code duplication. Now the codeview concern of the code duplication has been addressed.

The current version doesn't look bad to me. Do you think the initial version is better?

The expected client for this new symbolizeCode interface is for archive members and COFF is unlikely to symbolize plain object files for the moment.
I think the concern here the code movement for removing the potential duplication puts some untested code in the path for DWARF. I assume if there is a need for the COFF archive member symbolization we could easily deduplicate the code in the patch for COFF?

I think you meant "puts some untested code in the path for COFF". That is fine to me.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
431 ↗(On Diff #206255)

You can place this block before createModuleInfo to minimize the diff.

ychen updated this revision to Diff 207601.Tue, Jul 2, 12:15 PM
  • update
jhenderson accepted this revision.Wed, Jul 3, 1:40 AM

LGTM, with a couple of nits. Happy with or without them.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
88 ↗(On Diff #207601)

It's more common to use *InfoOrErr for Expected, I believe, to get the actual value.

97 ↗(On Diff #207601)

Ditto.

MaskRay accepted this revision.Wed, Jul 3, 2:03 AM
MaskRay added inline comments.
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
429 ↗(On Diff #207601)

Nit: std::error_code

ychen updated this revision to Diff 207855.Wed, Jul 3, 11:56 AM
  • update
ychen marked 3 inline comments as done.Wed, Jul 3, 11:56 AM
jhenderson accepted this revision.Thu, Jul 4, 1:36 AM
This revision was automatically updated to reflect the committed changes.