This is an archive of the discontinued LLVM Phabricator instance.

DWARF: Implementation of v5 string offsets table (.debug_str_offsets[.dwo] section)
AbandonedPublic

Authored by wolfgangp on Apr 27 2017, 3:00 PM.

Details

Summary

This is a proposed implementation of the DWARF v5 string offsets table for both producer and consumer side. This includes handling of the DW_AT_str_offsets_base attribute in the unit headers and the usage of DW_FORM_strx for string references.

I'd like to leave the emission of DW_FORM_strx{1,2,3,4} for a follow-on patch. Also, apologies for not handling Mach-O. I'm not familiar with it just yet, though I would like to handle it in another follow-on unless someone else wants to.

This could be easily broken into 2 patches (1 producer, 1 consumer) it that's more manageable.

Diff Detail

Event Timeline

wolfgangp created this revision.Apr 27 2017, 3:00 PM
aprantl edited edge metadata.Apr 27 2017, 3:59 PM

Thanks! The patch may be large, but the changes are mostly straightforward and generally look good to me. I found a couple of nits inline.

include/llvm/DebugInfo/DWARF/DWARFContext.h
241
/// DWARF 5.
/// @{
...
/// @}
include/llvm/DebugInfo/DWARF/DWARFUnit.h
217

const RelocAddrMap &getStringOffsetsRelocMap() const?

lib/CodeGen/AsmPrinter/DIE.cpp
618

http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments
wants us to comment in on place only. This should be a doxygen comment and it should be on the function prototype inside DIEIndexedString.

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
80

nullptr?

1587

Move comment into header.

lib/CodeGen/AsmPrinter/DwarfUnit.h
314

/// Add a label delta attribute data and value.

318

ditto

lib/DebugInfo/DWARF/DWARFContext.cpp
102

comment why?

104

drop the else

317

"DWARF 5" vs "DWARF v5" vs "DWARF Version 5"?

Also, apologies for not handling Mach-O.

Typically we try to do a best effort on all supported containers (ELF, Mach-O, COFF) when adding new features.
Could you outline what is necessary there? Is it just coming up with a short-enough section name or is there more to it?

Also, apologies for not handling Mach-O.

Typically we try to do a best effort on all supported containers (ELF, Mach-O, COFF) when adding new features.
Could you outline what is necessary there? Is it just coming up with a short-enough section name or is there more to it?

That is probably it. Basically, the hardest part is just making string references unique to the unit (compile unit, type unit) and putting all the relocatable string references into the new section. Nothing that should be specific to the object file format. I can take a stab at it, it shouldn't take too long.

Also, apologies for not handling Mach-O.

Typically we try to do a best effort on all supported containers (ELF, Mach-O, COFF) when adding new features.
Could you outline what is necessary there? Is it just coming up with a short-enough section name or is there more to it?

That is probably it. Basically, the hardest part is just making string references unique to the unit (compile unit, type unit) and putting all the relocatable string references into the new section. Nothing that should be specific to the object file format. I can take a stab at it, it shouldn't take too long.

Thanks! I looked through my mail archive and found this text that I compiled for the dwarf mailing list last year about the various platforms' name limitations:

I recommend adding the following non-normative text about Mach-O [1] (as used by OS X, iOS, ...):

In the Mach-O object file format, debug info sections are located
in the __DEBUG segment. By convention, section names start with a
double underscore instead of the leading dot. Section names are
truncated to 16 characters. Since Mach-O linkers do not link the
debug information, there is no real use-case for the .dwo
sections.

If I had to invent names for the .dwo sections I would replace the debug prefix with dwo. For example:

1234567890123456 1234567890123456
.debug_str_offsets.dwo -> __dwo_str_offset

But, since no compiler implements DWO support for Mach-O (it is solving a problem that does not exist on the platform), I’m unsure whether we should standardize names at this point.

For COFF, which is used on Windows, the situation is confusing. I’ve been told that the COFF linker truncates section names after 8 characters. Still, apparently the file format can hold more than that and compilers like clang use the full ELF section names and this does not seem to be problem. My best guess is that this works because all of the DWARF 2 section names are unique after 8 characters. Other compilers like Microsoft’s Visual Studio don’t emit DWARF, but use their own CodeView debug file format. I’m unsure how to describe this situation in the text. Maybe we should pass.

  • adrian
wolfgangp updated this revision to Diff 97311.May 1 2017, 11:32 AM

Added support for Mach-O and COFF and addressed review comments.

wolfgangp marked 8 inline comments as done.May 1 2017, 11:36 AM
wolfgangp added inline comments.
lib/CodeGen/AsmPrinter/DwarfUnit.cpp
80

This symbol is not needed in v4 and earlier and only gets created when there is a string offsets section. Could you elaborate what your objection is?

aprantl added inline comments.May 1 2017, 11:45 AM
lib/CodeGen/AsmPrinter/DwarfUnit.cpp
80

It looks like phabricator misleadingly interpreted my source code as formatting :-)
this was meant to say: = nullptr, i.e., doing a C++11-style initialization in the header file might look better here?

Also, Dave would prefer to split the patch in 2, in producer/consumer. Any objections?

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
80

Ah, OK, sure. I agree. I'll make that change.

wolfgangp abandoned this revision.May 3 2017, 10:20 AM

By the way: you probably noticed that since last week have a brand new DWARF Verifier (accessible via dwarfdump --verify). It might be nice to develop Verifier checks alongside with the support for new sections/formats.