This is an archive of the discontinued LLVM Phabricator instance.

[DWARF5]:Added support for .debug_macro.dwo section in llvm-dwarfdump
ClosedPublic

Authored by SouraVX on Apr 20 2020, 8:54 AM.

Details

Summary

This patch extends the parsing and dumping support of llvm-dwarfdump for debug_macro.dwo section-

    • DW_MACRO_define
    • DW_MACRO_undef
    • DW_MACRO_start_file
    • DW_MACRO_end_file
    • DW_MACRO_define_strx
    • DW_MACRO_undef_strx
  • DW_MACRO_define_strp
    • DW_MACRO_undef_strp

Diff Detail

Event Timeline

SouraVX created this revision.Apr 20 2020, 8:54 AM
SouraVX marked an inline comment as done.Apr 20 2020, 9:05 AM
SouraVX added inline comments.Apr 20 2020, 9:05 AM
llvm/test/DebugInfo/X86/debug-macro-dwo.s
59 ↗(On Diff #258758)

Primary reason why this test case contains debug_info and other sections too, because llvm-dwarfdump , as of now, uses DWARF version number present in Header to distinguish between version 5 debug_str_offsets.dwo and pre-standardized str_offset section.
As a result if we try to reduce this test case to minimal, then we'll end up parsing the contents incorrectly.

dblaikie added inline comments.Apr 20 2020, 3:54 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
154–155 ↗(On Diff #258758)

Is there an API you could call to provide this answer, rather than hardcoding it? (like a MacroHeader structure that tells you the size based on DWARF32/64 enum - you could hardcode passing the DWARF32 enum value to such an API for now, but at least calling the API rather than hardcoding it would be good)

163 ↗(On Diff #258758)

Does the StringOffsetsExtractor already know which contribution it's reading from (if you have two objects linked together, with two string offsets contributions, would this code correctly read the desired one?)

Speaking of: these forms (_strx) aren't unique to debug_macro.dwo, they can be used in debug_macro - perhaps it'd be good to separate adding support for these new forms & testing them in debug_macro (non-dwo) support, then, separately, adding support for debug_macro.dwo?

SouraVX added inline comments.Apr 20 2020, 9:04 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
154–155 ↗(On Diff #258758)

Primary reason why this value is hard-coded here is because in a dwo object there can be only one contribution to debug_str_offets section. Hence StrOffsetsBase can be safely set to 8 bytes(Header Size).

Should we leave it as it ?? is so that iff strx form got supported in a non-dwo object too, then a function can should set up this correctly(discussed in next comment) with exception that if it's a dwo object set it up as 8/16 bytes according to DWARF32/64.

163 ↗(On Diff #258758)

Does the StringOffsetsExtractor already know which contribution it's reading from (if you have two objects linked together, with two string offsets contributions, would this code correctly read the desired one?)

In a dwo object there can't be multiple contributions, and perhaps the naming(of the Extractor) is confusing here, I 'll modify that too getStringOffsetsDWOExtractor as that's what has been passed by DWARFContext.

Speaking of: these forms (_strx) aren't unique to debug_macro.dwo, they can be used in debug_macro - perhaps it'd be good to separate adding support for these new forms & testing them in debug_macro (non-dwo) support, then, separately, adding support for debug_macro.dwo?

Yes they can be used in macro (non-dwo) too, emission from clang/llvm won't be a problem. But parsing/dumping is where it will add complexity that IMO is un-justified.
Added complexity in a sense, for example to parse a single strx entry form we have to look for its CU(from a list of all CU's) contribution to debug_str_offsets section i.e DW_AT_str_offsets_base, then correctly parsing it to set StrOffsetsBase and subsequent actual macro defined in debug_str section.
(For getting CU contribution, we could use DW_AT_macros to search among the list of CU's then extract DW_AT_str_offsets_base from it ?)

This whole setup looks a bit complex/error-prone at-least in contrast to the simplicity provided by strp entry form in case multiple CU's linked in a normal object.

SouraVX added a comment.EditedApr 21 2020, 5:09 AM

+1 more point to be considered here is that as of now strp form in (non-split) binaries are supported by debuggers including GDB and LLDB (can expand macros in debugger correctly). But strx form is not supported in any case split or non-split.

SouraVX updated this revision to Diff 259009.Apr 21 2020, 8:25 AM

Addressed @dblaikie comments, thanks for this!
This revision extends llvm-dwarfdump to support *define_strx/*undef_strx forms dumping present in either macro or macro.dwo section.
As it is evident from here strx form support is a bit expensive from parsing/dumping point of view. In case of non-split macro we have to search in all CUs for DW_AT_str_offsets_base, in order to dump info correctly.

  • Added test case containing multiple CU's contribution to debug_macro section represented using strx forms. llvm-dwarfdump is able to parse and dump macro info correctly.
ikudrin added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
267–270

As string offsets sections are only involved in parsing .debug_macro[.dwo] section, you should not pass the extractors when parsing .debug_macinfo[.dwo] sections.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
118 ↗(On Diff #259009)

What if there are two or more CUs which use separate contributions to the macro section with different string offsets bases?

131 ↗(On Diff #259009)

You may want to support pre-v5 .debug_macro sections. They were implemented as a GNU extension in GCC 4.7. As they use the pre-standard format of .debug_str_offsets.dwo, there is no header.

See http://www.dwarfstd.org/ShowIssue.php?issue=110722.1 and https://gcc.gnu.org/wiki/DebugFission.

SouraVX added inline comments.Apr 22 2020, 9:22 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
267–270

Thanks, I'll mark/pass this parameter as optional.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
118 ↗(On Diff #259009)

Not sure what you meant here, this patch provide support for this situation too.
The test case debug-macro-multi-cu-strx.s contains multiple CUs(2 CU) having separated contributions to macro section with different string offsets base.

131 ↗(On Diff #259009)

This should be taken separately IMO, we should first support spec.

SouraVX updated this revision to Diff 259340.Apr 22 2020, 11:05 AM

Addressed @ikudrin comments, Thanks for this!

  • Made StringOffsetsExtractor parmeter as Optional since it is only getting used in debug_macro[.dwo] case.
SouraVX marked 2 inline comments as done.Apr 22 2020, 11:06 AM

I am supporting @dblaikie in that the patch should be split. The part of adding *_strx forms should be separated from adding the support of debug_macro.dwo. The parts look independent of each other and can be added in any order.

And please, reduce the tests.

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
107–109 ↗(On Diff #259340)

CUs and StringExtractor should be optional because they are not needed when parsing debug_macinfo[.dwo].

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
265–273

This should not be filled if IsMacro == false. This way you can just pass it later in the code without additional condition.

269

This should be dwo_compile_units() if you parse a DWO section, no?

275–276

The strings section is not needed when parsing .debug_macinfo[.dwo], right? The argument should be optional, too. As it is written at the moment, it looks a bit misleading when you pass the extractor for .debug_str when parsing .debug_macinfo.dwo.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
118 ↗(On Diff #259009)

Ah, I see, M->Offset is hidden deep inside the lambda. A bit counterintuitive.

Anyway, this approach is very inefficient. The set of compile units is scanned each time *_strx code is encountered. And the implementation is very limited in the sense of what it can handle. DWARFUnit already has the code to find the contribution in the string offsets sections, no need to reimplement it again. Please, take a look at dumping the string offsets sections, dumpDWARFv5StringOffsetsSection() for now or dumpStringOffsetsSection() after D78555, in DWARFContext.cpp. I guess the approach may be adopted here.

163 ↗(On Diff #258758)

In a dwo object there can't be multiple contributions

What about dwp?

llvm/test/DebugInfo/X86/debug-macro-dwo.s
33 ↗(On Diff #259340)

Please, avoid hardcoding the length. Add a pair of labels and use an expression.

55 ↗(On Diff #259340)

Please, avoid hardcoding the offsets. Use something like .long .Linfo_string3-.debug_str.dwo instead.

68–71 ↗(On Diff #259340)

These attributes are not required for the test, right?

SouraVX marked 2 inline comments as done.Apr 23 2020, 1:30 AM

I am supporting @dblaikie in that the patch should be split. The part of adding *_strx forms should be separated from adding the support of debug_macro.dwo. The parts look independent of each other and can be added in any order.

Okay, I'll separate them, but first let us reach to a consensus on the approach used here and the way forward.

And please, reduce the tests.

Briefly stated the reason why they are a bit long, assuming D78555 will address this, I'll be happy to reduce them to bare minimal.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
118 ↗(On Diff #259009)

Anyway, this approach is very inefficient. The set of compile units is scanned each time *_strx code is encountered. And the implementation is very limited in the sense of what it can handle. DWARFUnit already has the code to find the contribution in the string offsets sections, no need to reimplement it again.

Primary problem with _strx form is that they are just indices so you have to map them to there corresponding CU get str_offset_base and using that dump. otherwise we'll end up doing this in-correctly.

And for mapping indices correctly to CU, we need to find the CU in list CU's based on macro contributions(since strx is contributing here). This is done here using DW_AT_macros to map to appropriate CU, subsequently getting the DW_AT_str_offsets_base and proceed forward.

Anyway, this approach is very inefficient.

Yea, that was one of the reason I was inclined to not to use _strx in debug_macro section at-least(In my first revision). But for debug_macro.dwo _strx seems a good fit there and _strp can't be allowed since that involves relocation .

BTW Is there a better/efficient way to accomplish this that I'm missing out here ?

llvm/test/DebugInfo/X86/debug-macro-dwo.s
33 ↗(On Diff #259340)

This comment is addressed in next comment.

55 ↗(On Diff #259340)

These are clang generated assembly, although slightly tweaked to fit our purpose. I was also surprised to see raw offsets, but as of now this is the way it is.
Do you want me to replace them with your approach ?

68–71 ↗(On Diff #259340)

I think(at the moment, writing this patch), they are needed. Since str_offsets section dumping relies on the MaxVersion for discriminating between v5 and pre-standardized str_offsets section. So CU Header is needed, that's how Maxversion is set to 5.

if (MaxVersion >= 5)
     dumpDWARFv5StringOffsetsSection(OS, DumpOpts, SectionName, Obj ...

But I guess after D78555, this shouldn't be the problem :) Thanks for this!

And please, reduce the tests.

Briefly stated the reason why they are a bit long, assuming D78555 will address this, I'll be happy to reduce them to bare minimal.

I can't imagine how D78555 can help with that.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
118 ↗(On Diff #259009)

BTW Is there a better/efficient way to accomplish this that I'm missing out here ?

As I've said, take a look at dumpDWARFv5StringOffsetsSection().

llvm/test/DebugInfo/X86/debug-macro-dwo.s
55 ↗(On Diff #259340)

The test is for llvm-dwarfdump, not clang. clang can use raw values because it produces the output primarily for other tools, not for people to read it; thus, if it can calculate the exact value, it does not need to use descriptive expressions. On the other hand, the test files, like any kind of source files, should be aimed for people in the first place. They have to be simple, clear, and understandable so that they are easy to maintain.

68–71 ↗(On Diff #259340)

I think(at the moment, writing this patch), they are needed.

How are DW_AT_producer, DW_AT_language, DW_AT_name etc. needed to set MaxVersion?

But I guess after D78555, this shouldn't be the problem :)

D78555 is solely about supporting DWARF64 string offsets tables. It is not expected to have any effect on this patch.

As I've said, take a look at dumpDWARFv5StringOffsetsSection().

@ikudrin, Sorry for churn :). But I'm still not getting as to how to map the indices(strx) to appropriate CU, without having to search in list(of CU's) using DW_AT_macros.
Sort of let's say we got 0 as an index(In every CU index will start from 0) from _strx form how do we map this to appropriate CU and retrieve the macro or string correctly.

dumpDWARFv5StringOffsetsSection() this function just collects the contributions from list of CU's and dump/prints it.

 for (auto &Contribution : Contributions) { 
Offset = Contribution->Base; 
// print dump using this as StrOffset

Could you please briefly elaborate the approach you have in mind.
Thanks a lot !

The function first scans the compilation units and collects their contributions to the string offsets section. Then it scans the collected list of contributions and dumps the section according to them, detecting overlaps and gaps as a bonus.

You can do the same for the macro section: scan the units, add to the list the offsets to the macro section and the corresponding contributions to the offsets section; then, go through the list, and for each item you will have both the offset in the macro section and the valid reference to the offsets section (if defined).

Thanks @ikudrin , I've created separate revision D78736 based on your suggestion.
I'm planning revise this after D78736 get's reviewed.

SouraVX updated this revision to Diff 260117.Apr 25 2020, 9:39 AM

Addressed @ikudrin comments. Thanks for this!

Rebased on top of D78736.

  • Reduced test case further, replaced hardcoded length and offsets from test case by creating temporary symbols.
ikudrin added inline comments.Apr 27 2020, 5:00 AM
llvm/include/llvm/MC/MCObjectFileInfo.h
116 ↗(On Diff #260117)

It looks like the changes in MC files should not be in this patch.

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
265–266

I would prefer a positive condition rather than negative.

265–274

Please, run clang-format on that.

268

It should be dwo_compile_units() to correspond compile_units() in the other branch.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
89–92 ↗(On Diff #260117)

Why the formatting was changed?

SouraVX updated this revision to Diff 260356.Apr 27 2020, 9:39 AM

Addressed @ikudrin, Thanks for this.

SouraVX marked 5 inline comments as done.Apr 27 2020, 9:43 AM
SouraVX added inline comments.
llvm/include/llvm/MC/MCObjectFileInfo.h
116 ↗(On Diff #260117)

It is needed in MCObjectFileInfo.cpp.

DwarfMacroDWOSection =
     Ctx->getELFSection(".debug_macro.dwo", DebugSecType, ELF::SHF_EXCLUDE);
ikudrin added inline comments.Apr 28 2020, 4:32 AM
llvm/include/llvm/MC/MCObjectFileInfo.h
116 ↗(On Diff #260117)

Yes, but the changes in MCObjectFileInfo.cpp are also not needed for this patch.

llvm/test/DebugInfo/X86/debug-macro-dwo.s
34–35 ↗(On Diff #260356)

Please, add comments for these two lines.

38–40 ↗(On Diff #260356)

The comments are misleading.

41 ↗(On Diff #260356)

Please, avoid splitting definitions of sections.

Add an empty line before the .section directives.

jdoerfert resigned from this revision.Apr 30 2020, 6:09 AM
SouraVX updated this revision to Diff 263357.May 11 2020, 11:14 PM

Rebased to tip!

  • Addressed @ikudrin comments WRT test case and the other. Thanks for this!
SouraVX updated this revision to Diff 263360.May 11 2020, 11:23 PM

MCSection *DwarfMacroSection = nullptr; removal.

SouraVX marked 4 inline comments as done.May 11 2020, 11:24 PM
ikudrin added inline comments.May 19 2020, 3:30 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
266–272

Passing an extractor for a .debug_str section when we parse a .debug_macro.dwo does not seem correct. Please, fix and add a test with DW_MACRO_*_strp entries in a .debug_macro.dwo section.

SouraVX added inline comments.May 19 2020, 4:08 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
266–272

Passing an extractor for a .debug_str section when we parse a .debug_macro.dwo does not seem correct.

Yes, StringExtractor is not getting used anywhere except for parsing DW_MACRO_*_strp` form. but the current interface(parseMacro) does not distinguish whether we're parsing .debug_macro or .debug_macro.dwo (except the units part(that is being distiguished)).
Do you think, it's a good idea to make this parameter optional ? Since it's only going to be used in parsing .debug_macro section.

Please, fix and add a test with DW_MACRO_*_strp entries in a .debug_macro.dwo section.

These forms are not allowed in a DWO object since these forms involves a direct relocation to the debug_str section. And in LLVM a relocation cannot be present in a DWO object.

ikudrin added inline comments.May 19 2020, 4:55 AM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
266–272

These forms are not allowed in a DWO object since these forms involves a direct relocation to the debug_str section. And in LLVM a relocation cannot be present in a DWO object.

There is no need for a relocation to reference a record in the .debug_str.dwo section; e.g. .debug_str_offsets.dwo references the strings somehow, right? Moreover, in figure B.2 on page 278 you may find a link (po) between sections .debug_macro.dwo and .debug_str.dwo. The comment for the link is missing, but it should correspond to the similar link (p) in figure B.1.

SouraVX updated this revision to Diff 264928.May 19 2020, 8:35 AM
SouraVX edited the summary of this revision. (Show Details)

Addressed @ikudrin comments, Thanks for this!

  • Extended llvm-dwarfdump to dump DW_MACRO_*_strp forms present in a debug_macro.dwo section.
  • Added test case for the same.
  • Edited the patch summary to reflect this.
SouraVX marked 3 inline comments as done.May 19 2020, 8:36 AM
dblaikie added inline comments.May 19 2020, 2:39 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
266–272

Oh, I was going to say that was just a spec bug, but seems to be explicitly supported:

In a .dwo file, referring to a string using DW_FORM_strp is valid, but such use results in a file that cannot be incorporated into a package file (which involves string merging).

(DWARFv5, F.1, top of page 393)

Thanks for pointing that out, @ikudrin !

ikudrin added inline comments.May 19 2020, 9:08 PM
llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
266–272

The standard definitely needs clarification on using *_strp forms in dwo/dwp files. A few pages after that it says:

In a split DWARF object file, all references to strings go through this table (there
are no other offsets to .debug_str.dwo in a split DWARF object file). That is,
there is no use of DW_FORM_strp in a split DWARF object file.

(DWARFv5, p. 405, lines 4-6).

As it seems from now, the tool should be ready to handle *_strp records in dwo (and maybe even in dwp) files, because they are allowed at least in some parts of the standard.

That was one of the reason(for adding *_strx forms dumping support) and as it seems like this whole ambiguity can be avoided(at least in LLVM) by unifying the emission of _strx forms in both split(debug_macro) and non-split(debug_macro.dwo) mode. D78865 and D78866 revisions addresses these respectively.

I am still not really contented that there are two distinct methods to get strings from a string section in DWARFDebugMacro::parseImpl(), one for _strp codes and another for _strx. Is there a way to unify them?

llvm/test/DebugInfo/X86/debug-macro-strx-dwo.s
50

Remove this empty line.

61

Add an empty line before .section.

64

In fact, this DIE does not have any children.

68

There should be yet another zero byte which terminates the abbreviation table for the CU.

SouraVX updated this revision to Diff 265466.May 21 2020, 3:03 AM

Addressed @ikudrin comments, Thanks for this!

  • Addressed test case related comments
SouraVX marked 4 inline comments as done.May 21 2020, 3:10 AM

I am still not really contented that there are two distinct methods to get strings from a string section in DWARFDebugMacro::parseImpl(), one for _strp codes and another for _strx. Is there a way to unify them?

I'm afraid not, since _strp and _strx forms parsing is entirely different.
_strp forms doesn't require any CU information(Test case shows that) to parse it that's why we need to explicitly pass the relevant StringExtractor based on context(macro or macro.dwo).
While for parsing _strx forms we need CU information(Test case shows that) to parse it, once we have the CU we can extract the desired StringExtractor and use that to parse macro or macro.dwo.

@ikudrin , does these changes looks good to you ?

ikudrin accepted this revision.May 29 2020, 5:55 AM

Well, improving DWARFDebugMacro::parseImpl() is not really related to this particular patch.

This revision is now accepted and ready to land.May 29 2020, 5:55 AM

Thanks @ikudrin for investing in reviewing this!
@dblaikie and all other reviewers, does these changes looks good to you folks ?

dblaikie accepted this revision.May 29 2020, 9:38 AM

Looks OK - thanks!

As @ikudrin mentioned, this could probably use some tidying up/refactoring after-the-fact.

This revision was automatically updated to reflect the committed changes.