Page MenuHomePhabricator

[DWARF5] Added support for debug_macro section parsing and dumping in llvm-dwarfdump.
Needs RevisionPublic

Authored by SouraVX on Jan 21 2020, 2:00 AM.

Details

Summary

Summary:
This patch adds parsing and dumping DWARFv5 macro section in llvm-dwarfdump, it does not introduce any new switch. Existing "--debug-macro" can be used to dump macinfo/macro section.

Diff Detail

Event Timeline

SouraVX created this revision.Jan 21 2020, 2:00 AM

Unit tests: unknown.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt

SouraVX retitled this revision from [DWARF5] Added support for emission of debug_macro section. to [DWARF5] Added support for debug_macro section parsing and dumping in llvm-dwarfdump..Jan 21 2020, 2:27 AM
ikudrin added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
55

Make this field uint64_t so that it will be compatible with 64-bit DWARF from the beginning. Changing it later is always more difficult and error-prone.

SouraVX updated this revision to Diff 239484.Jan 21 2020, 8:00 PM

Quick Rebase.

SouraVX marked an inline comment as done.Jan 22 2020, 4:14 AM
SouraVX added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
55

Hi @ikudrin Thanks for reviewing this. Even I'm confused here about DWARF32, DWARF64. For instance here to access debug_str section Offset in uint64_t that means debug_str is already DWARF64. ??
Even, while accessing macro, I've used uint64_t Offset, is it wrong ?? If macro section is already DWARF64, then I've have correct generartion patch D72828 also to emit DWARF64 in macro header.

Can somebody help/explain this. ?? @probinson @aprantl
Spec Pg. 144 Section 6.1.1.4.2 says ->
"DWARF-32 format, a section offset is 4 bytes, while in the DWARF-64 format, a section offset is 8 bytes."

ikudrin added inline comments.Jan 22 2020, 5:30 AM
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
55

You don't need to implement the support for DWARF64 right now, at least in this patch. DWARF32 is the most popular and, as such, a more important format. Support for DWARF64 may be added later.

Anyway, it is better to be ready for it to minimize rewritings in the future.

SouraVX updated this revision to Diff 242083.Mon, Feb 3, 8:28 AM

This revision, rebase itself on top of @dblaikie https://reviews.llvm.org/rG9e8bff71d073abf0ef7f8bb4ac04ec8689babcad
Thanks @dblaikie for this, that was in my mind, but initially I thought, I should seek your comments/suggestion on the overall patch first.
Certainly it's still has some issues, since it is also built/uses existing macinfo infra. Primary reason behind going towards this approach was that macinfo and new macro section, Shares couple of common things -> encodings, internal storage and dumping infra can be shared.
But, I'm open to suggestions, If you think macro dumping infra should be separate from macinfo from ground up we can do that also.

Ping!!
To all reviewers!

This revision cannot be compiled on its own. It looks like it depends on D72828, but, as @dblaikie suggested in D72828#1824527, the support in llvm-dwarfdump should be implemented first, so that it can be used later for tests. Please, also add some artificial tests here to check llvm-dwarfdump itself.

Please, make the DebugLineOffset field optional, as that is required by the DWARF standard.

Please, handle the possible errors in the input data while parsing the tables, like insufficient section size, unexpected field values, etc.

The former implementation of DWARFDebugMacro represented the whole section. The new implementation can represent only one macro unit, as it is defined in the DWARF5 standard. Please, fix that so that the meaning of the class is not changed.

It may be worth to support .debug_macro.dwo, too.

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
30

You probably don't need the header fields in a separate struct. That was used in some places to calculate the total size of the header by sizeof(Header), however, as some fields are optional, like debug_line_offset and opcode_operands_table, or have different sizes in DWARF32 and DWARF64, that approach is not usable anymore.

55

Please, make this field uint64_t.

68

You call this method from DWARFDebugMacro::parseMacroHeader(), which reads the value from an untrusted source. Using assert() here means that the program just crashes on an unexpected input in Debug build or ignore that input in Release build. Both are wrong. It should print a diagnostic message instead.

By the way, why do you use an explicit constant here if you have already defined the corresponding named constants?

70

Using |= here seems incorrect because the usage of this method implies setting the whole set of flags.

73

Make this uint64_t.

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
327

I am pretty sure that DWARFContext should not be aware of the internals of that section. All that should be hidden inside the DWARFDebugMacro class. parse() should parse the whole thing as well as dump() also should print everything. Whether the "whole thing" should be the whole section or just one macro unit might be debatable, though.

834

I find the name of this and the following method a bit misleading now. For my taste, they should be renamed to getDebugMacinfoDWO() and getDebugMacinfo().

SouraVX marked 7 inline comments as done.Mon, Feb 10, 6:40 AM

Hi! Thank You so much for you comments.

Please, make the DebugLineOffset field optional, as that is required by the DWARF standard.

Indeed it's optional, but I think it should be absolutely present, Please consider @probinson comment in D72828.

Please, handle the possible errors in the input data while parsing the tables, like insufficient section size, unexpected field values, etc.

The former implementation of DWARFDebugMacro represented the whole section. The new implementation can represent only one macro unit, as it is defined in the DWARF5 standard. Please, fix that so that the meaning of the class is not changed.

Not sure, what you meant here, I've tested this implementation with multiple CU's having debug_macro sections. This implementation can dump the entire section[individual and final merged sections in a.out]. It reads/dump header then section content. After reaching to end check's for validity of offset, if valid repeat this cycle until the Offset is invalid.

It may be worth to support .debug_macro.dwo, too.

On my TODO list once, debug_macro emission/dumping get's in.

llvm/lib/DebugInfo/DWARF/DWARFContext.cpp
327

Yes, I had that in mind while writing this. That can be done, but we'll end with another function *parsev5/dumpv5* or so Do we need to ?
Since most part of dumping/parsing can be reused.
Or extending the exisiting parse/dump to accomodate v5 header also ?
Or should we opt for fresh start for v5 macro. ?
So I ended writing a separate function for the header. Consider location list for instance, slightly same sort of thing happens their.
Parse/dump Header -> then section contents.

834

Yes, that's because of naming convention previously followed, for instance. Line:455 --

if (shouldDump(Explicit, ".debug_macinfo", DIDT_ID_DebugMacro,
                DObj->getMacinfoSection())) {
   getDebugMacro()->dump(OS);
 }

I plan to clean this up. After this.

SouraVX marked 2 inline comments as done.Mon, Feb 10, 6:58 AM
SouraVX added inline comments.
llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
55

Sorry for the trouble, I some how forgot revise and update my revision. It's been already done.

Currently the testing is very self-referential. Should we add a test in assembler (.s) in test/tools/llvm-dwarfdump that hardcodes a properly formatted macro section to make sure we don't add symmetric mistakes (where we implement the same mistake in both the generator and llvm-dwarfdump)?

llvm/include/llvm/DebugInfo/DWARF/DWARFDebugMacro.h
34

nit: This should probably be one long doxygen comment. (i.e., use / instead of )

Currently the testing is very self-referential. Should we add a test in assembler (.s) in test/tools/llvm-dwarfdump that hardcodes a properly formatted macro section to make sure we don't add symmetric mistakes (where we implement the same mistake in both the generator and llvm-dwarfdump)?

Not sure what you meant here. The generation/emission patch emits/formats the contents of debug_macro correctly, for instance if you try to generate macro section using emission patch in clang. You can dump section contents of debug_macro section using objdump, It will dump contents entirely without any warnings/errors. As one more step for content verification, you use generated binary and can expand macros in LLDB. All these reaffirms that our emission implementation is correct!

Only reason, I've segregated this in 2 patches{emission/dumping} is for ease of review process.
Once we get through this, I'm planning to merge back these 2 patches into single patch and land it as debug_macro section support in llvm and llvm-dwarfdump. Or in a mutually agreed way.

Currently the testing is very self-referential. Should we add a test in assembler (.s) in test/tools/llvm-dwarfdump that hardcodes a properly formatted macro section to make sure we don't add symmetric mistakes (where we implement the same mistake in both the generator and llvm-dwarfdump)?

Not sure what you meant here. The generation/emission patch emits/formats the contents of debug_macro correctly, for instance if you try to generate macro section using emission patch in clang. You can dump section contents of debug_macro section using objdump, It will dump contents entirely without any warnings/errors. As one more step for content verification, you use generated binary and can expand macros in LLDB. All these reaffirms that our emission implementation is correct!

If we only test the encoding output by LLVM with the decoding in LLVM, we could have a mistake in both that our testing doesn't catch. It would be good to add a hardcoded assembler, binary, or yaml2obj input (perhaps even one that was generated be another compiler, but manually written is just fine, too) to make sure we can parse a known-to-be-correct reference input.

I'm speaking out of personal experience here. It happened to me before that I misinterpreted the DWARF spec and implemented a wrong encoding in both LLVM and LLDB that worked fine within the tools, but would have broken interoperability with other producers and consumers.

I'm speaking out of personal experience here. It happened to me before that I misinterpreted the DWARF spec and implemented a wrong encoding in both LLVM and LLDB that worked fine within the tools, but would have broken interoperability with other producers and consumers.

Thank You, for sharing this. Will do this in next revision.

Yeah, we don't always do this perfectly (sometimes get lazy and add dumping/output support in the same patch, without standalone tests for the dumping) but it's a good practice to get into & helps separate patches out (yes, the dumping and output patches should be committed separately and should be tested when they're committed - which means assembly or yaml based input for the dumping functionality first, then the codegen/output testing is done in terms of the dumping functionality already submitted/tested).

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
100–101
SouraVX updated this revision to Diff 244168.Wed, Feb 12, 7:05 AM

Thank you for reviewing this.
Revised based on suggestions by @dblaikie @aprantl @ikudrin .

  • Added 2 asm test 1 positive and other negative case for llvm-dwardump coverage.
  • Removed assertion based error handling, when flag in header is different then 2. It's been replaced by more functional String based error reporting.
  • Remove else-after-return.
SouraVX marked an inline comment as done.Wed, Feb 12, 7:06 AM

@aprantl and others, are you guys Okay with these updated changes ?

dblaikie added inline comments.Tue, Feb 18, 1:32 PM
llvm/test/DebugInfo/X86/unsupported-flag-debug-macro-v5.s
7

This input seems more complicated/much longer than it needs to be to test the error message(s)?

SouraVX updated this revision to Diff 245270.Tue, Feb 18, 2:26 PM

@dblaikie , thanks for the comment, removed the unnecessary stuff from test case.

SouraVX marked an inline comment as done.Tue, Feb 18, 2:33 PM
dblaikie added inline comments.Tue, Feb 18, 2:55 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
91–97

I can't seem to make sense of this comment. Both the commend and the code seem incorrect to me - there should still be one debug_macro contribution per-CU in DWARFv5 (it'd not be possible for there to be only one across all CUs without DWARF-aware linking, which is something the DWARF spec tries very hard to not require)

Have you tried compiling two files with DWARFv5 debug_macro contents using your prototype patches & then looking at the result? I would expect there should be two debug_macro contributions, and that this functionality as currently commented/written would only dump one of them.

108–117

This switch has a mix of DW_MACRO and DW_MACINFO enumerators in it, which seems suspicous/questionable.

Perhaps it'd be better to use DW_MACRO macros exclusively and include a comment prior to the switch explaining that the macinfo values are a strict subset (with identical constant values) of the macro values that existed previously?

llvm/test/DebugInfo/X86/debug-macro-v5.s
11–29

Maybe strip this down to just one example of each of the new-in-v5 forms?

SouraVX marked an inline comment as done.Tue, Feb 18, 3:08 PM
SouraVX added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
91–97

Apologies for the confusion here, Yes I've tested the generation/ dumping using dwarfdump even with objdump rigorously.
This work flawlessly for multipe CU case.
Snippet from dump taken just now 2 CU --

.debug_macro contents:
0x00000000: macro header: version = 0x0005, flags = 0x02, debug_line_offset = 0x0000
DW_MACRO_start_file - lineno: 0 filenum: 0
  DW_MACRO_define_strp - lineno: 1 macro: BAR 4
DW_MACRO_end_file
DW_MACRO_define_strp - lineno: 0 macro: __llvm__ 1

0080a: macro header: version = 0x0005, flags = 0x02, debug_line_offset = 0x0067
DW_MACRO_start_file - lineno: 0 filenum: 0
  DW_MACRO_define_strp - lineno: 1 macro: BAR 4
DW_MACRO_end_file
DW_MACRO_define_strp - lineno: 0 macro: __llvm__ 1

I've crossed checked these macro and debug_line offsets with objdump also using same binary. matches!

dblaikie added inline comments.Tue, Feb 18, 3:21 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
91–97

OK, so this is a subtle divergence between how debug_macinfo is parsed and debug_macro is parsed, even though they share some code. With macinfo the looping happens here in this function and with macro the looping happens in the caller to this in dumpMacroSection.

Please fix this so macro and macinfo are handled in similar/the same way - both should build a single DWARFDebugMacro object representing all the contributions, then dump them together from there. (look at the way the existing macinfo support is written in DWARFContext - starting there/making the macro support similar)

Hi @dblaikie, I'm noticing a unusual behavior here, somehow crept it, maybe due multiple branches and versions. Can you please hold on till I revise it.

SouraVX marked an inline comment as done.Tue, Feb 18, 3:40 PM
SouraVX added inline comments.
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
91–97

Thank You for the suggestion.
I think the situation here is same/a bit less/ like location list. macro got header, rest of it is same as macinfo for the most part.

As a result opted for separate header and content parsing/dumping. This segregates both header part and content parsing/dumping can be shared with both macinfo/macro with slight inevitable modifications.

Nevertheless, I'll try your suggested approach too.

BTW, If you're okay with generation part D72828, Can you please accept that?? I'm asking this because, this dumping patch is based on top of that.
So while working on this, I do not have to think about /changes/rebase and all/ about that. As I recall, Adrian & Paul has already shared their acceptance previously.

BTW, If you're okay with generation part D72828, Can you please accept that??

I do not want to approve that patch until this patch has been committed - since that patch should use the functionality provided by this patch for testing.

I'm asking this because, this dumping patch is based on top of that.

It should be (& I think it is?) the other way around - the LLVM functionality patch should depend on this dumping patch, because the functionality patch should use the dumping patch for testing.

So while working on this, I do not have to think about /changes/rebase and all/ about that. As I recall, Adrian & Paul has already shared their acceptance previously.

I'd not worry about keeping the other patch up to date, etc, while this review is happening - wait until this patch is done and committed, then update the other patch to use the final functionality that's committed here.

ikudrin requested changes to this revision.Tue, Feb 18, 11:00 PM

Please, make the DebugLineOffset field optional, as that is required by the DWARF standard.

Indeed it's optional, but I think it should be absolutely present, Please consider @probinson comment in D72828.

I am sure that @probinson meant a producer, not a dumper. Note that the current implementation is even unable to parse an example from the standard, see fig. D.72 on p. 363, DWARFv5.

llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
120

Please, add a "// TODO: Add support for DWARF64" comment.

This revision now requires changes to proceed.Tue, Feb 18, 11:00 PM
probinson added inline comments.Thu, Feb 20, 12:41 PM
llvm/lib/DebugInfo/DWARF/DWARFDebugMacro.cpp
24

I think printing the DebugLineOffset should be done only if the field is present (i.e., the corresponding flag is set).
You might also leave a FIXME for printing the opcode_operands_table.

148

This assumes that the offset_size_flag indicates 32-bit; it should either do the right thing, or check and reject 64-bit format.
This also assumes the debug_line_offset_flag bit is 1; it should check and skip parsing that field if the flag is 0.
Finally, it should parse (or skip with a warning) the optional opcode_operands_table.

Hi Thanks for your comments, this revision is getting ontrack/offtrack a bit. Sorry about that. As for opcode_openrand_table I initially added a FIXME me for that, but removed after
David's remark in D72828 -
I don't think this is a FIXME situation - LLVM has no extensions here, so there's no need for the opcode_operands_table.

Hi Thanks for your comments, this revision is getting ontrack/offtrack a bit. Sorry about that. As for opcode_openrand_table I initially added a FIXME me for that, but removed after
David's remark in D72828 -
I don't think this is a FIXME situation - LLVM has no extensions here, so there's no need for the opcode_operands_table.

That's because D72828 is the producer side; and LLVM has no need to emit the opcode_operands_table.
This patch is the consumer (dumper) side, and should be prepared for inputs that do have an opcode_operands_table (and don't have a debug_line_offset, or might be in 64-bit format). The dumper should operate correctly on objects produced by other compilers, not just Clang, and support all DWARF v5 features of the .debug_macro section.

@dblaikie, I tried parsing multiple CU's contribution's[1 CU is fine] in a single DWARFDebugMacro object. It's not working out correctly, since every unit's contribution has it's own *header* apart from list of macros which can be easily parsed and dumped through a container. Even dumping is not straightforward, if all contributions are in a single object.
I'll try again,.
BTW are their down sights of the approach I followed. One step at a time take CU parse/dump and continue till offset is valid, since this is also used in location list parsing and dumping i.e Each contribution getting it's own object and header resides separately in other place.

Thanks @probinson for clearing the context here. Next time when I'll revise these 2 patches, I'll try not to intermix the goals of consumers and producers.

@dblaikie, I tried parsing multiple CU's contribution's[1 CU is fine] in a single DWARFDebugMacro object. It's not working out correctly, since every unit's contribution has it's own *header* apart from list of macros which can be easily parsed and dumped through a container. Even dumping is not straightforward, if all contributions are in a single object.
I'll try again,.
BTW are their down sights of the approach I followed. One step at a time take CU parse/dump and continue till offset is valid, since this is also used in location list parsing and dumping i.e Each contribution getting it's own object and header resides separately in other place.

Yes, the particular downside I was trying to avoid is that the common code ended up being used in two very different ways - especially the early return in DWARFDebugMacro.cpp:96 - that the debug_macinfo code uses that loop to read all the contributions, but the debug_macro uses the same code to stop after the first list based on version number is pretty subtle/divergent in ways that would be best avoided to make the code consistent and thus easier to understand and maintain.

The header information would need to be kept on a per-MacroList basis (the MacroList is currently just the raw vector - that won't be sufficient - probably a refactoring patch or two to make MacroList an actual structure, so it can have the header member(s) added to it for debug_macro support would be good) because it could vary by contribution, so it can't be stored singularly on the DWARFDebugMacro object since that represents multiple contributions.