This is an archive of the discontinued LLVM Phabricator instance.

[ObjectYAML][ELF] Add support for emitting the .debug_gnu_pubnames/pubtypes sections.
ClosedPublic

Authored by Higuoxing on Jun 23 2020, 4:36 AM.

Details

Summary

This patch helps add support for emitting the .debug_gnu_pubnames and .debug_gnu_pubtypes sections.

The .debug_gnu_pub* sections is verified by llvm-dwarfdump.

Known issues:

  • Doesn't support emitting multiple pub-tables.

Diff Detail

Event Timeline

Higuoxing created this revision.Jun 23 2020, 4:36 AM

My understanding of the .debug_gun_pubnames table is that it is a slightly different format to the .debug_pubnames table. The output of yaml2obj here is not consistent with the parsing code for GNU style, as far as I can tell, because of this and the bug you mentioned. I don't think we should add support for the section unless it is actually formatted correctly, as it is likely to cause confusion, should anybody attempt to use it in its seemingly-working-but-broken state.

Higuoxing planned changes to this revision.Jun 23 2020, 6:16 AM

My understanding of the .debug_gun_pubnames table is that it is a slightly different format to the .debug_pubnames table. The output of yaml2obj here is not consistent with the parsing code for GNU style, as far as I can tell, because of this and the bug you mentioned. I don't think we should add support for the section unless it is actually formatted correctly, as it is likely to cause confusion, should anybody attempt to use it in its seemingly-working-but-broken state.

Got it! I will try to fix it later.

Higuoxing edited the summary of this revision. (Show Details)Jun 30 2020, 1:15 AM
jhenderson accepted this revision.Jun 30 2020, 1:55 AM

LGTM, with one suggestion.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubtypes.yaml
110

Probably missed this elsewhere, but can't this be SIZE-NEXT?

This revision is now accepted and ready to land.Jun 30 2020, 1:55 AM
This revision was automatically updated to reflect the committed changes.
dblaikie added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

Should this be tested via llvm-dwarfdump instead? (perhaps there's already lots of precedent/reasons that yaml2obj is being tested via readobj?)

Higuoxing marked an inline comment as done.Jul 6 2020, 6:58 PM
Higuoxing added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

Because some tests in llvm-dwarfdump are using yaml2obj to generate DWARF sections, e.g., llvm-dwarfdump/X86/verify_overlapping_cu_ranges.yaml, llvm-dwarfdump/X86/Inputs/i386_macho_with_debug.yaml, etc. We don't want to create a circular dependency. Does it make sense?

dblaikie added inline comments.Jul 6 2020, 8:45 PM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

Hmm, fair enough. Not sure what the right call is there - I would've thought assembly would be easier to read than hex object dumps? Case in point with these hex dumps and multiline ASCII art comments, compared to assembly with comments & appropriate-width values, symbolic expressions, etc.

(so using assembly tests for llvm-dwarfdump and then llvm-dwarfdump for tests of obj2yaml, rather than obj2yaml tests of llvm-dwarfdump and objdump tests of obj2yaml)

jhenderson added inline comments.Jul 7 2020, 2:38 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

(just in case you missed it, this is a yaml2obj test). The intent longer term with @Higuoxing's project is to get yaml2obj DWARF support to a good enough state that it makes it much easier to craft tests for llvm-dwarfdump etc without needing to specify all the fine details that assembly currently requires (just consider how much assembly some of the exisiting llvm-dwarfdump tests require for example). Assembly would probably still work well for creating broken inputs, but yaml2obj would be better for the higher-level testing.

The problem of course with using yaml2obj to test llvm-dwarfdump is that we can't use the reverse. Somewhere, we have to test either hex output or use assembly (or YAML + raw content hex) input. Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great).

Once we have basic testing in place for all the DWARF sections, it should be possible to use llvm-dwarfdump to verify the higher level auto-generation of things by yaml2obj that is intended for later in the project.

Higuoxing marked an inline comment as done.Jul 7 2020, 2:48 AM
Higuoxing added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

Oops, I missed @dblaikie 's previous comments. Thank you @jhenderson for clarifying this for me!

dblaikie added inline comments.Jul 7 2020, 5:29 PM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

Whilst I agree assembly input would be easier to read than this hex output, it rather defeats the point of the project, and it doesn't scale well (in theory, the testing here can be kept fairly small, so the costs of having hex aren't too great).

Not sure - why is it likely that the yaml2obj+hexdump tests scale better than the assembly+llvm-dwarfdump tests directly? Seems like we'd have to test maybe as many weird cases of DWARF emission to get a nice legible format for writing dwarfdump tests as we would for the dwarfdump tests themselves? It's starting to feel a bit "turtles all the way down" to me.

Something like yaml2obj could be handy for testing lldb, for instance - constructing arbitrarily interesting inputs. But for the yaml2obj<>llvm-dwarfdump circularity, I'm not so sure.

jhenderson added inline comments.Jul 8 2020, 12:08 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too.

dblaikie added inline comments.Jul 8 2020, 10:53 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

By "scale" I meant the auto-generation aspects probably don't need to be tested using hex dumps, so can be tested using llvm-dwarfdump, but honestly I'm not sure either way too.

What do you mean by "auto-generation aspects"?

But, yeah, I'm not holding this patch up over this direction that's already got precedent, etc - but raising the question at least for consideration/thinking about over time.

jhenderson added inline comments.Jul 9 2020, 12:46 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

At the moment, to use yaml2obj to generate DWARF, you have to specify pretty much every detail of the DWARF, including the details of the abbrev table and the string table for example. Ideally, we should be able to describe the DWARF in a higher level manner (e.g. by just specifying the attributes and values in the .debug_info description, letting yaml2obj do all the leg work of selecting a form, populating the abbrev and string tables etc). You'll see details of this in @Higuoxing's mailing list posts about his GSOC project.

We can use the basic-level testing for "bootstrapping". yaml2obj can generate valid raw sections, tested via hex -> allows testing of llvm-dwarfdump section dumping -> allows testing of yaml2obj higher-level functionality (because we know that llvm-dwarfdump section dumping now works).

dblaikie added inline comments.Jul 11 2020, 11:18 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

That seems like it's going to be fairly subtle/hard to maintain the separation here - if some yaml2obj tests use hex dumping but others can use llvm-dwarfdump - if/when/that's happening, might be worth separate directories for the two kinds of tests and some fairly specific documentation about how to determine which tests go where.

Higuoxing marked an inline comment as done.Jul 12 2020, 1:56 AM
Higuoxing added inline comments.
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

What do you think of making elf2yaml support dumping DWARF sections? In the future, we can use raw assembly to test elf2yaml and use elf2yaml to test yaml2elf.

dblaikie added inline comments.Jul 12 2020, 9:35 AM
llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml
9–10

Probably useful that elf2yaml and yaml2elf roundtrip/support the same features (would make it easier to create yaml files to work with/pare down, etc).

But as for testing - not sure - seems like it adds another layer of indirection (then we'd use raw assembly+llvm-mc to test elf2yaml, to test yaml2elf, to test llvm-dwarfdump - when we could've been using raw assembly to test llvm-dwarfdump) & not sure how much it improves/streamlines the testing matrix.

All that said, we did used to test llvm-dwarfdump with checked in object files - then we accepted that assembly + llvm-mc didn't especially reduce the test quality despite increasing the surface area of the test by using llvm-mc. Though I think the more DWARF-specific the functionality gets the less that sort of line of reasoning applies (ie: Once we're generating all of DWARF - we're reaching the same complexity as the parsing logic and have now written a whole other DWARF representation with all the risk of bugs, etc).

But really - I don't have any particular action/takeaway from these thoughts right now, but I think they're worth keeping in mind/thinking about as this work continues.

Is ".debug_gnu_pubtypes" different and/or better than the old ".debug_pubtypes" in terms of contents? AFAIK all debuggers completely ignore these sections as they don't contain all types (only public types). I did a quick web search for ".debug_gnu_pubtypes" and didn't find any docs or anything documenting this format.

Is ".debug_gnu_pubtypes" different and/or better than the old ".debug_pubtypes" in terms of contents? AFAIK all debuggers completely ignore these sections as they don't contain all types (only public types). I did a quick web search for ".debug_gnu_pubtypes" and didn't find any docs or anything documenting this format.

My general understanding is that GCC/GDB folks implemented .debug_gnu_pubnames(/types) because of the lack of guarantees in debug_pubnames(/types) - GCC does consume/rely on these sections (& gold can produce .gdb_index from these gnu_pub* sections - which is /necessary/ for correctness of gdb when using Split DWARF, in fact).

I'm guessing like lots of stuff, it's probably basically defined by the agreement between GCC and GDB - and Clang/LLVM just tries to implement gnu_pub* to be as compatible as possible with that.

Is ".debug_gnu_pubtypes" different and/or better than the old ".debug_pubtypes" in terms of contents? AFAIK all debuggers completely ignore these sections as they don't contain all types (only public types). I did a quick web search for ".debug_gnu_pubtypes" and didn't find any docs or anything documenting this format.

My general understanding is that GCC/GDB folks implemented .debug_gnu_pubnames(/types) because of the lack of guarantees in debug_pubnames(/types) - GCC does consume/rely on these sections (& gold can produce .gdb_index from these gnu_pub* sections - which is /necessary/ for correctness of gdb when using Split DWARF, in fact).

I'm guessing like lots of stuff, it's probably basically defined by the agreement between GCC and GDB - and Clang/LLVM just tries to implement gnu_pub* to be as compatible as possible with that.

Was there any documentation on this where the contents of this new format are detailed? Sounds promising and I would love to have LLDB to not have to manually index DWARF on all linux/android targets if at all possible.

The GCC doc mentions that the "-ggnu-pubnames" option is only useful with a linker that can produce GDB@ index version 7.

@item -ggnu-pubnames
@opindex ggnu-pubnames
Generate @code{.debug_pubnames} and @code{.debug_pubtypes} sections in a format
suitable for conversion into a GDB@ index.  This option is only useful
with a linker that can produce GDB@ index version 7.

See: https://raw.githubusercontent.com/gcc-mirror/gcc/master/gcc/doc/invoke.texi (This file is large, searching key word "ggnu-pubnames" may help)

Is ".debug_gnu_pubtypes" different and/or better than the old ".debug_pubtypes" in terms of contents? AFAIK all debuggers completely ignore these sections as they don't contain all types (only public types). I did a quick web search for ".debug_gnu_pubtypes" and didn't find any docs or anything documenting this format.

My general understanding is that GCC/GDB folks implemented .debug_gnu_pubnames(/types) because of the lack of guarantees in debug_pubnames(/types) - GCC does consume/rely on these sections (& gold can produce .gdb_index from these gnu_pub* sections - which is /necessary/ for correctness of gdb when using Split DWARF, in fact).

I'm guessing like lots of stuff, it's probably basically defined by the agreement between GCC and GDB - and Clang/LLVM just tries to implement gnu_pub* to be as compatible as possible with that.

Was there any documentation on this where the contents of this new format are detailed?

Not that I know of.

Sounds promising and I would love to have LLDB to not have to manually index DWARF on all linux/android targets if at all possible.

Presumably LLDB should probably be looking at the newer .debug_names format that's been standardized/documented, and derived from the apple names accelerator tables.

Is ".debug_gnu_pubtypes" different and/or better than the old ".debug_pubtypes" in terms of contents? AFAIK all debuggers completely ignore these sections as they don't contain all types (only public types). I did a quick web search for ".debug_gnu_pubtypes" and didn't find any docs or anything documenting this format.

My general understanding is that GCC/GDB folks implemented .debug_gnu_pubnames(/types) because of the lack of guarantees in debug_pubnames(/types) - GCC does consume/rely on these sections (& gold can produce .gdb_index from these gnu_pub* sections - which is /necessary/ for correctness of gdb when using Split DWARF, in fact).

I'm guessing like lots of stuff, it's probably basically defined by the agreement between GCC and GDB - and Clang/LLVM just tries to implement gnu_pub* to be as compatible as possible with that.

Was there any documentation on this where the contents of this new format are detailed?

Not that I know of.

Sounds promising and I would love to have LLDB to not have to manually index DWARF on all linux/android targets if at all possible.

Presumably LLDB should probably be looking at the newer .debug_names format that's been standardized/documented, and derived from the apple names accelerator tables.

We have support already and we do look at that. Not many compilers enable DWARF5 at the moment, so we rarely see the .debug_names in any binaries, but LLDB is ready for these. It is hard to control and change the toolchains that people build with. My idea was to write a post processing tool that can add the .debug_names to binaries, like maybe llvm-objcopy or other tool.

Is ".debug_gnu_pubtypes" different and/or better than the old ".debug_pubtypes" in terms of contents? AFAIK all debuggers completely ignore these sections as they don't contain all types (only public types). I did a quick web search for ".debug_gnu_pubtypes" and didn't find any docs or anything documenting this format.

My general understanding is that GCC/GDB folks implemented .debug_gnu_pubnames(/types) because of the lack of guarantees in debug_pubnames(/types) - GCC does consume/rely on these sections (& gold can produce .gdb_index from these gnu_pub* sections - which is /necessary/ for correctness of gdb when using Split DWARF, in fact).

I'm guessing like lots of stuff, it's probably basically defined by the agreement between GCC and GDB - and Clang/LLVM just tries to implement gnu_pub* to be as compatible as possible with that.

Was there any documentation on this where the contents of this new format are detailed?

Not that I know of.

Sounds promising and I would love to have LLDB to not have to manually index DWARF on all linux/android targets if at all possible.

Presumably LLDB should probably be looking at the newer .debug_names format that's been standardized/documented, and derived from the apple names accelerator tables.

We have support already and we do look at that. Not many compilers enable DWARF5 at the moment, so we rarely see the .debug_names in any binaries, but LLDB is ready for these. It is hard to control and change the toolchains that people build with. My idea was to write a post processing tool that can add the .debug_names to binaries, like maybe llvm-objcopy or other tool.

Yep, a post-processing tool to add .debug_names could be nice. Given folks have to add extra flags to get gnu (or non-gnu) pubnames (-ggnu-pubnames in both GCC and Clang) with GCC and Clang - they could instead switch to DWARFv5 "just as easily" (not quite, but close) - though certainly the ecosystem issues if they need to support tools that can't consume DWARFv5, etc. (perhaps .debug_names could be supported in pre-v5 modes as a backwards/partial compatibility mode - rather than having to produce gnu_pubnames)

Though producing gnu_pubnames/plain pubnames and linking those into .debug_names in the linker wouldn't be a bad thing to support either (though maybe has the problem of "this doesn't have all the names it needs to have" which is the general problem with all this).