This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dwarfdump] Print type names in DW_AT_type DIEs
ClosedPublic

Authored by JDevlieghere on Aug 21 2017, 5:51 PM.

Details

Summary

This patch adds printing for DW_AT_type DIEs like it is already the case for DW_AT_specification DIEs.

0x00000063:     DW_TAG_variable [4]
                  DW_AT_location [DW_FORM_exprloc]      (<0x2> 91 50 )
                  DW_AT_name [DW_FORM_strp]     ( .debug_str[0x0000007b] = "ptr")
                  DW_AT_decl_file [DW_FORM_data1]       ("/private/tmp/verify_type_names.cpp")
                  DW_AT_decl_line [DW_FORM_data1]       (7)
                  DW_AT_type [DW_FORM_ref4]     (cu + 0x00c7 => {0x000000c7} "int*[]")

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.Aug 21 2017, 5:51 PM
probinson requested changes to this revision.Aug 22 2017, 10:09 AM
probinson added a subscriber: probinson.

We like to have the original source available for tests like this, ideally with commands to regenerate it. That practice helps people to understand the test and makes it easier to accommodate any syntactic changes that might come along (although that's more the case with IR tests than assembler tests).

test/tools/llvm-dwarfdump/X86/verify_type_names.s
6 ↗(On Diff #112096)

The DW_AT_type is on the member, not the class type.

10 ↗(On Diff #112096)

Seems like it would be simpler to have one subprogram with a return type and a formal parameter, rather than two subprograms.
I'd also like to see what happens with a type more exciting than "int".

This revision now requires changes to proceed.Aug 22 2017, 10:09 AM
JDevlieghere edited edge metadata.

Thank you for the feedback Paul! I have simplified the test and included the source code.

This revision is now accepted and ready to land.Aug 22 2017, 2:03 PM
JDevlieghere edited the summary of this revision. (Show Details)
JDevlieghere added a reviewer: friss.
  • Use a more intelligent approach to dumping type names
  • Update tests
friss edited edge metadata.Aug 24 2017, 9:06 AM

It looks like you didn't add any actual testing of what you are dumping. It doesnt' need to be exhaustive, but there needs to be some.

aprantl added inline comments.Aug 24 2017, 9:26 AM
lib/DebugInfo/DWARF/DWARFDie.cpp
88 ↗(On Diff #112480)

remove extra braces

98 ↗(On Diff #112480)

In other places we condense this to:

if (const char *Name = D.getName(DINameKind::LinkageName))
  return OS << Name;

This is mostly up to personal taste though.

114 ↗(On Diff #112480)

. at the end. LLVM prefers full sentences in comments.

195 ↗(On Diff #112480)

maybe OS <<'"'; is more legible?

probinson added inline comments.Aug 24 2017, 11:18 AM
lib/DebugInfo/DWARF/DWARFDie.cpp
84 ↗(On Diff #112480)

This function name isn't descriptive enough; I had to read the code to understand what it was doing. Maybe dumpTypeTagName or something like that.

130 ↗(On Diff #112480)

Missing rvalue_reference_type.

JDevlieghere marked 8 inline comments as done.

Thanks for the feedback everyone, it is very much appreciated!

I forgot to add my test case again after adding the type dumper. I've included a more elaborate example that covers references, pointers and arrays.

I'm happy, but Adrian should have a chance to comment too.

aprantl edited edge metadata.Aug 26 2017, 10:32 AM

Looks fine to me, too. Thanks!

Where's the 'const' being printed from (couldn't spot it from a quick glance at the code) for the test cases like "const Struct"?

test/DebugInfo/Generic/tu-composite.ll
35 ↗(On Diff #112611)

probably easier/fine to just drop the ')' from all these cases, the } and other features are probably an adequate match rather than adding the "{{.*}}") at the end

Where's the 'const' being printed from (couldn't spot it from a quick glance at the code) for the test cases like "const Struct"?

The function dumpTypeTagName extracts the 'type name' from a tag. For the example the function dumps DW_TAG_const_type as const.

  • Feedback David
JDevlieghere marked an inline comment as done.Aug 29 2017, 1:59 AM

Where's the 'const' being printed from (couldn't spot it from a quick glance at the code) for the test cases like "const Struct"?

The function dumpTypeTagName extracts the 'type name' from a tag. For the example the function dumps DW_TAG_const_type as const.

That seems a bit subtle - does that codepath trigger/do things for other tag types, or just const? Especially if it's just const, I'd probably handle const where reference, array, pointer, etc, are handled. Seems clearer.

Where's the 'const' being printed from (couldn't spot it from a quick glance at the code) for the test cases like "const Struct"?

The function dumpTypeTagName extracts the 'type name' from a tag. For the example the function dumps DW_TAG_const_type as const.

That seems a bit subtle - does that codepath trigger/do things for other tag types, or just const? Especially if it's just const, I'd probably handle const where reference, array, pointer, etc, are handled. Seems clearer.

It works for all tag types, but some (such as pointers and references) are handled explicitly in the switch because we don't want to print their "type name" but rather a * or &. I totally agree that it's a little subtle. However originally every tag had a corresponding switch case but this was really verbose and not very maintainable. Personally I feel it's worth the trade-off, but I'd definitely be open to a different alternative!

David: Do you want to discuss this further or is this okay to land as is?

David, apologies for missing your e-mail. I really hate that it doesn't automatically show up in Phabricator! 🙁

If the tag doesn't have a name attribute, everything will go through this function except: DW_TAG_pointer_type, DW_TAG_ptr_to_member_type, DW_TAG_reference_type, DW_TAG_rvalue_reference_type. The first part explains why class and struct don't show up. I prefer this approach because it's guaranteed to be robust. Every DW_TAG_*_type encountered without a name will have something meaningful printed.

IIRC, the original switch had between 20 and 25 cases.

David, apologies for missing your e-mail. I really hate that it doesn't automatically show up in Phabricator! 🙁

If the tag doesn't have a name attribute, everything will go through this function except: DW_TAG_pointer_type, DW_TAG_ptr_to_member_type, DW_TAG_reference_type, DW_TAG_rvalue_reference_type. The first part explains why class and struct don't show up. I prefer this approach because it's guaranteed to be robust. Every DW_TAG_*_type encountered without a name will have something meaningful printed.

IIRC, the original switch had between 20 and 25 cases.

I'm curious what those 20-25 cases were - do you have a copy/roughly describe their contents? Because while 'const' does print nicely, (& volatile would be similar) I'm not sure what the other 10 or so cases might be and whether that's a reasonable way to print them.

David, apologies for missing your e-mail. I really hate that it doesn't automatically show up in Phabricator! 🙁

If the tag doesn't have a name attribute, everything will go through this function except: DW_TAG_pointer_type, DW_TAG_ptr_to_member_type, DW_TAG_reference_type, DW_TAG_rvalue_reference_type. The first part explains why class and struct don't show up. I prefer this approach because it's guaranteed to be robust. Every DW_TAG_*_type encountered without a name will have something meaningful printed.

IIRC, the original switch had between 20 and 25 cases.

I'm curious what those 20-25 cases were - do you have a copy/roughly describe their contents? Because while 'const' does print nicely, (& volatile would be similar) I'm not sure what the other 10 or so cases might be and whether that's a reasonable way to print them.

Here's the list of cases I had originally:

case DW_TAG_array_type:
case DW_TAG_base_type:
case DW_TAG_class_type:
case DW_TAG_const_type:
case DW_TAG_enumeration_type:
case DW_TAG_file_type:
case DW_TAG_interface_type:
case DW_TAG_packed_type:
case DW_TAG_pointer_type:
case DW_TAG_ptr_to_member_type:
case DW_TAG_reference_type:
case DW_TAG_restrict_type:
case DW_TAG_set_type:
case DW_TAG_shared_type:
case DW_TAG_string_type
case DW_TAG_structure_type:
case DW_TAG_subrange_type:
case DW_TAG_subroutine_type:
case DW_TAG_thrown_type:
case DW_TAG_union_type:
case DW_TAG_unspecified_type:
case DW_TAG_volatile_type:

David, apologies for missing your e-mail. I really hate that it doesn't automatically show up in Phabricator! 🙁

If the tag doesn't have a name attribute, everything will go through this function except: DW_TAG_pointer_type, DW_TAG_ptr_to_member_type, DW_TAG_reference_type, DW_TAG_rvalue_reference_type. The first part explains why class and struct don't show up. I prefer this approach because it's guaranteed to be robust. Every DW_TAG_*_type encountered without a name will have something meaningful printed.

IIRC, the original switch had between 20 and 25 cases.

I'm curious what those 20-25 cases were - do you have a copy/roughly describe their contents? Because while 'const' does print nicely, (& volatile would be similar) I'm not sure what the other 10 or so cases might be and whether that's a reasonable way to print them.

Here's the list of cases I had originally:

case DW_TAG_array_type:
case DW_TAG_base_type:
case DW_TAG_class_type:
case DW_TAG_const_type:
case DW_TAG_enumeration_type:
case DW_TAG_file_type:
case DW_TAG_interface_type:
case DW_TAG_packed_type:
case DW_TAG_pointer_type:
case DW_TAG_ptr_to_member_type:
case DW_TAG_reference_type:
case DW_TAG_restrict_type:
case DW_TAG_set_type:
case DW_TAG_shared_type:
case DW_TAG_string_type
case DW_TAG_structure_type:
case DW_TAG_subrange_type:
case DW_TAG_subroutine_type:
case DW_TAG_thrown_type:
case DW_TAG_union_type:
case DW_TAG_unspecified_type:
case DW_TAG_volatile_type:

Ah, thanks!

I feel like maybe this should be examined more closely (an example of how each of these would be printed would be ideal, though that might be a bit much) - for example I don't think it makes sense to print out subroutine types like "int subroutine" (rather than "int(float, double)", say) which I /think/ is how they might look based on the current code)

lib/DebugInfo/DWARF/DWARFDie.cpp
91–95 ↗(On Diff #113040)

This might be the wrong approach for, for example, DW_AT_thrown_type - that's an attribute that refers to a type, but isn't itself a DW_AT_type (it should probably still print the type name)

Should this logic maybe work based on the type of the DIE being referenced, rather than the attribute type used to reference it? (or maybe some of both? Not sure - maybe there's only a couple of tags that can refer to types so a short list might suffice, I'm not sure)

David, apologies for missing your e-mail. I really hate that it doesn't automatically show up in Phabricator! 🙁

If the tag doesn't have a name attribute, everything will go through this function except: DW_TAG_pointer_type, DW_TAG_ptr_to_member_type, DW_TAG_reference_type, DW_TAG_rvalue_reference_type. The first part explains why class and struct don't show up. I prefer this approach because it's guaranteed to be robust. Every DW_TAG_*_type encountered without a name will have something meaningful printed.

IIRC, the original switch had between 20 and 25 cases.

I'm curious what those 20-25 cases were - do you have a copy/roughly describe their contents? Because while 'const' does print nicely, (& volatile would be similar) I'm not sure what the other 10 or so cases might be and whether that's a reasonable way to print them.

Here's the list of cases I had originally:

case DW_TAG_array_type:
case DW_TAG_base_type:
case DW_TAG_class_type:
case DW_TAG_const_type:
case DW_TAG_enumeration_type:
case DW_TAG_file_type:
case DW_TAG_interface_type:
case DW_TAG_packed_type:
case DW_TAG_pointer_type:
case DW_TAG_ptr_to_member_type:
case DW_TAG_reference_type:
case DW_TAG_restrict_type:
case DW_TAG_set_type:
case DW_TAG_shared_type:
case DW_TAG_string_type
case DW_TAG_structure_type:
case DW_TAG_subrange_type:
case DW_TAG_subroutine_type:
case DW_TAG_thrown_type:
case DW_TAG_union_type:
case DW_TAG_unspecified_type:
case DW_TAG_volatile_type:

Ah, thanks!

I feel like maybe this should be examined more closely (an example of how each of these would be printed would be ideal, though that might be a bit much) - for example I don't think it makes sense to print out subroutine types like "int subroutine" (rather than "int(float, double)", say) which I /think/ is how they might look based on the current code)

Generally agreed, but I think it might make sense to improve this in separate follow-on patches on a case-by-case basis. Getting the pretty-printing entirely right would mean that we would have to implement different pretty-printers for each DW_LANG_foo, since e.g., a C function type would have to be rendered very differently from the same DWARF type-representation in an. e.g., Swift or Fortran context. And even if we choose to always render types as C types it is unclear what to do with types such as DW_TAG_set_type.

I played around with how we could generate testcases for this.

  • I'm absolutely certain that yaml2obj (the standalone tool) is too low-level to do this. Having to manually update the object file header (or Mach-O load commands) every time a section changes in size is not doable unless you are starting from an existing object file and only modifying a couple bytes to inject, e.g., an error.
  • Using just the yaml2obj DWARF emitter like in the DWARFDebugInfoTest.cpp unittest is slightly better, but you still have to manually update the size of the CU header and compute all offsets by hand. Plus IMHO the use of yaml in a string constant is kind of weird., but I can get used to it.
  • Using assembler allows using labels, which gets rid off all the error-prone offset calculation, but on the other hand is much more verbose and low-level than some aspects of yaml2obj.
  • Using LLVM IR is much more compact than all of the above, but it has the drawback that we can only represent DWARF that LLVM can generate,
  • Source code is the most compact representation, but difficult to integrate into the LLVM testsuite because of the frontend dependency. It also restricts what DWARF we can generate even more.

What I would want is a format where I can specify DWARF at the same level of abstraction that dwarfdump is presenting. The easiest way to achieve this is probably to generate the DWARF programmatically in a unit test using the same API that yaml2obj is using internally. This way I can at least write code to calculate all the offsets. It would be kind of cute to write a tool that takes dwarfdump-like output as input and emits a yaml2obj-like yaml file, but I feel that that would be overkill since it isn't going to be very useful outside of this patch.

I will think some more about this.

  • Rebase on master
  • Add FIXME regarding pretty printers

It seems like this got stuck while thinking about how we could test the individual types efficiently.

Does anyone object to landing the patch in the current state as a first general step? The goal as I see it would then be to refine our approach with separate patches that have different pretty printers per language, which can be thoroughly tested for the types that actually make sense for that particular language.

dblaikie accepted this revision.Oct 9 2017, 11:30 AM

I still feel like the implementation that handles 'const' by truncating the name of the DWARF TAG itself isn't sufficiently general to be a great way to go, but it's not exactly destructive to other things, etc, so carry on.

This revision was automatically updated to reflect the committed changes.

FWIW, I ended up adding test coverage for some of this in r348954 while improving the printing of arrays to include array dimensions - hand written assembly, so yeah, it's a bit less than ideal to maintain (mostly having to edit both the abbreviation and the definition - a solution to that would be handy (& having to lookup the numeric values of the enums (DW_AT/DW_TAGs))) but I think it's worth having.

FWIW, I ended up adding test coverage for some of this in r348954 while improving the printing of arrays to include array dimensions - hand written assembly, so yeah, it's a bit less than ideal to maintain (mostly having to edit both the abbreviation and the definition - a solution to that would be handy (& having to lookup the numeric values of the enums (DW_AT/DW_TAGs))) but I think it's worth having.

I can imagine some scheme involving a tool that generates assembler-source constants by pulling in Dwarf.def, but it's harder to imagine getting that hooked into the test infrastructure somehow.

FWIW, I ended up adding test coverage for some of this in r348954 while improving the printing of arrays to include array dimensions - hand written assembly, so yeah, it's a bit less than ideal to maintain (mostly having to edit both the abbreviation and the definition - a solution to that would be handy (& having to lookup the numeric values of the enums (DW_AT/DW_TAGs))) but I think it's worth having.

I can imagine some scheme involving a tool that generates assembler-source constants by pulling in Dwarf.def, but it's harder to imagine getting that hooked into the test infrastructure somehow.

Something like yaml2obj probably already uses LLVM's object libraries, so having access to the DWARF constants there wouldn't be too hard - I think they're down at a pretty low level near there.