This is an archive of the discontinued LLVM Phabricator instance.

Add DWARF for discriminated unions
ClosedPublic

Authored by tromey on Jan 15 2018, 11:00 AM.

Details

Summary

In Rust, an enum that carries data in the variants is, essentially, a
discriminated union. Furthermore, the Rust compiler will perform
space optimizations on such enums in some situations. Previously,
DWARF for these constructs was emitted using a hack (a magic field
name); but this approach stopped working when more space optimizations
were added in https://github.com/rust-lang/rust/pull/45225.

This patch changes LLVM to allow discriminated unions to be
represented in DWARF. It adds createDiscriminatedUnionType and
createDiscriminatedMemberType to DIBuilder and then arranges for this
to be emitted using DWARF's DW_TAG_variant_part and DW_TAG_variant.

Note that DWARF requires that a discriminated union be represented as
a structure with a variant part. However, as Rust only needs to emit
pure discriminated unions, this is what I chose to expose on
DIBuilder.

Diff Detail

Repository
rL LLVM

Event Timeline

tromey created this revision.Jan 15 2018, 11:00 AM
dblaikie added a project: debug-info.
dblaikie added a subscriber: probinson.

This generally something we should support in LLVM, thanks!
I didn't have time to give this a thorough read-through yet, but I have some quick comments: One thing I noticed is that the patch is missing an update tp the metadata unittests. The testcase could also be more verbose and check for all the attributes expected in the dwarf output for the variant type.

You'll also need tests for the LLVM IR assembler, and probably IR Verifier checks.

So, to summarize the changes for myself/others, focusing on the changes to the metadata syntax/semantics (rather than the DIBuilder API changes - though they're equivalent):

  • Adds another DIDerivedType field to DICompositeType, null for existing cases, but otherwise holds a reference to a member that is the discriminator
  • Uses the 'extra data' in a DIDerivedType member (existing uses include for the constant value in a static member variable, that doesn't overlap with this usage that only applies to non-static members) to hold the constant discriminant value (what value of the discriminator denotes this as the active member)

Any DICompositeType with a non-null discriminator gets emitted with all its member variables wrapped in a DW_TAG_variant & each member other than the discriminator gets a discriminant value.

Seems OK - as noted in the review description, this is not as general as DWARF supports (it allows arbitrary groupings of members into discriminated unions - not just the "all or nothing" provided here), but I'm not sure that matters too much (open to hearing other opinions if folks feel like this is too special-case and would be a pain to generalize later if needed, etc).

Mostly curious to hear design feedback/discussion first, then can dig into the specifics (which shouldn't take long - it's mostly the usual mechanical stuff)

DWARF discriminated unions were likely designed to support Pascal variant records, which do allow multiple fields per variant. If we're going to support discriminated unions (which seems reasonable) then it ought to be general enough to handle the Pascal case.

DWARF discriminated unions were likely designed to support Pascal variant records, which do allow multiple fields per variant. If we're going to support discriminated unions (which seems reasonable) then it ought to be general enough to handle the Pascal case.

The reason I did not do this is that I didn't need it for Rust, and it seemed to me that it would be better for someone truly needing this functionality to add it -- I felt I would not do a good job guessing what might be needed in the future.

As I poke around the web looking for how a Rust enum-with-value works, it appears that Rust has what in Pascal would be a tagged variant record with no invariant part and no default variant. It's the web, of course, so it might be lying to me, but that's what I see, and it's about 90% of what you need for Pascal. The DWARF spec does not provide an example of a variant record, but I would expect to see something along these lines in the final DWARF:

DW_TAG_struct_type // type entry for the enum-with-fields
  DW_TAG_variant_part // because DWARF says variant_part is owned by the struct_type
      DW_AT_discr // reference to the discriminator field
    DW_TAG_member // this is the discriminator field, where the enum value is stored
    DW_TAG_variant // wrapper for the first variant
        DW_AT_discr_value // first discriminant value
      DW_TAG_member ... // member(s) for the first variant
    DW_TAG_variant // wrapper for the second variant
        DW_AT_discr_value // second discriminant value 
      DW_TAG_member ... // member(s) for the second variant
    // etc

Is that what you are trying to produce?

As I poke around the web looking for how a Rust enum-with-value works, it appears that Rust has what in Pascal would be a tagged variant record with no invariant part and no default variant.

The Rust compiler can introduce a default variant by optimizing out other variants. The test case in this patch is a reduced version of output from a case where that happened.

Recent versions of the compiler can do this in more situations, but the "non-zero" optimization goes back quite a ways. Code looks like:

enum NonZero {
    None,
    Some(String),
}

Because the string cannot be zero, the two variants share space and zero is used as the discriminant for None.

DW_TAG_struct_type // type entry for the enum-with-fields
  DW_TAG_variant_part // because DWARF says variant_part is owned by the struct_type
      DW_AT_discr // reference to the discriminator field
    DW_TAG_member // this is the discriminator field, where the enum value is stored
    DW_TAG_variant // wrapper for the first variant
        DW_AT_discr_value // first discriminant value
      DW_TAG_member ... // member(s) for the first variant
    DW_TAG_variant // wrapper for the second variant
        DW_AT_discr_value // second discriminant value 
      DW_TAG_member ... // member(s) for the second variant
    // etc

Is that what you are trying to produce?

Almost; in DWARF the discriminant has to be a member of the outer structure. So the output ends up a little funny. Here's one from a test case:

<3><444>: Abbrev Number: 6 (DW_TAG_structure_type)
   <445>   DW_AT_name        : (indirect string, offset: 0xa7f7f): Option<&u8>
   <449>   DW_AT_byte_size   : 8
   <44a>   DW_AT_alignment   : 8
<4><44b>: Abbrev Number: 9 (DW_TAG_member)
   <44c>   DW_AT_type        : <0x509>
   <450>   DW_AT_alignment   : 8
   <451>   DW_AT_data_member_location: 0
   <452>   DW_AT_artificial  : 1
<4><452>: Abbrev Number: 10 (DW_TAG_variant_part)
   <453>   DW_AT_discr       : <0x44b>
<5><457>: Abbrev Number: 11 (DW_TAG_variant)
   <458>   DW_AT_discr_value : 0
<6><459>: Abbrev Number: 12 (DW_TAG_member)
   <45a>   DW_AT_type        : <0x46b>
   <45e>   DW_AT_alignment   : 8
   <45f>   DW_AT_data_member_location: 0
<6><460>: Abbrev Number: 0
<5><461>: Abbrev Number: 13 (DW_TAG_variant)
<6><462>: Abbrev Number: 12 (DW_TAG_member)
   <463>   DW_AT_type        : <0x472>
   <467>   DW_AT_alignment   : 8
   <468>   DW_AT_data_member_location: 0

Here the discriminant member is DIE 0x44b; the rest is as you wrote.

DWARF allows some other constructs, like extra members in the outer struct, and lists of discriminant values. However, Rust doesn't need this, so I did not implement them. This is also why the interface talks about "discriminated unions" and not variants -- it is more limited.

Another approach I've considered is to just not use the DWARF variant record support at all and instead introduce a new DW_TAG_Rust_enum. This would be simpler in some ways, but I figured I'd first try to use what DWARF provides.

The DWARF spec says: "If the variant part has a discriminant, the discriminant is represented by a separate debugging information entry which is a child of the variant part entry. This entry has the form of a structure data member entry. The variant part entry will have a DW_AT_descr attribute whose value is a reference to the member entry for the discriminant."

I read this as saying the discriminant goes under the variant_part; it seems slightly redundant to have it both as a child and with a reference, but that's what it says. If you want you can bring it up on dwarf-discuss and see if anyone else thinks the discriminant ought to go outside the variant_part, but as it stands I think that would not be syntactically valid.

The DWARF spec says: "If the variant part has a discriminant, the discriminant is represented by a separate debugging information entry which is a child of the variant part entry. This entry has the form of a structure data member entry. The variant part entry will have a DW_AT_descr attribute whose value is a reference to the member entry for the discriminant."

I read this as saying the discriminant goes under the variant_part; it seems slightly redundant to have it both as a child and with a reference, but that's what it says. If you want you can bring it up on dwarf-discuss and see if anyone else thinks the discriminant ought to go outside the variant_part, but as it stands I think that would not be syntactically valid.

Yeah, I guess I misread that. FWIW GCC generates this style as well. From an Ada test case:

<1><7c0>: Abbrev Number: 4 (DW_TAG_structure_type)
   <7c1>   DW_AT_name        : (indirect string, offset: 0xb26): pck__rec
   <7c5>   DW_AT_byte_size   : 13 byte block: 97 94 1 99 95 0 0 0 23 7 9 fc 1a 
   <7d3>   DW_AT_decl_file   : 2
   <7d4>   DW_AT_decl_line   : 15
   <7d5>   DW_AT_sibling     : <0x817>
<2><7d9>: Abbrev Number: 5 (DW_TAG_member)
   <7da>   DW_AT_name        : (indirect string, offset: 0xaf8): discr
   <7de>   DW_AT_decl_file   : 2
   <7df>   DW_AT_decl_line   : 15
   <7e0>   DW_AT_type        : <0x7af>
   <7e4>   DW_AT_data_member_location: 0
<2><7e5>: Abbrev Number: 6 (DW_TAG_variant_part)
   <7e6>   DW_AT_discr       : <0x7d9>

I think this approach actually makes more sense, because surely the discriminant is something whose placement cannot vary.

I'm thinking now DW_TAG_Rust_enum would be a better fit for what I need. This approach would let me handle univariant enums consistently as well.

I think this approach actually makes more sense, because surely the discriminant is something whose placement cannot vary.

But it doesn't. It's a child of TAG_variant_part, but not under any individual TAG_variant.
That said, I'm happy to support a proposal to move the discriminant's member description out from under TAG_variant_part. Especially given GCC's practice, which probably is the same for Pascal as it is for Ada. Semantically it doesn't make any real difference where that member goes.

I'm thinking now DW_TAG_Rust_enum would be a better fit for what I need. This approach would let me handle univariant enums consistently as well.

I'd argue against a new tag, because it raises the barrier to debugger support for Rust. I can see wanting to emit all Rust enums consistently, whether they have variant data or not.

How about this: A Rust enum is implicitly a struct, which has one member in the "univariant" case, and a variant part if it does have additional data. I'll accept putting the member outside the TAG_variant_part; it's pedantically incorrect but there's a good case for doing it the other way.

That said, I'm happy to support a proposal to move the discriminant's member description out from under TAG_variant_part. Especially given GCC's practice, which probably is the same for Pascal as it is for Ada.

The support was added for Ada, but isn't used by default, because gdb doesn't understand DW_TAG_variant_part (I'm doing that work too...). So I doubt that GNU Pascal uses this. Not that it matters.

I'd argue against a new tag, because it raises the barrier to debugger support for Rust.

In practice the barrier is the same. Neither gdb nor lldb read variant parts, either. I'm writing the gdb support as part of my Rust work there, and elsewhere I'm also working on an lldb plugin to support Rust.

Maybe there are proprietary debuggers or something, but ...

I can see wanting to emit all Rust enums consistently, whether they have variant data or not.

Yeah. In Rust, a univariant enum won't have a discriminant at all. It is more or less just a struct (you can't access the members in the same way, but this hardly matters in a debugger). I think it might be nice to be able to distinguish univariant enums from structs in DWARF, but I couldn't really make sense of the language in DWARF:

If the variant part does not have a discriminant (tag field), the variant part entry has a DW_AT_type attribute to represent the tag type.

Does this make sense to you? What I particularly didn't understand is what the tag type might mean if there isn't a tag field. What would it be the type of?

How about this: A Rust enum is implicitly a struct, which has one member in the "univariant" case, and a variant part if it does have additional data.

I'd like to be able to distinguish a Rust enum from an ordinary struct. Previously I was punting on this but this discussion has made me reconsider. Perhaps I could have it emit a variant part without a discriminant. What do you think?

I'll accept putting the member outside the TAG_variant_part; it's pedantically incorrect but there's a good case for doing it the other way.

Ok; but I'll also look to see if it's easy to move the discriminant into the variant with my current patch. If so, why not, I guess. I can contact the ACT folks to see what they think of changing GCC.

I'd argue against a new tag, because it raises the barrier to debugger support for Rust.

In practice the barrier is the same. Neither gdb nor lldb read variant parts, either. I'm writing the gdb support as part of my Rust work there, and elsewhere I'm also working on an lldb plugin to support Rust.

Maybe there are proprietary debuggers or something, but ...

In fact Sony has its own debugger for PS4, the tech lead is even more pedantic about DWARF than I am (hard to imagine, but true), and we have had interest from licensees about supporting Rust for PS4. So if we can do this with existing standard tags, I'm all for it.

I can see wanting to emit all Rust enums consistently, whether they have variant data or not.

Yeah. In Rust, a univariant enum won't have a discriminant at all. It is more or less just a struct (you can't access the members in the same way, but this hardly matters in a debugger). I think it might be nice to be able to distinguish univariant enums from structs in DWARF, but I couldn't really make sense of the language in DWARF:

If the variant part does not have a discriminant (tag field), the variant part entry has a DW_AT_type attribute to represent the tag type.

Does this make sense to you? What I particularly didn't understand is what the tag type might mean if there isn't a tag field. What would it be the type of?

This is for Pascal, which doesn't require an explicit discriminant. It's how Pascal does what C calls a union. The type is generally boolean or integer, and the cases are just enumerated in the source. In this situation each TAG_variant still has an AT_discr_value but there's no field to go with it. Complaints to Professor Wirth.

How about this: A Rust enum is implicitly a struct, which has one member in the "univariant" case, and a variant part if it does have additional data.

I'd like to be able to distinguish a Rust enum from an ordinary struct. Previously I was punting on this but this discussion has made me reconsider. Perhaps I could have it emit a variant part without a discriminant. What do you think?

Well, you'll want an actual enumeration_type for the type of the struct's field, and if enums don't otherwise occur naturally in Rust, that seems like a pretty good indication all by itself. Adding a TAG_variant_part that had no children could also work, although it seems a little odd and might trip up an unwary non-Rust-aware debugger.

I'll accept putting the member outside the TAG_variant_part; it's pedantically incorrect but there's a good case for doing it the other way.

Ok; but I'll also look to see if it's easy to move the discriminant into the variant with my current patch. If so, why not, I guess. I can contact the ACT folks to see what they think of changing GCC.

Thanks!

Well, you'll want an actual enumeration_type for the type of the struct's field, and if enums don't otherwise occur naturally in Rust, that seems like a pretty good indication all by itself.

A data-less Rust enum is represented in a more C-like way. That is, something like:

enum JustSomeValues { A, B, C }

This turns into DW_TAG_enumeration_type.

I don't plan to have rustc emit an enumeration type for the discriminant. This is done presently, but IME is more work for debuggers -- it has to be mostly ignored because it doesn't correspond to any actual language feature -- and with the variant approach it is also unneeded. So, this is removed in my patches.

Adding a TAG_variant_part that had no children could also work, although it seems a little odd and might trip up an unwary non-Rust-aware debugger.

I was thinking perhaps a variant part with a single child but no discriminant. The case in question here is something like

enum Univariant { TheSoleVariant(u8) }

If we're concerned about non-Rust-aware debuggers, or the wrath of the Sony person, then a new tag would be a better choice.

Another thing to consider is that this is really just the first patch in a larger project. There are other things in Rust that will require new debuginfo. Some of these can be mapped to existing DWARF (e.g., traits can probably use DW_TAG_interface pretty comfortably); but some cannot (e.g., the distinction between tuples and structs).

I filed a GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83935, and contacted a friend at AdaCore to see what they think.

Another thing to consider is that this is really just the first patch in a larger project. There are other things in Rust that will require new debuginfo. Some of these can be mapped to existing DWARF (e.g., traits can probably use DW_TAG_interface pretty comfortably); but some cannot (e.g., the distinction between tuples and structs).

Any new DWARF tags/attributes for Rust should be distinct enough from existing features to be proposed as new features for DWARF 6. Enums-with-data really do look enough like variant records that I think a new Rust_enum would be hard to justify. I don't know Rust at all (other than recent googling about enums so I can heckle you in this review) so we'll take each one as it comes. Just because one Rust feature will require a new tag doesn't mean all Rust features should get new tags.

I filed a GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83935, and contacted a friend at AdaCore to see what they think.

Awesome, thanks! Hopefully Cary Coutant will see it, I think his opinion would be worth something.

Having the DIE shape conform to DWARF seems easy in this patch -- just a reordering in DwarfUnit::constructDiscriminatorDIE. However I don't see how to nicely support univariant Rust enums; the variant part is triggered by having a discriminator, but these objects don't have enumerators.

Maybe one idea would be to have DIBuilder::createDiscriminatedUnionType use DW_TAG_variant_part as its tag, then fix things up when emitting the DIEs in DwarfUnit.cpp. Is that too gross?

I was thinking perhaps a variant part with a single child but no discriminant. The case in question here is something like

enum Univariant { TheSoleVariant(u8) }

So, a struct_type with a variant_part with only one variant? That seems fine.

If we're concerned about non-Rust-aware debuggers, or the wrath of the Sony person, then a new tag would be a better choice.

If you can construct syntactically valid DWARF with existing tags and it describes the Rust construct sensibly, there's no need for a new tag. I thought you meant having a variant_part with *no* variants, which would be weird enough to be a potential problem. Sorry about the misunderstanding!

FWIW, I had forgotten about this, but when I was first writing this patch I did find a dwarf-discuss thread about the discriminant placement issue: http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2006-August/001710.html

FWIW, I had forgotten about this, but when I was first writing this patch I did find a dwarf-discuss thread about the discriminant placement issue: http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2006-August/001710.html

Huh. Well, if *nobody* is following the spec, it becomes expedient to fix the spec. :-) Functionally I don't think it makes any real difference where the discriminant's DIE is, because the variant_part has a reference to it anyway.

tromey updated this revision to Diff 131314.Jan 24 2018, 11:02 AM

This is a new version of the discriminated union patch.

It addresses most of the comments previously made, though I still have
not added support for multiple members in a variant, because Rust does
not need this. However, I think that with the more flexible approach
taken in this patch, it should not be too hard for someone else to add
this functionality when needed.

dblaikie added inline comments.Jan 25 2018, 2:50 PM
test/DebugInfo/Generic/discriminated-union.ll
9–28 ↗(On Diff #131314)

This test should probably be more precise. Checking that DW_AT_discr contains the right offset to the discriminator member. Check that the members have the right values (at least a handful - if they're generally all the same - at least checking that the discr_values are valid/correct)

& might help to indent the tags and attributes in the CHECK lines (FileCheck ignores leading/trailing whitespace in CHECK lines anyway) to make it easier to see the structure here.

test/DebugInfo/Generic/univariant-discriminated-union.ll
9–15 ↗(On Diff #131314)

Similar test quality improvements here might be nice.

I still have
not added support for multiple members in a variant, because Rust does
not need this.

I found this example, which sure looks like multiple members in a variant, here:
https://stackoverflow.com/questions/29248665/why-does-rust-not-have-unions

enum Foo {
    Bar(i32),
    Baz,
    Quux {
        misc: A,
        ellaneous: B,
        fields: C,
    },
}

I found this example, which sure looks like multiple members in a variant, here:
https://stackoverflow.com/questions/29248665/why-does-rust-not-have-unions

In cases like this, a separate type is emitted for the contents of each variant.
So from the DWARF point of view there is just a single variant in the variant part, but this variant may be a structure.

I found this example, which sure looks like multiple members in a variant, here:
https://stackoverflow.com/questions/29248665/why-does-rust-not-have-unions

In cases like this, a separate type is emitted for the contents of each variant.
So from the DWARF point of view there is just a single variant in the variant part, but this variant may be a structure.

Ok.

I am not super comfortable with the metadata APIs but a couple of things struck me, see inline comments.

lib/Bitcode/Reader/MetadataLoader.cpp
1249 ↗(On Diff #131314)

You want || not && here?

lib/IR/Verifier.cpp
971 ↗(On Diff #131314)
if (auto *D = N.getRawDiscriminator()) {
  AssertDI(isa<DIDerivedType>(D) && ...
tromey updated this revision to Diff 131624.Jan 26 2018, 10:51 AM

Updates according to review. This then pointed out a buglet in
dwarfdump, now fixed.

I'm okay with this, but as I said I am not so familiar with the metadata API and I think someone else should give the final approval.
And thanks for taking out the mis-handling of DW_AT_discr_value in the dumper!

tromey marked 4 inline comments as done.Jan 29 2018, 6:48 AM

As far as the metadata changes are concerned this looks mostly good, what's missing is a

  • round-trip test of the assembler (similar to test/Assembler/dicompositetype-members.ll )\
  • update DICompositeTypeTest in the unittests to test the hashing of the new field.

As far as the metadata changes are concerned this looks mostly good, what's missing is a

  • round-trip test of the assembler (similar to test/Assembler/dicompositetype-members.ll )\

Oops, I thought I'd done this, but I guess I forgot to actually do it.

As far as the metadata changes are concerned this looks mostly good, what's missing is a

  • round-trip test of the assembler (similar to test/Assembler/dicompositetype-members.ll )\

I looked at the patch again, and I thought that the changes to test/Assembler/debug-info.ll were sufficient here.
If that's not the case, can you tell me what I'd have to change?

tromey updated this revision to Diff 132168.Jan 31 2018, 6:38 AM

Rebased & added hashing test.

I looked at the patch again, and I thought that the changes to test/Assembler/debug-info.ll were sufficient here.

If that's not the case, can you tell me what I'd have to change?

They are sufficent. I must have overlooked it.

aprantl accepted this revision.Jan 31 2018, 9:04 AM

The only thing I am left wondering is if we should make DICompositeType's in-memory representation have a variable number of operands, so we don't have to pay for the extra pointer, which could pay off when compiling C++. That said, this is an implementation detail that we can fix in a subsequent patch without changing the interface so I'm fine with taking this as is.

One last thing: Can you please make sure that there is a binary bitcode upgrade test in test/Bitcode that exercises the upgrade from the older shorter version of DICompositeType in MetadataLoder.cpp and if not, add one? Thanks!

This revision is now accepted and ready to land.Jan 31 2018, 9:04 AM

One last thing: Can you please make sure that there is a binary bitcode upgrade test in test/Bitcode that exercises the upgrade from the older shorter version of DICompositeType in MetadataLoder.cpp and if not, add one? Thanks!

There is - an earlier version of the patch just checked for the new length, and testing showed failures, so those tests (unchanged by this patch) are why the code accepts the shorter version as well.

Works for me, thanks!

Could someone land this for me? I don't have access to do that. Thanks!

This revision was automatically updated to reflect the committed changes.
labath added a subscriber: labath.Feb 7 2018, 3:22 AM
labath added inline comments.
llvm/trunk/lib/BinaryFormat/Dwarf.cpp
396–405

This left a dangling declaration in the header file. I've removed it in r324474.