Page MenuHomePhabricator

[DWARFYAML] Add support for referencing different abbrev tables.
ClosedPublic

Authored by Higuoxing on Jul 3 2020, 12:25 AM.

Details

Summary

This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.

The syntax is:

debug_abbrev:
  - ID: 0
    Table:
      ...
  - ID: 1
    Table:
      ...
debug_info:
  - ...
    AbbrevTableID: 1 ## Reference the second abbrev table.
  - ...
    AbbrevTableID: 0 ## Reference the first abbrev table.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Newly added tests are marked by inline comments since too many tests are updated πŸ˜‚.

llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
679

Added test.

792

Added test.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml
642

Added test.

labath added a subscriber: labath.Jul 3 2020, 1:19 AM

What would you say if, instead of AbbrevTableIndex, we had a field like AbbrevTableID. The main difference would be that this "ID" field can be explicitly specified on the Abbrev table, and it does not have to be a sequentially increasing number (though it could of course be that by default).

The thing I'm trying to achieve is to make the yaml more robust against modifications/simplifications. E.g., it would be nice if deleting an abbrev table does not make all compile units suddenly refer to different tables. If the ids were present explicitly, compile units would be unaffected by this, and one would get an explicit error message if there was a compile unit left which still referred to the deleted abbrev table.

(This is one of the aspects where an assembler is better than yaml, as symbolic labels in asm have these properties.)

What would you say if, instead of AbbrevTableIndex, we had a field like AbbrevTableID. The main difference would be that this "ID" field can be explicitly specified on the Abbrev table, and it does not have to be a sequentially increasing number (though it could of course be that by default).

I think it works. I'm fine with this approach, let's see what others thinking. Thanks a lot for sharing your ideas!

debug_abbrev:
  - ID: 1
    Table:
      - Code: 1
        Tag:  DW_TAG_something
        Children: DW_CHILDREN_yes
        Attributes:
          - Attribute: DW_AT_something
            Form:      DW_FORM_something
  - ID: 2
    Table:
      - Code: 1
        Tag:  DW_TAG_something
        Children: DW_CHILDREN_yes
        Attributes:
          - Attribute: DW_AT_something
            Form:      DW_FORM_something
  - ID: 3
    Table:
      - Code: 1
        Tag:  DW_TAG_something
        Children: DW_CHILDREN_yes
        Attributes:
          - Attribute: DW_AT_something
            Form:      DW_FORM_something

debug_info:
  - Length:        0x1234
    Version:       4
    AbbrevTableID: 3 ## references table 3
    ...
    Entries:
      - AbbrevCode: 1
        Values:
          - Value: 0x1234
  - Length:        0x1234
    Version:       4
    AbbrevTableID: 2 ## references table 2
    ...
    Entries:
      - AbbrevCode: 1
        Values:
          - Value: 0x1234
  - Length:        0x1234
    Version:       4
    AbbrevTableID: 1 ## references table 1
    ...
    Entries:
      - AbbrevCode: 1
        Values:
          - Value: 0x1234

What would you say if, instead of AbbrevTableIndex, we had a field like AbbrevTableID. The main difference would be that this "ID" field can be explicitly specified on the Abbrev table, and it does not have to be a sequentially increasing number (though it could of course be that by default).

The thing I'm trying to achieve is to make the yaml more robust against modifications/simplifications. E.g., it would be nice if deleting an abbrev table does not make all compile units suddenly refer to different tables. If the ids were present explicitly, compile units would be unaffected by this, and one would get an explicit error message if there was a compile unit left which still referred to the deleted abbrev table.

(This is one of the aspects where an assembler is better than yaml, as symbolic labels in asm have these properties.)

+1 to this idea. By default it could of course be an incremental index, when emitting, but in yaml2obj, I see no reason for it to be tied in that way necessarily.

lldb/unittests/Symbol/Inputs/inlined-functions.yaml
348 β†—(On Diff #275304)

AbbrevTableIndex should be an optional value. I think by default it should pick the first table (or possibly should pick the Nth table, where N is the .debug_info table index). Thus in this and pretty much every other case, you should be able to omit the new value.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml
649

Perhaps worth including the index and table count values in this error message to make it easier for people to identify the problem.

llvm/tools/obj2yaml/dwarf2yaml.cpp
20

I'm slightly surprised you needed to add this. Did this not compile without it?

Higuoxing marked an inline comment as done.Jul 3 2020, 2:47 AM
Higuoxing added inline comments.
llvm/tools/obj2yaml/dwarf2yaml.cpp
20

I guess it was automatically added by the code completion tool.

Higuoxing updated this revision to Diff 285944.Aug 17 2020, 2:17 AM

In this revision, I added a helper struct called DWARFState. When emitting DWARF sections, we should initialize DWARFState first, it will help us make DWARF sections interlinked.

Now, we are able to use AbbrevTableID to reference abbrev tables and use ID to assign a number as abbrev table's ID.

Higuoxing updated this revision to Diff 286191.Aug 17 2020, 7:41 PM

Suppress clang-format warnings.

jhenderson added inline comments.Aug 18 2020, 3:49 AM
llvm/include/llvm/ObjectYAML/DWARFEmitter.h
34 β†—(On Diff #286191)

Would it make any sense to merge the DWARFYAML::Data class and DWARFYAML::DWARFState class at all?

35 β†—(On Diff #286191)

Can you use std::unordered_map here? Do we need deterministic ordering? Might also be worth naming it something like AbbrevTableID2Index to avoid confusing with the AbbrevCode values.

llvm/lib/ObjectYAML/DWARFEmitter.cpp
92

Maybe to avoid weird errors, it might make sense to disallow mixing the two methods, i.e. if one table has an explicit ID, the others all must do too. What do you think? It may not be worth it if the code is more complex, I don't know.

105

Could you use the return value of insert to identify whether the key already exists in the map? That way, you don't need the explicit count call.

260–267

It might make more sense to do this work in the caller of this function, and to maintain this function's interface.

llvm/lib/ObjectYAML/MachOEmitter.cpp
271 β†—(On Diff #286191)
llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
679–681

Is there any particular reason you don't have something equivalent to this in the ELF testing, to show you can have tables out-of-order etc.

711

Something's not right here - the YAML below has four debug_info tables, but only three have check lines. What happened to the one that should be referencing abbrev table 2?

873–884

Can you omit the abbrev section entirely?

906–911

This feels like it belongs in a debug_abbrev dedicated test?

llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml
925–926

Same as Mach-O, this really is a debug_abbrev test, not a debug_info test.

This seems fairly straight-forward to me, though I didn't dive into all the details.

Given the amount of changed tests, it may be nice to separate the format change from the linking patch (so in the interim state, one could generate multiple abbrev tables, but there would be no (reasonable) way to reference any table except the first one).

llvm/include/llvm/ObjectYAML/DWARFEmitter.h
34 β†—(On Diff #286191)

That would definitely be nice.

llvm/lib/ObjectYAML/DWARFEmitter.cpp
89

If you'd do all this work in the factory function and then just pass in a finished map to the constructor, there'd be no need for the ErrorAsOutParameter thingy.

91

consider: for (auto &Abbr : enumerate(DI.DebugAbbrev))

93–95

ID = DI.DebugAbbrev[I].ID.getValueOr(I)

Higuoxing planned changes to this revision.Aug 18 2020, 11:08 PM
Higuoxing retitled this revision from [DWARFYAML] Add support for emitting multiple abbrev tables. to [DWARFYAML] Add support for referencing different abbrev tables..
Higuoxing edited the summary of this revision. (Show Details)
llvm/include/llvm/ObjectYAML/DWARFEmitter.h
34 β†—(On Diff #286191)

Would it make any sense to merge the DWARFYAML::Data class and DWARFYAML::DWARFState class at all?

That would definitely be nice.

I think they can be merged. But is there any particular reason to merge them? Personally, I would like to keep the DWARFYAML::DWARFState class. Here are a few concerns I would like to address:

  • If we merge the DWARFYAML::DWARFState and DWARFYAML::Data at all, we will need to add some helper variables, e.g. AbbrevTableID2Index and a method like Error DWARFYAML::link() to DWARFYAML::Data class to link DWARF sections. We will have to call this special method before DWARF sections being emitted. If we port DWARF support to other binary format in the future, e.g., wasm, we might forget to call this method. If we make DWARF emitters accept DWARFYAML::DWARFState, it's easy to figure out that we need a DWARFState before emitting DWARF sections.
  • Besides, I think DWARFYAML::Data and DWARFYAML::DWARFState have different functions. DWARFYAML::Data is used to contain original DWARF section descriptions and DWARFYAML::DWARFState contains helper structs and methods which helps link DWARF sections. It might be good not to merge them so that we can easily figure out when and where are those sections get linked?

I can be persuaded :-)

jhenderson added inline comments.Aug 20 2020, 12:45 AM
llvm/include/llvm/ObjectYAML/DWARFEmitter.h
34 β†—(On Diff #286191)

I think they can be merged. But is there any particular reason to merge them?

As things stand, it mostly seems like DWARFState merely gets in the way of accessing the Data class. It seems to me like it would be easier to use if the two were one class, as you wouldn't have to keep going through an extra function call/member access.

  • If we merge the DWARFYAML::DWARFState and DWARFYAML::Data at all, we will need to add some helper variables, e.g. AbbrevTableID2Index and a method like Error DWARFYAML::link() to DWARFYAML::Data class to link DWARF sections. We will have to call this special method before DWARF sections being emitted. If we port DWARF support to other binary format in the future, e.g., wasm, we might forget to call this method. If we make DWARF emitters accept DWARFYAML::DWARFState, it's easy to figure out that we need a DWARFState before emitting DWARF sections.

If, rather than do all the calculations up front (e.g. mapping tables to IDs), you put the Data members behind getters, you could then add the "linking" code there. For example, with the AbbrevTable ID mapping, you could have the getAbbrevTableIndexByID method, which is the only way of getting an abbrev table from outside the class. The first time this is called (or optionally every time), it works out the mapping between ID(s) and table(s). This avoids the need for an explicit link or finalize method.

  • Besides, I think DWARFYAML::Data and DWARFYAML::DWARFState have different functions. DWARFYAML::Data is used to contain original DWARF section descriptions and DWARFYAML::DWARFState contains helper structs and methods which helps link DWARF sections. It might be good not to merge them so that we can easily figure out when and where are those sections get linked?

I'm not sure I agree they have different functions. The AbbrevID2Index map is just a helper variable that is directly derived from the Data struct's members. We could get rid of it entirely, and just move the getAbbrevTableIndexByID method into Data, rewriting it to just run through the list of tables each time to find the right ID. This is obviously less efficient than instantiating a mapping up front, if done repeatedly, but I think it indicates that this extra layer is not actually doing anything distinct. Similar principles apply probably for other things, although without further concrete examples, it's hard to discuss them!

Higuoxing updated this revision to Diff 286755.Aug 20 2020, 2:45 AM
Higuoxing marked 10 inline comments as done.

Merge DWARFYAML::Data and DWARFYAML::DWARFState.

llvm/include/llvm/ObjectYAML/DWARFEmitter.h
34 β†—(On Diff #286191)

Thanks for the explanation!

llvm/lib/ObjectYAML/DWARFEmitter.cpp
89

DWARFYAML::DWARFState is merged into DWARFYAML::Data, so we don't need ErrorAsOutParameter any more.

92

I think it's a little bit difficult to distinguish these 2 situations. Besides, sometimes we might want to add one compilation unit to a test case and let it reference an existing abbrev table. We don't need to mutate the whole test case to add IDs. What do think of it?

Higuoxing edited the summary of this revision. (Show Details)Aug 20 2020, 2:51 AM
Higuoxing added inline comments.Aug 20 2020, 2:54 AM
llvm/test/ObjectYAML/MachO/DWARF-debug_abbrev.yaml
426

I will remove this blank line later.

I'm not sure if my test comments have been looked at. Could you mark them as Done if you think you have addressed them, please?

llvm/include/llvm/ObjectYAML/DWARFYAML.h
234–237

Something to consider here: the user-visible behaviour of this class is that this is a const operation, but because this method causes AbbrevTableID2Index to be populated on the first call, it technically isn't. I would say this is a reasonable use case for mutable - the cache (in this case AbbrevTableID2Index) is marked as mutable, and then you can continue to pass this class around as const.

llvm/lib/ObjectYAML/DWARFEmitter.cpp
92

Leave it as is. I wasn't convinced by my own statement, so I think what you've got is fine.

Higuoxing updated this revision to Diff 286780.Aug 20 2020, 4:45 AM
Higuoxing marked 11 inline comments as done.

Address comments.

llvm/lib/ObjectYAML/DWARFEmitter.cpp
260–267

This is discussed in D86194

llvm/test/ObjectYAML/MachO/DWARF-debug_info.yaml
679–681

Added in test/tools/yaml2obj/ELF/DWARF/debug-info.yaml (o)

711

Sorry, the length and offsets are calculated by myself, and the __debug_info section's length is miscalculated to be 60, which should be 67. I've corrected it.

llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml
649

I've included the abbrev table ID and compilation unit count values in test case (n). Can we prettify the error message in test case (i) in a follow-up patch?

jhenderson accepted this revision.Aug 20 2020, 5:37 AM

LGTM. Thanks!

llvm/test/tools/yaml2obj/ELF/DWARF/debug-info.yaml
649

That's fine.

This revision is now accepted and ready to land.Aug 20 2020, 5:37 AM
This revision was landed with ongoing or failed builds.Aug 20 2020, 8:45 PM
This revision was automatically updated to reflect the committed changes.
Higuoxing reopened this revision.Aug 20 2020, 9:23 PM

This change is causing build failure, I'm going to revise this change.

This revision is now accepted and ready to land.Aug 20 2020, 9:23 PM
Higuoxing updated this revision to Diff 286985.Aug 21 2020, 3:12 AM

Fix incorrect format string.

Higuoxing updated this revision to Diff 286987.Aug 21 2020, 3:16 AM

Add missing test.

Higuoxing requested review of this revision.Aug 21 2020, 3:22 AM

Hi @jhenderson

This change is causing build failure on armv7 platform and I've fixed it.

See: http://lab.llvm.org:8011/builders/clang-cmake-armv7-global-isel/builds/10400 and http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/20261

Could you please review this change again? Thanks a lot!

llvm/lib/ObjectYAML/DWARFYAML.cpp
69

The build failure is caused by the incorrect format string. AbbrevTable.index() returns size_t and it was formatted as %u. We should use %zu here.

jhenderson accepted this revision.Aug 21 2020, 3:23 AM

Looks good!

This revision is now accepted and ready to land.Aug 21 2020, 3:23 AM