This is an archive of the discontinued LLVM Phabricator instance.

DIEData, DIEWriter: introduce and begin migration.
Needs ReviewPublic

Authored by pcc on Feb 16 2016, 6:32 PM.

Details

Summary

This patch introduces the DIEData class, which stores a sequence of DWARF
DIEs directly in the format in which they will be emitted to the object file
(i.e. bytes and relocations), together with the DIEWriter class, which is
used to add DIEs to a DIEData. The goal is to decrease memory consumption
in the DWARF emitter by migrating all DIE emission to go through these
classes instead of the current memory-inefficient DIE and DIEValue classes
(see mailing list thread [1]).

This patch also takes the first step towards migrating the DWARF emitter. It
introduces the necessary scaffolding for the migration in the form of the
ability for a DIE to use a DIEData-based representation, and replaces the
existing DWARF emission code for base types with code that uses DIEData.
To avoid a regression in memory consumption, I coded my changes to DIE to
avoid increasing its size.

A few notes on this patch:

  • This doesn't add any support for relocations (other than symbol relocations) or insertions as mentioned in the thread, because they aren't needed to emit base types, and I wanted to make this first patch as simple as possible. Future patches will add this as needed.
  • The way that the current DWARF hasher code works is by walking DIE entries. This won't work in the new world, so I've added a new mechanism for generating a hash from metadata. To get from a DIE to metadata, I store a reference to the MDNode in the DIE. I reckon that hashes will eventually be computed by walking MDNodes directly starting at a compilation unit or type, perhaps supplemented by additional information collected by walking the IR, so you can see how the current code might evolve into that.

    This code isn't very DRY, but I think we can address this as needed as we migrate more tags to DIEData.
  • The old and new abbreviation emitter code don't interact except by choosing to use different abbreviation numbers (the old code uses odd numbers, the new code uses even numbers).

[1] http://lists.llvm.org/pipermail/llvm-dev/2016-February/094974.html

Diff Detail

Event Timeline

pcc updated this revision to Diff 48139.Feb 16 2016, 6:32 PM
pcc retitled this revision from to DIEData, DIEWriter: introduce and begin migration..
pcc updated this object.
pcc added reviewers: dblaikie, echristo, dexonsmith.
pcc added a subscriber: llvm-commits.
dblaikie edited edge metadata.Feb 17 2016, 2:35 PM

The hashing stuff does seem a bit worrying - not sure how that'll end up looking.

The odd/even abbrev numbering seems a bit awkward - can we just build the new infrastructure closer to the old stuff, so they can pick numbers more closely? (have you done any size analysis of this change - I realize the numbering of DIEs is probably a small component of DWARF size, but this change sounds like it'd increase DWARF size a little?)

What do you have in mind for DIEs that are incomplete for most of DWARF emission? (user defined types (classes and structs), and subprogram/function descriptions are like this - as we were discussing in previous threads) The abbreviation won't be known until the end of DWARF emission, so it seems we'll want to keep around a representation of the abbreviation that's more like what we have today. It seems like this transition approach might result in more churn than would be necessary, prehaps? But maybe not - just wonedring/thinking about it.

I haven't looked at everything in detail - more review tomorrow, etc.

include/llvm/CodeGen/DIEData.h
153

else-after-return

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
685

Probably best not to name a function the same as a common library type. I think functions that use string as a verb usually end up named "stringify" or the like.. but that might not be what's being described here, not sure yet.

unittests/CodeGen/DIEDataTest.cpp
85

Use () rather than {}, this doesn't look like it should be an aggregate/implicit ctor (but we haven't really bothered to put 'explicit' on all the ctors that became implicit in C++11 with aggregate init)

pcc added a comment.Feb 17 2016, 4:02 PM

The odd/even abbrev numbering seems a bit awkward - can we just build the new infrastructure closer to the old stuff, so they can pick numbers more closely?

Maybe, but I'd like to avoid having the new code and the old code interact where possible. In particular, I think the code would be a little harder to unit test if I use the old code for abbreviations.

(have you done any size analysis of this change - I realize the numbering of DIEs is probably a small component of DWARF size, but this change sounds like it'd increase DWARF size a little?)

I'll see what the impact is on Chromium.

The abbreviation won't be known until the end of DWARF emission

Is this true? Since the abbreviation depends only on the attribute names and forms, I suspect that in many cases we can compute an abbreviation up front and fill in attribute values later.

For example, the abstract origins for concrete variables added by DwarfDebug::finishVariableDefinitions. Since we know at DbgVariable creation time whether a variable is concrete or not, we can create an abstract origin attribute with some arbitrary value for concrete variables when we create them (in the equivalent of DwarfCompileUnit::constructVariableDIEImpl), store a reference to that attribute somewhere (which could just be a DIEData+offset pair), and use that in DwarfDebug::finishVariableDefinitions to fill in the value.

If there are cases where we truly cannot know about the existence of an attribute until later, then we can probably try something like maintaining a flag which would only be relevant for those DIE tags which can have attributes inserted dynamically, and have the code in DIEData::{prepareToEmit,emit} create abbreviations on demand for those specific tags, but I'd like to try to find a way to avoid that if possible.

aprantl edited edge metadata.Feb 18 2016, 10:50 AM
aprantl added a subscriber: friss.
aprantl added a subscriber: aprantl.

I'd be very interested in the size impact. Do you have access to a Darwin machine? Could you build something large and then run llvm-dsymutil with and without the patch on it and compare the differences?

include/llvm/CodeGen/DIEData.h
65

Certainly the most important part of the review :-)
We enable autobrief in Doxygen now. In most situations \brief is redundant now.

test/DebugInfo/X86/linkage-name.ll
7

If we're not checking the FORMs matching them just adds visual noise to the test. I would just delete everything after the TAG.
(this applies to all testcases)

pcc updated this revision to Diff 48550.Feb 19 2016, 3:24 PM
pcc marked 3 inline comments as done.
  • Address code review comments
  • Restore sizeof(llvm::DIE) as intended. I misunderstood the code and thought that IntrusiveBackList was 2 pointers when in fact it is 1
include/llvm/CodeGen/DIEData.h
153

Fixed by removing the returns. I wanted to eventually make some of these functions return a reference to the attribute that could be filled in later, but that probably isn't appropriate for attributes like this one which need to know the value up front.

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
685

Hm. I wanted to be concise with names in the new code, in order to distinguish visually from the old code. I think this naming might be okay if there are only a few weird cases like string.

pcc added a comment.Feb 19 2016, 3:25 PM

Could you build something large and then run llvm-dsymutil with and without the patch on it and compare the differences?

I don't think this change will make a difference in llvm-dsymutil because it isn't using the data representation yet.

dexonsmith resigned from this revision.Oct 16 2020, 7:34 PM