This is an archive of the discontinued LLVM Phabricator instance.

Abstract almost all DwarfDebug out of the classes in DIE.cpp.
ClosedPublic

Authored by clayborg on Nov 28 2016, 10:01 AM.

Details

Summary

This change removes the dependency on DwarfDebug that was used for DW_FORM_ref_addr by making a new DIEUnit class in DIE.cpp.

The DIEUnit class represents a compile or type unit and it owns the unit DIE as an instance variable. This allows anyone with a DIE, to get the unit DIE, and then get back to its DIEUnit without adding any new ivars to the DIE class. Why was this needed? The DIE class has an Offset that is always the CU relative DIE offset, not the "offset in debug info section" as was commented in the header file (the comment has been corrected). This is great for performance because most DIE references are compile unit relative and this means most code that accessed the DIE's offset didn't need to make it into a compile unit relative offset because it already was. When we needed to emit a DW_FORM_ref_addr though, we needed to find the absolute offset of the DIE by finding the DIE's compile/type unit. This class did have the absolute debug info/type offset and could be added to the CU relative offset to compute the absolute offset. With this change we can easily get back to a DIE's DIEUnit which will have this needed offset. Prior to this is required having a DwarfDebug and required calling:

DwarfCompileUnit *DwarfDebug::lookupUnit(const DIE *CU) const;

Now we can use the DIEUnit class to do so without needing DwarfDebug. All clients now use DIEUnit objects (the DwarfDebug stack and the DwarfLinker). A follow on patch for the DWARF generator will also take advantage of this.

Diff Detail

Event Timeline

clayborg updated this revision to Diff 79418.Nov 28 2016, 10:01 AM
clayborg retitled this revision from to Abstract almost all DwarfDebug out of the classes in DIE.cpp..
clayborg updated this object.
dblaikie added inline comments.Nov 28 2016, 11:49 AM
include/llvm/CodeGen/DIE.h
88–89

Aren't these already implicit?

612–616

That's a pretty subtle API contract.

Duncan tried to do something similar with StringMap maybe (granted, in that case the inner variable wasn't the first member of the outer struct so he was trying to use offset_of) & for some reason that wasn't feasible, if I recall correctly. Perhaps he can tell me/us more about what the issues were there & whether the same concerns apply here.

(in any case - due to alignment, doesn't this bool variable grow the size of the struct by a pointer size anyway? The struct has to be aligned to pointer size (due to the Parent member, and probably the Children member too) , so even in a 64 bit build - Offset, Size, AbbrevNumber, and Tag are 32 bits? So 'Children' appears on the next 64 bit boundary probably, etc... )

Could we instead change 'Parent' to be a PointerUnion<DIE, DIEUnit> & wouldn't need the offset tricks anyway?

724

This is implicit/default, right? (but I suppose if this thing is movable, then having the DIE physically point to its DIEUnit could be tricky - equally, though, all the DIE's children parent pointers would be broken by such an API as well, right?)

lib/CodeGen/AsmPrinter/DIE.cpp
469–471

Guessing this is an unrelated change - but seems OK (you could commit it separately/ahead of this patch, to reduce the diff here/simplify code review).

476–477

formatting

481

Is this an unrelated/orthogonal change?

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
303

Hey Adrian - did you happen to run size numbers on whether this is important? What would happen if we just always used ref_addr when we enabled cross-unit referencing? (wondering if the complexity is worth the benefit - 4 bytes per unit reference isn't nothing, for sure - just wondering how much it is in the end)

tools/dsymutil/DwarfLinker.cpp
225–226

Moving DIEs seems problematic - any sub-DIEs parent pointers will be invalidated by this, right?

3395–3398

You could reduce indentation here in one of a couple of ways:

Remove the outer scope and nest the variable declaration in the if:

if (DIE *OutputDIE = cloneDIE(...))
  CurrentUnit.setOutputUnitDIE(std::move(*OutputDIE));
else
  continue;

or keeping the outer scope (though I don't think it's really necessary - we have moved from objects regularly - you could null out the pointer to help a bit there, if you liked - though that might trigger dead-store warnings eventually) and flipping the if condition while removing the else:

if (!OutputDIE)
  continue;
CurrentUnit.setOutputUnitDIE(...);

Will switch over to using PointerUnion to avoid the strict API contracts and fix the rest accordingly.

include/llvm/CodeGen/DIE.h
88–89

Yep, they are. There was code in DwarfLinker a while back that was manually copying a DIEAbbrev for no reason so I added these when I removed that redundant code. These aren't needed.

616

Yes it is subtle API contract in an effort to save space.

This doesn't grow this class because dwarf::Tag in 16 bits (enum class uint16_t) and the next class will be 4 or 8 byte aligned so it will sit in the gap.

Could we instead change 'Parent' to be a PointerUnion<DIE, DIEUnit> & wouldn't need the offset tricks anyway?

Great idea. That solves the problem without growing the size.

724

I will think about this as I am switching over to use a PointerUnion and make any needed modifications.

lib/CodeGen/AsmPrinter/DIE.cpp
471

This is leaving in he assert that used to be on line 448 to the right? Am I missing something?

481

Intentional change that fixes emitting DW_FORM_ref_addr for DWARF version 2 which was broken. It would always emit 4 bytes regardless of the DWARF version.

tools/dsymutil/DwarfLinker.cpp
226

I will switch to DIE to use a PointerUnion and fix accordingly.

clayborg updated this revision to Diff 79644.Nov 29 2016, 2:21 PM

Switched over to using PointerUnion and outlawed llvm::DIEUnit and llvm::DIE classes from being copied. Had to modify the DwarfLinker to not copy or move construct any DIE instances.

Friendly ping. I would like to get back to the DWARF generator patch after this if possible.

aprantl edited edge metadata.Nov 30 2016, 11:07 AM

Some nitpicky comments.

include/llvm/CodeGen/DIE.h
705

Please don't repeat the class name in a doxygen comment.

706

I think this regular comment in-between may break doxygen?

lib/CodeGen/AsmPrinter/DIE.cpp
124

We typically write this as:
if (const auto *U = getUnit())

126

alternatively:

assert(getUnit() && "DIE must belong to a DIEUnit to find the absolute offset.")
return getUnit()->getDebugSectionOffset() + getOffset();
137

Why not return a DIEUnit & instead? This would simplify all the call sites.

145

Same comment(s) apply here :-)

211

) && "message")?

214

perhaps move these trivial accessors into the class definition?

tools/dsymutil/DwarfLinker.cpp
2718

this is confusing to read. No idea how to make it better though...

dblaikie added inline comments.Nov 30 2016, 11:35 AM
tools/dsymutil/DwarfLinker.cpp
217–218

This type includes a non-copy/movable DIEUnit, so it's implicitly non-copy/movable, so you can omit these if you like ( http://blog.florianwolters.de/educational/2015/01/31/The_Rule_of_Zero/ ) I think

1096–1097

Think the code might be more legible without these typedefs (the first moreso than the second)

1097

Alternatively, this could be a std::deque if items are added but never removed & object identity is important. (may not be better that way - just suggesting it so you, Adrian, etc can consider it)

2716

Please add a message to the assertion to explain what/why this should be true.

(I personally find a block with only an assert in it to be a little weird and might tend to phrase this as "assert(!Die || !Info.Clone); if (!Die) { ... } - but yeah, it's not exactly more legible that way :/)

2718–2720

Might this be simpler as:

if (!Info.Clone)
  Info.Clone = ...;
Die = Info.Clone;
3467–3468

prefer llvm::make_unique

aprantl added inline comments.Nov 30 2016, 11:38 AM
lib/CodeGen/AsmPrinter/DwarfUnit.cpp
303

I just did some numbers on a couple of projects by doing

dwarfdump -F WebKit.framework.dSYM | grep FORM_ref4 |wc -l

               WebKit     clang   medium-sized ObjC project
===========================================================
FORM_ref_addr 7231489  15483428
FORM_ref4      189099    555085    189189
-----------------------------------------------------------
Savings         ~700K       ~2M     ~700K

It's not much in the grand scheme of things, but also not entirely negligible. Since .dSYM bundles are made to be stored indefinitely I think we should keep every optimization that makes them smaller.

clayborg updated this revision to Diff 79796.Nov 30 2016, 12:49 PM
clayborg edited edge metadata.

Fixed all issues found by previous comments.

clayborg added inline comments.Nov 30 2016, 12:50 PM
include/llvm/CodeGen/DIE.h
705

I was trying to be consistent with the other classes in this file. I can remove it.

lib/CodeGen/AsmPrinter/DIE.cpp
137

Since there are clients that need the getUnitOrNull() already, I would like to keep this consistent with that call. Also I would like the code to not crash if asserts are not enabled. So I would prefer to leave this as a pointer.

210–211

Not sure what I would say? && "It must be one of these?". The assert seems pretty clear. It was also moved from another location in DwarfUnit::DwarfUnit where this assert appeared just like this.

214

I generally don't like putting things with asserts in header files, but there are plenty of asserts in LLVM header files already, so I can move this.

tools/dsymutil/DwarfLinker.cpp
218

Sounds good, will remove.

Commented inline on the DW_FORM_ref_addr vs DW_FORM_ref4 question.

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
303

There should be no size difference at all DW_FORM_ref_addr are 4 bytes in DWARF32 unless you are emitting Version 2, which we don't emit much right? There is a computation penalty for using DW_FORM_ref_addr since and attribute in a DIE will have a DIEValue that contains a DIEEntry. DIEEntry objects point to DIE objects and these have CU relative offsets stored inside of them, so in order to emit a DW_FORM_ref_addr, we will need to get the Unit DIE and then than add the compile/type unit's absolute offset to the DIE's offset to compute the absolute offset, so there will be some performance cost to using DW_FORM_ref_addr.

clayborg updated this revision to Diff 79819.Nov 30 2016, 2:36 PM

Address Adrian's comments from email by making DIE just have:

class DIE {
  const DIEUnit *getUnit() const;
  const DIE *getUnitDie() const;
};

And skip the "OrNull" variants. Also added a "expected a unit TAG" to the assert in the DIEUnit constructor that checks the incoming dwarf Tag.

Any objections to the current state of things?

aprantl accepted this revision.Dec 1 2016, 10:39 AM
aprantl edited edge metadata.

Looks good to me now. Thanks!

include/llvm/CodeGen/DIE.h
598

Not super important, but we would generally spell this like this nowadays:

/// Dwarf unit relative offset.
unsigned Offset;
This revision is now accepted and ready to land.Dec 1 2016, 10:39 AM
This revision was automatically updated to reflect the committed changes.
dblaikie edited edge metadata.Dec 1 2016, 2:17 PM

Realized a few bits of code review feedback got lost in the noise somewhere. So I applied some myself & left a few here.

llvm/trunk/include/llvm/CodeGen/DIE.h
622 ↗(On Diff #79954)

This probably shouldn't be protected - there's no inheritance happening here. DIEUnit can access 'Owner' directly and probably should do that. It gives a bit of a false sense of encapsulation, I think.

710–714 ↗(On Diff #79954)

Comment ended up out of date?

715–722 ↗(On Diff #79954)

Are these members accessed directly?

(ah, the first two have one direct access each, that could've used an accessor - and the others not, so I made them all private and used the two accessors in the two spots that were needed)

tools/dsymutil/DwarfLinker.cpp
1097

Any thoughts on using deque here? (Adrian?)

2718–2720

Made this change in r288423