This is an archive of the discontinued LLVM Phabricator instance.

Improve DWARF parsing speed by improving DWARFAbbreviationDeclaration
ClosedPublic

Authored by clayborg on Nov 11 2016, 3:22 PM.

Details

Summary

This patch gets a DWARF parsing speed improvement by having DWARFAbbreviationDeclaration instances know if they have a fixed byte size. If an abbreviation has a fixed byte size that can be calculated given a DWARFUnit, then parsing a DIE becomes two steps: parse ULEB128 abbrev code, and then add constant size to the offset.

This patch also adds a fixed byte size to each DWARFAbbreviationDeclaration::AttributeSpec so that attributes can quickly skip their values if needed without the need to lookup the fixed for size.

Notable improvements:

  • DWARFAbbreviationDeclaration::findAttributeIndex() now returns an Optional<uint32_t> instead of a uint32_t and we no longer have to look for the magic -1U return value
Optional<uint32_t> DWARFAbbreviationDeclaration::findAttributeIndex(dwarf::Attribute attr) const;
  • DWARFAbbreviationDeclaration now has a getAttributeValue() function that extracts an attribute value given a DIE offset that takes advantage of the DWARFAbbreviationDeclaration::AttributeSpec::ByteSize
bool DWARFAbbreviationDeclaration::getAttributeValue(const uint32_t DIEOffset, const dwarf::Attribute Attr, const DWARFUnit &U, DWARFFormValue &FormValue) const;
  • A DWARFAbbreviationDeclaration instance can return a fixed byte size for itself so DWARF parsing is faster:
Optional<size_t> DWARFAbbreviationDeclaration::getFixedAttributesByteSize(const DWARFUnit &U) const;
  • Any functions that used to take a "const DWARFUnit *U" that would crash if U was NULL now take a "const DWARFUnit &U" and are only called with a valid DWARFUnit

Diff Detail

Repository
rL LLVM

Event Timeline

clayborg updated this revision to Diff 77678.Nov 11 2016, 3:22 PM
clayborg retitled this revision from to Improve DWARF parsing speed by improving DWARFAbbreviationDeclaration.
clayborg updated this object.

DWARF parsing speed is improved by 7% on a large DWARF file that contains debug info for LLVM, Clang and LLDB. The test would parse all of the DIEs in all of the compile units, iterate through all DIEs by visiting their children and siblings and extract a DW_AT_name as a string and extract a DW_AT_low_pc as an address. This tests overall parsing speed and also tests accessing attributes where DWARFAbbreviationDeclaration instances need to extract attribute values.

aprantl added inline comments.Nov 11 2016, 3:47 PM
include/llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h
50 ↗(On Diff #77678)

This is pointer to an offset in the section? Would a uint32_t & be more readable here? (Not sure since it is then less clear that this is an inout argument at the call site.)

51 ↗(On Diff #77678)

Why are there two methods called extractFast, is that a transitional thing that will go away in a subsequent patch?

That's a general problem I have even with the original interface: Just from looking at the documentation here it isn't clear to me what this function does and what the alternative to extractFast would be.

tools/dsymutil/DwarfLinker.cpp
2085 ↗(On Diff #77678)

Much nicer!

probinson added inline comments.Nov 11 2016, 5:22 PM
lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
68 ↗(On Diff #77678)

Is there some reason you're only considering FORM_ref_addr here? I'd think you would want to count all of the format-dependent reference forms here (ref_addr, strp, sec_offset, and all the similar new ones in DWARF 5). I'd especially think FORM_strp would be important as that's used by pretty much every abbreviation that has a DW_AT_name.

clayborg added inline comments.Nov 11 2016, 5:24 PM
include/llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h
50 ↗(On Diff #77678)

I can change this, but I should probably do this in a separate patch as there are many many functions that take a uint32_t pointer to an offset that gets updated. So lets not fix this now and make it inconsistent with 20 other function calls. Is that ok?

51 ↗(On Diff #77678)

extractFast is used elsewhere in places I didn't write. I would like to get rid of this, and yes I will do that in another patch, but not this one.

clayborg added inline comments.Nov 11 2016, 5:30 PM
lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
68 ↗(On Diff #77678)

Yep, I need to fix this. Good catch.

68 ↗(On Diff #77678)

Actually isn't DW_FORM_ref_addr special? It is AddrSize in DWARF 2, and in DWARF 3 and later it is 4 bytes for DWARF32 and 8 for DWARF64. So I need an extra field in FixedSizeInfo that counts the DWARF32/DWARF64 only variants. Am I correct in that the following:

case DW_FORM_strp:
case DW_FORM_GNU_ref_alt:
case DW_FORM_GNU_strp_alt:
case DW_FORM_line_strp:
case DW_FORM_sec_offset:
case DW_FORM_strp_sup:
case DW_FORM_ref_sup:

They are not AddrSize in DWARF 2 right? It just switches off of DWARF32/DWARF64.

clayborg updated this revision to Diff 77705.Nov 11 2016, 5:41 PM

Fixed the constant size handling of all forms that are 4 bytes in DWARF32 and 8 bytes in DWARF64 (DW_FORM_strp, DW_FORM_GNU_ref_alt, DW_FORM_GNU_strp_alt, DW_FORM_line_strp, DW_FORM_sec_offset, DW_FORM_strp_sup, DW_FORM_ref_sup) by adding a new field named DWARFAbbreviationDeclaration::FixedSizeInfo::NumDwarfOffsets to DWARFAbbreviationDeclaration::FixedSizeInfo.

I had incorrectly placed DW_FORM_strp in the "always 4 bytes" column in a previous fix and didn't think about that when I submitted this patch as I had this patch already done at the same time as the previous patch. This is now fixed in this revision.

probinson added inline comments.Nov 14 2016, 7:47 AM
lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
68 ↗(On Diff #77678)

Yes, DW_FORM_ref_addr is special for DWARF v2. The other forms you list did not exist in DWARF 2 and so are only dependent on the 32/64 format.

probinson edited edge metadata.Nov 14 2016, 7:59 AM

So the idea is that you don't always have a Unit when you're parsing the abbreviations? That's mildly counter-intuitive. If you have a Unit then you know version and format up front, and you can just calculate the fixed sizes without having to remember how many of which special cases you found. When would you be doing this without a Unit available?

So the idea is that you don't always have a Unit when you're parsing the abbreviations? That's mildly counter-intuitive. If you have a Unit then you know version and format up front, and you can just calculate the fixed sizes without having to remember how many of which special cases you found. When would you be doing this without a Unit available?

I thought this was better than assuming all DWARFUnits would have the same version, DWARF32/DWARF64, and address size. I could always parse the first DWARFUnit and use that unit's version, address size and 32/64 to calculate things, but that seems like more of a hack.

So the idea is that you don't always have a Unit when you're parsing the abbreviations? That's mildly counter-intuitive. If you have a Unit then you know version and format up front, and you can just calculate the fixed sizes without having to remember how many of which special cases you found. When would you be doing this without a Unit available?

All DWARF that we deal with in Mach-O at Apple have one abbreviation table that is shared for all compile units. Not sure if this is how things end up being in ELF DWARF files, but our tools end up having one abbreviation table for all compile units.

One question regarding DWARF32/DWARF64: is there anything in the DWARF spec that requires all units to be 32 bit, or all must be 64 bit? Can they be intermixed where we have some 32 and some 64?

If the abbrev table is shared across units, then what you are doing makes more sense. Actually I just found where DWARF explicitly allows sharing abbrev tables (DWARF 4 section 7.5). I don't know that the committee explicitly intended sharing across 32-bit and 64-bit units, but it's not forbidden. So, my intuition was wrong. :-)

DWARF requires that 32-bit and 64-bit formats are not intermixed in the same compilation unit. (DWARF 4 section 7.4 p.142.) I take this to mean things like the .debug_info and .debug_line contributions for the same CU must use the same format. However different CUs can use different formats within the same program.

I have another upcoming patch that will add unit testing of DWARF that you generate that will be able to detect that we have a fixed size abbreviation declaration, but that patch will need to be recoded to use the AsmPrinter classes so that will be in my next patch.

I would like to get this in. Does anyone have any objections to this?

I would like to get this in. Does anyone have any objections to this?

I'm good, but I think David and/or Adrian need to chime in.

aprantl edited edge metadata.Nov 14 2016, 3:04 PM

Couple of stylistic comments inline, but otherwise this is fine from my end unless David sees anything.

lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
178 ↗(On Diff #77705)

> 0 is redundant since NumAddrs is unsigned. I would either use != 0 of if (NumAddrs).

Actually, it looks like the entire condition is redundant. If NumAddrs is 0 then the next line will be a noop.

lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
221 ↗(On Diff #77705)

roll into the condition?

248 ↗(On Diff #77705)

I don't like that U can be null here but that's for another patch :-)

aprantl added inline comments.Nov 14 2016, 3:15 PM
lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
187 ↗(On Diff #77705)

This comment should be on the declaration and a doxygen comment.

192 ↗(On Diff #77705)

Maybe it's easier to read to have this as a ?: one-liner inlined the class declaration?

Will update with changes.

lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
178 ↗(On Diff #77705)

Will change.

192 ↗(On Diff #77705)

Not sure I follow? Do you want this?:

return ByteSize ? ByteSize : DWARFFormValue::getFixedByteSize(Form, &U);
lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
221 ↗(On Diff #77705)

Will do.

248 ↗(On Diff #77705)

This will go away in my future DWARFDIE patch.

clayborg updated this revision to Diff 77911.Nov 14 2016, 4:10 PM
clayborg edited edge metadata.

Made changes Adrian asked for.

clayborg updated this revision to Diff 77913.Nov 14 2016, 4:12 PM

Removed comment that was redundant as headerdoc was already there in the definition.

clayborg marked 4 inline comments as done.Nov 14 2016, 4:13 PM

Updated things were were "Done".

aprantl accepted this revision.Nov 14 2016, 4:43 PM
aprantl edited edge metadata.

Thanks!

lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
192 ↗(On Diff #77705)

Yes. This is up to taste — but since this is just a very straightforward single line it might be a better to just move it straight into the class definition.

This revision is now accepted and ready to land.Nov 14 2016, 4:43 PM
This revision was automatically updated to reflect the committed changes.
chapuni added inline comments.
llvm/trunk/include/llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h
68

s/Attr/attr/
Fixed in r286954.