This is an archive of the discontinued LLVM Phabricator instance.

Improve DWARF parsing and attribute retrieval speed by improving DWARF abbreviation declarations.
AbandonedPublic

Authored by clayborg on Nov 9 2016, 1:39 PM.

Details

Summary

If you take a look at large DWARF files you will notice there are not that many abbreviation declarations. For a large binary that has debug inforamtion for LLVM, Clang and LLDB, there are only ~500 abbreviations. By adding some new data members to the DWARFAbbreviationDeclaration class, we can parse DWARF faster and also retrieve DWARF attribute values faster.

To speed up the DWARF parsing we do a little extra work when parsing the DWARFAbbreviationDeclaration. We determine if a DWARFAbbreviationDeclaration has a fixed byte size and we remember how to calculate the fixed byte size using a new DWARFAbbreviationDeclaration::FixedSizeInfo structure that is stored as a optional value inside the DWARFAbbreviationDeclaration. When parsing DWARF the DIEs in a compile unit, we must extract DWARFDebugInfoEntryMinimal objects. In the extract function we check this DWARFAbbreviationDeclaration::FixedAttributeSize optional value to see if the byte size if fixed and if so, we can skip all attributes in one operation instead of iterating through all of the attribute/form pairs and individually skipping each one.

To speed up attribute value extraction, we get rid of the DWARFFormValue::getFixedFormSizes(...) static function and store an optional byte size with each attribute/form pair. The old DWARFAbbreviationDeclaration::AttributeSpec was:

class DWARFAbbreviationDeclaration {
  struct AttributeSpec {
    dwarf::Attribute Attr;
    dwarf::Form Form;
  };
};

The new one add an optional ByteSize:

class DWARFAbbreviationDeclaration {
public:
  struct AttributeSpec {
    dwarf::Attribute Attr;
    dwarf::Form Form;
    Optional<uint8_t> ByteSize;
};

This allows us to not have to calculate fixed form sizes each time we parse a DIE. Member fucntions were added to DWARFAbbreviationDeclaration, DWARFAbbreviationDeclaration::AttributeSpec and DWARFFormValue to centralize the information for each AttributeSpec and to be able to calculate the byte size given a DWARFUnit for a DWARFAbbreviationDeclaration as a whole, and if that fails, each AttributeSpec individually. We also added a map to convert dwarf::Attribute enum values into attribute indexes.

These fixes improve DWARF parsing speed by around 7 percent. The test was done by parsing an LLDB build that contains full debug info for LLDB, Clang and LLVM where we grab all compile units, extract all DIEs, traverses each DIE in the hierachy and asking each one for its name by extracting the DW_AT_name attribute (if any) and extracting the DW_AT_low_pc attribute.

Previously there we no DWARF unittests that actually tested DWARF parsing. I have added a dwarf_gen::DWARFGenerator class that allows C++ code to easily create DWARF debug info and encode it into. Example code for generating DWARF:

using namespace dwarf_gen;
// Create a DWARF generator object
DWARFGenerator Dwarf;
// Create a compile unit with the specified DWARF version and address size
CompileUnit &CU = Dwarf.appendCompileUnit(Version, AddrSize);

// Append a few attributes to the compile unit's DIE:
CU.Die.appendAttribute({DW_AT_name, DW_FORM_strp, "/tmp/main.c"});
CU.Die.appendAttribute({DW_AT_language, DW_FORM_data2, DW_LANG_C});

// Create a DW_TAG_subprogram DIE as a child of the compile unit DIE and
// add some attributes to it
DIE &SubprogramDie = CU.Die.appendChild(DW_TAG_subprogram);
SubprogramDie.appendAttribute({DW_AT_name, DW_FORM_strp, "main"});
SubprogramDie.appendAttribute({DW_AT_low_pc, DW_FORM_addr, 0x1000U});
SubprogramDie.appendAttribute({DW_AT_high_pc, DW_FORM_addr, 0x2000U});

// Create a DW_TAG_base_type type DIE as a child of the compile unit DIE and
// add some attributes to it
DIE &IntDie = CU.Die.appendChild(DW_TAG_base_type);
IntDie.appendAttribute({DW_AT_name, DW_FORM_strp, "int"});
IntDie.appendAttribute({DW_AT_encoding, DW_FORM_data1, DW_ATE_signed});
IntDie.appendAttribute({DW_AT_byte_size, DW_FORM_data1, 4});

// Create a DW_TAG_base_type type DIE as a child of the subprogram DIE and
// add some attributes to it.
DIE &ArgcDie = SubprogramDie.appendChild(DW_TAG_formal_parameter);
ArgcDie.appendAttribute({DW_AT_name, DW_FORM_strp, "argc"});
ArgcDie.appendAttribute({DW_AT_type, DW_FORM_ref4, &IntDie});

// Generate the DWARF
DWARFSections DwarfSections;
Dwarf.generate(DwarfSections);

// Now make a DWARFContextInMemory using the given section data that was
// generated and use LLVM's DWARF API to extract info from it.
DWARFContextInMemory dwarfContext(
    LittleEndian, AddrSize, DwarfSections.getDebugAbbrevData(),
    DwarfSections.getDebugInfoData(), DwarfSections.getDebugStrData());
uint32_t NumCUs = dwarfContext.getNumCompileUnits();
DWARFCompileUnit *U = dwarfContext.getCompileUnitAtIndex(0);
DWARFDebugInfoEntryMinimal* Die = U->getUnitDIE(false);

The DWARF generator is a separate code base from the parser and that ensures that we don't end up with symmetric encode/decode errors.

A full suite of unit tests were added that test decoding all DW_FORM_XXX values that we currently support using DWARF version 2, 3, and 4. Tests we also added for parsing a known chunk of DWARF and ensuring that we can extract it, and get the children and sibling DIEs as expected.

Diff Detail

Event Timeline

clayborg updated this revision to Diff 77387.Nov 9 2016, 1:39 PM
clayborg retitled this revision from to Improve DWARF parsing and attribute retrieval speed by improving DWARF abbreviation declarations..
clayborg updated this object.
clayborg set the repository for this revision to rL LLVM.
aprantl edited edge metadata.Nov 9 2016, 2:51 PM

This patch is really too long for a thorough review. I made a first pass and added a couple of low-level comments, but it would benefit a lot if you could split this up into smaller units.

include/llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h
36

It would be better to use doxygen comments /// for all these definitions.

86

perhaps write this as:

/// The fixed byte size for fixed size fo
uint16_t NumBytes = 0;
/// Number of DW_FORM_address forms in this abbrevation declaration.
uint8_t NumAddrs = 0;
/// Number of DW_FORM_ref_addr forms in this abbrevation declaration.
uint8_t NumRefAddrs = 0;

and use the default constructor?

103

Why are we using a std::map instead of an llvm::DenseMap here?

include/llvm/DebugInfo/DWARF/DWARFContext.h
287

The Doxygen comment should be in the header and not in the .cpp file. And the surrounding code is also violating this rule :-(

include/llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h
51

trailing .

54

What's the point in making the by-value uint32_t const?

105

Doxygen comment?

include/llvm/DebugInfo/DWARF/DWARFFormValue.h
88

DWARF 64

92

/// NULL, then None will be returned.

97

ditto

lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
58

This loop is confusing with HasFixedByteSize being carried over into the next iteration...

121

This might be a good chance to replace this API with an Optional<uint32_t> in a follow-up commit?

136

auto, since the type is clear from the context?

139

trailing .

144

trailing .

161

why not pass in a const DWARFUnit &U?

170

trailing .

lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
215

maybe *FixedDIESize?

218

.

lib/DebugInfo/DWARF/DWARFFormValue.cpp
73

trailing .

94

maybe use returns everywhere?

117

Perhaps make this a variable?

126

redundant break.

143

Same comments as in previous function.

unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
26

there is sys::IsBigEndianHost in Support/Endian.h

I will make the changes.

include/llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h
36

ok

86

I added the ///. Can't use the default constructor as it won't initialize anything.

103

DenseMap doesn't compile.

include/llvm/DebugInfo/DWARF/DWARFContext.h
287

Are you asking for me to add a Doxygen comment? What is the suggestion here? There is not header doc in the .cpp file.

include/llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h
54

It was so people wouldn't change it accidentally, but I remember with the last review we fixed these so I can remove it.

105

I will skip this for now as I will be moving all of this into the llvm::DWARFDIE class in the next revision. I will add doxygen comments then.

lib/DebugInfo/DWARF/DWARFAbbreviationDeclaration.cpp
58

It has to be this way. We are trying to see if the entire DWARFAbbreviationDeclaration has a fixed byte size. I can fix it by using an Optional around FSI maybe to make it clearer?

121

No, this is actually a good idea, I will do it as part of this commit. I am not too fond of the use of the magic -1U that is used elsewhere.

161

I thought of this but all DWARF APIs currently pass around pointers to DWARFUnits everywhere, so I would rather not change it and make it inconsistent.

lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp
215

Not sure that makes it clearer, but if that what most LLVM users do, I will change it.

lib/DebugInfo/DWARF/DWARFFormValue.cpp
117

I added an accessor in DWARFUnit:

case DW_FORM_GNU_ref_alt:
case DW_FORM_GNU_strp_alt:
case DW_FORM_line_strp:
case DW_FORM_sec_offset:
case DW_FORM_strp_sup:
case DW_FORM_ref_sup:
  // 4 bytes in DWARF 32, 8 bytes in DWARF 64.
  if (U)
    return U->getDwarfOffsetByteSize();
  return None;
unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
26

Nice, I'll use that.

clayborg updated this revision to Diff 77440.Nov 9 2016, 9:11 PM
clayborg edited edge metadata.
clayborg removed rL LLVM as the repository for this revision.

Fixed all requests from Adrian's inline comments.

clayborg marked 20 inline comments as done.Nov 9 2016, 9:13 PM

Addressed issues.

I agree with Adrian, this is way too big to review.

So, following test-first practice, let's talk about the DWARF-test-case-generator first. There are already 3 different ways it could go.

  • A standalone generator like you've done here.
  • A driver for LLVM's DWARF generator, as @dblaikie suggested.
  • A DWARF-speaking yaml2obj, which @beanz is proposing on cfe-dev.

The standalone generator looks like it would work well for unittests but not so much for more traditional LIT tests, unless you also want to produce a tool that can parse some textual representation and produce DWARF from it.

A driver for LLVM's DwarfDebug would get nice code re-use, but limit the test space to what DwarfDebug knows how to produce. I think it would be more robust to make sure we had confidence in a reader/dumper before starting to persuade the compiler to emit some of the more finicky bits of DWARF 5. And, I personally would want a WAY nicer input syntax than having to hand-write IR metadata (oy, what a headache).

Speaking of generating binary files from text input, isn't that exactly what yaml2obj does? I don't know whether people like it or hate it, but it does appear to have worked examples of emitting objects that get used for tests. Teaching it a convenient DWARF syntax and how to emit DWARF sections into an object file seems like a more fruitful avenue than confining the generator to unittests. (Maybe you could repurpose the code here, save a lot of typing.)

clayborg abandoned this revision.Nov 10 2016, 9:14 AM

I will break this up.