This is an archive of the discontinued LLVM Phabricator instance.

Add the ability to iterate across all attributes in a DIE.
ClosedPublic

Authored by clayborg on Jan 5 2017, 4:29 PM.

Details

Summary

In order for LLDB to be able to use the LLVM DWARF parser, it needs to be able to iterate across all attributes in a DWARFDie. This change adds a DWARFDie::attributes() function that allows easy iteration:

for (auto &AttrValue : CUDie.attributes()) {
  switch (AttrValue.Attr) {
  case DW_AT_name:
    if (auto Name = AttrValue.Value.getAsCString())
      ...
    break;
  case DW_AT_declaration:
    if (auto Declaration = AttrValue.Value.getAsUnsignedConstant())
      ...
    break;
  case DW_AT_low_pc:
    if (auto LowPC = AttrValue.Value.getAsAddress())
      ...
    break;
  default:
    break;
  }
}

Diff Detail

Repository
rL LLVM

Event Timeline

clayborg updated this revision to Diff 83322.Jan 5 2017, 4:29 PM
clayborg retitled this revision from to Add the ability to iterate across all attributes in a DIE..
clayborg updated this object.
dblaikie added inline comments.Jan 5 2017, 4:57 PM
lib/DebugInfo/DWARF/DWARFDie.cpp
448–449 ↗(On Diff #83322)

Either roll the declaration into the if condition, or - more likely - invert the condition and return early to reduce indentation:

if (!AbbrDecl)
  return;
455 ↗(On Diff #83322)

Does the existing dumping code do this getULEB128Size thing? That seems like something that'd be nice to avoid (not that it's costly - it just strikes me as a bit strange to have to compute the size of the ULEB after we've parsed it to figure out which bytes to read again (are ULEBs guaranteed to be the minimal/same size for the same value? Or could DWARF have a large ULEB encoding for a small value?))

463–464 ↗(On Diff #83322)

I think we should probably error on this early, and not check it repeatedly? eg: perhaps in the begin() function we could cehck if there's an abbrev, and if not - produce an iterator that will compare == to end immediately (eg: make sure begin/end produce index 0, and ensure no work/nothing that needs the abbrdecl is done).

Any thoughts on how error handling should be done when using this API? We could have "attributes()" return ErrorOr<range>? But I suppose other errors can occur while iterating. Lang's played with some error handling during iteration for the libObject APIs - having iterators that can describe failure, but not sure how that's looking/whether there's a good idiom yet.

464 ↗(On Diff #83322)

Early return?

478 ↗(On Diff #83322)

If you return immediately from here - then you won't need the "Success" variable - and the cleanup at the end of the function can be unconditional (since the only way to get there will be if !success)

unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
1225 ↗(On Diff #83322)

Looks like this change doesn't affect the parser - could we test it without roundtripping through DWARF, instead building the data structures directly (the same way the parser would) and then querying them? (like the unit test for, say, SmallVector or other ADTs)

probinson added inline comments.Jan 6 2017, 10:47 AM
lib/DebugInfo/DWARF/DWARFDie.cpp
455 ↗(On Diff #83322)

Re. minimal LEB encoding, the "syntax" of the encoding doesn't require that it is minimal (a too-long encoded value will still decode correctly) but DWARF section 7.6 specifies that the encoded value is minimal.

probinson added inline comments.Jan 6 2017, 11:06 AM
lib/DebugInfo/DWARF/DWARFDie.cpp
455 ↗(On Diff #83322)

Not that minimalist encoding matters here. It looks like the iterator has to keep track of the offset-within-DIE for each attribute's value as it steps through the attributes, because the caller won't necessarily extract the value for each attribute.

clayborg updated this revision to Diff 83643.Jan 9 2017, 9:52 AM

Fixed inlined comment requests.

clayborg marked 3 inline comments as done.Jan 9 2017, 9:53 AM
clayborg added inline comments.
lib/DebugInfo/DWARF/DWARFDie.cpp
455 ↗(On Diff #83322)

I forgot I had already added a "uint8_t CodeByteSize" to DWARFAbbreviationDeclaration. I will add an accessor and use it.

464 ↗(On Diff #83322)

My main goal for this API is performance and I am not sure if error checking will slow us down too much. LLDB grabs all attributes from many DIE and we know the DWARF parser is the slowest part of our debugger at the moment.

I did think about this when I was making this. I though about returning a Optional<DWARFAttribute> for each attribute, but then the code to consume it becomes a bit more messy when almost 100% of the time things are fine. So I opted to keep performance as much as possible and just have the iterator short out early if things go wrong.

When thinking about performance, I was trying to think if I should make DWARFAttribute actually lazily fetch the DWARFFormValue. There are many attributes that we just don't fetch, but then that makes the attribute consuming code a bit more involved.

So I landed on the current approach: if iteration succeeds, all is good and the values are there and ready. No need to check the attribute value with error checking, and then check if the DWARFFormValue is good, we just use the DWARFAttribute class and if iteration succeeds we are good to go.

478 ↗(On Diff #83322)

Can do. This function was refactored around during my initial coding and now that DWARFDie::attribute_iterator::extractValueForIndex() is its own function, we can easily do this.

unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
1225 ↗(On Diff #83322)

We could do it that way as the DWARFGenerator actually does make up DIE classes with compile units. But it seems inconsistent with the other testing done here and not sure it is worth the time making sure we can build a DWARFUnit and DIEs manually and make sure they are ingestible from a parser standpoint.

dblaikie added inline comments.Jan 9 2017, 10:52 AM
include/llvm/DebugInfo/DWARF/DWARFAbbreviationDeclaration.h
62–64 ↗(On Diff #83643)

Assert that the index is valid, rather than conditionalizing? Or is this an error case (Optional<Attribute> or ErrorOr<Attribute> might be more suitable then) that callers are expecting to handle/look for?

lib/DebugInfo/DWARF/DWARFDie.cpp
464 ↗(On Diff #83322)

It seems a bit problematic to have no error handling here. If there are performance numbers we can compare & consider different error handling schemes to find options with acceptable performance, that'd be great - but without that I don't think we can/should justify a design that silently fails like this.

unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
1267–1300 ↗(On Diff #83643)

This loop+switch based testing seems awkward - could it test linearly/directly each element in the range? (we should reasonably expect them to be in a well defined order, without any extra elements)

1225 ↗(On Diff #83322)

Seems like it should be, hopefully, strictly simpler to create the objects directly rather than generate and parse dwarf.

(arguably this is adding missing coverage to the libDebugInfo - the attribute parsing code is probably only tested by dwarfdump right now, so adding coverage here's probably good-ish, though)

clayborg updated this revision to Diff 83698.Jan 9 2017, 2:21 PM

Added optional error handling to the attributes iterators. You can now pass an "llvm::Error *" to the attributes():

attribute_iterator DWARFDie::attributes(llvm::Error *Err);

If the error is non-NULL, then the error will be filled in. This iteration error is the same method used in Archive.h/Archive.cpp after I spoke with Lang Hames.

clayborg marked 3 inline comments as done.Jan 9 2017, 2:23 PM

Marked things as done.

unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
1271 ↗(On Diff #83698)

This might be awkward, but it is how users will use it and this verifies things come out in order. I would rather test this like users will test it rather than use the range manually since this tests everything that the iterator needs.

dblaikie added inline comments.Jan 9 2017, 2:38 PM
unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
1271 ↗(On Diff #83698)

It'll test the same functionality - the range-for loop has a pretty narrow interaction with a range & users may use it in any number of ways - the important thing is the contract that it returns a range that is usable as a range (begin, end, dereference, etc):

auto R = CUDie.attributes(Err);
auto I = R.begin();
auto E = R.end();

ASSERT_NE(E, I);
EXPECT_EQ(DW_AT_name, I->Attr);
EXPECT_EQ(CUPath, I->Value.getAsCString()); // change CUPath to a StringRef type for this to work correctly, I expect.

ASSERT_NE(E, ++I);
EXPECT_EQ(DW_AT_declaration, I->Attr);
EXPECT_EQ(1ull, I->Value.getAsUnsignedConstant());

ASSERT_NE(E, ++I);
EXPECT_EQ(DW_AT_low_pc, I->Attr);
EXPECT_EQ(CULowPC, I->Value.getAsAddress());

EXPECT_EQ(E, ++I);
1302 ↗(On Diff #83698)

Could you use Chris Bieneman's work on yaml2obj to test error cases of this API? (at least one error case)

clayborg updated this revision to Diff 83735.Jan 9 2017, 4:28 PM

Removed error handling as we consider this an invariant currently.
Updated test case to not use a switch statement with index.

dblaikie added inline comments.Jan 10 2017, 8:02 AM
lib/DebugInfo/DWARF/DWARFDie.cpp
449 ↗(On Diff #83735)

Is it valid to call this in a situation in which 'getAbbreviationDeclarationPtr' would return a failure value?

If so, is there a test case for it? If not, change this code to assert (& change other code in this class, such as op++, to access the value unconditionally (you can include similar assertions in those other use cases - but I'd probably skip it & have only an assert in the ctor here to establish/document the invariant))

462 ↗(On Diff #83735)

Remove dead code. (remove this function if it's never meant to be called)

474–477 ↗(On Diff #83735)

Remove this code and assert (probably assert in getAttrByIndex that the index is valid - such that there's no "invalid" return value that callers should check for)

483–487 ↗(On Diff #83735)

remove the conditional and assert:

bool b = AttrValue.Value.extractValue(...);
(void)b;
assert(b && "extractValue cannot fail on fully parsed DWARF");

Sink this into the extractValue function, if possible. (if the function can't fail, why does it have a bool return value? Looks like two out of the 3 false returns are if you pass a null Unit when a unit is required, so we could remove those by having an API that takes a Unit* that can be null and a matching function that takes a Unit& that cannot be (if you call the latter, those failures can't occur) - the 3rd is the default in the switch. Should that be an assertion? (that callers cannot call this with a form that the function cannot handle?))

489–496 ↗(On Diff #83735)

Replace this with:

} else {
  assert(Index == NumAttrs && "Indexes should be [0, NumAttrs) only");
  AttrValue.clear();
}
500–501 ↗(On Diff #83735)

Roll the declaration into the condition:

if (auto AbbrDecl = Die.getAbbreviationDeclarationPtr())
lib/DebugInfo/DWARF/DWARFFormValue.cpp
401 ↗(On Diff #83735)

use llvm_unreachable, rather than assert(false)

402 ↗(On Diff #83735)

Remove code after unreachable.

unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp
1217 ↗(On Diff #83735)

Remove commented out code

1266 ↗(On Diff #83735)

Remove commented out code

1274–1276 ↗(On Diff #83735)

No need for the ASSERT_TRUE((bool)ActualCUPath) - Optional will assert if op* is called on an unset value.

1276 ↗(On Diff #83735)

Is the StringRef cast needed here? I'd expect StringRef==const char* to be valid?

clayborg updated this revision to Diff 83835.Jan 10 2017, 11:32 AM

Fixed dblaikie's issues.

clayborg marked 5 inline comments as done.Jan 10 2017, 11:34 AM
clayborg added inline comments.
lib/DebugInfo/DWARF/DWARFFormValue.cpp
401 ↗(On Diff #83835)

This is not unreachable. Newer DWARF might be passed to this at some point and it needs to fail correctly and just not parse the DIE which will cause all DIEs in a DWARFUnit to not be parsed or returned.

dblaikie added inline comments.Jan 10 2017, 11:40 AM
lib/DebugInfo/DWARF/DWARFFormValue.cpp
401 ↗(On Diff #83835)

Then there cannot be an assert(false) here. There must be a test case demonstrating that this situation is correctly handled (with a hardcoded unknown form).

But shouldn't the failure have surfaced earlier - never reaching the code that would attempt to parse the value in the DIE? The attempt to walk the DIE would fail because the size could not be determined for an unknown form.

clayborg added inline comments.Jan 12 2017, 10:55 AM
lib/DebugInfo/DWARF/DWARFFormValue.cpp
401 ↗(On Diff #83835)

I can change this to use llvm_unreachable this and make a test that a bad form fails to parse the DIEs for a compile unit. This would be caught in DWARFFormValue::skipValue() which calls through to skipFormValue in DWARFFormValue.cpp.

clayborg updated this revision to Diff 84188.Jan 12 2017, 4:12 PM

Updated with suggestions. I can't make an invalid form test yet until we can use the yam2obj.

dblaikie accepted this revision.Jan 12 2017, 4:18 PM
dblaikie edited edge metadata.
dblaikie added inline comments.
lib/DebugInfo/DWARF/DWARFFormValue.cpp
406 ↗(On Diff #84188)

Remove dead code after unreachable

This revision is now accepted and ready to land.Jan 12 2017, 4:18 PM
clayborg updated this revision to Diff 84189.Jan 12 2017, 4:22 PM
clayborg edited edge metadata.

Remove code after llvm_unreachable

This revision was automatically updated to reflect the committed changes.