This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] DWARF v5: Rework of string offsets table reader
ClosedPublic

Authored by wolfgangp on Dec 12 2017, 5:22 PM.

Details

Summary

The current consumer side implementation of DWARF v5 string offsets tables is
flawed. According to the DWARF standard, contributions to the string offsets table
have their own format - either DWARF32 or DWARF64 - independent of the contributing
unit's (CU or TU) format. The current implementation incorrectly derives the string
offsets table's format from that of its contributing unit.

This patch corrects this problem in the following way: In full compilation units, full type units and
skeleton units we obtain the start of the unit's contribution from the DW_AT_str_offsets_base
attribute as before. Since the standard mandates that a contribution header has to
immediately precede this location, we attempt to detect a well-formed contribution header at
either 16 (DWARF64) or 8 (DWARF32) bytes before it. This establishes the format and length
of the unit's string offsets table contribution and enables us to validate the length
as well as any accesses through the table. Each unit's contribution is captured in a
descriptor that holds all the relevant information about the contribution. If parsing of
the header failed, the descriptor is left in an error state, which prevents extraction
of strings by any consumers. llvm-dwarfdump reports errors based on the state of the
descriptor.

llvm-dwarfdump can now also report gaps in the string offsets table as well as overlapping
contributions because dumping the table is driven by the existing units and their str_offsets_base
attributes and is no longer a simple scan as before.

For split CUs and TUs, the standard says that the DW_AT_str_offsets_table attribute is
not used. The current implementation mistakenly honors it. The patch changes the
implementation to ignore the attribute instead. Additionally, since the standard
mandates that there is only one CU (but possibly multiple TUs) in a .dwo file,
the implementation now assumes that there is a single contribution to the string offsets
table in the .debug_str_offsets.dwo section with a DWARF v5 header at offset 0. It attempts
to derive the format from the header. It also assumes that the CU and all TUs in the .dwo file
share this single contribution. The DWARF5 standard is not completely clear on this
last point but it is the only consistent interpretation I can think of.

In a .dwp file, individual contributions to the string offsets table are identified
by the index tables. There is only one CU per contributing .dwo file, so we can identify
pre-DWARF5 units and correctly parse their contributions to the string offsets table,
which is just a simple array of string offsets without a header.

The tests have been modified to add tests for

  1. Units whose string offsets tables have a different format.
  2. Recognition of gaps in the table
  3. Mixing of DWARF v5 and existing GNU split dwarf units in a dwp file
  4. Detection of overlapping contributions to the string offsets table (error)

Diff Detail

Repository
rL LLVM

Event Timeline

wolfgangp created this revision.Dec 12 2017, 5:22 PM
aprantl added inline comments.Dec 12 2017, 5:46 PM
include/llvm/DebugInfo/DWARF/DWARFUnit.h
173 ↗(On Diff #126656)

/// and on line above, so it gets picked up by Doxygen?

lib/DebugInfo/DWARF/DWARFUnit.cpp
570 ↗(On Diff #126656)

The way that the Valid flag is set as a side-efffect here is a bit difficult to follow. Could it perhaps be a return value of parseDWARF64StringOffsetsTableHeader?

wolfgangp added inline comments.Dec 13 2017, 10:41 AM
lib/DebugInfo/DWARF/DWARFUnit.cpp
570 ↗(On Diff #126656)

Sure. I'll bury the resetting of the valid bit inside the parseDWARF* functions as well to localize it a bit more.

JDevlieghere added inline comments.Dec 14 2017, 8:14 AM
lib/DebugInfo/DWARF/DWARFContext.cpp
97 ↗(On Diff #126656)

How about just returning the vector by value and relying on copy elision? This signature initially gave me the impression that we might have several calls to this function for the same ContributionCollection.

149 ↗(On Diff #126656)

I'd add a comment with the stuff you wrote in the patch description here to explain why you're subtracting the 8/16 bytes for DWARFv5 and later.

lib/DebugInfo/DWARF/DWARFUnit.cpp
80 ↗(On Diff #126656)

I think it would be nice to have this return an Optional<int64_t> rather than this bool/ref combo, but obviously that doesn't have to be part of this patch.

487 ↗(On Diff #126656)

What do you think about making this a member function of StrOffsetsContributionDescriptor?

571 ↗(On Diff #126656)

Why do you need to set the value to true here? If I understand correctly the flag is set to false when calling the parseDWARF64StringOffsetsTableHeader and I don't immediately see something that relies on it.

wolfgangp updated this revision to Diff 127049.Dec 14 2017, 5:33 PM
wolfgangp marked 7 inline comments as done.

Addressed review comments. Streamlined the parseDWARF??StringOffsetsTableHeader() functions a bit and implemented a couple of other suggested changes.

lib/DebugInfo/DWARF/DWARFContext.cpp
97 ↗(On Diff #126656)

Hmm, OK. I'm a bit old school on returning larger things from functions, but this is cleaner.

lib/DebugInfo/DWARF/DWARFUnit.cpp
80 ↗(On Diff #126656)

Right. I think this code originally predates the wider use of Optional<>. Refactoring would seem like a good idea, but not in this patch.

487 ↗(On Diff #126656)

Makes sense.

570 ↗(On Diff #126656)

Streamlined this a bit. The parse* routines now expect an invalid descriptor in all cases and return the validity. the parseDWARF32* routine technically doesn't need to do that since its return value is never examined, but it seems more consistent that way.

571 ↗(On Diff #126656)

Right, it wasn't necessary. This should look a bit more consistent now.

Forgot to use std::move() on return of a vector.

aprantl added inline comments.Dec 15 2017, 12:38 PM
lib/DebugInfo/DWARF/DWARFContext.cpp
181 ↗(On Diff #127169)

std::numeric_limits<uint32_t>::max() ? (not sure if this is actually better)

lib/DebugInfo/DWARF/DWARFUnit.cpp
494 ↗(On Diff #127169)

I think I would prefer a bitwise not ~ over - here.

494 ↗(On Diff #127169)

Do we need to check for overflow here? Otherwise we'll get a fuzzer bugreport incoming soon.

519 ↗(On Diff #127169)

It still makes me uneasy that Valid is only set on one path. (Ok, with the assert that makes more sense now).
Would it be even better for this function to return an Expected/Optional<StrOffsetsContributionDescriptor> instead of having an explicit bool Valid member? This would make it less likely that we accidentally operate on an invalid record.

551 ↗(On Diff #127169)

Do we need to guard against underflow here?

JDevlieghere added inline comments.Dec 16 2017, 4:00 AM
lib/DebugInfo/DWARF/DWARFContext.cpp
119 ↗(On Diff #127169)

Moving the ContributionCollection will actually prevent RVO (the compiler should warn you about this).

wolfgangp marked 5 inline comments as done.

Refactored the string offsets contribution descriptor using Optional<>, which removes the need for a "valid" bit. Addressed some other review comments.

lib/DebugInfo/DWARF/DWARFContext.cpp
119 ↗(On Diff #127169)

Looking at the gcc-generated code it doesn't seem to make a difference (probably due to inlining) but yes.

181 ↗(On Diff #127169)

Looks better anyway.

lib/DebugInfo/DWARF/DWARFUnit.cpp
494 ↗(On Diff #127169)

This is a real edge case (size > 0xfffffffffffffff8), but fuzzer will probably find it.

519 ↗(On Diff #127169)

I refactored this using Optional<>. You're right, the valid bit looks so last century...

Found a few more nitpicks, but otherwise I'm happy now. Thanks!

include/llvm/DebugInfo/DWARF/DWARFUnit.h
211 ↗(On Diff #127764)

s/It is initially invalid.//

lib/DebugInfo/DWARF/DWARFContext.cpp
113 ↗(On Diff #127764)

This is clever, but perhaps expanding it makes it more readable?

122 ↗(On Diff #127764)

Is this intentionally different to the previous comparator? To the casual reader (me) it's non-obvious why. Some comments or more verbose code might help.

lib/DebugInfo/DWARF/DWARFUnit.cpp
494 ↗(On Diff #127169)

And now with the new operator, this looks really like llvm::alignAddr() :-)

522 ↗(On Diff #127764)

LLVM style wants to remove extra braces

526 ↗(On Diff #127764)

same here

wolfgangp updated this revision to Diff 127790.Dec 20 2017, 3:29 PM
wolfgangp marked 5 inline comments as done.

Addressed more review comments: Made the logic on how to sort and unique string offsets table contributions a bit more readable when getting ready for dumping.

lib/DebugInfo/DWARF/DWARFContext.cpp
113 ↗(On Diff #127764)

Let me know if this is clear enough. There are several ways to write this.

122 ↗(On Diff #127764)

Let me know if this is clearer. Basically, I want to only remove duplicates when they are valid contributions. The dumper quits on the first invalid one anyway and so uniquing invalid contributions seems superfluous.

lib/DebugInfo/DWARF/DWARFUnit.cpp
494 ↗(On Diff #127169)

Didn't realize that existed...

We are aligning a uint64_t, so it's got to be its cousin alignTo().

thanks! I'll defer to Jonas to accept this.

lib/DebugInfo/DWARF/DWARFContext.cpp
113 ↗(On Diff #127790)

Not sure if this matters, but won't this be nondeterministic when both L and R are invalid?

128 ↗(On Diff #127790)

see above

probinson added inline comments.Dec 20 2017, 3:59 PM
lib/DebugInfo/DWARF/DWARFContext.cpp
113 ↗(On Diff #127790)

I think the most compact code would be

if (L && R) return L->base < R->base;
return R;

if both are valid, order depends on the base.
if L is invalid and R valid, L sorts before R.
if L is valid and R invalid, R sorts before L.
if both are invalid, they are equivalent (test(R,L) == test(L,R)).

wolfgangp added inline comments.Dec 20 2017, 5:38 PM
lib/DebugInfo/DWARF/DWARFContext.cpp
113 ↗(On Diff #127790)

Not sure if this matters, but won't this be nondeterministic when both L and R are invalid?

That's true. Several invalid contributions in the vector will probably end up in different locations when the initial order is different, but all that matters is that all the invalid ones end up at the beginning. It would only matter if we tried to be more precise about how and why they are invalid (bad start vs. invalid length etc.), but this patch doesn't try to do that. It's not clear to me if it would be worth it.

113 ↗(On Diff #127790)

I think the most compact code would be

Sleek! I can put that in, if you guys prefer.

JDevlieghere accepted this revision.Dec 21 2017, 3:05 AM

Thanks, Wolfgang. With these few nits addressed this LGTM.

lib/DebugInfo/DWARF/DWARFContext.cpp
113 ↗(On Diff #127790)

I like Paul's suggestion.

lib/DebugInfo/DWARF/DWARFUnit.cpp
512 ↗(On Diff #127790)

return StrOffsetsContributionDescriptor(Offset, Size, Version, DWARF64);

527 ↗(On Diff #127790)

return StrOffsetsContributionDescriptor(Offset, ContributionSize, Version, DWARF32);

567 ↗(On Diff #127790)

return StrOffsetsContributionDescriptor(Offset, Size, 4, DWARF32);

This revision is now accepted and ready to land.Dec 21 2017, 3:05 AM
This revision was automatically updated to reflect the committed changes.