This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Basic .debug_names dumping support
ClosedPublic

Authored by labath on Jan 19 2018, 6:27 AM.

Details

Summary

This commit renames DWARFAcceleratorTable to AppleAcceleratorTable to free up
the first name as an interface for the different accelerator tables.
Then I add a DWARFDebugNames class for the dwarf5 table.

Presently, the only common functionality of the two classes is the dump()
method, because this is the only method that was necessary to implement
dwarfdump -debug-names; and because the rest of the
AppleAcceleratorTable interface does not directly transfer to the dwarf5
tables (the main reason for that is that the present interface assumes
the tables are homogeneous, but the dwarf5 tables can have different
keys associated with each entry).

I expect to make the common interface richer as I add more functionality
to the new class (and invent a way to represent it in generic way).

In terms of sharing the implementation, I found the format of the two
tables sufficiently different to frustrate any attempts to have common
parsing or dumping code, so presently the implementations share just low
level code for formatting dwarf constants.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Jan 19 2018, 6:27 AM

I would really appreciate it if someone could compare the code with the dwarf
specification, to make sure we are reading it the same way.

The other open question for me here is the difference in dwarf-dump -apple-names
and -debug-names format. I chose to use a ScopedPrinter for dumping because the
dwarf tables have more structure to them than the apple ones, and ScopedPrinter
made dumping that much easier. However, that makes the format quite different.
If you are OK with this format, and there is no requirement to preserve the
apple-names dump, I could switch the apple format to use a ScopedPrinter as a
follow-up.

The test case input was generated by my very-experimental very-wip .debug_names
generator (with some additional massaging of the assembly).

May I ask you to split out the renaming and the adding of the new functionality into separate patches? Having smaller incremental steps will make it easier for us to adapt our yet-to-be upstreamed code.

In terms of sharing the implementation, I found the format of the two
tables sufficiently different to frustrate any attempts to have common
parsing or dumping code, so presently the implementations share just low
level code for formatting dwarf constants.

Seems fair. @JDevlieghere, do you agree with Pavel's assessment?

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
26 ↗(On Diff #130602)

We probably still want some comment here?

+1 for doing the renaming separately (first). There's not a lot of it compared to the rest of the patch, but it is a bit of a distraction.

I would really appreciate it if someone could compare the code with the dwarf
specification, to make sure we are reading it the same way.

I didn’t find time for this today, but I will do so on Monday.

The other open question for me here is the difference in dwarf-dump -apple-names
and -debug-names format. I chose to use a ScopedPrinter for dumping because the
dwarf tables have more structure to them than the apple ones, and ScopedPrinter
made dumping that much easier. However, that makes the format quite different.
If you are OK with this format, and there is no requirement to preserve the
apple-names dump, I could switch the apple format to use a ScopedPrinter as a
follow-up.

I’m very much in favor for keeping both consistent. We might have some internal tests that would need to be updated which I’ll gladly take care of.

The test case input was generated by my very-experimental very-wip .debug_names
generator (with some additional massaging of the assembly).

May I ask you to split out the renaming and the adding of the new functionality into separate patches? Having smaller incremental steps will make it easier for us to adapt our yet-to-be upstreamed code.

Another +1 from my part. I think you can directly commit the rename.

Seems fair. @JDevlieghere, do you agree with Pavel's assessment?

I totally agree. I was afraid this was going to be the outcome when I read the standard last time but I wanted to remain optimistic. We can always reevaluate this in the future.

So I had a quick look and generally this looks good. I’ll do a more in depth review on Monday as promised.

JDevlieghere requested changes to this revision.Jan 22 2018, 6:05 AM
JDevlieghere added inline comments.
include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
160 ↗(On Diff #130602)

It would be useful to name things consistently with the standard where possible, particularly the fields in the header. I doubt anyone will object to changing the respective names in the Apple counterparts to keep things in sync. Here for example I'd go with BucketCount.

168 ↗(On Diff #130602)

Move the ::dump() method into the struct and then just dump them all in a ranged for loop. This way if you ever and up with an instance of this struct you can dump it, for example in the verifier.

170 ↗(On Diff #130602)

Let's call this NameIndex like in the standard?

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
363 ↗(On Diff #130602)

The next remark applies to the whole diff. While we currently only read in the data for dumping, we should split off the parsing. As DebugInfo is a library we want it to be reusable, for instance for things like the verifier (dwarfdump -verify) and looking up entries in the accelerator tables by name (dwarfdump -name).

For this particular instance, I'd parse everything into an Entry struct with a ::dump() method and have the Unit cache the result. This doesn't mean parsing everything in advance though, we should always try to be as lazy as possible and don't read data until we need it. At some point in the future we want to merge LLVM's and LLDB's implementation, so that's something to keep in mind.

503 ↗(On Diff #130602)

Move this into a dump() method in the Header struct. Could even be it's own scope imho.

518 ↗(On Diff #130602)

Building on my previous comment this would then become something like for (auto& Entry : GetEntries()) { Entry.dump(W); }

This revision now requires changes to proceed.Jan 22 2018, 6:05 AM
labath updated this revision to Diff 130886.Jan 22 2018, 6:56 AM

Rebase after commiting the rename separately

This also makes the dwarfdump output more concise by moving the "unique id" into
the block header instead of a separate entry. (E.g., Bucket {\n Index: 0\n...
becomes Bucket 0 {\n...). This also brings the output format a bit closer to
the present apple tables dump.

This does not yet incorporate the latest round of feedback from Jonas. I'll
respond to that separately.

labath updated this revision to Diff 131031.Jan 23 2018, 3:33 AM
labath marked 4 inline comments as done.

Incorporate feedback from Jonas.

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
363 ↗(On Diff #130602)

I've split off parsing from dumping as much as possible (most of the "parsing" functions are extremely simple). I started implementing Entry caching, but then I've decided against it because the memory/speed tradeoff doesn't seem right. Parsing an Entry should be a very fast process particularly if the values are fixed size, and I'm not sure if the extra memory needed to cache the entries would outweigh that. Also, neither the current llvm AppleAcceleratorTable nor the LLDB's MappedHash classes do any caching at present. I'm not outright opposed to the idea of caching, but I think this should be driven by the needs of a specific consumer.

518 ↗(On Diff #130602)

Kinda done, but I don't have a range-based for iteration. I don't think it's a good match for the dwarfdump use case, as it makes hard to convey parsing error messages to the caller, but I will consider adding it in the future for callers that don't need to distinguish between properly terminated list, and one that got cut short by a parsing error.

JDevlieghere requested changes to this revision.Jan 23 2018, 6:47 AM

Thanks Pavel, this seems to be going in the right direction!

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
235 ↗(On Diff #131031)

I'm guessing this returns the index into the hashes array, but it's hard to tell without any information other than the return type being a uint32.

238 ↗(On Diff #131031)

What does this return exactly? Is it the hash at the given index? Is it the index for that entry in the name table?

243 ↗(On Diff #131031)

Same here. If you're not relying on it being a pair maybe we can wrap it in a struct too with a sensible name like NameTableOffset with StringOffset and EntryOffset members? I think we might be able to reuse this for what a hash points at (based on Figure 6.1 in the standard)

266 ↗(On Diff #131031)

NameIndices?

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
395 ↗(On Diff #131031)

I was a little confused why you were copying the forms here, especially with the two lists called Indices. I see it's to create DWARFFormValues which are then set in the parser. Makes sense but maybe add a little comment here or in the struct.

513 ↗(On Diff #131031)

Can we split this too, like getCUOffsets() and getTUOffsets(), getForeignTUOffsets(), etc?

363 ↗(On Diff #130602)

Agree, it's probably not worth it here. I mentioned it because it's part of the general approach we take in the DebugInfo lib, but like you said it should be driven by the need of a consumer. Thanks for looking into it!

This revision now requires changes to proceed.Jan 23 2018, 6:47 AM

Thanks! I have a couple of stylistic suggestions inline.

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
27 ↗(On Diff #131031)

Could you add a Doxygen comment explaining the purpose of this class please?

176 ↗(On Diff #131031)

Would a struct with named members make the code more readable than a std::pair?

183 ↗(On Diff #131031)

Could you please add some one-liner Doxygen comments explaining what these inner structs and classes are for?

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
25 ↗(On Diff #131031)

Would it make more sense for these functions to take a raw_ostream as first argument instead of creating a temporary std::string?

440 ↗(On Diff #131031)

A pair of two uint32_t looks dangerous to my eyes. Perhaps define a struct with two named members to avoid accidental confusion?

labath updated this revision to Diff 131081.Jan 23 2018, 8:18 AM
labath marked 7 inline comments as done.

Apply suggestions from Jonas.

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
235 ↗(On Diff #131031)

It's an index into all three arrays (hashes, string offsets, entry offsets). I have put more information in the comment, and I have made the function names more descriptive (getBucketArrayEntry). I'm not sure if that makes things better...

243 ↗(On Diff #131031)

I've called the new struct NameTableEntry (so this function becomes: getNameTableEntry(Index))

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
395 ↗(On Diff #131031)

I've added some comments, and I've renamed the two arrays (one is now Attributes and the other Values).

363 ↗(On Diff #130602)

The biggest difference I see between this and random other debug info section is that this one is specifically designed to enable easy random access, whereas for e.g. .debug_frame you have to evaluate a state machine to get a reasonable representation of the data.

Looking at how we currently query the accelerator tables for names, I wonder if we'd be able to share the ValueIterator so that we can move the equal_range method into the base class. At first sight it looks feasible as most of the work is done in the equal_range function itself, which can potentially be different between the two tables. Pavel, what do you think?

labath marked 14 inline comments as done.Jan 24 2018, 2:55 AM

Looking at how we currently query the accelerator tables for names, I wonder if we'd be able to share the ValueIterator so that we can move the equal_range method into the base class. At first sight it looks feasible as most of the work is done in the equal_range function itself, which can potentially be different between the two tables. Pavel, what do you think?

I was hoping to leave that for a separate patch. The main reason the ValueIterator class cannot be used out-of-the-box is because it assumes the accelerator table is homogeneous, but in the dwarf5 case, each abbreviation can describe a different set of index attributes. So we will need to come up with a more generic iterator for the dwarf tables (and that one can then be used for the apple tables as well). (And even the tiny bit of parsing that happens in the iterator (Next() function) is incorrect for dwarf5).

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
25 ↗(On Diff #131031)

I did it to streamline the usage (just << everything into OS). But I guess the same thing can be achieved by having formatXXX return a struct and defining an operator<< for that struct.

labath updated this revision to Diff 131214.Jan 24 2018, 2:56 AM
labath marked an inline comment as done.

Respond to comments from Adrian (which were made concurrently with my last patch).

I was hoping to leave that for a separate patch.

Doing this in a separate patch sounds like the right thing to do.

The main reason the ValueIterator class cannot be used out-of-the-box is because it assumes the accelerator table is homogeneous, but in the dwarf5 case, each abbreviation can describe a different set of index attributes. So we will need to come up with a more generic iterator for the dwarf tables (and that one can then be used for the apple tables as well). (And even the tiny bit of parsing that happens in the iterator (Next() function) is incorrect for dwarf5).

Alright, I have some questions but maybe it's better so save this discussion for the other patch :-)

A few more comments, nothing major I think.

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
190 ↗(On Diff #131214)

Nit: I saw a few patches were these style of comments were being replaced with comments before the variable. Not sure if this is a global change in coding style but personally I prefer that aesthetically, especially if the comments grows in the future. Feel free to ignore if you disagree!

266 ↗(On Diff #131214)

Without a way to access the number of CUs in the header this function doesn't provide a very useful public interface. What do you think about returning a list (this way you can still index) and we don't need to expose the internals of the header and whatnot?

274 ↗(On Diff #131214)

Reads an entry in the Bucket Array for the given Bucket?

279 ↗(On Diff #131214)

Reads an entry in the Hash Array for the given Index?

286 ↗(On Diff #131214)

Should this be an expected too?

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
345 ↗(On Diff #131214)

If this is only used for parsing AttributeEncodings, could we maybe make it a static member function that returns an Expected<AttributeEncoding>? That would allow us to move up the additional checks you do below in the while loop.

517 ↗(On Diff #131214)

This has been bothering me since the first revision. I understand your motivation about doing a best effort in case something is off. I wonder if it's worth the trade off if we implement a check for this in the verifier.

labath marked 2 inline comments as done.Jan 24 2018, 7:50 AM
labath added inline comments.
include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
190 ↗(On Diff #131214)

I like the short one more when it fits on a line, but I would definitely change it if the comment does not fit on a line anymore.

266 ↗(On Diff #131214)

Right now this *isn't* a public interface (it's private within NameIndex and DwarfDebugNames doesn't even allow you to obtain the NameIndex objects it holds).

I am not sure how I could return a list here. The values themselves need to be accessed through DWARFDataExtractor::getRelocatedValue so none of the existing containers would work without materializing the elements somewhere. For a public interface (when we add one) I would consider writing a tiny pseudo-iterator that delegates to this method, but that's something I think we need for a private interface.

Of course always delegating to getRelocatedValue may turn out to be very slow actually (I have not looked at the complexity of that function), but I am trying to make this class copy-free because of the lldb use-case. In lldb we will be generally processing a fully linked shared object, the getRelocatedValue will be a no-op and we *should* be able to index these arrays directly without any intermediate copies (that's how lldb's MappedHash works for apple tables). Of course, this assumes I will manage to plug in this class into lldb.

286 ↗(On Diff #131214)

The extract() function verifies that the section contains enough data for all of the fixed-size tables (which reminds me to add asserts to this function to verify they only access in-range values). It is up to the caller to verify that e.g. the returned String Offset points to a valid string in the string table.

So, no Expected is necessary, as this cannot fail.

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
345 ↗(On Diff #131214)

I stared at this for about an hour, and I couldn't figure out a way to make it look nice (the issue being that the function would need to give three different kinds of result -- valid attribute encoding, parse error, sentinel. SentinelError would do the trick but it made the caller extremely ugly). I eventually settled for something else: extractAttributeEncoding*s*, which returns a parsed vector of encodings or an error.

517 ↗(On Diff #131214)

I don't think there any tradeoff here. The result of the dumpEntry function is basically "are there more entries after this one", and we keep dumping until we reach the last entry or encounter a parse error.

(Unlike the apple tables, the dwarf table does not give us the count of entries up-front but instead relies on a terminator entry to denote end of list.)

labath updated this revision to Diff 131269.Jan 24 2018, 7:50 AM

One more batch of small tweaks.

JDevlieghere added inline comments.Jan 25 2018, 2:21 AM
include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
266 ↗(On Diff #131214)

Sounds fair, thanks for the explanation!

286 ↗(On Diff #131214)

Alright. Adding asserts is fine as long as they only detect misuse of the API. If they can trigger for invalid input we should have a different mechanism for notifying the caller.

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
345 ↗(On Diff #131214)

One approach could be to create a static (constexpr) AttributeEncoding for the sentinel, then comparing against it to see if we've reached the end. I don't know if that helps with your original problem? I think it might still be useful here as it clearly conveys what you're doing, i.e. comparing against the sentinel.

517 ↗(On Diff #131214)

The alternative which I was thinking of (but clearly forgot to write down in the comment, sorry!) was to read all the entries into a list and then have the dump method print them one by one. The trade-off being that you read all entries in advance, rather than one at a time. If you do this with an Expcected<vector> you lose the ability to display the valid entries up to the error, and if you do a vector<Expected> you increase the memory footprint while you know all elements will be fine expect for the last one.

labath marked 13 inline comments as done.Jan 25 2018, 4:00 AM
labath added inline comments.
include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
286 ↗(On Diff #131214)

I'ts an internal API, and it's up to the caller to make sure it only accesses compilations units up to CompUnitCount.

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
345 ↗(On Diff #131214)

I liked that idea, and I wanted to use it for the Abbrev sentinel as well, but creating an constexpr Abbrev is impossible. So, I modified this idea to use a factory function for the sentinels (and the AttributeEncoding factory is constexpr).

517 ↗(On Diff #131214)

Ok, I see what you meant now.

I don't think it's worth doing anything about this now. Maybe later when we have a better iterating mechanism for external users (they would probably also want to know that the list was cut short due to a parse error), we can change this to use that.

labath updated this revision to Diff 131412.Jan 25 2018, 4:00 AM
labath marked an inline comment as done.

use sentinel values to report list ends

JDevlieghere accepted this revision.Jan 25 2018, 6:55 AM

I'm happy with the current state of this patch. Unless anyone objects I think it is ready to go in, so LGTM. Thanks Pavel!

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
286 ↗(On Diff #131214)

Fair enough

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
345 ↗(On Diff #131214)

Right, I got a deja-vu feeling while writing that comment, because I wanted to suggest it too for the Abbrevs but indeed, the vector makes that impossible. I'm happy with the factory.

517 ↗(On Diff #131214)

Sound reasonable

This revision is now accepted and ready to land.Jan 25 2018, 6:55 AM
This revision was automatically updated to reflect the committed changes.
labath added inline comments.Jan 29 2018, 6:02 AM
llvm/trunk/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
311

I fixed an error here in a follow-up (r323648), which I noticed when some buildbots started failing:
Originally Header, contained only the fixed-size part, but during review it grew into something more. So, I split the fixed-size part into a separate struct to enable the sizeof() expression here to work.

Let me know if you want me to handle this differently.