This is an archive of the discontinued LLVM Phabricator instance.

DWARFDebugLoc: Add a function to get the address range of an entry
AbandonedPublic

Authored by labath on Oct 1 2019, 7:21 AM.

Details

Summary

Interpreting a .debug_loclists entry is not completely trivial [citation
needed]. This patch creates a function which can be used by any
libDebugInfo user (thinking of LLDB mainly) to get the range of an
entry.

The debug_loclists parser already contained a partial implementation of
that in the dump function. This implementation is replaced by a call to
the new "getRange" function, and it falls back to printing of raw data
in case we fail to get the address range.

Because LLDB is not fully converted to llvm's debug info parser, I
provide two getRange signatures: one takes a DWARFUnit*, which is used
to resolve .debug_addr references; and one which delegates this job to a
user-supplied callback.

I add a more thorough test of debug_loclists dumping capabilities. In
writing this test, I discovered that we're not able to handle
relocations in the debug_loclists section. However, fixing this was not
completely straight-forward, so I left a TODO, and will address that in
a separate patch.

Diff Detail

Event Timeline

labath created this revision.Oct 1 2019, 7:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2019, 7:21 AM
Herald added a subscriber: aprantl. · View Herald Transcript
dblaikie added inline comments.Oct 1 2019, 8:04 AM
include/llvm/DebugInfo/DWARF/DWARFDebugLoc.h
89–104

I think using an error to express base address selection entries probably is a bit much.

I think this API would be more suited to a higher level abstraction over the whole list, rather than one entry in it. (same way we do with ranges - that API has been more fleshed out because of the presence of more users (symbolizers, etc) that want to abstract over the different representations (v4, split, v5, v5-split) & I think shows a fairly good direction this should go in too)

One possible caveat: it might make more sense to provide either a lazy iterator or a callback for entries, rather than (as the ranges API does currently) a full in-memory vector of the computed/finalized ranges, for efficiency's sake.

Also - would be super great if we could generalize both the range and loclist printing to use some common infrastructure for printing both the half-open range in non-verbose form and the verbose printing with underlying forms including RLE/LLE encodings, old-style base address selection entries (in v4), and also being able to print the section details (like we do for ranges when they're printed in the debug_info section, but we don't do it when they're printed in the actual debug_ranges/rnglists section... ). I realize this is a bit of a big feature request, but figured I'd mention it in case it helps inform your design direction/makes sense to address together, etc. (for instance, it probably means that a pair of uint64_t is insufficient, since we'll want to carry SectionIndex too - and maybe the start/end/section triple could be the same data structure (with the same printing support) as used for ranges, nested inside the data structure that includes the expression and can print that too, etc)

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
284–286

LLVM style suggests avoiding "else after return" ( https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return )

labath planned changes to this revision.Oct 2 2019, 5:56 AM
labath marked an inline comment as done.

Thanks for the quick feedback. I didn't realize that the range list code handles some of this stuff already (I didn't look at it -- I guess I should've). I'll try to play around with this a bit, and then figure out what to do.

Regarding the iterator api: it has occurred to me that such an api would be more suitable (in fact, lldb's current api already has that), but since the location lists are generally small, I did not want to make a big deal out of it. But now that you mention it, I'll definitely consider it...

labath updated this revision to Diff 223193.Oct 4 2019, 5:22 AM

Upload a new version of the patch.

This isn't fully ready for submission, but I am putting it up anyway, to get
some feedback on the direction I am taking this, and ask some questions.

First I tried to do a complete rewrite of the loclists class in a manner similar
to the rnglists parser, but then I ran into the problem called .debug_loc.dwo
(v4 extension vaguely similar to DWARF5 loclists). Right now, it is possible to
share the parsing code between this format and .debug_loclists. That would be
pretty tricky to do with the rnglists approach.

So, instead I went for a bottom-up approach and tried to rewrite/reuse/make
similar the lower level classes, which can be shared more easily with the
rnglists stuff. This patch creates a DWARFLocation class, which is based on the
existing DWARFAddressRange class. The next step would be (or maybe I'll land it
before this patch) a LocationListEntry class akin to the existing
RangeListEntry.

labath marked 8 inline comments as done.Oct 4 2019, 5:45 AM
labath added inline comments.
include/llvm/DebugInfo/DWARF/DWARFDebugLoc.h
91

I went for an iterator-like approach (instead of a callback or direct materialization) because it's easier to use. In particular it makes it possible to reuse this stuff in the dumping code, which would have been pretty hard with callbacks.

138

Using a callback is consistent with what rnglists code does. This uses a std::function instead of a function_ref because it's easier for iterators to escape out of scopes. However, I've wondered if we shouldn't define an AddrOffsetResolver interface that llvm, lldb DWARFUnits and anyone else who wishes so (unit tests ?) can implement.

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
164

A significant deviation from the rnglists code: Here I return an error if it is not possible to compute the address range correctly. The rnglists parser cannot compute the value, it will just return something, which can be very far from the correct range -- for instance, it's happy to use the base_addressx index as the offset, if the index cannot be resolved correctly. And it doesn't provide any indication that it has done so, which doesn't seem like a very useful behavior.

If this is the behavior we want, I can also try to make the rnglists parser do something similar.

291–295

This parallel iteration is not completely nice, but I think it's worth being able to reuse the absolute range computation code. I'm open to ideas for improvement though.

test/CodeGen/X86/debug-loclists.ll
16

This tries to follow the RLE format as closely as possible, but I think something like

[DW_LLE_offset_pair,  0x0000000000000000, 0x0000000000000004] => [0x0000000000000000, 0x0000000000000004): DW_OP_breg5 RDI+0

would make more sense (both here and for RLE).

test/DebugInfo/X86/fission-ranges.ll
48

This is somewhat annoying, because the entries printed through the loclists section will always have this error (as we don't have the DWARFUnit). I'll have to figure out a way to suppress those, while still keeping them around when printing from DWARFDie (as there a failure means a real error).

dblaikie added inline comments.Oct 4 2019, 1:57 PM
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

Ah, I see - this is what you meant about "In particular it makes it possible to reuse this stuff in the dumping code, which would have been pretty hard with callbacks.".

I'm wondering if that might be worth revisiting somewhat. A full iterator abstraction for one user here (well, two once you include lldb - but I assume it's likely going to build its own data structure from the iteration anyway, right? (it's not going to keep the iterator around, do anything interesting like partial iterations, re-iterate/etc - such that a callback would suffice))

I could imagine two callback APIs for this - one that gets entries and locations and one that only gets locations by filtering on the entry version.

eg:

// for non-verbose output:
LL.forEachEntry([&](const Entry &E, Expected<DWARFLocation> L) {
  if (Verbose && actually dumping debug_loc)
    print(E) // print any LLE_*, raw parameters, etc
  if (L)
    print(*L) // print the resulting address range, section name (if verbose), 
  else
    print(error stuff)
});

One question would be "when/where do we print the DWARF expression" - if there's an error computing the address range, we can still print the expression, so maybe that happens unconditionally at the end of the callback, using the expression in the Entry? (then, arguably, the expression doesn't need to be in the DWARFLocation - and I'd say make the DWARFLocation a sectioned range, exactly the same type as for ranges so that part of the dumping code, etc, can be maximally reused)

test/CodeGen/X86/debug-loclists.ll
16

Yep, that'd make more sense to me - are you planning to unify the codepaths for this? I think that'd be for the best.

If I were picking a printing from scratch, I might go with:

DW_LLE_offset_pair(0x0000, 0x0004) => [0x0000, 0x0004): DW_OP_breg5 RDI+0

Making it look a bit more like a function call and function arguments. Though the () might be confusing with the range notation.

I'm also undecided on the " => " separator. Whether a ':' might be better/fine, etc.

Totally open to ideas, but mostly I'd really love these to use loclist and ranges to use the same code as much as possible, so we can get consistency and any readability benefits, etc in both.

test/DebugInfo/X86/dwarfdump-debug-loclists.test
7

I don't think the inline dumping should print the encoding - I'd borrow a lot from/try to unify with the ranges printing, which doesn't. I think verbose ranges print the same as non-verbose except they also add the section name/number.

test/DebugInfo/X86/fission-ranges.ll
48

IMHO we may want to move to a model where we don't try to create/parse any content except by finding a reference from a CU (or the DWARFv5 stanfdalone line tables).

In theory, it's perfectly find to have random garbage in debug sections other than debug_info (or the standalone line table) - because the only parts that should be parsed are those referenced from debug_info.

This came up in the form of a bug in location list dumping when the binary is linked with bfd ld. It doesn't update any addresses to discarded sections, leaving them as zero (whereas gold and lld write the addend to the relocation - which generally makes sure any range pair doesn't end up as "zero zero" which marks the end of a list) which terminates a list early and leads to the following location expression to be parsed as the start of a new list... which is totally bogus.
Now, granted, the resulting debug info from bfd ld is wrong (if you had a location list spanning multiple functions (eg: a global variable had been put in a register for the duration of a function, etc) then resolving any one of those location entries to zero-zero would terminate the list early even though there might be non-dropped functions in the list after that point) - but I still think there's something to be said for it.

There's a fair counterargument too - that we might want to be able to make a best-effort to dump content that isn't complete (eg: if a section was emitted alone - or there was some hunk of unreferenced location list in the debug_loc section, it might be interesting to know what's in that hunk - might give you hints about where it /should/ have been referenced from)

Apparently binutils objdump when printing debug info only dumps those referenced pieces and prints info about "holes" when there's unreferenced chunks.

Ah, here's the bug context on that: https://bugs.llvm.org/show_bug.cgi?id=43290

But, yeah, all that aside - given the architecture of libDebugInfoDWARF/llvm-dwarfdump right now, yes, it'd be good to omit those error messages.

Also note that address indexes wouldn't be resolvable when dumping .dwo files - since the debug_addr would be in the .o file instead. So it'd be good to not print lots of error messages there either.

labath marked 3 inline comments as done.Oct 7 2019, 11:27 AM
labath added inline comments.
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

Actually, what lldb currently does is that it does not build any data structures at all (except storing the pointer to the right place in the debug_loc section. Then, whenever it wants to do something to the loclist, it parses it afresh. I don't know why it does this exactly, but I assume it has something to do with most locations never being used, or being only a couple of times, and the actual parsing being fairly fast. What this means is that lldb is not really a single "user", but there are like four or five places where it iterates through the list, depending on what does it actually want to do with it. It also does partial iteration where it stops as soon as it find the entry it was interested in.
Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).
Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy. Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)

test/CodeGen/X86/debug-loclists.ll
16

I like the function call format. I hoping to get some code reuse, though it's still not fully clear to me how to achieve that..

test/DebugInfo/X86/dwarfdump-debug-loclists.test
7

Sure, I can do that, though I think that means there won't be a single place where one can see both the raw encodings and their interpretation -- section-based dumping will not show the interpretation (would you want me to show still show them I they happen to be interpretable without the base address or the address pool?), and the debug_info dumping will not show the encoding. Is that bad? -- I don't know...

dblaikie added inline comments.Oct 7 2019, 6:31 PM
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

Ah, thanks for the details on LLDB's location parsing logic. That's interesting indeed!

I can appreciate an iterator-based API if that's the sort of usage we've got, though I expect it doesn't have any interest in the low-level encoding & just wants the fully processed address ranges/locations - it doesn't want base_address or end_of_list entries? & I think the dual-iteration is a fairly awkward API design, trying to iterate them in lock-step, etc. I'd rather avoid that if reasonably possible.

Either having an iterator API that gives only the fully processed data/semantic view & a completely different API if you want to access the low level primitives (LLE, etc) (this is how ranges works - there's an API that gives a collection of ranges & abstracts over v4/v5/rnglists/etc - though that's partly motivated by a strong multi-client need for that functionality for symbolizing, etc - but I think it's a good abstraction/model anyway (& one of the reasons the inline range list printing doesn't include encoding information, the API it uses is too high level to even have access to it))

Now, all of that is possible with a callback (though I am generally trying to avoid them), but it does resurface the issue of what should be the value of the second argument for DW_LLE_base_address entries (the thing which I originally used a error type for).

Sorry, my intent in the above API was for the second argument to be Optional's "None" state when... oh, I see, I did use Expected there, rather than Optional, because there are legit error cases.

I know it's sort of awkward, but I might be inclined to use Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a bit weird, but I think it'd be nicer than having an error state for what, I think, isn't erroneous.

Maybe this should be actually one callback API, taking two callback functions, with one of them being invoked for base_address entries, and one for others? However, if we stick to the current approaches in both LLE and RLE of making the address pool resolution function a parameter (which I'd like to keep, as it makes my job in lldb easier), then this would actually be three callbacks, which starts to get unwieldy.

Don't mind three callbacks too much.

Though one of those callbacks could be removed with the "DWARFUnit implementing a AddrOffsetResolver interface" idea, which I really like. :)

Sorry, I haven't really looked at where the address resolver callback is registered and alternative designs being discussed - but yeah, going off just the one-sentence, it seems reasonable to have the DWARFUnit own an address resolver/be the thing you consult when you want to resolve an address (just through a normal function call in DWARFUnit, perhaps - which might, internally, use a callback registered when it was constructed).

test/CodeGen/X86/debug-loclists.ll
16

I've posted my unification of range/loc/v4/v5 emission here: https://reviews.llvm.org/D68620 - & I'd imagine something similar in the parsing side.

test/DebugInfo/X86/dwarfdump-debug-loclists.test
7

Fair - that comes back to the issue I mentioned in a previous comment about potentially limiting dumping of non-debug_info sections based on the presence of a CU that references it (& only dumping it that way, rather than trying to parse it without a CU). DWARF isn't really designed to be parsed without the CU anyway. (could leave it in as best-effort to parse things without a referencing CU for debugging, etc).

Mostly I'm interested in unification perhaps more/primarily, than feature improvements - then we can make feature improvements to both ranges and locs without having to duplicate things.

labath marked 2 inline comments as done.Oct 8 2019, 8:24 AM
labath added inline comments.
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

I know it's sort of awkward, but I might be inclined to use Optional<Expected<AddressRange>> there. I realize two layers of wrapping is a bit weird, but I think it'd be nicer than having an error state for what, I think, isn't erroneous.

Actually, my very first attempt at this patch used an Expected<Optional<Whatever>>, but then I scrapped it because I didn't think you'd like it. It's not the friendliest of APIs, but I think we can go with that.

Sorry, I haven't really looked at where the address resolver callback is registered and alternative designs being discussed - but yeah, going off just the one-sentence, it seems reasonable to have the DWARFUnit own an address resolver/be the thing you consult when you want to resolve an address (just through a normal function call in DWARFUnit, perhaps - which might, internally, use a callback registered when it was constructed).

I think you got that backwards. I don't want the DWARFUnit to be the source of truth for address pool resolutions, as that would make it hard to use from lldb (it's far from ready to start using the llvm version right now). What I wanted was to replace the lambda/function_ref with a single-method interface. Then both DWARFUnits could implement that interface so that passing a DWARFUnit& would "just work" (but you wouldn't be limited to DWARFUnits as anyone could implement that interface, just like anyone can write a lambda).

test/CodeGen/X86/debug-loclists.ll
16

cool. I'll see what I can do with that.

Do we care whether llvm-dwarfdump's output bears any similarities to the output from GNU readelf or objdump? There has been a push lately to get the LLVM "binutils" to behave more like GNU's, although AFAIK it hasn't gotten to the DWARF dumping part.

Do we care whether llvm-dwarfdump's output bears any similarities to the output from GNU readelf or objdump? There has been a push lately to get the LLVM "binutils" to behave more like GNU's, although AFAIK it hasn't gotten to the DWARF dumping part.

I am not too fond of the readelf output. At least for the .debug_info dumping. I like the see indentation. But I do see the appeal of consistent output.

Do we care whether llvm-dwarfdump's output bears any similarities to the output from GNU readelf or objdump? There has been a push lately to get the LLVM "binutils" to behave more like GNU's, although AFAIK it hasn't gotten to the DWARF dumping part.

Generally I hope not to deal with that until there's a user with a need for it who wants to do the work & has a specific use-case that can help motivate which similarities are desirable and which ones don't matter (& perhaps if there's enough that they start to tradeoff usability - maybe the "compatibility mode" is a separate tool or separate flag to the existing tool).

My broader hope is probably that llvm-dwarfdump is more for interactive uses than other dumpers, so fewer people might try to build automated things on top of it & thus expect specific output (this gives us both the freedom not to match the GNU tools, and the freedom not to match previous llvm-dwarfdump behavior (which we've done a fair bit in the past - which seems to support the theory that people don't seem to be building much on top of this))

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

As for Expected<Optional<Whatever>> (or Optional<Expected<>>) - yeah, I think this is a non-obvious API (both the general problem and this specific solution). I think it's probably worth discussing this design a bit more to save you time writing/rewriting things a bit. I guess there are a few layers of failure here.

There's the possibility that the iteration itself could fail - even for debug_loc style lists (if we reached the end of the section before encountering a terminating {0,0}). That would suggest a fallible iterator idiom: http://llvm.org/docs/ProgrammersManual.html#building-fallible-iterators-and-iterator-ranges

But then, yes, when looking at the "processed"/semantic view, that could fail too in the case of an invalid address index, etc.

The generic/processed/abstracted-over-ranges-and-rnglists API for ranges produces a fully computer vector (& then returns Expected<vector> of that range) - is that reasonable? (this does mean manifesting a whole location in memory, which may not be needed so I could understand avoiding that even without fully implementing & demonstrating the vector solution is inadequate).

But I /think/ maybe the we could/should have two APIs - one generic API that abstracts over loc/loclists and only provides the fully processed view, and another that is type specific for dumping the underlying representation (only used in dumping debug_loclists).

labath marked an inline comment as done.Oct 11 2019, 5:30 AM
labath added inline comments.
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

If we were computing the final address ranges from scratch (which would be the best match for the current lldb usage, but which I am not considering now for fear of changing too many things), then I agree that we would need the fallible_iterator iterator thingy. But in this case we are "interpreting" the already parsed ranges, so we can assume some level of correctness here, and the thing that can fail is only the computation of a single range, which does not affect our ability to process the next entry.
This indicates to me that either each entry in the list should be an Expected<>, or that the invalid entries should be just dropped (possibly accompanied by some flag which would tell the caller that the result was not exhaustive).

This is connected to one of the issues I have with the debug ranges API -- it tries _really_ hard to return *something* -- if resolving the indirect base address entry fails, it is perfectly happy to use the address _index_ as the base address. This makes sense for dumping, where you want to show something (though it would still be good to indicate that you're not showing a real address), but it definitely does not help consumers which then need to make decisions based on the returned data.

Anyway, yes, I agree that we need to APIs, and probably callbacks are the easiest way to achieve that. We could have a "base" callback that is not particularly nice to use, but provides the full information via a combination of UnparsedLL and Optional<Expected<ParsedLL>> arguments. The dumper could use that to print out everything it needs. And then we could have a second API, built on top of the first one, which ignores base address entries and the raw data and returns just a bunch of Expected<ParsedLL>. This could be used by users like lldb, who just want to see the final data. The ParsedLL type would be independent of the location list type, so that the debug_loc parser could provide the same kind of API (but implemented on top of something else, as the UnparsedLL types would differ). Also, under the hood, the location list dumper for debug_loclists (but not debug_loc) could reuse some implementation details with the debug_rnglists dumper via a suitable combination of templates and callbacks.

How does that sound?

dblaikie added inline comments.Oct 11 2019, 1:29 PM
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

What sort of things are you concerned about with deeper API changes here? I think it's probably worth building the "right" thing now - as good a time as any. LLVM's debug info APIs, as you've pointed out, aren't exactly "sturdy" (treating address indexes as offsets, etc, etc), so no time like the present to clean it up.

I think if we had an abstraction over v4 and v5 location descriptions, parsing from scratch, fallible iterators, etc - that'd be the ideal thing to use in the inline dumping code (that dumps inside debug_info) - which currently uses "parseOneLocationList" - so it is parsing from scratch and dumping.

But equally I understand not wanting to make you/me/anyone fix everything when just trying to get something actually done.

labath marked an inline comment as done.Oct 14 2019, 6:58 AM
labath added inline comments.
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

I think I am mainly concerned with the scope explosion of doing changes like that. I don't know how well founded is that concern, because I don't know all the use cases for this information right now (but that's a part of the problem). But anyway, let's continue the discussion.

The main problem I see with "inline" dumping using some higher-level abstraction is what should one do in case of incomplete data (e.g. dwo files). If this abstraction returns "cooked" data, then the inline dump could not show anything for dwo files dumped standalone. This is perfectly fine for lldb and maybe other tools (which never look at dwo files separately, and mostly don't care about the reason why the "cooking" failed -- if they just need to get the data it can rely on), but for something like llvm-dwarfdump, we'd probably want to display _something_. I guess this is why the ranges api returns indexes as offsets, but I think we agree we don't want that.

Then the question is what should that "something" be? If it's going to be a raw dump of the location list entry, then solution would not be fully generic, as they raw entry type will depend on the location list kind. Though, we could still arrange it so that the various location lists can be processed uniformly via a template if they e.g. have a dump() function with the same signature...

Suppose we go down that path (this is the path I wanted to go down in my previous comment, modulo the parse-from-scratch part). The question then is what to do with the section-based dumping. Should it use the same mechanism? It probably should, because the first callback/iterator will provide all data it can possibly want. But then, should it also build the parsed representation like it does now?
If yes, then what for? Should we also build some mechanism to "cook" that data too.
If not, are we ok with having all users reparse from scratch always? (There are only two users I found right now, and they look like they'd be fine with it.)
Or should the DWARFDebugLoclists object cache the cooked data instead? llvm-dwarfdump --statistics would actually prefer that, but that might make the DWARF verifier sad (though I guess it could always reparse from scratch if needed).

JDevlieghere added inline comments.Oct 31 2019, 9:00 AM
include/llvm/DebugInfo/DWARF/DWARFDebugLoc.h
138

I like the idea, +1 from me.

147

Most dump methods give a default argument for DIDumpOptions (DumpOpts = {}). Should we do the same here?

lib/BinaryFormat/Dwarf.cpp
479

Maybe this should go in in DWARF.h?

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

I'm very late to the discussion and I'm not as familiar as both of you with the details and the API uses, so please ignore my suggestion if it doesn't make any sense...
Could we have an API where we parse a *list* of Uncooked (to reuse Pavel's nomenclature) and then have the ability to resolve each Uncooked entry into a Cooked entry? Then both LLDB and dwarfdump could get the list of Uncooked entries and try to get the cooked variant. If that works, great, we dump/use that, if not we move on and have separate failure modes for errors originating from parsing and from cooking.

labath abandoned this revision.Oct 31 2019, 9:26 AM
labath marked an inline comment as done.

Abandoning. I'll create separate patches with the new implementation.

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
291–295

We've discussed this with David last week, and we have hopefully agreed on the rough direction forward (and I think it's going to roughly correspond to what you had in mind). I'm preparing a patch (first of many) to implement that and I'm hoping I'll be able to upload something today.