Page MenuHomePhabricator

DWARFDebugLoclists: Move to a incremental parsing model

Authored by labath on Oct 31 2019, 9:58 AM.



This patch stems from the discussion D68270 (including some offline
talks). The idea is to provide an "incremental" api for parsing location
lists, which will avoid caching or materializing parsed data. An
additional goal is to provide a high level location list api, which
abstracts the differences between different encoding schemes, and can be
used by users which don't care about those (such as LLDB).

This patch implements the first part. It implements a call-back based
"visitLocationList" api. This function parses a single location list,
calling a user-specified callback for each entry. This is going to be
the base api, which other location list functions (right now, just the
dumping code) are going to be based on.

Future patches will do something similar for the v4 location lists, and
add a mechanism to translate raw entries into concrete address ranges.

Diff Detail

Event Timeline

labath created this revision.Oct 31 2019, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2019, 9:58 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
labath marked 3 inline comments as done.Oct 31 2019, 10:09 AM

It looks like there will be some conflicts between this patch and D69462, but it does not seem they should be hard to resolve, as that patch mostly deals with the generation side of things.


Something which isn't obvious from this patch is that our idea was to make the Entry class be shared between the two location list implementations. The v4 entries will get "upgraded" to v5 during parsing. This will make it possible to share the code translating the location list entries into actual address ranges. Dumping will be handled specially, in order to make the v4 dump come out as it was originally in the file.


Parsing twice just to get the maximum name length is somewhat wasteful. A different option would be to get the maximum of *all* entry types, not just of those that happen to be used in this location lists.


This is +/- the only functional change in this patch. It is unfortunate because the range list code does not print the extra colon. It wouldn't be *too* hard to avoid that. I'd just need to move the offset printing code out of the "dumpLocationList" function and into the individual (about 3 of them) callers. However, that did not seem worth it, particularly as I am hoping that this can be rectified once we have a more high-level method of printing "inline" location lists.

labath updated this revision to Diff 227306.Oct 31 2019, 10:18 AM

Add some comments for the new functions.

dblaikie accepted this revision.Oct 31 2019, 5:25 PM

Looks good!


Yeah, I'd be happy if you switch that to max of all entry types.


isValidOffset might be unnecessary given "Offset < EndOffset"? (maybe check that EndOffset is smaller than Data's actual end ofset to start with, then skip the isValidOffset check? (maybe need a lower bound check too))

This revision is now accepted and ready to land.Oct 31 2019, 5:25 PM
SouraVX added inline comments.Nov 3 2019, 9:37 AM

@labath , Not sure about this piece of code, is this patch has some relationship with D68271 ??
I noticed this piece of code their, and which is uncommitted ??

labath marked an inline comment as done.Nov 4 2019, 9:56 AM

Will address @dblaikie's comments tomorrow.


Yes, it is based on that patch, mainly because I was too lazy to create a new branch for this work. What is it that you are unsure of? (Is it just it's presence, or something more?)

SouraVX marked an inline comment as done.Nov 4 2019, 10:10 AM
SouraVX added inline comments.

Just the presence here(when I was rebaseing), I wasn't sure it's stemming / related from that patch.
I believe you must've forgotten to specified it's dependence / related to D68271. ??
Thanks for clarifying this.

labath updated this revision to Diff 227846.Nov 5 2019, 4:28 AM
labath marked 4 inline comments as done.
  • change how maximum length is computed
  • tweak end offset handling

This is debatable, I think. In case the EndOffset is wrong (corrupted header?), it may make sense to still dump as much as we are able to do instead of bailing out straight away. However, that is probably not a likely scenario, and it does make things cleaner, so I've gone with your suggestion. I've also changed the EndOffset argument into a "Size", as that seemed a bit cleaner.

The thing I would really want to have some day is the ability to "slice" a data extractor. That would make a lot of these offset calculations simpler and/or safer. For instance, this code has the bug that if one location list crosses the "end offset" boundary (and there is more data after it), then we will only detect that once we've finished dumping that location list. If the size was encoded in the data extractor, then this could be detected immediately, as each contribution would be handled as if it was the last one in the section.

JDevlieghere accepted this revision.Nov 5 2019, 9:39 AM

Personally I'm not a big fan of callbacks, but it seems like this was well thought through. My original suggestion of doing a two-step parse also wouldn't allow you to abort parsing, which is possible here by returning false form the callback.


labath added a comment.Nov 6 2019, 7:15 AM

Personally I'm not a big fan of callbacks, but it seems like this was well thought through. My original suggestion of doing a two-step parse also wouldn't allow you to abort parsing, which is possible here by returning false form the callback.

Ah, I now fully understand what you meant on the other diff. Yes, being able to abort parsing part-way (along avoiding materializing the full list) is one of the benefits of this approach -- or at least, it makes it fit well into the way it is currently used from lldb. Though in practice, I don't expect that to matter much, as individual location lists are small [citation needed]. I'm not a fan of callbacks either, and if this had/will have more users, I'd go for turning these into iterators -- it's just that writing iterators is cumbersome (when can we have coroutines?), so it's not clear if it's worth the trouble.



This revision was automatically updated to reflect the committed changes.