This is an archive of the discontinued LLVM Phabricator instance.

[LLDB] Dump valid ranges of variables
ClosedPublic

Authored by zequanwu on Feb 16 2022, 11:23 AM.

Details

Summary

This allows image lookup -a ... -v to print variables only if the given
address is covered by the valid ranges of the variables. Since variables created
in dwarf plugin always has empty scope range, print the variable if it has
empty scope.

Diff Detail

Event Timeline

zequanwu requested review of this revision.Feb 16 2022, 11:23 AM
zequanwu created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2022, 11:23 AM
jingham added a subscriber: jingham.EditedFeb 16 2022, 11:52 AM

It's really useful to be able to see these variable ranges, so the feature seems good to me, but I think the valid range output should be governed by another flag, like --show-variable-ranges. In optimized code, there can be a whole lot of ranges for each variable, so this could get really noisy and make the whole output hard to look at, so it would be good to have a way to turn it off. Unfortunately -v doesn't work for that since the non -v version isn't so useful.

It's really useful to be able to see these variable ranges, so the feature seems good to me, but I think the valid range output should be governed by another flag, like --show-variable-ranges. In optimized code, there can be a whole lot of ranges for each variable, so this could get really noisy and make the whole output hard to look at, so it would be good to have a way to turn it off.

Using a flag to control it seems good to me.

Unfortunately -v doesn't work for that since the non -v version isn't so useful.

I don't quite understand what you mean here. I suppose we want a flag --show-variable-ranges to have effect only when -v is given in image lookup -v -a .... Is this what you mean?

It's really useful to be able to see these variable ranges, so the feature seems good to me, but I think the valid range output should be governed by another flag, like --show-variable-ranges. In optimized code, there can be a whole lot of ranges for each variable, so this could get really noisy and make the whole output hard to look at, so it would be good to have a way to turn it off.

Using a flag to control it seems good to me.

Unfortunately -v doesn't work for that since the non -v version isn't so useful.

I don't quite understand what you mean here. I suppose we want a flag --show-variable-ranges to have effect only when -v is given in image lookup -v -a .... Is this what you mean?

I was addressing another unasked question: It seemed odd to say "add another flag to reduce the amount of output for something that is only printed when the user asks for verbose output". So I was trying to explain why that made sense. The reason being that when we made the "image lookup" command, the non-verbose version wasn't actually terribly useful, but the verbose version was. So pretty much everybody uses the -v flag when the command is used. The other option would be to do the horrible thing some tools do and have multiple -v's meaning more and more verbose. You could do it that way too (the option parser can count the number of -v's it has seen). But I think it's a lot better to have an option to dial up the specific extra info you want in this case.

But you are right, we don't print the variable information at all without -v, so while it's odd to have --show-variable-ranges only take effect when -v is supplied, I think that's the reasonable way to do it.

zequanwu updated this revision to Diff 409440.Feb 16 2022, 3:38 PM

Add -show-variables-ranges/-R flag to enable it with -v.

labath added a comment.EditedFeb 17 2022, 12:56 AM
This comment has been deleted.
lldb/source/Commands/Options.td
960
961

A not widely known fact: It is possible to produce an long option without a short version by using an non-printable character in place of the short option. You can see and example of that in disassemble --force, defined in this file.

If we are considering this a niche option, then maybe we don't need to use a random character just to have a short version.

lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
29

I don't think that the information itself is out of place here, but I think it can get confusing when one sees it like this, next to the location field. So, if we had an output like location = RDI, valid ranges = [0, 20), I think one would/could read it as "the variable is in RDI from PC=0 to PC=20", but what it _really_ is telling you is "the variable (or its value -- I'm not sure) exists between PC=0 and PC=20, *and* for the exact PC that you are querying (0 in this case), it's value happens to be in RDI".

We could try to come up with a way to make this less confusing, but maybe we could just sidestep this problem and just print this information in the image dump symfile output.

39

If you printed the information here, then there would be no (or less) confusion as the location field contains explicit range information. Additionally, the image dump symfile output is so verbose, than an extra field does not make a difference.

(I'm sorry, I know it was I who suggested using image lookup, but I did not realize you would be adding an extra argument because of that.)

There was a question on the dev list a while ago about to print out all the valid ranges of a variable. That's a useful bit of info if you're trying to figure out where you could break to actually see or change a variable's value in an optimized function. I think that's the motivation for the change, so just showing the range that encompasses the address wouldn't satisfy that initial impetus for the change.

It is a little awkward to put it in "image lookup -va" however, and odder to have it be part of the printing of an Address. It seems more like something you should ask a function?

Jim

zequanwu added inline comments.Feb 17 2022, 11:34 AM
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
29

I think this confusion can be solved by adding an argument for --show-variable-ranges. When only --show-variable-ranges is given, dump only the range that covers the querying address. When it comes with all argument, dump all valid ranges.

40–41

image dump symfile already prints valid ranges for variables along with where the value is at each range.

labath added inline comments.Feb 17 2022, 12:14 PM
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

Are you sure it does?

I was under the impression that there are two distinct range concepts being combined here. One is the range list member of the Variable object (as given by GetScopeRange -- that's the one you're printing now), and the other is the list of ranges hidden in the DWARFExpression object, which come from the debug_loc(lists) section (that's the one we've been printing so far). And that the root cause of the confusion is the very existence of these two concepts.

If I got it wrong, then do let me know, cause it would make things a lot simpler if there is only one validity concept to think about.

zequanwu added a comment.EditedFeb 17 2022, 1:03 PM
This comment has been deleted.
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

Dwarf plugin is supposed to construct the m_scope_range member of an Variable, but it doesn't. scope_ranges is empty at https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp#L3468.
image dump symfile dumps the dwarf location list in m_location in Variable.
The dwarf location list has more information than m_scope_range as it contains info about where the value is during each range. (e.g. which register the variable lives in).

So, I think we need to use similar logic to construct m_scope_range when creating Variable in dwarf plugin like this https://github.com/llvm/llvm-project/blob/main/lldb/source/Expression/DWARFExpression.cpp#L145.

There was a question on the dev list a while ago about to print out all the valid ranges of a variable. That's a useful bit of info if you're trying to figure out where you could break to actually see or change a variable's value in an optimized function. I think that's the motivation for the change, so just showing the range that encompasses the address wouldn't satisfy that initial impetus for the change.

I'm pretty sure the immediate motivation is testing of D119508, but yes, if there was another feature request for something like this, then it makes sense to implement it more fully.

lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

Ok, I see where you're coming from. You're essentially saying that the fact that the dwarf plugin does not fill this out is a bug.

I don't think that's the case. My interpretation was (and this comment confirms it) that an empty range here means the entire enclosing block. (Also, DWARF was for a long time the only symbol file plugin, so what it does is kinda "correct by definition").

I don't think we want to change that interpretation, as forcing a copy of the range in the location list would be wasteful (it would be different if this was an interface that one could query, and that the dwarf plugin could implement by consulting the location list). However, since the dwarf class does not actually make use of this functionality (it was added to support DW_AT_start_scope, then broken at some point, and eventually removed), we do have some freedom in defining the interactions of the two fields (if you still want to pursue this, that is).

So how about this: if the user passes the extra flag, then we print both the range field (if it exists), and the *full* location list (in that order, ideally). That way the output will be either range = [a, b), [c, d), location = DW_OP_reg47 or location = [a,b) -> DW_OP_reg4, [c,d) -> DW_OP_reg7. If the dwarf plugin starts using the range field again then the output will contain both fields, which will be slightly confusing, but at least not misleading (and we can also change the format then).

zequanwu added inline comments.Feb 18 2022, 5:55 PM
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

Oh, I think I misunderstood m_scope_range. It's the range list where the variable is valid regardless whether its value is accessible or not (valid range). As for m_location in Variable, it's describing the ranges where the value is (value range). They are not the same.

So, currently how NativePDB creates local Variable's range is not correct. That only works when it's not optimized build such that the valid range is the same as the value range. It still need to create dwarf location lists to correctly represent the value range, but as mentioned here, we need to choose a generic "variable location provider" interface for that.

labath added inline comments.Feb 20 2022, 11:34 PM
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

Oh, I think I misunderstood m_scope_range. It's the range list where the variable is valid regardless whether its value is accessible or not (valid range). As for m_location in Variable, it's describing the ranges where the value is (value range).

Yes, that was my initial assumption as well, and I think that is the only interpretation in which it makes sense to have two sources of range information for a variable. However, I've done some research since then, and I haven't found any compiler or debugger which would model the program sufficiently precisely to be able to make that distinction.

There are definite limits as to how far you can go with pdb using these abstractions, but given they (the m_scope_range) exist, I think you could make use of them (as you've done now), if they are sufficient for your current use case. That said, I would definitely encourage you to create a better abstraction for providing the location information for a variable.

zequanwu updated this revision to Diff 410679.Feb 22 2022, 5:08 PM
  • Use -show-variable-ranges/-R single/all to dump single addres range that contains the querying address or all address ranges. It seeme like I can't use non-printable character like \\x01 to omit short option when it takes a parameter, so -R is used as abbrev "range".
  • Refactor DWARFExpression::GetLocationExpression.
zequanwu added inline comments.Feb 22 2022, 5:29 PM
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

I think you meant to replace DWARFExpression with a more generic interface which has the same functionalities as DWARFExpression. That seems a lot work, especially on DWARFExpression::Evaluate.

I'm not sure about ModuleLookupShowRange::Single option. It feels like it just complicates things. Did anyone request that?

FWIW, I don't think that dumping a /single/ range is particularly verbose, so I could imagine even doing that by default.

I think it should be sufficient to have just two modes, either None+All, or Single+All. Having three seems like overkill.

lldb/source/Core/Address.cpp
734–740

Use DumpAddressRange from Utility/Stream.h

766–777

Can we get rid of this lambda by inlining it into DumpLocations. I don't think we need that much flexibility.

lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
20

print something like valid ranges = <block>, or maybe don't print anything at all?

40–41

Well.. not exactly "replace". You know how they say there's no software engineering problem that can't be solved by adding a layer of indirection. So, I thought we could create a new VariableLocationProvider (for lack of a better name) interface, and one of the implementations of that interface would be backed by a DWARFExpression class. Theoretically you may not need to touch the DWARFExpression class at all -- just wrap it so that it conforms to the new interface.

This is still pretty hand-wavy, so I don't know how much work would it be, but it does not seem like it should be _that_ hard..

zequanwu updated this revision to Diff 410882.Feb 23 2022, 11:25 AM
zequanwu marked 3 inline comments as done.
  • Remove short option. -show-variable-ranges prints all ranges. If not given, only print single range.
  • Address comments.
lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

Thanks, I see what you mean. I thought it would be each symbol file plugin needs to convert their debug formats to a universal one and the actual evaluation will happens in that one.

labath added inline comments.Feb 24 2022, 5:33 AM
lldb/source/Core/Address.cpp
739

This place was the only caller of Variable::DumpLocationForAddress. What if we, instead of duplicating its logic here, just change the function to do what we want?
The interface could be as simple as Variable::DumpLocations(Stream&, Address) where, if one provides an invalid Address, then all locations get dumped, and one can request a specific location by providing a concrete address.
We might even do something similar for the DWARFExpression class, and avoid the filter callbacks completely.

lldb/source/Expression/DWARFExpression.cpp
2735–2738

Could the sliding happen inside GetLocationExpressions, so that we don't have to repeat it in each callback?

lldb/test/Shell/SymbolFile/DWARF/x86/debug_loc.s
40–41

It's true that how most of our debug parsing works by converting the debug info into a generic format, but that is something I would want to change. It is really easy to construct the generic format from dwarf debug info (because it was designed that way), but the other symbol file plugins are often struggling to fit into this model. They often have the data that lldb needs easily accessible, just not in the format that lldb can accept, and so they often have to create copies just to put it into the required format. To some extent, this is also what happened with the dwarf, as we started using some of the llvm libraries to parse the debug info.

So overall, I think it would be better to just prescribe an interface through which one can query e.g. the line tables, and leave it up to the plugins to choose a suitable data structure to answer those queries. And I can't think of a better example for why is this needed than dwarf expression evaluation.

zequanwu updated this revision to Diff 411203.Feb 24 2022, 11:48 AM
zequanwu marked an inline comment as done.

Update.

lldb/source/Core/Address.cpp
739

I just moved the filter inside Variable::DumpLocations, because we still need to pass the filter callback to DWARFExpression::DumpLocaitons inside Variable::DumpLocaitons to choose dumping all ranges + locations or just one range location.

labath added inline comments.Feb 25 2022, 1:40 AM
lldb/source/Core/Address.cpp
739

Looking at it now, I think it'd be better to inline the callback all the way into DWARFExpression::DumpLocations. The callback is more flexible, but I'm not convinced that we need that flexibility, and the interface of the callback is pretty complex (two bool return values). I believe it would be sufficient if the function accepted an extra address argument (as an addr_t probably), and used an invalid address (LLDB_INVALID_ADDRESS) to mean "dump all locations".

lldb/source/Expression/DWARFExpression.cpp
2683

llvm::ListSeparator separator;

2691–2692

s->AsRawOstream() << separator;

lldb/source/Symbol/Variable.cpp
448

Can we keep abi and func_load_addr computations in this function? (It probably was easier to make them arguments starting from the last version of your patch, but putting the code here would reduce the diff from HEAD and also make the interface simpler).

zequanwu updated this revision to Diff 411446.Feb 25 2022, 9:59 AM
zequanwu marked 5 inline comments as done.

update.

zequanwu updated this revision to Diff 411898.Feb 28 2022, 2:18 PM
zequanwu marked 2 inline comments as done.

Remove unnecessary location_list_base_addr parameter from GetDescription() as it doesn't do filtering when dumping.

labath accepted this revision.Mar 1 2022, 7:38 AM

Looks good, I'm sorry this took so long.

Jim, do you have anything to add?

lldb/source/Symbol/Variable.cpp
439–440

Maybe move into a header?

This revision is now accepted and ready to land.Mar 1 2022, 7:38 AM

Yeah, just a couple little things in the comments. Otherwise this looks good.

lldb/include/lldb/Core/Address.h
250

You should document all_ranges.

lldb/source/Commands/Options.td
962

I missed where you return an error if somebody specifies show-variable-ranges == true w/o specifying verbose. Did I miss that?

lldb/source/Expression/DWARFExpression.cpp
2741

Local variables in lldb are all lower case and describe what they are.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 5:50 PM
zequanwu updated this revision to Diff 412532.Mar 2 2022, 1:40 PM
zequanwu marked 3 inline comments as done.

Address comments

lldb/source/Commands/Options.td
962

If show-variable-ranges is given but not -v, then show-variable-ranges has no effect. I don't know if there is way to make it gives an error in that case.
I tried to make show-variable-ranges group 7 and verbose GroupRange <1,7>, then -v and show-variable-ranges combination is invalid.

This revision was automatically updated to reflect the committed changes.

Address comments

This sort of thing is hard to check in the option groups w/o blowing out the number of groups you need to have.

The way to check after all the option parsing is done that the options are coherent is to have the CommandOptions implement OptionParsingFinished, check the if the all_ranges is set verbose is also set, and if it isn't return a suitable error.

Can you do this (even though I see you've checked it in already.) Silently having some combination of options not work isn't a good thing.

Address comments

This sort of thing is hard to check in the option groups w/o blowing out the number of groups you need to have.

The way to check after all the option parsing is done that the options are coherent is to have the CommandOptions implement OptionParsingFinished, check the if the all_ranges is set verbose is also set, and if it isn't return a suitable error.

Can you do this (even though I see you've checked it in already.) Silently having some combination of options not work isn't a good thing.

Thanks. I added it at https://reviews.llvm.org/rG302d7179e101ebbf48c386a632828a1150a18769.

Excellent, thanks!