This is an archive of the discontinued LLVM Phabricator instance.

Handle the case when a variable is only valid in part of the enclosing scope
ClosedPublic

Authored by tberghammer on Feb 19 2016, 8:01 AM.

Details

Summary

Handle the case when a variable is only valid in part of the enclosing scope

DWARF stores this information in the DW_AT_start_scope attribute. This
CL add support for this attribute and also changes the functions
displaying frame variables to only display the variables currently in
scope.

Note: This CL is part of an effort to make LLDB capable of debugging Java code JIT-ed by the android runtime

Diff Detail

Repository
rL LLVM

Event Timeline

tberghammer retitled this revision from to Handle the case when a variable is only valid in part of the enclosing scope.
tberghammer updated this object.
tberghammer added reviewers: ovyalov, clay.chang.
tberghammer added a subscriber: lldb-commits.
tberghammer edited reviewers, added: clayborg; removed: clay.chang.Feb 19 2016, 8:01 AM
clayborg requested changes to this revision.Feb 19 2016, 10:54 AM
clayborg edited edge metadata.

DWARF already has support for this using location lists. I would rather see this done by having the DWARF parser modify the DWARF expression to be a location list based. I don't mean have the compiler change, but just how we store the location in the variable. This would mean the ONLY change will be in the DWARF parser. We don't need to track any ranges or anything in the variable this way.

So if you have a variable:

DW_TAG_subprogram:
DW_AT_name("main")
DW_AT_low_pc(0x1000)
DW_AT_high_pc(0x2000)

    DW_TAG_variable:
    DW_AT_name("a")
    DW_AT_start_scope (0x20)
    DW_AT_location (0x11 0x22 0x33) (location is described by a block of DW_OP_expression bytes)

Then we would make a location list expression where the DWARFExpression's type is switched from:

NonLocationList with bytes of 0x11, 0x22, 0x33

over to a DWARFExpression whose type is:

RegularLocationList with [0x1020-0x2000): 0x11, 0x22, 0x33

If that doesn't make sense let me know. If we do this, then the ONLY change is in the DWARF parser since we already support location list based expressions.

This revision now requires changes to proceed.Feb 19 2016, 10:54 AM
tberghammer requested a review of this revision.Feb 22 2016, 3:26 AM
tberghammer edited edge metadata.

There is a (minor) difference between a variable being out of scope and having no location information. Being out of scope means that the variable hasn't declared yet while having no location information means that the variable is declared, it is in scope, but it is optimized out so the debugger can't display it's value. I introduced this new dimension of data to properly represent the difference between the 2 scenario.

In case of C/C++ the difference is very small (and I am not aware of any C/C++ compiler emitting DW_AT_start_scope) but based on our understanding of the dwarf spec it is valid if a compiler emits no lexical blocks at all and specifies the scope of all variable inside a function body using DW_AT_start_scope attributes (we use this method in the android java runtime because the full scoping information is not available). In this scenario the difference between being out of scope and being optimized out is significant because we can have two variable with the same name inside a function (same lexical block) with different scoping information what will ensure that the variable names are unique at any given point of the function (assuming the dwarf information is correct).

If we want to represent this information with using a dwarf expression then we need to create a new expression value what I think is a much more intrusive change as it will modify we handle the standard dwarf expressions. In my opinion it is a much better option to introduce the scoping information as a high level concept the way I implemented it so all dwarf information (possibly) generated by the compilers can be handled correctly.

Let me know what do you think.

I would rather do this by modifying the expression locations if at all possible. I would like to not having to modify each variable to contain a RangeList as this is extra data per variable. I would be fine if we just add a new enumeration to the DWARFExpression like StartScopeLocationList which is the same as "RegularLocationList". The contents would be the same (start address + end address followed by location expression block), but it would mean that the locations indicate that the variable is in scope (instead of just not available). Then we just need to modify Variable::IsInScope() to check what it did before, and if the DWARFExpression is StartScopeLocationList, just make sure we have a location that contains the current address. Does that sound acceptable? I am also OK with modifying the DWARFExpression class to be able to return an enumeration when the DWARFExpression is evaluated so it can return things like:

eDWARFExpressionSuccess
eDWARFExpressionVariableNotAvailable
eDWARFExpressionOutOfScope
eDWARFExpressionError

Then RegularLocationList would return eDWARFExpressionVariableNotAvailable when there is no location that the variable is available, and StartScopeLocationList would return eDWARFExpressionOutOfScope.

We have 2 (+1) independent dimension here:

  1. List of ranges where the variable is in scope
  2. List of ranges where the variable is available
  3. Format of the location list (it is more of an implementation details because of a format change in dwarf5)

To represent all of the possible combination for a variable we have to handle all dimension separately. I think we have several options:

  • Treat the scope as a top level parameter of a variable (same way as we treat the enclosing lexical block) the way I implemented it and if we are worrying about the memory implication then a possible optimization is to say that an empty location list means that the scope is the same as the enclosing block (This way we will add 2 pointer to each Variable in the general case what will be a negligible size increase).
  • Make DWARFExpression store 2 list of ranges (1 for the locations and 1 for the scope). I think it is a very similar implementation then the current one but with storing the data in a wrong location so I am strongly against this option.
  • Try to make DWARFExpression smart enough to handle the 2 layer of the location expressions (I think this is you suggestion). I think to implement this we need the following changes:
    • Change DWARFExpression to store a RangeMap<lldb::addr_t, lldb::addr_t, DataExtractor> for the location list instead of just a DataExtractor because we need some data format what can be modified instead of the current implementation where DWARFExpression stores only a DataExtractor referencing a blob of data.
    • Teach SymbolFileDWARF about the details of the DWARFExpression format so it can modify it.
    • Abuse the DWARFExpression a bit so it can return some additional value next to the result of the evaluation for the DWARFExpression.

Based on the steps I listed above I don't think it is reasonable to merge the scope information into the DWARFExpression especially if we consider that dwarf expressions are used for several other thing then describing the location of variables. If we don't want to add the scoping information to the Variable class then we can go with a plan like this (I don't really like it):

  • Create a new DWARFVariableLocationDescription class
  • Remove all location list support from the DWARFExpression class so it really just evaluates dwarf expressions
  • In the new class store a map from address range to (dwarf expression + bitfield) where the bitfield value tell if the variable is in/out of scope and if it has/hasn't have location information

I think in one side this would be a fairly clean abstraction and it would also simplify the DWARFExpression class but on the other side I am almost certain then it would have more memory and performance impact then my current implementation (especially if the memory optimization I mentioned is applied) because it will have to create a range map from each dwarf expression and store it in memory.

All in all I don't really share your concern regarding memory impact, but I am happy to optimize it to the point where we use only 3 extra pointer (1 std::vector) compared to the original situation in the common (C/C++) case. Looking into the other options I don't think any of them is any better in terms of memory/performance and they will complicate the use of dwarf expressions all over lldb (the DWARFVariableLocationDescription class would make things cleaner but I don't see any good way to implement it without a higher memory penalty then my current implementation)

We have 2 (+1) independent dimension here:

  1. List of ranges where the variable is in scope
  2. List of ranges where the variable is available
  3. Format of the location list (it is more of an implementation details because of a format change in dwarf5)

To represent all of the possible combination for a variable we have to handle all dimension separately. I think we have several options:

  • Treat the scope as a top level parameter of a variable (same way as we treat the enclosing lexical block) the way I implemented it and if we are worrying about the memory implication then a possible optimization is to say that an empty location list means that the scope is the same as the enclosing block (This way we will add 2 pointer to each Variable in the general case what will be a negligible size increase).
  • Make DWARFExpression store 2 list of ranges (1 for the locations and 1 for the scope). I think it is a very similar implementation then the current one but with storing the data in a wrong location so I am strongly against this option.
  • Try to make DWARFExpression smart enough to handle the 2 layer of the location expressions (I think this is you suggestion). I think to implement this we need the following changes:
    • Change DWARFExpression to store a RangeMap<lldb::addr_t, lldb::addr_t, DataExtractor> for the location list instead of just a DataExtractor because we need some data format what can be modified instead of the current implementation where DWARFExpression stores only a DataExtractor referencing a blob of data.
    • Teach SymbolFileDWARF about the details of the DWARFExpression format so it can modify it.
    • Abuse the DWARFExpression a bit so it can return some additional value next to the result of the evaluation for the DWARFExpression.

Based on the steps I listed above I don't think it is reasonable to merge the scope information into the DWARFExpression especially if we consider that dwarf expressions are used for several other thing then describing the location of variables. If we don't want to add the scoping information to the Variable class then we can go with a plan like this (I don't really like it):

  • Create a new DWARFVariableLocationDescription class
  • Remove all location list support from the DWARFExpression class so it really just evaluates dwarf expressions
  • In the new class store a map from address range to (dwarf expression + bitfield) where the bitfield value tell if the variable is in/out of scope and if it has/hasn't have location information

I think in one side this would be a fairly clean abstraction and it would also simplify the DWARFExpression class but on the other side I am almost certain then it would have more memory and performance impact then my current implementation (especially if the memory optimization I mentioned is applied) because it will have to create a range map from each dwarf expression and store it in memory.

All in all I don't really share your concern regarding memory impact, but I am happy to optimize it to the point where we use only 3 extra pointer (1 std::vector) compared to the original situation in the common (C/C++) case. Looking into the other options I don't think any of them is any better in terms of memory/performance and they will complicate the use of dwarf expressions all over lldb (the DWARFVariableLocationDescription class would make things cleaner but I don't see any good way to implement it without a higher memory penalty then my current implementation)

I can see your point. I don't want to change the DWARFExpression class to just contain a list of ranges and have DWARFExpression _only_ parse single DWARF location expressions because this will cause us to have to parse the information and pull out the addresses just so we can put it into a variable that may never get displayed and this is work we don't need to do. It is actually stored in the DWARF (if the DW_AT_locations has a reference it becomes an offset into the .debug_loc section which contains data already encoded with start address, end address, block, repeat until terminate), so lets not change DWARFExpression.

How about we add the in scope range list that was previously added to the variable to the DWARFExpression class instead? Just in case we have things other that a variable that has a DW_AT_location or DW_AT_data_member_location that also use the DW_AT_start_scope. Then you can ask your DWARFExpression if it is in scope before evaluating it. Then if the in scope range list is empty, it is in scope. Then we check if we have a location list, if we do, make sure the address is in one of those ranges and if not the variable is not available. Else find the right location expression and evaluate it. How does that sound?

I can live with that but I still think the scope information belongs to the variable and not to the DWARF expression representing it's location. As far as I see keeping the information inside the variable have lower memory impact (assuming an empty list means full scope) because we have less Variable object then DWARFExpression object and it isn't complicate other use cases of DWARFExpression where scope isn't used at all.

So all in all I can move the information to the DWARFExpression but I think keeping them in the Variable class is better. Let me know what do you think.

I can live with that but I still think the scope information belongs to the variable and not to the DWARF expression representing it's location. As far as I see keeping the information inside the variable have lower memory impact (assuming an empty list means full scope) because we have less Variable object then DWARFExpression object and it isn't complicate other use cases of DWARFExpression where scope isn't used at all.

So all in all I can move the information to the DWARFExpression but I think keeping them in the Variable class is better. Let me know what do you think.

I would vote for modifying the DWARFExpression only because we might eventually have other DIEs that have locations and start scope attributes that might use DWARFExpressions. But I don't see a problem with either if you really feel strongly...

tberghammer edited edge metadata.

I added the memory optimization to represent the case when no DW_AT_start_scope is specified as an empty range map. This way the memory impact for C/C++ will be only an empty std::vector per variable.

Regarding the location of the code I don't want to add it to DWARFExpression because dwarf expressions are used for much more thing then variable locations on their own. If an other DIE needs this information in the future then I would suggest to introduce a new class (don't have any good name for it) containing a DWARFExpression representing the location and a RangeMap representing the scope information.

In long term (don't have time from for it) I would also want to split DWARFExpression to 2 class where one of them know how to evaluate a dwarf expression and can represent only a single dwarf expression. Next to it I want to introduce a new DWARFLocationList class what knows about the location list part with no knowledge about the DWARF opcodes. The 2 class can be closely integrated so we don't need to parse anything upfront while we are keeping the 2 concept separated.

clayborg accepted this revision.Feb 24 2016, 9:47 AM
clayborg edited edge metadata.
This revision is now accepted and ready to land.Feb 24 2016, 9:47 AM
This revision was automatically updated to reflect the committed changes.