This is an archive of the discontinued LLVM Phabricator instance.

Support unscoped enumeration members in the expression evaluator.
Needs ReviewPublic

Authored by werat on Jan 5 2021, 3:22 AM.

Details

Summary

Add unscoped enumeration members to the "globals" manual dwarf index. This
effectively makes them discoverable as global variables (which they
essentially are).

Before expression evaluator failed to lookup enumerators unless the
enumeration type has been already completed.

Consider the example:

enum MyEnum { eFoo = 1 };
MyEnum my_enum = eFoo;

(lldb) p eFoo
error: <user expression 2>:1:1: use of undeclared identifier 'eFoo'
eFoo
^
(lldb) p my_enum + eFoo
(int) $0 = 2

With this patch all unscoped enumerators can be looked up same as the global
variables and the expression evaluation works as expected.
SBTarget::FindGlobalVariables() now returns unscoped enumerators as well.

Diff Detail

Event Timeline

werat created this revision.Jan 5 2021, 3:22 AM
werat requested review of this revision.Jan 5 2021, 3:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 3:22 AM
werat updated this revision to Diff 314558.Jan 5 2021, 3:24 AM

Fix formatting

werat added a subscriber: jarin.
shafik added a subscriber: shafik.Jan 5 2021, 10:30 AM

We can have unscoped enums in namespace and class scope and the enumerators won't leak out from those scopes. Thus we can have shadowing going on e.g.:

#include <iostream>

enum GEnum {eOne=2,};

namespace A {
   enum AEnum {eOne=0,};

  void g() {std::cout << eOne;}
};

struct B {
  enum CEnum {eOne=1,};

  void f() {std::cout << eOne;}
};

int main() {
   std::cout << eOne ;
   A::g() ;

   B b;
   b.f() ;
}

godbolt: https://godbolt.org/z/h3r76n

How does this change deal with those cases?

labath added a comment.Jan 6 2021, 5:31 AM

This suffers from the same problem as the other patch, where the other index classes (apple_names and debug_names) will essentially never be able to implement this feature. (Unless they re-index the dwarf themselves, that is, but this would defeat the purpose of having the index in the first place.)

werat updated this revision to Diff 315319.Jan 8 2021, 2:08 AM

Handle enum constants similar to global variables, support scoped lookup in the expression evaluator.
Add more test cases.

werat added a comment.Jan 8 2021, 2:14 AM

We can have unscoped enums in namespace and class scope and the enumerators won't leak out from those scopes. Thus we can have shadowing going on e.g.:

...

How does this change deal with those cases?

Thanks for pointing this out! My first patch didn't really take this into account, so it didn't work properly. I've made some changes to make it work, although I'm not very familiar with these parts of LLDB yet, so I'm not sure whether this approach is how it should be implemented.

Also I've noticed that LLDB's expression evaluator is not perfect with these lookups, e.g. it can find global variables from other scopes if there's no better candidate (not sure if this is a bug of a feature):

Process 3768045 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000401116 a.out`main at global.cpp:2:13
   1    namespace Foo { int x = 1; }
-> 2    int main() {};
(lldb) p x
(int) $0 = 1
werat added a comment.Jan 8 2021, 2:22 AM

This suffers from the same problem as the other patch, where the other index classes (apple_names and debug_names) will essentially never be able to implement this feature. (Unless they re-index the dwarf themselves, that is, but this would defeat the purpose of having the index in the first place.)

I'm not sure I completely understand. Why does changing the manual_dwarf_index mean that the other index classes can't ever implement this feature? If, let's say, debug_names decides to support enum constants, then its data layout should be changed to either include enumerators as globals, or add a new section, or something else. Then it can be properly handled in LLDB too (maybe changing the way manual_dwarf_index works too to keep things consistent). As I understand, manual_dwarf_index can be changed anytime, since we're building it ourselves.

Do I misunderstand how these indexes work and interact with each other?

werat updated this revision to Diff 315343.Jan 8 2021, 4:23 AM

Generate fully qualified names for enum constants.

We can have unscoped enums in namespace and class scope and the enumerators won't leak out from those scopes. Thus we can have shadowing going on e.g.:

...

How does this change deal with those cases?

Thanks for pointing this out! My first patch didn't really take this into account, so it didn't work properly. I've made some changes to make it work, although I'm not very familiar with these parts of LLDB yet, so I'm not sure whether this approach is how it should be implemented.

Also I've noticed that LLDB's expression evaluator is not perfect with these lookups, e.g. it can find global variables from other scopes if there's no better candidate (not sure if this is a bug of a feature):

Process 3768045 stopped
* thread #1, name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x0000000000401116 a.out`main at global.cpp:2:13
   1    namespace Foo { int x = 1; }
-> 2    int main() {};
(lldb) p x
(int) $0 = 1

That's a feature, not a bug. lldb tries to find the "closest" definition to the frame in which the expression is evaluated, it isn't bound by what the current frame can see. If it can find a unique closest definition for an identifier working out from the current frame, it will that.

This seems the most useful behavior. If your program has a shared library that has a static singleton which stores some interesting information about the library's state, debugger users want to be able to access that variable in the expression evaluator even if they aren't currently stopped in a frame of that library. That's also important for type lookups. If I'm in a frame of library A, which only sees a forward declaration of some type, but library B has the full debug information for that type, it's super useful to be able to keep looking past library A for a full type definition.

This suffers from the same problem as the other patch, where the other index classes (apple_names and debug_names) will essentially never be able to implement this feature. (Unless they re-index the dwarf themselves, that is, but this would defeat the purpose of having the index in the first place.)

I'm not sure I completely understand. Why does changing the manual_dwarf_index mean that the other index classes can't ever implement this feature? If, let's say, debug_names decides to support enum constants, then its data layout should be changed to either include enumerators as globals, or add a new section, or something else. Then it can be properly handled in LLDB too (maybe changing the way manual_dwarf_index works too to keep things consistent). As I understand, manual_dwarf_index can be changed anytime, since we're building it ourselves.

Do I misunderstand how these indexes work and interact with each other?

Maybe "(n)ever" was too strong a word. Still, there is a fundamental difference between the "manual" index and the other two indexes:

  • the manual index is entirely an lldb concept. We can change it whenever we want. We don't need to ask anyone for permission to do that, coordinate with anyone, or care about what anyone else is doing.
  • the contents of the other two indexes is given by a specification (it case of apple_names, it's a local llvm spec, but it still suffers from similar problems, albeit less pronounced). We cannot unilaterally change anything about them. We first need to change the relevant specification, which requires gathering consensus and convincing people that this use case is really worth the extra amount of space that the new entries will introduce. Then we need to wait until a new version of the standard is published and producers (compilers) to start using it. And even them our job is not finished, because we still need to figure out what do we want to do with debug info produced by older producers which still adhere to the old version of the spec.

This is why I am pushing back against the "simple" solution of having the manual index index more things -- it fragments the feature set for users who have the index vs. those who not.

This doesn't mean we cannot change the manual index at all -- it's possible some feature cannot be implemented differently. But it's also possible the feature can be implemented differently, in a way that makes works with all indexes. Or it's possible the feature is not worth the divergence, or the increase in the manual index memory footprint that it produces (i think it would be interesting to get some measurements on that, particularly as one can also expect a similar increase in the on-disk indexes, if they are to support this). So, I think this deserves a wider discussion and exploration of other alternatives.

werat added a comment.Jan 10 2021, 6:25 AM

Thanks for the explanation, this makes sense. I've checked the mailing list archives and it seems there was already a discussion about the enumerators in the .debug_names index back in 2018 -- http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2018-April/004443.html. You were the one to bring it up and the consensus was that the enumerators should go into the index too.

But it seems this was never actually implemented, the latest version of the toolchain doesn't add DW_TAG_enumerator entries to the .debug_names index. Is there a reason for that or it just slipped through the cracks? Should I bring it up again on the mailing list or we can assume the consensus is still the same and this should be just implemented?

Regarding the changes in the manual index. Do you think there's a better way to implement enumerators lookup? To be honest, I don't see another way. Assuming .debug_names and apple_names are updated to include the enumerators, manual_index should be updated as well to match the behaviour.

Thanks for the explanation, this makes sense. I've checked the mailing list archives and it seems there was already a discussion about the enumerators in the .debug_names index back in 2018 -- http://lists.dwarfstd.org/pipermail/dwarf-discuss-dwarfstd.org/2018-April/004443.html. You were the one to bring it up and the consensus was that the enumerators should go into the index too.

But it seems this was never actually implemented, the latest version of the toolchain doesn't add DW_TAG_enumerator entries to the .debug_names index. Is there a reason for that or it just slipped through the cracks? Should I bring it up again on the mailing list or we can assume the consensus is still the same and this should be just implemented?

As I recall, I myself started having doubts about the correctness of that approach, as I started discovering other interesting named objects that were not in the index:

  • using bar = foo; // another name for foo
  • using foo = bar::foo; // same name but in a different namespace
  • using namespace foo; // another name for _everything_ in foo

I started being unsure where should the line be drawn, and (coupled with insufficient time to investigate this in depth) decided to stick to what the spec says. I am also unsure whether that discussion would be sufficiently authoritative to decide on the matter, since this is pretty clearly in contradiction with the wording of the spec. And there's still the question of how much extra space will this feature consume (there's no way to "home" these declarations, so they index entries would have to be emitted for each compile unit that uses them)...

Regarding the changes in the manual index. Do you think there's a better way to implement enumerators lookup?

Maybe? It depends on what kinds of contexts we'd want these enumerators to be recognised. If we're fine with seeing just the enumerators visible in the current scope, then we could parse the current compile unit, looking for enum declarations -- I believe "using namespace" declarations are handled in this way. If we'd want them to be visible globally, then things get trickier. Though there's always the possibility to say that the feature is not worth the cost it brings...

Assuming .debug_names and apple_names are updated to include the enumerators, manual_index should be updated as well to match the behaviour.

Yep, that much we can agree on. :)

werat edited the summary of this revision. (Show Details)Dec 3 2021, 6:06 AM