This is an archive of the discontinued LLVM Phabricator instance.

Fix global data symbol resolution
AbandonedPublic

Authored by xiaobai on Oct 25 2017, 3:24 PM.

Details

Summary

With this patch I want to do the following:
When searching for global data symbols, only consider non-externally visible
symbols if we are in the module where they are visible.

I was investigating bug 35043
(https://bugs.llvm.org/show_bug.cgi?id=35043)
and found out I was able to reproduce this on a Linux x86-64 machine I have
access to. I was doing some digging, and found out that even when we're not in
the module with visibility, we would still consider internal symbols from other
modules when clang was asking us to find external declarations by name.
So in TestLambdas, we want to define a lambda with parameters (int a, int b).
Clang wants to ensure we don't redeclare our parameters. Through a long chain of
function calls, the call chain ends up
at SymbolContext::FindBestGlobalDataSymbol(), which (on my machine) will find
that there is are symbols named "a" and "b" in another module, but they are not
externally visible. lldb will complain that there are multiple defined internal
symbols, but these shouldn't matter for our lambda.

I don't have as much context as some of y'all about whether or not this kind of
change will present significant problems elsewhere, or if we should tackle this
problem another way. Feedback is very much appreciated.

Event Timeline

xiaobai created this revision.Oct 25 2017, 3:24 PM
clayborg edited edge metadata.Oct 25 2017, 3:42 PM

It is a nice idea. I would still rather fix this in clang so it doesn't ask us about a variable it already knows about. Though this might be a good solution until we can fix it for real though. Sean or Jim?

It is a nice idea. I would still rather fix this in clang so it doesn't ask us about a variable it already knows about. Though this might be a good solution until we can fix it for real though. Sean or Jim?

From what I can tell, in clang it looks like it's looking for a redeclaration of parameters (something like int foo(int x, int x)). Not entirely sure why it does that, and I'm not entirely sure why it would need to look beyond the Function Declaration scope to do that. When I get a chance, I can look into it more.

davide added a subscriber: davide.Oct 25 2017, 4:08 PM

Thanks, I'll try this patch tomorrow.
I know this may be a little off, but how hard is to write a test for this so that it doesn't regress?

jingham requested changes to this revision.Oct 25 2017, 4:13 PM

If I have a library that uses some singleton class that's not exported, and I'm debugging code in a client of my library, and the state of my library is looking odd based on what it is returning to the client, it seems not unnatural to want to call:

(lldb) expr my_library_singleton->WFTDude()

accessing some debugging function I've put in for that purpose. This change would make that not work.

Even worse, it would work when I stepped into the code of my library, but then when I stepped out to the client it would stop working. And the error would be just "unknown symbol my_library_singleton". We wouldn't have a good way to help explain this failure.

If the general vote is "Jim is nuts, no programmer would ever do this" then I guess it is okay, but this seems to me a not-implausible debugging scenario, and I'd really rather not break it unless we can't think of any other way around the problem this is actually trying to solve.

This revision now requires changes to proceed.Oct 25 2017, 4:13 PM

If I have a library that uses some singleton class that's not exported, and I'm debugging code in a client of my library, and the state of my library is looking odd based on what it is returning to the client, it seems not unnatural to want to call:

(lldb) expr my_library_singleton->WFTDude()

accessing some debugging function I've put in for that purpose. This change would make that not work.

While not implausible, it seems as though making my_library_singleton visible from a scope that it is not intended to be visible from is causing the problem I am trying to fix. If your library is acting strange while interacting with your client, it seems like it's your library that you're really interested in debugging (or rather, your library interacting with the client), in which case you should probably enter the library's scope and then try to query its internals. While what you want to do is quite convenient, it possibly introduces more problems than it aims to alleviate.

However, I don't see a problem with adding a flag to expr to say "I want visibility from the scope of this module/library/whatever". In that case, we could create an expression in some dummy function within the context of the module we care about and use that to interact with clang. I'm not sure how feasible this would be, nor how much work it would involve. It would be a less convenient than what exists now, but it would still be more convenient than trying to put a breakpoint somewhere in your library and potentially waiting until its too late to gather any meaningful information.

What do you think?

Even worse, it would work when I stepped into the code of my library, but then when I stepped out to the client it would stop working. And the error would be just "unknown symbol my_library_singleton". We wouldn't have a good way to help explain this failure

I totally agree. The diagnostics of failure here are in poor, at best. I could modify what we do to say "hey, I found the symbol you're talking about in this module, but it's not visible from this context." If my idea about the flag above sounds good, we could even say "you can use this flag if want to see it here and now".

Thanks, I'll try this patch tomorrow.
I know this may be a little off, but how hard is to write a test for this so that it doesn't regress?

I don't think this would be terribly difficult, and I absolutely will write a test so we don't regress on this behavior. I wanted to get feedback on my change before I try to test its behavior.

Thanks for taking care of the test.

About the libmath question. I thought libmath exported a symbol named a ,but I haven't checked the readelf output.
This is what I saw for the symbols:

(lldb) image lookup --address 0x00007ffff77ec290
Address: libm.so.6[0x00000000000a4290] (libm.so.6..rodata + 186832)
Summary: libm.so.6`a

But I think there may be another latent bug because what I see here is:

$ objdump -t libm.so |grep ' a'
00040040 l     O .rodata        00000008              a11.8714
00040030 l     O .rodata        00000008              a9.8712
00040020 l     O .rodata        00000008              a7.8710
00040010 l     O .rodata        00000008              a5.8708
00040008 l     O .rodata        00000008              aa5.8709
00040000 l     O .rodata        00000008              a3.8706
0003fff8 l     O .rodata        00000008              aa3.8707
00040080 l     O .rodata        00000008              a27.8724
00040088 l     O .rodata        00000008              a25.8723
00040078 l     O .rodata        00000008              a23.8722
00040070 l     O .rodata        00000008              a21.8721
00040068 l     O .rodata        00000008              a19.8720
00040060 l     O .rodata        00000008              a17.8719
00040058 l     O .rodata        00000008              a15.8718
00040050 l     O .rodata        00000008              a13.8716
00040048 l     O .rodata        00000008              aa13.8717
00040038 l     O .rodata        00000008              aa11.8715
00040028 l     O .rodata        00000008              aa9.8713
00040018 l     O .rodata        00000008              aa7.8711
00010ee0  w    F .text  000000e5              atan2
0002eb60  w    F .text  00000009              atanl
00021ea0  w    F .text  00000032              atanf
00010fd0  w    F .text  0000007b              atanh
00030090  w    F .text  00000068              acosl
00010e10  w    F .text  00000054              acosh
000230d0  w    F .text  00000063              acosf
00010da0  w    F .text  00000063              acos
00009570  w    F .text  00000032              atan
00010e70  w    F .text  00000063              asin
00023300  w    F .text  0000007b              atanhf
000302a0  w    F .text  0000007d              atanhl
00021da0  w    F .text  000000f1              asinhf
0002ea60  w    F .text  000000f1              asinhl
00023140  w    F .text  00000057              acoshf
00030100  w    F .text  00000059              acoshl
00009470  w    F .text  000000f1              asinh
00030160  w    F .text  00000068              asinl
000231a0  w    F .text  00000063              asinf
000301d0  w    F .text  000000cd              atan2l
00023210  w    F .text  000000e5              atan2f

So, there' no a exported, but a bunch of symbols starting with a and then followed by a decimal.
If you want, I may provide the shared object (or probably it's easier for you to try on Fedora yourself).

Sorry, my bad, I haven't grepped properly, there are 3 internal symbols named a in libm.so.

$ nm /lib64/libm.so.6 | grep " a$"
0000000000093bb0 r a
00000000000c6a80 r a
0000000000093bb0 r a

This is what I get, which seems to match what you said. These are definitely internal to libm. It's a similar story for the symbol b.

So one of the expectations of people using the debugger is that their whole program is available to them wherever they happen to be stopped. People get really upset when we break that fiction. I've experienced this both in Radar and in person... But in this case, it seems like a reasonable expectation:

Imagine the debugging scenario where I had stepped into my library, run my debugging function, stepped out to the client and what got returned didn't make sense with the output of the debugging function. So I want to run it again to see what happened. I certainly don't want to have to navigate back to my library to do that, I don't want the program to change state at all and there may be no functions of my library still on the stack somewhere.

I think if you make me say:

(lldb) expr --pretend-module MyLib.dyld -- my_library_singleton->WFTDude()

you will not make me happy. lldb should not force users to disambiguate things that are not ambiguous. If there were multiple visible my_library_singleton symbols floating around, that would be my fault for choosing a bad name, and as the debugger user I wouldn't feel too bad about having to do something to disambiguate. But if it isn't lldb shouldn't make me have to do more work to specify it.

Note, BTW, we absolutely need some way to say "this symbol from this library". But first of all, if we're going to do this you need to be able to mix & match within an expression which you can't do with a flag to expr. Instead you need something like:

(lldb) expr $$MyDylib$my_symbol + $$MyOtherDylib$my_other_symbol

That syntax is ugly, we should try to think of something better. But the main point is this should only be necessary when lldb can't find a unique symbol. When we can no intervention should be required.

labath edited edge metadata.Oct 25 2017, 5:07 PM

I thought https://reviews.llvm.org/D33083 was supposed to resolve the conflicting-symbol-in-another-library issue by giving precedence to the current module. Why doesn't that apply here?

The problem here is that the name that started the whole query is a local variable in the code we're currently compiling. So clang doesn't need to ask us about it, and in fact since we're still in mid-compilation lldb doesn't know anything about the name we're actually trying to look up.

Actually clang shouldn't be asking us about it at all. It already knows what it is, and it is a local variable so it should know that it takes priority over anything else we might find. But it does, by bad luck we find ONE unambiguous symbol with the same name. We return that to clang and then it says that symbol and the local variable it should never have looked elsewhere for collide.

This fix would only fix one particular version of the problem. For instance if the symbol that was causing problems happened to be external and not internal, this fix wouldn't work. It's just that libm happens to have an internal symbol with a quite unfortunate name: 'a'...

Note, BTW, we absolutely need some way to say "this symbol from this library". But first of all, if we're going to do this you need to be able to mix & match within an expression which you can't do with a flag to expr. Instead you need something like:

(lldb) expr $$MyDylib$my_symbol + $$MyOtherDylib$my_other_symbol

That syntax is ugly, we should try to think of something better. But the main point is this should only be necessary when lldb can't find a unique symbol. When we can no intervention should be required.

I see what you mean. I think I'd agree with you here, being able to mix and match would be a useful thing to be able to do while debugging. The syntax is something we can work on.

Actually clang shouldn't be asking us about it at all. It already knows what it is, and it is a local variable so it should know that it takes priority over anything else we might find.

Right. We get into this whole mess when clang starts to check for redeclaration of parameters in Sema::ActOnParamDeclaration() in $clang_root/lib/Sema/SemaDecl.cpp. It basically takes your parameter and calls LookupName with it. I'm not sure why it would ever need to look beyond the function declaration scope, but that's what it does. It seems to recurse through scopes until it hits the TU scope, and from there it tries to find the symbol a. You can get a rough idea of what it tries to do with this backtrace:

(lldb) bt
* thread #1, name = 'lldb', stop reason = breakpoint 1.1
  * frame #0: 0x00007fada91245a1 liblldb.so.6`lldb_private::SymbolContext::FindBestGlobalDataSymbol(this=0x00000000007e69b0, name=0x00007ffe8e8c7810, error=0x00007ffe8e8c73c0) at SymbolContext.cpp:804
    frame #1: 0x00007fada930b81d liblldb.so.6`lldb_private::ClangExpressionDeclMap::FindExternalVisibleDecls(this=0x0000000000852b60, context=0x00007ffe8e8c7e40, module_sp=nullptr, namespace_decl=0x00007ffe8e8c7cb0, current_id=16) at ClangExpressionDeclMap.cpp:1545
    frame #2: 0x00007fada9308a2b liblldb.so.6`lldb_private::ClangExpressionDeclMap::FindExternalVisibleDecls(this=0x0000000000852b60, context=0x00007ffe8e8c7e40) at ClangExpressionDeclMap.cpp:843
    frame #3: 0x00007fada92d3d03 liblldb.so.6`lldb_private::ClangASTSource::FindExternalVisibleDeclsByName(this=0x0000000000852b60, decl_ctx=0x0000000000934078, clang_decl_name=(Ptr = 9898096)) at ClangASTSource.cpp:261
    frame #4: 0x00007fada90c5c2b liblldb.so.6`lldb_private::ClangASTSource::ClangASTSourceProxy::FindExternalVisibleDeclsByName(this=0x0000000000675040, DC=0x0000000000934078, Name=(Ptr = 9898096)) at ClangASTSource.h:246
    frame #5: 0x00007fadad10a7f1 liblldb.so.6`clang::DeclContext::lookup(this=0x0000000000934078, Name=(Ptr = 9898096)) const at DeclBase.cpp:1542
    frame #6: 0x00007fadac1c2ffa liblldb.so.6`::LookupDirect(S=0x000000000093d7d0, R=0x00007ffe8e8c89c0, DC=0x0000000000934078) at SemaLookup.cpp:843
    frame #7: 0x00007fadac1c35eb liblldb.so.6`::CppNamespaceLookup(S=0x000000000093d7d0, R=0x00007ffe8e8c89c0, Context=0x0000000000926ec0, NS=0x0000000000934078, UDirs=0x00007ffe8e8c8650)::UnqualUsingDirectiveSet &const) at SemaLookup.cpp:942
    frame #8: 0x00007fadac1c4490 liblldb.so.6`clang::Sema::CppLookupName(this=0x000000000093d7d0, R=0x00007ffe8e8c89c0, S=0x00000000009464a0) at SemaLookup.cpp:1322
    frame #9: 0x00007fadac1c5d44 liblldb.so.6`clang::Sema::LookupName(this=0x000000000093d7d0, R=0x00007ffe8e8c89c0, S=0x000000000095aa60, AllowBuiltinCreation=false) at SemaLookup.cpp:1826
    frame #10: 0x00007fadabd285b7 liblldb.so.6`clang::Sema::ActOnParamDeclarator(this=0x000000000093d7d0, S=0x000000000095aa60, D=0x00007ffe8e8c8e30) at SemaDecl.cpp:11775
    frame #11: 0x00007fadab823c62 liblldb.so.6`clang::Parser::ParseParameterDeclarationClause(this=0x0000000000942260, D=0x00007ffe8e8c9b50, FirstArgAttrs=0x00007ffe8e8ca2a0, ParamInfo=0x00007ffe8e8c9930, EllipsisLoc=0x00007ffe8e8ca230) at ParseDecl.cpp:6351
    frame #12: 0x00007fadab866d73 liblldb.so.6`clang::Parser::ParseLambdaExpressionAfterIntroducer(this=0x0000000000942260, Intro=0x00007ffe8e8ca530) at ParseExprCXX.cpp:1127
    frame #13: 0x00007fadab8655fe liblldb.so.6`clang::Parser::ParseLambdaExpression(this=0x0000000000942260) at ParseExprCXX.cpp:685
xiaobai added a comment.EditedOct 25 2017, 5:43 PM

So one of the expectations of people using the debugger is that their whole program is available to them wherever they happen to be stopped. People get really upset when we break that fiction. I've experienced this both in Radar and in person... But in this case, it seems like a reasonable expectation:

Imagine the debugging scenario where I had stepped into my library, run my debugging function, stepped out to the client and what got returned didn't make sense with the output of the debugging function. So I want to run it again to see what happened. I certainly don't want to have to navigate back to my library to do that, I don't want the program to change state at all and there may be no functions of my library still on the stack somewhere.

I think if you make me say:

(lldb) expr --pretend-module MyLib.dyld -- my_library_singleton->WFTDude()

you will not make me happy. lldb should not force users to disambiguate things that are not ambiguous. If there were multiple visible my_library_singleton symbols floating around, that would be my fault for choosing a bad name, and as the debugger user I wouldn't feel too bad about having to do something to disambiguate. But if it isn't lldb shouldn't make me have to do more work to specify it.

I think that the problem here is that lldb is disambiguating and grabbing internal symbols from libraries I don't particularly care about. It seems like clang shouldn't even be asking us about this in the first place, but I don't fully understand why it's trying to do what it's trying to do. Perhaps somebody with more understanding of what clang should be doing can shine some light?

davide added a subscriber: rsmith.Oct 25 2017, 5:45 PM

Richard Smith (@rsmith) owns clang and is familiar with this.

It seems like clang shouldn't even be asking us about this in the first place, but I don't fully understand why it's trying to do what it's trying to do.

It looks like there are two issues here: clang is doing seemingly-unnecessary work as part of a redeclaration lookup, and lldb is reacting badly to clang performing those lookups. The former part is possibly a bug, but we sometimes really do want to do a full lookup for redeclaration checking, particularly for -Wshadow. In general we reserve the right to look up names even when it might not be obvious why, so a clang API consumer shouldn't be assuming otherwise.

It would seem reasonable to me for lldb to pick the symbol from the current module in the case of an ambiguity, and to otherwise make all symbols visible. I think that's generally what gdb does. That said... dynamically changing how a name resolves might cause some problems, particularly in C++, since Clang caches lookup results from ExternalASTSources in the DeclContext lookup tables. If that turns out to be a problem, I think we'll either need to push some part of this disambiguation down into Clang itself or make LLDB invalidate those caches when the current module is switched.

xiaobai abandoned this revision.Jan 31 2018, 4:49 PM

It's clear this is not the way forward. The problem I am trying to solve should be attacked differently. Thanks everyone for your input!

@rsmith: lldb does as you suggest, if it can find a name in the local context, it returns that, if it can find it in the current CU, it returns that, etc... It only returns multiple matches if it can't find a single instance along that chain.

The problem with this case is that from lldb's standpoint the symbol wasn't ambiguous, the only match lldb knew about was the internal libc symbol. The collision was between that libc symbol and a local variable that was defined in the expression that lldb was submitting to clang. At present, we don't know anything about those in flight definitions, we just reply with names from outside the current compilation. So we didn't know there was a closer definition. Is it possible for lldb to peek at those in flight definitions ourselves and return that as the most proximate instance? I think that's the behavior we want. The other thing to do is have clang know to prioritize name matches from decls it made out of the expression over those provided to it by an externalAST source, if that's a thing it is possible to do.

It would be nice if we can try and disable the redeclaration lookups from clang when in debugger mode. I can't remember if we already have such a flag in the compiler options. If we do, we should try to just disable this when in debug expression mode and see how the test suite fares. Unless we are able to lookup the name within the AST that is currently being built so we can re-find the exact same variable declaration and hand it back, disabling this seems to be the correct thing to do for debug expression. Comments?