Page MenuHomePhabricator

Limit LLDB_EXPORT_ALL_SYMBOLS to additionally export only the lldb_private namespace symbols
ClosedPublic

Authored by tfiala on Oct 28 2016, 12:13 PM.

Details

Summary

My original implementation of this last year was to just export everything from liblldb.so. As lldb-mi recently started including a separate copy of llvm, exporting everything from liblldb.so has the side effect of allowing the Linux dynamic linker to resolve liblldb.so and lldb-mi's two copies of llvm to the same underlying storage. This means any global constructors in LLVM (e.g. the 'debug' command line option that triggered this change when NDEBUG is not defined) have side-effects applied twice to the same underlying data.

This change adjusts the -DLLDB_EXPORT_ALL_SYMBOLS flag's implementation, limiting exports to the normal exports plus the lldb_private namespace symbols. This filters out exporting any of the LLVM symbols for the copy inside liblldb.so, maintaining the firewall between the separate copies of llvm. preventing two sets of global constructors from both occurring on the same underlying data.

This is a fix for:
https://llvm.org/bugs/show_bug.cgi?id=30822

Event Timeline

tfiala updated this revision to Diff 76229.Oct 28 2016, 12:13 PM
tfiala retitled this revision from to Limit LLDB_EXPORT_ALL_SYMBOLS to additionally export only the lldb_private namespace symbols.
tfiala updated this object.
tfiala added reviewers: labath, tberghammer.
tfiala added a subscriber: lldb-commits.
tfiala added inline comments.Oct 28 2016, 12:15 PM
API/CMakeLists.txt
106 ↗(On Diff #76229)

I'll change this to:

  • Symbols (liblldb): exporting symbols in the lldb namespace

in the commit to normalize with the message I modified below.

tfiala updated this object.Oct 28 2016, 12:21 PM
labath added a subscriber: beanz.

I am a bit surprised that this fixes the problem. I would have expected that the problem would occur in case when we *do* restrict exports, as in that case, it is much easier to pull in copies of stuff twice. I wonder if in this case we could actually avoid having two copies of llvm.

What would happen if you just remove the call to llvm_config in lldb-mi's CMakeLists.txt. Will you get some unresolved symbols?

Adding @beanz, in case he has some ideas.

I am a bit surprised that this fixes the problem. I would have expected that the problem would occur in case when we *do* restrict exports,

This matches my expectations. When I was using the flag the exported all symbols indiscriminately from liblldb.so, it meant that the llvm symbols for the copy inside liblldb.so were public and known. That public element allows the dynamic linker to coalesce the data storage locations with the ones for the llvm in lldb-mi. Hiding them (as per the normal case, when we only export the 'lldb' public namespace) prevents the dynamic linker from doing the coalesce.

as in that case, it is much easier to pull in copies of stuff twice. I wonder if in this case we could actually avoid having two copies of llvm.

What would happen if you just remove the call to llvm_config in lldb-mi's CMakeLists.txt. Will you get some unresolved symbols?

Adding @beanz, in case he has some ideas.

I am a bit surprised that this fixes the problem. I would have expected that the problem would occur in case when we *do* restrict exports,

This matches my expectations. When I was using the flag the exported all symbols indiscriminately from liblldb.so, it meant that the llvm symbols for the copy inside liblldb.so were public and known. That public element allows the dynamic linker to coalesce the data storage locations with the ones for the llvm in lldb-mi. Hiding them (as per the normal case, when we only export the 'lldb' public namespace) prevents the dynamic linker from doing the coalesce.

So, the global constructor initialization sequence for liblldb.so, which includes a copy of llvm with Debug.cpp and therefore runs the global constructor for adding 'debug' to the command line, does so, but against *shared* locations for the backing storage. And lldb-mi, which also contains a separate copy of the llvm code, also adds the constructor to the init sequence, using the shared data location that was coalesced by the dynamic linker. So two separate copies of the code, using the same coalesced storage, end up blowing up because both instances of the global constructor think they're the only one operating on that data.

The reason why it doesn't blow up in the hidden case is that liblldb.so has already internally resolved the storage, doesn't involve dynamic resolution, and has its own data location. Ditto for lldb-mi's copy. So they are separate islands.

There are two totally separate other issues here that I see, that I think are best handled by other discrete changes:

  1. Fix lldb-mi to not need a separate copy of llvm.
  2. Fix lldb on Linux to not require private symbols to be exposed to get access to them during debugging, when the debug info is available.

Item 1 was recently introduced. Item 2 appears to be a flaw in Linux LLDB, at least in the configurations we use. I plan to tackle #2 in the near future. I seem to recall #1 happened when some refactoring was taking place that ended up requiring private details from llvm?

The reason why it doesn't blow up in the hidden case is that liblldb.so has already internally resolved the storage, doesn't involve dynamic resolution, and has its own data location. Ditto for lldb-mi's copy. So they are separate islands.

There are two totally separate other issues here that I see, that I think are best handled by other discrete changes:

  1. Fix lldb-mi to not need a separate copy of llvm.

lldb-mi has needed bits of llvm since forever, I think. But we have changed which parts exactly we pull, which may have triggered this problem. Fixing this is going to be hard.

  1. Fix lldb on Linux to not require private symbols to be exposed to get access to them during debugging, when the debug info is available.

Item 1 was recently introduced. Item 2 appears to be a flaw in Linux LLDB, at least in the configurations we use. I plan to tackle #2 in the near future.

I take it that by "debugging" you mean "having symbols appear in backtrace(3)" output. That is a unfortunate, but I don't know what can we do about that. I am curious, could you explain what you have in mind?

I seem to recall #1 happened when some refactoring was taking place that ended up requiring private details from llvm?

It was the other way around. lldb-mi stopped using internal llvm API, and switched to the public ones. The issue is that we consider all llvm APIs to be private as far as liblldb is concerned.

Feel free to check this in, if it unblocks you. I don't think it makes things any worse. I'll take a look next week if I can come up with something better,

beanz edited edge metadata.Oct 28 2016, 3:49 PM

This patch ends up hiding the problem, not fixing it. If it unblocks something it might be ok, but it doesn't actually fix the problem.

The underlying problem is that liblldb and lldb-mi have their own copies of LLVM with their own copies of the GlobalParser static (llvm/lib/Support/CommandLine.cpp), and the corresponding static for the debug cl::opt (llvm/lib/Support/Debug.cpp).

When the library exports the LLVM symbols, the dynamic loader loads lldb-mi and it makes the cl::opt constructor point at the same instance of the GlobalParser (following link-once ODR rules).

When the library doesn't export the LLVM symbols, the cl::opt variables inside liblldb resolve to the GlobalParser inside liblldb (because it is local), and the cl::opt variables inside lldb-mi resolve inside lldb-mi because the loader cannot see the GlobalParser from liblldb because the symbol isn't exported. This makes two instances of the GlobalParser exist at the same time.

This does work around the problem Todd was seeing. Unfortunately, it doesn't actually fix the problem, and if lldb-mi is using cl::opt across the library boundary it will cause subtle and difficult to diagnose bugs.

tfiala added a comment.EditedOct 28 2016, 4:07 PM

This patch ends up hiding the problem, not fixing it.

Yes.

If it unblocks something it might be ok, but it doesn't actually fix the problem.

It allows debugging of lldb with lldb on Ubuntu. This broke recently, and the CL here re-enables the ability to do that.

The underlying problem is that liblldb and lldb-mi have their own copies of LLVM with their own copies of the GlobalParser static (llvm/lib/Support/CommandLine.cpp), and the corresponding static for the debug cl::opt (llvm/lib/Support/Debug.cpp).

When the library exports the LLVM symbols, the dynamic loader loads lldb-mi and it makes the cl::opt constructor point at the same instance of the GlobalParser (following link-once ODR rules).

When the library doesn't export the LLVM symbols, the cl::opt variables inside liblldb resolve to the GlobalParser inside liblldb (because it is local), and the cl::opt variables inside lldb-mi resolve inside lldb-mi because the loader cannot see the GlobalParser from liblldb because the symbol isn't exported. This makes two instances of the GlobalParser exist at the same time.

Right. I think I said all that in comments above, but perhaps that is a succinct summary :-)

This does work around the problem Todd was seeing. Unfortunately, it doesn't actually fix the problem, and if lldb-mi is using cl::opt across the library boundary it will cause subtle and difficult to diagnose bugs.

As lldb-mi doesn't use CommandLine to the best of my understanding by searching for cl usages in the code, I don't see that as critical. The current usage of CommandLine via global static constructors in Debug.cpp seems like a poor choice, as it forces all users of llvm with NDEBUG not defined to get some command line behavior that they may never use. Maybe some aspect of that can be made lazy?

Note that this change doesn't introduce the issue of having two islands of CommandLine data --- that is already there in standard builds of lldb-mi --- it simply prevents the "please allow me to debug lldb with lldb" case from getting clobbered by the dynamic linker coalescing which the old implementation of LLDB_EXPORT_ALL_SYMBOLS did not avoid.

Unless there is strong opposition, I'd like to put this in.

beanz added a comment.Oct 28 2016, 4:38 PM

The current usage of CommandLine via global static constructors in Debug.cpp seems like a poor choice, as it forces all users of llvm with NDEBUG not defined to get some command line behavior that they may never use. Maybe some aspect of that can be made lazy?

It can't be made lazy without completely re-designing how cl::opt works. Which is something I've had lengthy discussions with Chandler about. Currently the biggest issue with fixing it relates to how the new pass manager works. Specifically in order to provide meaningful -help output we need a way to register command line options in passes, which we have today, but won't with the new pass manager.

It is an open problem that we're trying to come up with a solution to.

Unless there is strong opposition, I'd like to put this in.

No objection from me.

tfiala added a comment.EditedOct 28 2016, 5:25 PM

The current usage of CommandLine via global static constructors in Debug.cpp seems like a poor choice, as it forces all users of llvm with NDEBUG not defined to get some command line behavior that they may never use. Maybe some aspect of that can be made lazy?

It can't be made lazy without completely re-designing how cl::opt works. Which is something I've had lengthy discussions with Chandler about. Currently the biggest issue with fixing it relates to how the new pass manager works. Specifically in order to provide meaningful -help output we need a way to register command line options in passes, which we have today, but won't with the new pass manager.

It is an open problem that we're trying to come up with a solution to.

Cool - glad to see some thought is going into it.

On other projects where there are firewalls between linkage domains, I've had one subsystem on one side to be initialized with a pointer to the storage location(s) on the other, by way of an init() routine that is called by a third party passing it over the barrier.

Something like this, in terms of our problem here (pseudo-code and maybe wrong naming convention!):

// in lldb-mi::main(), main binary
int main() {
  // Ask lldb-mi's local llvm's CommandLine to return a handle to its internal data structures,
  // and pass that to the agent that controls the llvm embedded in liblldb.
  liblldbInitCommandLine(cl::CommandLine::GetHandle());
  ...
}

// in liblldb so/dylib/dll, exported method:

void liblldbInitCommandLine(void *commandLineHandle) {
  // Pass the given opaque handle to the LLVM CommandLine embedded in liblldb.
  cl::CommandLine::SetHandle(commandLineHandle);
}

// in CommandLine

void *CommandLine::GetHandle() {
  return static_cast<void*>(my_internal_pimpl);
}

void CommandLine::SetHandle(void *external_pimpl_to_use_here) {
  // Make concurrency safe if desired...
  // Merge existing results with those present in the passed in version, if desired (see below)
  delete my_internal_pimpl;
  my_internal_pimpl = (some_type*)(external_pimpl_to_use_here);
}

The trick to this working, though, is to not have uncontrolled, global static initializers permuting these on both sides until after the dance of synchronizing their storage explicitly. And that isn't the case here. We can probably work around it with 'debug' since we know both sides will do it, and so there is no race to avoid. But if other clients on either side do the same thing, we can lose them. Unless, we have some kind of merge semantics on the SetHandle(void*) call, which I suppose could be done to collate any items that were initialized on either side.

Anyways, food for thought... I'm not thrilled with the manual aspect of it, but I have used it in production.

Unless there is strong opposition, I'd like to put this in.

No objection from me.

Okay - Pavel seemed alright enough with it for the time being as well, so I'll put this in, with the one wording correction I called out above.

This revision was automatically updated to reflect the committed changes.

If it unblocks something it might be ok, but it doesn't actually fix the problem.

It allows debugging of lldb with lldb on Ubuntu. This broke recently, and the CL here re-enables the ability to do that.

Could you elaborate on this? I've been debugging lldb without this switch and everything seems to work fine (modulo problems evaluating fancy expressions, but I don't think that is related to this. Or is it?). I thought this only affected which symbols you get in the backtrace output, as it only parses .dynsym sections.

If it unblocks something it might be ok, but it doesn't actually fix the problem.

It allows debugging of lldb with lldb on Ubuntu. This broke recently, and the CL here re-enables the ability to do that.

Could you elaborate on this?

Sure, I'll see if I can reproduce the lack of backtrace symbol lookup that I was experiencing without exporting those symbols back in roughly Oct/Nov 2015. We've been using this flag ever since to address it.

I've been debugging lldb without this switch and everything seems to work fine

That definitely was not my experience when I added the flag. I am seeing if the old scenario holds true on 16.04 x86_64 with the current LLDB codebase. It is possible that the issue we saw manifest has been addressed.) I will make a best effort attempt this weekend, but it may be more like Monday when I will have dedicated time to repro it. If it is no longer an issue, I'd consider this flag deprecated as its initial purpose is no longer valid. If it is still an issue, it could be a config difference between stock Ubuntu and your setup (as we know is at least the case with kernel settings).

(modulo problems evaluating fancy expressions, but I don't think that is related to this. Or is it?).

At the time the issue was symbol lookup - we were somewhere short-circuiting so that we didn't consider debug symbols properly. While I didn't try it at the time as I wasn't aware of it, it is possible the issue may be related to needing the -fno-limit-debug-info flag. When I get to reproducing the original issue, if it does surface, I will try adding that flag as a second step to see if that addresses it.

For expressions that need to evaluate any symbols we were failing to resolve, I could imagine that being at least one factor. Not sure though without a specific case to look at.

More later when I have some results.

Could you elaborate on this?

Sure, I'll see if I can reproduce the lack of backtrace symbol lookup that I was experiencing without exporting those symbols back in roughly Oct/Nov 2015. We've been using this flag ever since to address it.

I've been debugging lldb without this switch and everything seems to work fine

That definitely was not my experience when I added the flag. I am seeing if the old scenario holds true on 16.04 x86_64 with the current LLDB codebase. It is possible that the issue we saw manifest has been addressed.) I will make a best effort attempt this weekend, but it may be more like Monday when I will have dedicated time to repro it. If it is no longer an issue, I'd consider this flag deprecated as its initial purpose is no longer valid. If it is still an issue, it could be a config difference between stock Ubuntu and your setup (as we know is at least the case with kernel settings).

(modulo problems evaluating fancy expressions, but I don't think that is related to this. Or is it?).

At the time the issue was symbol lookup - we were somewhere short-circuiting so that we didn't consider debug symbols properly. While I didn't try it at the time as I wasn't aware of it, it is possible the issue may be related to needing the -fno-limit-debug-info flag. When I get to reproducing the original issue, if it does surface, I will try adding that flag as a second step to see if that addresses it.

For expressions that need to evaluate any symbols we were failing to resolve, I could imagine that being at least one factor. Not sure though without a specific case to look at.

More later when I have some results.

Here is an example of where symbolic lookup of backtraces from within liblldb.so utterly fail without using this flag. In the example below, I'm using:
(lldb) log enable -S lldb process
(lldb) target create /bin/ls
(lldb) run

And here is my output:

Process::SetPublicState (exited) -- unlocking run lock
0  liblldb.so.40   0x00007fb6041f6cd0
1  liblldb.so.40   0x00007fb600834e22
2  liblldb.so.40   0x00007fb600834b3f
3  liblldb.so.40   0x00007fb600a5f601
4  liblldb.so.40   0x00007fb600a5f047
5  liblldb.so.40   0x00007fb6007b897e
6  libpthread.so.0 0x00007fb60841570a
7  libc.so.6       0x00007fb5ffa2882d clone + 109
8  libc.so.6       000000000000000000 clone + 6125632
9  liblldb.so.40   0x00007fb6007e7a68
10 liblldb.so.40   0x00007fb6007e7e96
11 liblldb.so.40   0x00007fb6007b897e
12 libpthread.so.0 0x00007fb60841570a
13 libc.so.6       0x00007fb5ffa2882d clone + 109
Process 31919 exited with status = 0 (0x00000000)
Process::RunPrivateStateThread (arg = 0x1d05580, pid = 31919) thread exiting...
0  liblldb.so.40   0x00007fb6041f6cd0
1  liblldb.so.40   0x00007fb600834e22
2  liblldb.so.40   0x00007fb600834b3f
3  liblldb.so.40   0x00007fb600a5f601
4  liblldb.so.40   0x00007fb600a5f047
5  liblldb.so.40   0x00007fb6007b897e
6  libpthread.so.0 0x00007fb60841570a
7  libc.so.6       0x00007fb5ffa2882d clone + 109
(lldb-superior) MonitorChildProcessThreadFunction (arg = 0x1d25390) thread exiting because pid received exit signal...
0  liblldb.so.40   0x00007fb6041f6cd0
1  liblldb.so.40   0x00007fb600834e22
2  liblldb.so.40   0x00007fb600834b3f
3  liblldb.so.40   0x00007fb60079fd23
4  liblldb.so.40   0x00007fb6007b897e
5  libpthread.so.0 0x00007fb60841570a
6  libc.so.6       0x00007fb5ffa2882d clone + 109
...

The exact cmake line was:

cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm

where the build tree is a sibling directory to the llvm source root directory.

I see very similar behavior when trying to set breakpoints and step into lldb's internals with this same lldb. Those all change to expected behavior when I build and run with the -DLLDB_EXPORT_ALL_SYMBOLS:BOOL=YES setting.

Let me know if you get different behavior without with the same setup. If so, we should try to find out what about your Ubuntu load out is different than a stock Ubuntu, since this clearly affects LLDB developers using Linux.

Here's what you get with the same codebase, but using this cmake line instead:

cmake -GNinja ../llvm -DCMAKE_BUILD_TYPE=Debug -DLLDB_EXPORT_ALL_SYMBOLS:BOOL=YES
bin/lldb
(lldb) log enable -S lldb process
(lldb) target create /bin/ls
Current executable set to '/bin/ls' (x86_64).
(lldb) run
...
Process::SetPublicState (state = launching, restarted = 0)
0  liblldb.so.40 0x00007f8f75b0c970
1  liblldb.so.40 0x00007f8f7214aac2 lldb_private::Log::VAPrintf(char const*, __va_list_tag*) + 712
2  liblldb.so.40 0x00007f8f7214a7df lldb_private::Log::Printf(char const*, ...) + 195
3  liblldb.so.40 0x00007f8f7236c66e lldb_private::Process::SetPublicState(lldb::StateType, bool) + 100
4  liblldb.so.40 0x00007f8f72370ae5 lldb_private::Process::Launch(lldb_private::ProcessLaunchInfo&) + 667
5  liblldb.so.40 0x00007f8f7262cbaf lldb_private::platform_linux::PlatformLinux::DebugProcess(lldb_private::ProcessLaunchInfo&, lldb_private::Debugger&, lldb_private::Target*, lldb_private::Error&) + 1839
6  liblldb.so.40 0x00007f8f723bd932 lldb_private::Target::Launch(lldb_private::ProcessLaunchInfo&, lldb_private::Stream*) + 1288
7  liblldb.so.40 0x00007f8f7281ffd6
8  liblldb.so.40 0x00007f8f7221ffa0 lldb_private::CommandObjectParsed::Execute(char const*, lldb_private::CommandReturnObject&) + 808
9  liblldb.so.40 0x00007f8f722102d2 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, lldb_private::ExecutionContext*, bool, bool) + 2468
10 liblldb.so.40 0x00007f8f72214748 lldb_private::CommandInterpreter::IOHandlerInputComplete(lldb_private::IOHandler&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) + 324
11 liblldb.so.40 0x00007f8f7212b71d lldb_private::IOHandlerEditline::Run() + 545
12 liblldb.so.40 0x00007f8f720fb32e lldb_private::Debugger::ExecuteIOHandlers() + 110
13 liblldb.so.40 0x00007f8f72215488 lldb_private::CommandInterpreter::RunCommandInterpreter(bool, bool, lldb_private::CommandInterpreterRunOptions&) + 176
14 liblldb.so.40 0x00007f8f71eb314b lldb::SBDebugger::RunCommandInterpreter(bool, bool) + 117
15 lldb          0x0000000000407045
16 lldb          0x0000000000407408
17 libc.so.6     0x00007f8f7111f830 __libc_start_main + 240
18 lldb          0x00000000004038e9
MonitorChildProcessThreadFunction (arg = 0x12883e0) thread starting...
0  liblldb.so.40   0x00007f8f75b0c970
1  liblldb.so.40   0x00007f8f7214aac2 lldb_private::Log::VAPrintf(char const*, __va_list_tag*) + 712
2  liblldb.so.40   0x00007f8f7214a7df lldb_private::Log::Printf(char const*, ...) + 195
3  liblldb.so.40   0x00007f8f720b553f
4  liblldb.so.40   0x00007f8f720ce61e lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) + 236
5  libpthread.so.0 0x00007f8f79d3770a
6  libc.so.6       0x00007f8f7120582d clone + 109
MonitorChildProcessThreadFunction ::waitpid (pid = 46995, &status, options = 1073741824)...
0  liblldb.so.40   0x00007f8f75b0c970
1  liblldb.so.40   0x00007f8f7214aac2 lldb_private::Log::VAPrintf(char const*, __va_list_tag*) + 712
2  liblldb.so.40   0x00007f8f7214a7df lldb_private::Log::Printf(char const*, ...) + 195
3  liblldb.so.40   0x00007f8f720b56a4
4  liblldb.so.40   0x00007f8f720ce61e lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) + 236
5  libpthread.so.0 0x00007f8f79d3770a
6  libc.so.6       0x00007f8f7120582d clone + 109
started monitoring child process.
0  liblldb.so.40   0x00007f8f75b0c970
1  liblldb.so.40   0x00007f8f7214aac2 lldb_private::Log::VAPrintf(char const*, __va_list_tag*) + 712
2  liblldb.so.40   0x00007f8f7214a7df lldb_private::Log::Printf(char const*, ...) + 195
3  liblldb.so.40   0x00007f8f720b553f
4  liblldb.so.40   0x00007f8f720ce61e lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*) + 236
5  libpthread.so.0 0x00007f8f79d3770a
6  libc.so.6       0x00007f8f7120582d clone + 109
7  libc.so.6       000000000000000000 clone + 2397022272
8  liblldb.so.40   0x00007f8f725afe6c lldb_private::process_gdb_remote::ProcessGDBRemote::EstablishConnectionIfNeeded(lldb_private::ProcessInfo const&) + 212
9  liblldb.so.40   0x00007f8f725a4364 lldb_private::process_gdb_remote::ProcessGDBRemote::DoLaunch(lldb_private::Module*, lldb_private::ProcessLaunchInfo&) + 1072
10 liblldb.so.40   0x00007f8f72370b3d lldb_private::Process::Launch(lldb_private::ProcessLaunchInfo&) + 755
11 liblldb.so.40   0x00007f8f7262cbaf lldb_private::platform_linux::PlatformLinux::DebugProcess(lldb_private::ProcessLaunchInfo&, lldb_private::Debugger&, lldb_private::Target*, lldb_private::Error&) + 1839
12 liblldb.so.40   0x00007f8f723bd932 lldb_private::Target::Launch(lldb_private::ProcessLaunchInfo&, lldb_private::Stream*) + 1288
13 liblldb.so.40   0x00007f8f7281ffd6
14 liblldb.so.40   0x00007f8f7221ffa0 lldb_private::CommandObjectParsed::Execute(char const*, lldb_private::CommandReturnObject&) + 808
15 liblldb.so.40   0x00007f8f722102d2 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, lldb_private::ExecutionContext*, bool, bool) + 2468
16 liblldb.so.40   0x00007f8f72214748 lldb_private::CommandInterpreter::IOHandlerInputComplete(lldb_private::IOHandler&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) + 324
17 liblldb.so.40   0x00007f8f7212b71d lldb_private::IOHandlerEditline::Run() + 545
18 liblldb.so.40   0x00007f8f720fb32e lldb_private::Debugger::ExecuteIOHandlers() + 110
19 liblldb.so.40   0x00007f8f72215488 lldb_private::CommandInterpreter::RunCommandInterpreter(bool, bool, lldb_private::CommandInterpreterRunOptions&) + 176
20 liblldb.so.40   0x00007f8f71eb314b lldb::SBDebugger::RunCommandInterpreter(bool, bool) + 117
21 lldb            0x0000000000407045
22 lldb            0x0000000000407408
23 libc.so.6       0x00007f8f7111f830 __libc_start_main + 240
24 lldb            0x00000000004038e9
Process::SetPrivateState (stopped)
...

We get real symbolication in this case. Setting breakpoints and doing '(lldb) bt' and the like show similar results.

The snippets you showed are pretty-much expected behavior. The backtrace that gets printed as a part of the log messages is coming from the backtrace(3) library, which has pretty limited backtracing capabilities -- it only looks at the symbols in the .dynsym section (because that's the only thing that is loaded into memory). I am pretty sure you would have problems backtracing if you compiled with -fomit-frame-pointer as well.

I think you fix of just making sure that all symbols show up in the .dynsym section would be fine, were it not for that fact that we have an inconsistent linking policy, which means you get wildly different links depending on whether you export something or not. If we get that straight that this would just be a size optimization (that was the reason we introduced it), with no impact on the behavior, that we could even consider turning off by default (I personally have never used the -S option of logging, but I can see how it could be useful in some workflows). I'll get back to the link policy in a separate email.

That said, you mentioned you also had some problems with setting breakpoints and stuff. Now, if this was true, that would be extremely worrying, but am not able to reproduce that on my side -- breakpoints everything else works fine. The reason for that is that lldb reads the .symtab section (if it is present, which it will be unless you strip the executable), and these linker options do not affect that section. If that is not the case, then we definitely need to figure that out (I suggest stepping through ObjectFileELF::GetSymtab() to get an initial idea of what is going on).

FWIW, this is how a debug session looks like for me. The backtrace() output has addresses only, but lldb is able to symbolicate it with no problem:

$ bin/lldb -- bin/lldb /bin/ls
(lldb) target create "bin/lldb"
Current executable set to 'bin/lldb' (x86_64).
(lldb) settings set -- target.run-args  "/bin/ls"
(lldb) br set -n Log::VAPrintf
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) pr la
Process 87438 launched: '/usr/local/google/home/labath/ll/build/dbg/bin/lldb' (x86_64)
1 location added to breakpoint 1
Process 87438 stopped and restarted: thread 1 received signal: SIGCHLD
(lldb) target create "/bin/ls"
Current executable set to '/bin/ls' (x86_64).
(lldb) log enable -S lldb process
(lldb) pr la
Process 87438 stopped
* thread #1: tid = 87438, 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s at least one of stdin/stdout/stderr was not set, evaluating default handling", args=0x00007fffffffc1a0) + 27 at Log.cpp:73, name = 'lldb', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s at least one of stdin/stdout/stderr was not set, evaluating default handling", args=0x00007fffffffc1a0) + 27 at Log.cpp:73
   70  	void Log::VAPrintf(const char *format, va_list args) {
   71  	  // Make a copy of our stream shared pointer in case someone disables our
   72  	  // log while we are logging and releases the stream
-> 73  	  StreamSP stream_sp(m_stream_sp);
   74  	  if (stream_sp) {
   75  	    static uint32_t g_sequence_id = 0;
   76  	    StreamString header;
(lldb) c
Process 87438 resuming
ProcessLaunchInfo::FinalizeFileActions at least one of stdin/stdout/stderr was not set, evaluating default handling
0  liblldb.so.40 0x00007ffff4d5a36f
1  liblldb.so.40 0x00007ffff0d1d49f
2  liblldb.so.40 0x00007ffff0d1cfbb
3  liblldb.so.40 0x00007ffff0fff4c4
4  liblldb.so.40 0x00007ffff103ad63
5  liblldb.so.40 0x00007ffff15c4f9f
6  liblldb.so.40 0x00007ffff0e33fe0
7  liblldb.so.40 0x00007ffff0e20317
8  liblldb.so.40 0x00007ffff0e25c0d
9  liblldb.so.40 0x00007ffff0e265a7
10 liblldb.so.40 0x00007ffff0cf5a60
11 liblldb.so.40 0x00007ffff0cb5c3c
12 liblldb.so.40 0x00007ffff0e26fdf
13 liblldb.so.40 0x00007ffff09d8d21 lldb::SBDebugger::RunCommandInterpreter(bool, bool) + 129
14 lldb          0x0000000000407c15
15 lldb          0x0000000000408257
16 libc.so.6     0x00007fffeeeb8f45 __libc_start_main + 245
17 lldb          0x00000000004038d9
Process 87438 stopped
* thread #1: tid = 87438, 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73, name = 'lldb', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73
   70  	void Log::VAPrintf(const char *format, va_list args) {
   71  	  // Make a copy of our stream shared pointer in case someone disables our
   72  	  // log while we are logging and releases the stream
-> 73  	  StreamSP stream_sp(m_stream_sp);
   74  	  if (stream_sp) {
   75  	    static uint32_t g_sequence_id = 0;
   76  	    StreamString header;
(lldb) bt
* thread #1: tid = 87438, 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73, name = 'lldb', stop reason = breakpoint 1.1
  * frame #0: 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73
    frame #1: 0x00007ffff0d1cfbb liblldb.so.40`lldb_private::Log::Printf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'") + 347 at Log.cpp:61
    frame #2: 0x00007ffff0fff970 liblldb.so.40`lldb_private::ProcessLaunchInfo::FinalizeFileActions(this=0x00000000004a0170, target=0x000000000058bea0, default_to_use_pty=true) + 1376 at ProcessLaunchInfo.cpp:260
    frame #3: 0x00007ffff103ad63 liblldb.so.40`lldb_private::Target::Launch(this=0x000000000058bea0, launch_info=0x00000000004a0170, stream=0x00007fffffffcaa8) + 1315 at Target.cpp:2835
    frame #4: 0x00007ffff15c4f9f liblldb.so.40`CommandObjectProcessLaunch::DoExecute(this=0x00000000004a0030, launch_args=0x00007fffffffcd68, result=0x00007fffffffd4d8) + 1647 at CommandObjectProcess.cpp:233
    frame #5: 0x00007ffff0e33fe0 liblldb.so.40`lldb_private::CommandObjectParsed::Execute(this=0x00000000004a0030, args_string="", result=0x00007fffffffd4d8) + 1344 at CommandObject.cpp:1008
    frame #6: 0x00007ffff0e20317 liblldb.so.40`lldb_private::CommandInterpreter::HandleCommand(this=0x0000000000482750, command_line="pr la", lazy_add_to_history=eLazyBoolCalculate, result=0x00007fffffffd4d8, override_context=0x0000000000000000, repeat_on_empty_command=true, no_context_switching=false) + 3527 at CommandInterpreter.cpp:1679
    frame #7: 0x00007ffff0e25c0d liblldb.so.40`lldb_private::CommandInterpreter::IOHandlerInputComplete(this=0x0000000000482750, io_handler=0x0000000000583190, line="pr la") + 301 at CommandInterpreter.cpp:2713
    frame #8: 0x00007ffff0e265a7 liblldb.so.40`non-virtual thunk to lldb_private::CommandInterpreter::IOHandlerInputComplete(this=0x0000000000482750, io_handler=0x0000000000583190, line="pr la") + 55 at CommandInterpreter.cpp:2689
    frame #9: 0x00007ffff0cf5a60 liblldb.so.40`lldb_private::IOHandlerEditline::Run(this=0x0000000000583190) + 608 at IOHandler.cpp:552
    frame #10: 0x00007ffff0cb5c3c liblldb.so.40`lldb_private::Debugger::ExecuteIOHandlers(this=0x0000000000480e90) + 140 at Debugger.cpp:899
    frame #11: 0x00007ffff0e26fdf liblldb.so.40`lldb_private::CommandInterpreter::RunCommandInterpreter(this=0x0000000000482750, auto_handle_events=true, spawn_thread=false, options=0x00007fffffffd7d8) + 223 at CommandInterpreter.cpp:2910
    frame #12: 0x00007ffff09d8d21 liblldb.so.40`lldb::SBDebugger::RunCommandInterpreter(this=0x00007fffffffdb48, auto_handle_events=true, spawn_thread=false) + 129 at SBDebugger.cpp:814
    frame #13: 0x0000000000407c15 lldb`Driver::MainLoop(this=0x00007fffffffdb28) + 4341 at Driver.cpp:1156
    frame #14: 0x0000000000408257 lldb`main(argc=2, argv=0x00007fffffffdd28) + 471 at Driver.cpp:1254
    frame #15: 0x00007fffeeeb8f45 libc.so.6`__libc_start_main(main=(lldb`main at Driver.cpp:1218), argc=2, argv=0x00007fffffffdd28, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffdd18) + 245 at libc-start.c:287
    frame #16: 0x00000000004038d9 lldb`_start + 41

The snippets you showed are pretty-much expected behavior. The backtrace that gets printed as a part of the log messages is coming from the backtrace(3) library, which has pretty limited backtracing capabilities -- it only looks at the symbols in the .dynsym section (because that's the only thing that is loaded into memory). I am pretty sure you would have problems backtracing if you compiled with -fomit-frame-pointer as well.

I think you fix of just making sure that all symbols show up in the .dynsym section would be fine, were it not for that fact that we have an inconsistent linking policy, which means you get wildly different links depending on whether you export something or not. If we get that straight that this would just be a size optimization (that was the reason we introduced it), with no impact on the behavior, that we could even consider turning off by default (I personally have never used the -S option of logging, but I can see how it could be useful in some workflows). I'll get back to the link policy in a separate email.

That said, you mentioned you also had some problems with setting breakpoints and stuff. Now, if this was true, that would be extremely worrying, but am not able to reproduce that on my side -- breakpoints everything else works fine. The reason for that is that lldb reads the .symtab section (if it is present, which it will be unless you strip the executable), and these linker options do not affect that section. If that is not the case, then we definitely need to figure that out (I suggest stepping through ObjectFileELF::GetSymtab() to get an initial idea of what is going on).

FWIW, this is how a debug session looks like for me. The backtrace() output has addresses only, but lldb is able to symbolicate it with no problem:

$ bin/lldb -- bin/lldb /bin/ls
(lldb) target create "bin/lldb"
Current executable set to 'bin/lldb' (x86_64).
(lldb) settings set -- target.run-args  "/bin/ls"
(lldb) br set -n Log::VAPrintf
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) pr la
Process 87438 launched: '/usr/local/google/home/labath/ll/build/dbg/bin/lldb' (x86_64)
1 location added to breakpoint 1
Process 87438 stopped and restarted: thread 1 received signal: SIGCHLD
(lldb) target create "/bin/ls"
Current executable set to '/bin/ls' (x86_64).
(lldb) log enable -S lldb process
(lldb) pr la
Process 87438 stopped
* thread #1: tid = 87438, 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s at least one of stdin/stdout/stderr was not set, evaluating default handling", args=0x00007fffffffc1a0) + 27 at Log.cpp:73, name = 'lldb', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s at least one of stdin/stdout/stderr was not set, evaluating default handling", args=0x00007fffffffc1a0) + 27 at Log.cpp:73
   70  	void Log::VAPrintf(const char *format, va_list args) {
   71  	  // Make a copy of our stream shared pointer in case someone disables our
   72  	  // log while we are logging and releases the stream
-> 73  	  StreamSP stream_sp(m_stream_sp);
   74  	  if (stream_sp) {
   75  	    static uint32_t g_sequence_id = 0;
   76  	    StreamString header;
(lldb) c
Process 87438 resuming
ProcessLaunchInfo::FinalizeFileActions at least one of stdin/stdout/stderr was not set, evaluating default handling
0  liblldb.so.40 0x00007ffff4d5a36f
1  liblldb.so.40 0x00007ffff0d1d49f
2  liblldb.so.40 0x00007ffff0d1cfbb
3  liblldb.so.40 0x00007ffff0fff4c4
4  liblldb.so.40 0x00007ffff103ad63
5  liblldb.so.40 0x00007ffff15c4f9f
6  liblldb.so.40 0x00007ffff0e33fe0
7  liblldb.so.40 0x00007ffff0e20317
8  liblldb.so.40 0x00007ffff0e25c0d
9  liblldb.so.40 0x00007ffff0e265a7
10 liblldb.so.40 0x00007ffff0cf5a60
11 liblldb.so.40 0x00007ffff0cb5c3c
12 liblldb.so.40 0x00007ffff0e26fdf
13 liblldb.so.40 0x00007ffff09d8d21 lldb::SBDebugger::RunCommandInterpreter(bool, bool) + 129
14 lldb          0x0000000000407c15
15 lldb          0x0000000000408257
16 libc.so.6     0x00007fffeeeb8f45 __libc_start_main + 245
17 lldb          0x00000000004038d9
Process 87438 stopped
* thread #1: tid = 87438, 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73, name = 'lldb', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73
   70  	void Log::VAPrintf(const char *format, va_list args) {
   71  	  // Make a copy of our stream shared pointer in case someone disables our
   72  	  // log while we are logging and releases the stream
-> 73  	  StreamSP stream_sp(m_stream_sp);
   74  	  if (stream_sp) {
   75  	    static uint32_t g_sequence_id = 0;
   76  	    StreamString header;
(lldb) bt
* thread #1: tid = 87438, 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73, name = 'lldb', stop reason = breakpoint 1.1
  * frame #0: 0x00007ffff0d1cfeb liblldb.so.40`lldb_private::Log::VAPrintf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'", args=0x00007fffffffc1a0) + 27 at Log.cpp:73
    frame #1: 0x00007ffff0d1cfbb liblldb.so.40`lldb_private::Log::Printf(this=0x000000000059a560, format="ProcessLaunchInfo::%s target stdin='%s', target stdout='%s', stderr='%s'") + 347 at Log.cpp:61
    frame #2: 0x00007ffff0fff970 liblldb.so.40`lldb_private::ProcessLaunchInfo::FinalizeFileActions(this=0x00000000004a0170, target=0x000000000058bea0, default_to_use_pty=true) + 1376 at ProcessLaunchInfo.cpp:260
    frame #3: 0x00007ffff103ad63 liblldb.so.40`lldb_private::Target::Launch(this=0x000000000058bea0, launch_info=0x00000000004a0170, stream=0x00007fffffffcaa8) + 1315 at Target.cpp:2835
    frame #4: 0x00007ffff15c4f9f liblldb.so.40`CommandObjectProcessLaunch::DoExecute(this=0x00000000004a0030, launch_args=0x00007fffffffcd68, result=0x00007fffffffd4d8) + 1647 at CommandObjectProcess.cpp:233
    frame #5: 0x00007ffff0e33fe0 liblldb.so.40`lldb_private::CommandObjectParsed::Execute(this=0x00000000004a0030, args_string="", result=0x00007fffffffd4d8) + 1344 at CommandObject.cpp:1008
    frame #6: 0x00007ffff0e20317 liblldb.so.40`lldb_private::CommandInterpreter::HandleCommand(this=0x0000000000482750, command_line="pr la", lazy_add_to_history=eLazyBoolCalculate, result=0x00007fffffffd4d8, override_context=0x0000000000000000, repeat_on_empty_command=true, no_context_switching=false) + 3527 at CommandInterpreter.cpp:1679
    frame #7: 0x00007ffff0e25c0d liblldb.so.40`lldb_private::CommandInterpreter::IOHandlerInputComplete(this=0x0000000000482750, io_handler=0x0000000000583190, line="pr la") + 301 at CommandInterpreter.cpp:2713
    frame #8: 0x00007ffff0e265a7 liblldb.so.40`non-virtual thunk to lldb_private::CommandInterpreter::IOHandlerInputComplete(this=0x0000000000482750, io_handler=0x0000000000583190, line="pr la") + 55 at CommandInterpreter.cpp:2689
    frame #9: 0x00007ffff0cf5a60 liblldb.so.40`lldb_private::IOHandlerEditline::Run(this=0x0000000000583190) + 608 at IOHandler.cpp:552
    frame #10: 0x00007ffff0cb5c3c liblldb.so.40`lldb_private::Debugger::ExecuteIOHandlers(this=0x0000000000480e90) + 140 at Debugger.cpp:899
    frame #11: 0x00007ffff0e26fdf liblldb.so.40`lldb_private::CommandInterpreter::RunCommandInterpreter(this=0x0000000000482750, auto_handle_events=true, spawn_thread=false, options=0x00007fffffffd7d8) + 223 at CommandInterpreter.cpp:2910
    frame #12: 0x00007ffff09d8d21 liblldb.so.40`lldb::SBDebugger::RunCommandInterpreter(this=0x00007fffffffdb48, auto_handle_events=true, spawn_thread=false) + 129 at SBDebugger.cpp:814
    frame #13: 0x0000000000407c15 lldb`Driver::MainLoop(this=0x00007fffffffdb28) + 4341 at Driver.cpp:1156
    frame #14: 0x0000000000408257 lldb`main(argc=2, argv=0x00007fffffffdd28) + 471 at Driver.cpp:1254
    frame #15: 0x00007fffeeeb8f45 libc.so.6`__libc_start_main(main=(lldb`main at Driver.cpp:1218), argc=2, argv=0x00007fffffffdd28, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffdd18) + 245 at libc-start.c:287
    frame #16: 0x00000000004038d9 lldb`_start + 41

Okay, thanks for the details on that.

At the time, both the log backtrace functionality (via backtrace(3) as you mentioned) and the actual backtracing were suffering IIRC. I wouldn't be surprised if the latter was fixed in the last year given the number of symbolic lookup changes that took place. I may also be mistaken on that. I'll have to go back and find the particulars. I may need to go back to the old version (circa August-Oct 2015) to see what it looked like back then. It was pretty much a show stopper for getting any debugging of lldb to work on Linux.

I'm glad to hear backtracing in general is working.

FWIW on macOS and related platforms, our atos and backtrace(3)-like functionality do not require public symbols to work in the presence of debug info. It is unfortunate, from a feature-parity perspective, that we require exporting private symbols on Linux to get similar functionality on the macOS platform side, at least as far as log stacktraces go. Now that I think of it, I recall digging into that a bit back then, and musing that we might be able to replace the backtrace(3) strategy for stacktraces with our own backtracing logic.