This is an archive of the discontinued LLVM Phabricator instance.

[lldb-server/linux] Use waitpid(-1) to collect inferior events
ClosedPublic

Authored by labath on Mar 27 2023, 8:48 AM.

Details

Summary

This is a follow-up to D116372, which had a rather unfortunate side
effect of making the processing of a single SIGCHLD quadratic in the
number of threads -- which does not matter for simple applications, but
can get really bad for applications with thousands of threads.

This patch fixes the problem by implementing the other possibility
mentioned in the first patch -- doing waitpid(-1) centrally and then
routing the events to the correct process instance. The "uncollected"
threads are held in the process factory class -- which I've renamed to
Manager for this purpose, as it now does more than creating processes.

Diff Detail

Event Timeline

labath created this revision.Mar 27 2023, 8:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 8:48 AM
Herald added a subscriber: emaste. · View Herald Transcript
labath requested review of this revision.Mar 27 2023, 8:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 8:48 AM
yinghuitan added inline comments.Mar 27 2023, 3:16 PM
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
376–386

Maybe I misunderstood the purpose -- the code stores pid into m_unowned_threads for later collection if none of the process handle it. Instead, I thought we want to store into m_unowned_threads if at least one process did not handle it?

Can you add some comment to clarify the intention? Thanks.

538

Can you add back the original comment which I find useful? Thanks

// The process exited.  We're done monitoring.  Report to delegate.
lldb/source/Plugins/Process/Linux/NativeProcessLinux.h
70–72

I assume no lock is needed?

72

Add a comment for this field.

labath marked 2 inline comments as done.Mar 28 2023, 4:05 AM
labath added inline comments.
lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
376–386

The code is correct. Each event should be handled by exactly one process, but it may not happen immediately (if the process doesn't know about the thread). I've added some comments to clarify.

538

Yeah sure. It got lost because I changed the code a couple of times while working on it.

lldb/source/Plugins/Process/Linux/NativeProcessLinux.h
70–72

Yes. This is all single-threaded.

labath updated this revision to Diff 508957.Mar 28 2023, 4:05 AM
labath marked an inline comment as done.

Address review comments

yinghuitan accepted this revision.Mar 28 2023, 9:27 AM

LGTM.

I have patched and verified this diff indeed fixed the performance issue in our internal service with 3000~4000 threads. Thanks for fixing it!

This revision is now accepted and ready to land.Mar 28 2023, 9:27 AM

@labath I believe this broke the lldb bots, because the declaration of some virtual functions had the const qualifier removed, but not the definition:

/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/tools/lldb-server/lldb-gdbserver.cpp:79:36: error: non-virtual member function marked 'override' hides virtual member function
         MainLoop &mainloop) const override {
                                   ^
/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/include/lldb/Host/common/NativeProcessProtocol.h:302:5: note: hidden overloaded virtual function 'lldb_private::NativeProcessProtocol::Manager::Launch' declared here: different number of parameters (2 vs 3)
    Launch(ProcessLaunchInfo &launch_info,
    ^
/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/tools/lldb-server/lldb-gdbserver.cpp:84:36: error: non-virtual member function marked 'override' hides virtual member function
         MainLoop &mainloop) const override {
                                   ^
/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/include/lldb/Host/common/NativeProcessProtocol.h:324:5: note: hidden overloaded virtual function 'lldb_private::NativeProcessProtocol::Manager::Attach' declared here: different number of parameters (2 vs 3)
    Attach(lldb::pid_t pid, NativeDelegate &native_delegate) = 0;
    ^
/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/tools/lldb-server/lldb-gdbserver.cpp:434:24: error: variable type '(anonymous namespace)::NativeProcessManager' is an abstract class
  NativeProcessManager manager(mainloop);
                       ^
/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/include/lldb/Host/common/NativeProcessProtocol.h:302:5: note: unimplemented pure virtual method 'Launch' in 'NativeProcessManager'
    Launch(ProcessLaunchInfo &launch_info,
    ^
/Users/buildslave/jenkins/workspace/lldb-cmake@2/llvm-project/lldb/include/lldb/Host/common/NativeProcessProtocol.h:324:5: note: unimplemented pure virtual method 'Attach' in 'NativeProcessManager'
    Attach(lldb::pid_t pid, NativeDelegate &native_delegate) = 0;

Could you have a look? https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/53015/console

thakis added a subscriber: thakis.Mar 30 2023, 8:49 AM

Looks like this breaks building on Mac: http://45.33.8.238/macm1/57676/step_4.txt

Please take a look and revert for now if it takes a while to fix.

Thanks Jonas.