Page MenuHomePhabricator

Fix for lldb-mi crash in Listener code if -exec-abort MI command was invoked without getting process stopped
Needs RevisionPublic

Authored by KLapshin on Sep 18 2015, 8:54 AM.

Details

Summary

This patch fixes lldb core crash in Listener waiting for process ended and operates with already invalid data if lldb-mi requested process destroy in -exec-abort command handler without getting process stopped at first.

To avoid crash in existing code MI user should do process stop explicitly:

-exec-interrupt
-exec-abort

After investigation it was revealed what cause was in Process::HaltForDestroyOrDetach() - WaitForProcessToStop() used improperly - without hijacked listener created. Default listener argument is NULL (see WaitForProcessToStop default params). So initial workaround patch in lldb-MI module -exec-abort handler not needed anymore with patched Target\Process.cpp:

-exec-run
....
-exec-abort
^done
(gdb)
=thread-exited,id="1",group-id="i1"
=thread-exited,id="2",group-id="i1"
=thread-exited,id="3",group-id="i1"
=thread-exited,id="4",group-id="i1"
=thread-exited,id="5",group-id="i1"
=thread-exited,id="6",group-id="i1"
(gdb)
(gdb)
=thread-exited,id="1",group-id="i1"
=thread-group-exited,id="i1",exit-code="0"
*stopped,reason="exited-normally"
(gdb)
-gdb-exit
^exit
=thread-group-exited,id="i1"
(gdb)

Backtrace corresponding to crash if process running and MI user invoked -exec-abort without getting target stop:

Process:               lldb-mi-3.8.0 [13039]
Path:                  /Users/USER/*/lldb-mi-3.8.0
Identifier:            lldb-mi-3.8.0
Version:               0
Code Type:             X86-64 (Native)
Parent Process:        ??? [13038]
Responsible:           lldb-mi-3.8.0 [13039]
User ID:               501

Date/Time:             2015-07-24 15:08:07.807 +0300
OS Version:            Mac OS X 10.11 (15A226f)
Report Version:        11

Time Awake Since Boot: 170000 seconds

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000008200000092
Exception Note:        EXC_CORPSE_NOTIFY

VM Regions Near 0x8200000092:
    Process Corpse Info    0000000135abc000-0000000135cbc000 [ 2048K] rw-/rwx SM=COW  
--> 
    STACK GUARD            0000700000000000-0000700000001000 [    4K] ---/rwx SM=NUL  stack guard for thread 1

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   liblldb.3.8.0.dylib           	0x0000000104b952d4 EventMatcher::operator()(std::__1::shared_ptr<lldb_private::Event> const&) const + 52 (memory:3954)
1   liblldb.3.8.0.dylib           	0x0000000104b90dc4 lldb_private::Listener::FindNextEventInternal(lldb_private::Broadcaster*, lldb_private::ConstString const*, unsigned int, unsigned int, std::__1::shared_ptr<lldb_private::Event>&, bool) + 1188 (algorithm:878)
2   liblldb.3.8.0.dylib           	0x0000000104b9173a lldb_private::Listener::GetNextEventInternal(lldb_private::Broadcaster*, lldb_private::ConstString const*, unsigned int, unsigned int, std::__1::shared_ptr<lldb_private::Event>&) + 74 (Listener.cpp:373)
3   liblldb.3.8.0.dylib           	0x0000000104b91a5b lldb_private::Listener::WaitForEventsInternal(lldb_private::TimeValue const*, lldb_private::Broadcaster*, lldb_private::ConstString const*, unsigned int, unsigned int, std::__1::shared_ptr<lldb_private::Event>&) + 571 (Listener.cpp:419)
4   liblldb.3.8.0.dylib           	0x0000000104b91ee3 lldb_private::Listener::WaitForEventForBroadcasterWithType(lldb_private::TimeValue const*, lldb_private::Broadcaster*, unsigned int, std::__1::shared_ptr<lldb_private::Event>&) + 83 (Listener.cpp:469)
5   liblldb.3.8.0.dylib           	0x0000000104f8391a lldb_private::Process::WaitForStateChangedEvents(lldb_private::TimeValue const*, std::__1::shared_ptr<lldb_private::Event>&, lldb_private::Listener*) + 202 (Process.cpp:1328)
6   liblldb.3.8.0.dylib           	0x0000000104f834cc lldb_private::Process::WaitForProcessToStop(lldb_private::TimeValue const*, std::__1::shared_ptr<lldb_private::Event>*, bool, lldb_private::Listener*, lldb_private::Stream*) + 908 (Process.cpp:1012)
7   liblldb.3.8.0.dylib           	0x0000000104f93acd lldb_private::Process::HaltForDestroyOrDetach(std::__1::shared_ptr<lldb_private::Event>&) + 333 (Process.cpp:3936)
8   liblldb.3.8.0.dylib           	0x0000000104f820e2 lldb_private::Process::Destroy(bool) + 322 (Process.cpp:4060)
9   liblldb.3.8.0.dylib           	0x00000001031e9e49 lldb::SBProcess::Destroy() + 169 (SBProcess.cpp:793)
10  lldb-mi-3.8.0                 	0x0000000102f89539 CMICmdCmdExecAbort::Execute() + 777 (MICmdCmdExec.cpp:1249)
11  lldb-mi-3.8.0                 	0x0000000102f898cc non-virtual thunk to CMICmdCmdExecAbort::Execute() + 28 (MICmdCmdExec.cpp:1239)
12  lldb-mi-3.8.0                 	0x0000000102fcec0f CMICmdInvoker::CmdExecute(CMICmdBase&) + 255 (MICmdInvoker.cpp:204)
13  lldb-mi-3.8.0                 	0x0000000102fd3b55 CMICmdMgr::CmdExecute(SMICmdData const&) + 805 (MICmdMgr.cpp:199)
14  lldb-mi-3.8.0                 	0x0000000103023326 CMIDriver::ExecuteCommand(SMICmdData const&) + 38 (MIDriver.cpp:990)
15  lldb-mi-3.8.0                 	0x00000001030212ea CMIDriver::InterpretCommandThisDriver(CMIUtilString const&, bool&) + 234 (MIDriver.cpp:942)
16  lldb-mi-3.8.0                 	0x00000001030208ee CMIDriver::InterpretCommand(CMIUtilString const&) + 62 (MIDriver.cpp:828)
17  lldb-mi-3.8.0                 	0x000000010301f7c7 CMIDriver::DoMainLoop() + 1431 (MIDriver.cpp:575)
18  lldb-mi-3.8.0                 	0x00000001030209cc non-virtual thunk to CMIDriver::DoMainLoop() + 28 (MIDriver.cpp:526)
19  lldb-mi-3.8.0                 	0x000000010302e094 CMIDriverMgr::DriverMainLoop() + 68 (MIDriverMgr.cpp:294)
20  lldb-mi-3.8.0                 	0x000000010302bf9d main + 237 (MIDriverMain.cpp:174)
21  libdyld.dylib                 	0x00007fff8d93a5ad start + 1
...

Diff Detail

Repository
rL LLVM

Event Timeline

KLapshin updated this revision to Diff 35089.Sep 18 2015, 8:54 AM
KLapshin retitled this revision from to Fix for lldb-mi crash in Listener code if -exec-abort MI command was invoked without getting process stopped.
KLapshin updated this object.
KLapshin added reviewers: dawn, ki.stfu, abidh.
KLapshin set the repository for this revision to rL LLVM.
KLapshin added a subscriber: lldb-commits.
KLapshin updated this object.Sep 18 2015, 9:13 AM
KLapshin updated this object.
KLapshin updated this object.Sep 18 2015, 9:15 AM
ki.stfu edited edge metadata.EditedSep 18 2015, 10:26 AM
ki.stfu added a subscriber: clayborg.

For me it looks like a workaround because here is assumed the SBProcess::Destroy will do all required work.

@clayborg, is it permissible to do Process::Destroy(force_kill=false) for a running process?

If so, it's a bug in lldb core, or we use SBProcess by the wrong way.

You should be able to just call Destroy(), we shouldn't crash. We will need to fix the core it allow this since people can/will call this.

To work around it you might be able to call SBProcess::Halt() first?

dawn resigned from this revision.Sep 18 2015, 12:46 PM
dawn edited reviewers, added: clayborg; removed: dawn.

I'll watch and learn from Ilia and Greg :)

KLapshin updated this revision to Diff 35390.Sep 22 2015, 10:30 AM
KLapshin updated this object.
KLapshin edited edge metadata.
KLapshin updated this object.
KLapshin updated this revision to Diff 35392.EditedSep 22 2015, 10:39 AM

@clayborg, I reworked initial workaround solution, now this is exact fix.

Due to @labath reworked and replaced HaltForDestroyOrDetach to StopHaltForDestroyOrDetach method (see http://reviews.llvm.org/D13056) and his patch already approved by @clayborg and crash still reproducible with just race condition fix patch from @labath I suggest to apply this patch first, then patch for race condition - merge should be fine.

ki.stfu added a subscriber: jingham.

+@jingham for changes in Target.

source/Target/Process.cpp
3934 ↗(On Diff #35392)

btw: please update this line when Process::HaltForDestroyOrDetach will be renamed to StopForDestroyOrDetach

labath added inline comments.Sep 23 2015, 3:23 AM
source/Target/Process.cpp
3934 ↗(On Diff #35392)

The renaming patch has landed in r248371.

I am not sure about this, as I don't understand the full context of this change, so take this comment with a grain of salt. However, this seems to be introducing a race here... If I understand correctly, you are trying to make sure that someone else (sitting in another thread, waiting for public process events) does not snatch the process stop event from under you. If that is the case then this can still happen if the process stops very quickly (i.e., the event is broadcast after SendAsyncInterrupt(), but before HijackProcessEvents()). To avoid this, I think you should hijack the events before you interrupt the process, possibly even before StopForDestroyOrDetach is called (to handle the case when the process stops naturally just as you are about to kill it). Have you figured out what is the other thread doing that is causing the crash (i.e., where is the memory WaitForProcessToStop is referencing freed?)

I'd be interested to hear what Greg and Jim have to say about this..

KLapshin updated this revision to Diff 35491.Sep 23 2015, 6:55 AM
KLapshin removed rL LLVM as the repository for this revision.
KLapshin added a comment.EditedSep 23 2015, 7:06 AM

@labath,

I updated patch against actual source in SVN - i.e. with taking into account your change (Halt*->Stop*).

Regarding matter of this patch - I just tried to make process stop synchronous, see Process::ResumeSynchronous() - same technique was applied months ago (@ki.stfu).

Agreed what crash may be related to freed memory reuse, but not investigated it yet in deep.

KLapshin set the repository for this revision to rL LLVM.Sep 23 2015, 7:07 AM
KLapshin marked an inline comment as done.Sep 23 2015, 7:11 AM

Done.

Regarding matter of this patch - I just tried to make process stop synchronous, see Process::ResumeSynchronous() - same technique was applied months ago (@ki.stfu).

Agreed what crash may be related to freed memory reuse, but not investigated it yet in deep.

Fair enough, your patch definitely makes things better. I'm just trying to understand the root cause so we can fix this definitively.

If you look at ResumeSynchronous, you see that the hijacking happens there before we attempt to resume the process (PrivateResume()). This makes it race-free because we grab the events before it gets a chance to generate any. In your case, things are a bit more complicated, since the process is already running and can come to a stop at any moment, and we need to make sure we don't miss those events.

Do you have the ability to run lldb-mi under valgrind or msan? If you can, I'd be interested in taking a look at the output they produce in this case. Or if you have a simple repro case, I can try to do it myself.

pl

Regarding matter of this patch - I just tried to make process stop synchronous, see Process::ResumeSynchronous() - same technique was applied months ago (@ki.stfu).

Agreed what crash may be related to freed memory reuse, but not investigated it yet in deep.

Fair enough, your patch definitely makes things better. I'm just trying to understand the root cause so we can fix this definitively.

If you look at ResumeSynchronous, you see that the hijacking happens there before we attempt to resume the process (PrivateResume()). This makes it race-free because we grab the events before it gets a chance to generate any. In your case, things are a bit more complicated, since the process is already running and can come to a stop at any moment, and we need to make sure we don't miss those events.

Do you have the ability to run lldb-mi under valgrind or msan? If you can, I'd be interested in taking a look at the output they produce in this case. Or if you have a simple repro case, I can try to do it myself.

pl

@labath,

No, I didn't tried to use valgrind yet, as testcase you can create some Cocoa app in Xcode (actually Empty Form template is enough - just to make sure what app will not ended by itself), then start debug session on iOS device or OSX using lldb-mi (i.e. - via MI) and just do -exec-run, then do -exec-abort while app running - without this patch you will be able to reproduce crash easily. I checked if crash may be reproduced with your race condition patch - yes, reproducible.

clayborg requested changes to this revision.Sep 23 2015, 10:53 AM
clayborg edited edge metadata.

If you are going to hijack you do need to do it before you call Halt().

This revision now requires changes to proceed.Sep 23 2015, 10:53 AM

@clayborg, @labath,

After more deep investigation I think setting listener for hijacking also looks like workaround. Setting additional listener just change code flow path and buggy path not executed, thus no crash. At top level - crash appeared in Process::m_listener involved - as no hijacked listener was set in Destroy().

See code below:

StateType
Process::WaitForProcessToStop (const TimeValue *timeout,
                               EventSP *event_sp_ptr,
                               bool wait_always,
                               Listener *hijack_listener,
                               Stream *stream)
{
    // We can't just wait for a "stopped" event, because the stopped event may have restarted the target.
    // We have to actually check each event, and in the case of a stopped event check the restarted flag
    // on the event.
    if (event_sp_ptr)
        event_sp_ptr->reset();
    StateType state = GetState();
    // If we are exited or detached, we won't ever get back to any
    // other valid state...
    if (state == eStateDetached || state == eStateExited)
        return state;

    Log *log(lldb_private::GetLogIfAllCategoriesSet (LIBLLDB_LOG_PROCESS));
    if (log)
        log->Printf ("Process::%s (timeout = %p)", __FUNCTION__,
                     static_cast<const void*>(timeout));

    if (!wait_always &&
        StateIsStoppedState(state, true) &&
        StateIsStoppedState(GetPrivateState(), true))
    {
        if (log)
            log->Printf("Process::%s returning without waiting for events; process private and public states are already 'stopped'.",
                        __FUNCTION__);
        // We need to toggle the run lock as this won't get done in
        // SetPublicState() if the process is hijacked.
        if (hijack_listener)
            m_public_run_lock.SetStopped();
        return state;
    }

    while (state != eStateInvalid)
    {
        EventSP event_sp;
        state = WaitForStateChangedEvents (timeout, event_sp, hijack_listener);  <---
        if (event_sp_ptr && event_sp)
            *event_sp_ptr = event_sp;
....
StateType
Process::WaitForStateChangedEvents (const TimeValue *timeout, EventSP &event_sp, Listener *hijack_listener)
{
    Log *log(lldb_private::GetLogIfAllCategoriesSet (LIBLLDB_LOG_PROCESS));

    if (log)
        log->Printf ("Process::%s (timeout = %p, event_sp)...", __FUNCTION__,
                     static_cast<const void*>(timeout));

    Listener *listener = hijack_listener;
    if (listener == NULL)
        listener = &m_listener;  <--- what if m_listener was set as unitialized or deallocated Listener instance, "dummy" listener ?

    StateType state = eStateInvalid;
    if (listener->WaitForEventForBroadcasterWithType (timeout,
                                                      this,
                                                      eBroadcastBitStateChanged | eBroadcastBitInterrupt,
                                                      event_sp))
    {
        if (event_sp && event_sp->GetType() == eBroadcastBitStateChanged)
            state = Process::ProcessEventData::GetStateFromEvent(event_sp.get());
        else if (log)
            log->Printf ("Process::%s got no event or was interrupted.", __FUNCTION__);
    }
...

You can reproduce crash even without lldb-MI driver - just with remote iOS target and "process launch", then "process kill".

Corresponding log ("log enable lldb events"):

(lldb) process launch
...
Process::ShouldBroadcastEvent (0x7ff1fa4a1260) => new state: running, last broadcast state: running - NO
0x7ff1fad1b110 Listener::WaitForEventsInternal (timeout = { 0x0 }) for lldb.process.internal_state_listener
(lldb) process kill
0x7ff1fad1afa0 Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent (event_sp = {0x7ff1fce160d0 Event: broadcaster = 0x7ff1fad1afa0 (lldb.process.internal_state_broadcaster), type = 0x00000002, data = <NULL>}, unique =0) hijack = 0x0
0x7ff1fad1b110 Listener('lldb.process.internal_state_listener')::AddEvent (event_sp = {0x7ff1fce160d0})
0x7ff1fa48e300 Listener::WaitForEventsInternal (timeout = { 0x7fff57d47240 }) for  <-- WHAT ?! Listener without name - Process::m_listener may be ?
0x7ff1fad1b110 'lldb.process.internal_state_listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7ff1fce160d0
0x7ff1fad1ae38 Broadcaster("lldb.process")::HijackBroadcaster (listener("lldb.process.halt_listener")=0x7ff1fe01ca00)
CRASH !

Will continue investigation.

emaste added a subscriber: emaste.Sep 28 2015, 8:49 AM
KLapshin updated this revision to Diff 36005.EditedSep 29 2015, 11:05 AM
KLapshin edited edge metadata.

I realized what m_events.empty check in Listener::FindNextEventInternal() method returned event collection is NOT empty - probably because list.empty() just compare begin != end pointers, but pointers may be absolutely invalid.

So, as another workaround - switch to m_events.size() == 0 check.

I know - this looks funny because, IN GENERAL CASE (depends on STL vendor implementation), list.size() has O(N) complexity - i.e. - linear and time consuming, but works in this case because it do real empty list check.

I consider latest diff at moment just as some hint.

Actually two threads may modify events collection - one associated with Process and another one with GDBProcess module.

Full log (process and events) below corresponding to m_events.empty replaced with m_events collection size checking:

0x7f9f3d535110 Listener::WaitForEventsInternal (timeout = { 0x0 }) for lldb.process.internal_state_listener
-exec-abort
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::HijackBroadcaster (listener("lldb.process.halt_listener")=0x7fff5ac24580)
Process::SetPrivateState (stopped)
0x7fff5ac24580 Listener::WaitForEventsInternal (timeout = { 0x7fff5ac244e0 }) for lldb.process.halt_listener
Process::SetPrivateState (stopped) stop_id = 16
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent (event_sp = {0x7f9f39f16ff0 Event: broadcaster = 0x7f9f3d534fa0 (lldb.process.internal_state_broadcaster), type = 0x00000001, data = { process = 0x7f9f3d534e00 (pid = 258), state = stopped}}, unique =0) hijack = 0x7fff5ac24580
0x7fff5ac24580 Listener('lldb.process.halt_listener')::AddEvent (event_sp = {0x7f9f39f16ff0})
0x7f9f3d536660 Listener::WaitForEventsInternal (timeout = { 0x0 }) for lldb.process.gdb-remote.async-listener
0x7fff5ac24580 'lldb.process.halt_listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39f16ff0
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::RestoreBroadcaster (about to pop listener("lldb.process.halt_listener")=0x7fff5ac24580)
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent (event_sp = {0x7f9f39f16ff0 Event: broadcaster = 0x7f9f3d534fa0 (lldb.process.internal_state_broadcaster), type = 0x00000001, data = { process = 0x7f9f3d534e00 (pid = 258), state = stopped}}, unique =0) hijack = 0x0
0x7f9f3d535110 Listener('lldb.process.internal_state_listener')::AddEvent (event_sp = {0x7f9f39f16ff0})
0x7f9f3d535110 'lldb.process.internal_state_listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39f16ff0
Process::StopForDestroyOrDetach() About to stop.
0x700000428840 Listener::StartListeningForEvents (broadcaster = 0x7f9f3d5356a0, mask = 0x00000020) acquired_mask = 0x00000020 for Communication::SyncronizeWithReadThread
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent (event_sp = {0x7f9f39e16fa0 Event: broadcaster = 0x7f9f3d534fa0 (lldb.process.internal_state_broadcaster), type = 0x00000002, data = <NULL>}, unique =0) hijack = 0x0
0x7f9f3d535110 Listener('lldb.process.internal_state_listener')::AddEvent (event_sp = {0x7f9f39e16fa0})
Process::WaitForProcessToStop (timeout = 0x7fff5ac25b40)
Process::WaitForStateChangedEvents (timeout = 0x7fff5ac25b40, event_sp)...
0x7f9f3b0ed920 Listener::WaitForEventsInternal (timeout = { 0x7fff5ac25b40 }) for
0x7f9f3b0ed920 Listener::WaitForEventsInternal() unknown error for
Process::WaitForStateChangedEvents (timeout = 0x7fff5ac25b40, event_sp) => invalid
Process::StopForDestroyOrDetach() failed to stop, state is: invalid
0x7f9f3d535db8 Broadcaster("gdb-remote.client")::HijackBroadcaster (listener("lldb.NotifyHijacker")=0x109d0bfa8)
Process::ShouldBroadcastEvent (0x7f9f39f16ff0) stopped due to an interrupt, state: stopped
Process::ShouldBroadcastEvent (0x7f9f39f16ff0) => new state: stopped, last broadcast state: stopped - YES
Process::HandlePrivateEvent (pid = 258) broadcasting new state stopped (old state running) to public
0x7f9f3d534e38 Broadcaster("lldb.process")::BroadcastEvent (event_sp = {0x7f9f39f16ff0 Event: broadcaster = 0x7f9f3d534e38 (lldb.process), type = 0x00000001 (state-changed), data = { process = 0x7f9f3d534e00 (pid = 258), state = stopped}}, unique =0) hijack = 0x0
0x7f9f3a058278 Listener('lldb.Debugger')::AddEvent (event_sp = {0x7f9f39f16ff0})
Process::WaitForEventsPrivate (timeout = 0x0, event_sp)...
0x7f9f3d535110 Listener::WaitForEventsInternal (timeout = { 0x0 }) for lldb.process.internal_state_listener
0x7f9f3d535110 'lldb.process.internal_state_listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39e16fa0
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::HijackBroadcaster (listener("lldb.process.halt_listener")=0x700000428aa0)
0x700000428aa0 Listener::WaitForEventsInternal (timeout = { 0x700000428a00 }) for lldb.process.halt_listener
0x7f9f3a058278 'lldb.Debugger' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39f16ff0
Process::SetPublicState (state = stopped, restarted = 0)
Process::SetPublicState (stopped) -- unlocking run lock
0x7f9f3d535db8 Broadcaster("gdb-remote.client")::RestoreBroadcaster (about to pop listener("lldb.NotifyHijacker")=0x109d0bfa8)
Process::SetExitStatus (status=9 (0x00000009), description="")
Process::SetPrivateState (exited)
Process::SetPrivateState (exited) stop_id = 17
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent (event_sp = {0x7f9f39f0e470 Event: broadcaster = 0x7f9f3d534fa0 (lldb.process.internal_state_broadcaster), type = 0x00000001, data = { process = 0x7f9f3d534e00 (pid = 258), state = exited}}, unique =0) hijack = 0x700000428aa0
0x700000428aa0 Listener('lldb.process.halt_listener')::AddEvent (event_sp = {0x7f9f39f0e470})
0x700000428aa0 'lldb.process.halt_listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39f0e470
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::RestoreBroadcaster (about to pop listener("lldb.process.halt_listener")=0x700000428aa0)
0x7f9f3d534fa0 Broadcaster("lldb.process.internal_state_broadcaster")::BroadcastEvent (event_sp = {0x7f9f39f0e470 Event: broadcaster = 0x7f9f3d534fa0 (lldb.process.internal_state_broadcaster), type = 0x00000001, data = { process = 0x7f9f3d534e00 (pid = 258), state = exited}}, unique =0) hijack = 0x0
0x7f9f3d535110 Listener('lldb.process.internal_state_listener')::AddEvent (event_sp = {0x7f9f39f0e470})
0x7f9f3d5365a8 Broadcaster("lldb.process.gdb-remote.async-broadcaster")::BroadcastEvent (event_sp = {0x7f9f3b026b80 Event: broadcaster = 0x7f9f3d5365a8 (lldb.process.gdb-remote.async-broadcaster), type = 0x00000002 (async thread should exit), data = <NULL>}, unique =0) hijack = 0x0
Process::WaitForEventsPrivate (timeout = 0x0, event_sp)...
0x7f9f3d535110 Listener::WaitForEventsInternal (timeout = { 0x0 }) for lldb.process.internal_state_listener
0x7f9f3d535110 'lldb.process.internal_state_listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39f0e470
0x700000428840 Listener::StartListeningForEvents (broadcaster = 0x7f9f3d5356a0, mask = 0x00000020) acquired_mask = 0x00000020 for Communication::SyncronizeWithReadThread
Process::ShouldBroadcastEvent (0x7f9f39f0e470) => new state: exited, last broadcast state: exited - YES
Process::HandlePrivateEvent (pid = 258) broadcasting new state exited (old state stopped) to public
0x7f9f3d536660 Listener('lldb.process.gdb-remote.async-listener')::AddEvent (event_sp = {0x7f9f3b026b80})
0x7f9f3d534e38 Broadcaster("lldb.process")::BroadcastEvent (event_sp = {0x7f9f39f0e470 Event: broadcaster = 0x7f9f3d534e38 (lldb.process), type = 0x00000001 (state-changed), data = { process = 0x7f9f3d534e00 (pid = 258), state = exited}}, unique =0) hijack = 0x0
0x7f9f3a058278 Listener('lldb.Debugger')::AddEvent (event_sp = {0x7f9f39f0e470})
0x7f9f3d536660 'lldb.process.gdb-remote.async-listener' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f3b026b80
Went to stop the private state thread, but it was already invalid.
^done
(gdb)
=thread-exited,id="1",group-id="i1"
=thread-exited,id="2",group-id="i1"
=thread-exited,id="3",group-id="i1"
=thread-exited,id="4",group-id="i1"
=thread-exited,id="5",group-id="i1"
=thread-exited,id="6",group-id="i1"
=thread-exited,id="7",group-id="i1"
(gdb)
0x7f9f3a058278 'lldb.Debugger' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=0) event 0x7f9f39f0e470
0x7f9f3a058278 'lldb.Debugger' Listener::FindNextEventInternal(broadcaster=0x0, broadcaster_names=0x0[0], event_type_mask=0x00000000, remove=1) event 0x7f9f39f0e470
Process::SetPublicState (state = exited, restarted = 0)
(gdb)
=thread-exited,id="1",group-id="i1"
=thread-group-exited,id="i1",exit-code="0"
*stopped,reason="exited-normally"
(gdb)

This is definitely not a proper fix for this problem, it merely sweeps the problem under the carpet. If something like this makes a difference then it means something has gone horribly wrong a long time ago. I haven't been able to reproduce this locally, but I'd recommend trying to set up some memory-allocation/thread-race checker and see what it produces.

I still think the original solution of hijacking the listener was correct, but we need to figure out what is the right point at which to hijack it. I'll try to look into this when I get a bit of time.

KLapshin added a comment.EditedSep 30 2015, 10:55 AM

This is definitely not a proper fix for this problem, it merely sweeps the problem under the carpet. If something like this makes a difference then it means something has gone horribly wrong a long time ago. I haven't been able to reproduce this locally, but I'd recommend trying to set up some memory-allocation/thread-race checker and see what it produces.

I still think the original solution of hijacking the listener was correct, but we need to figure out what is the right point at which to hijack it. I'll try to look into this when I get a bit of time.

@labath,

Yes, I even noticed what latest "fix" patch is just hint to fact what whole Process m_listener object looks like freed or unitialized memory - m_events collection size in Listener before crash happened looks invalid absolutely - random value, in my case about 8 millions events (!).

This looks about threads race condition, so still investigating - slow because was busy a bit with other issues, excuse me.

@labath,

Regarding how to reproduce - this problem looks remote target specific only - in my case gdb-remote modules involved.

clayborg requested changes to this revision.Oct 5 2015, 11:31 AM
clayborg edited edge metadata.

This can't be the real fix for this. We must be able to trust the empty() function call. We must have someone playing with the collection without locking the mutex. Please find the real bug.

This revision now requires changes to proceed.Oct 5 2015, 11:31 AM
labath added a comment.Oct 7 2015, 8:48 AM

I have committed D13056 with the event hijacking portion from your patch. I think your use case should be working now. I am planning to return to this later, as I believe there are still some edge cases lurking here.

I have committed D13056 with the event hijacking portion from your patch. I think your use case should be working now. I am planning to return to this later, as I believe there are still some edge cases lurking here.

Yes, of course - hijacked listener setup solves (or hides) issue.

Problem on my side (if no hijacking listener used and without your patch for race condition) is - halt listener often just got timeout (10 secs), then target process killed without crash. Or may work without timeout - this is really threads conflict issue - m_listener in process object unusable because operates with invalid data - I even saw strings in log belongs to RSP module - just random memory areas, memory corrupted. If you set hijacked listener - no problem because it looks has non-shared m_events collection.

Again - this is remote debugging issue only. Also I checked on Linux host debugging - process killed just fine.

I will try pure public lldb build with your patch and will see then, thanks.

This can't be the real fix for this. We must be able to trust the empty() function call. We must have someone playing with the collection without locking the mutex. Please find the real bug.

Hello @clayborg,

I will continue investigation why it works unstable without hijacked listener setup.