Page MenuHomePhabricator

Fix deadlock in operation thread in NativeProcessLinux

Authored by tberghammer on Mar 3 2015, 4:05 AM.



Fix deadlock in operation thread in NativeProcessLinux

The deadlock occurred when the Attach or the Launch operation failed for any reason.

Diff Detail


Event Timeline

tberghammer updated this revision to Diff 21094.Mar 3 2015, 4:05 AM
tberghammer retitled this revision from to Fix deadlock in operation thread in NativeProcessLinux.
tberghammer updated this object.
tberghammer edited the test plan for this revision. (Show Details)
tberghammer added a reviewer: ovyalov.
tberghammer added a subscriber: Unknown Object (MLST).
labath added a subscriber: labath.Mar 3 2015, 4:33 AM
labath added inline comments.
3591 ↗(On Diff #21094)

Enumerating the states when the thread could have exited on its own seems quite fragile. I would propose a more principled treatment. For example we could make DoOperation not wait on the acknowledgement semaphore in case of an exit operation (it is supposed to be the last operation, and the fact that it was processed is indicated by the fact that Join() returns).

PS: instead of accompanying every use of nullptr with a comment "nullptr is a code for the exit operation", I would recommend declaring a constant "static void * const EXIT_OPERATION = nullptr" (or something along those lines) and using it.

tberghammer updated this revision to Diff 21098.Mar 3 2015, 5:25 AM

Update based on suggestion in comment

3591 ↗(On Diff #21094)

I don't like my solution either, but haven't managed to find a better way. Thank you for your suggestion, I updated the CL based on it.

labath added a comment.Mar 3 2015, 5:42 AM

Looks much better to me now. I'll leave the final signoff to ovyalov.

ovyalov accepted this revision.Mar 3 2015, 9:41 AM
ovyalov edited edge metadata.

Looks good - please see my comments.

3400 ↗(On Diff #21098)

You may have early return if m_operation == EXIT_OPERATION

3563 ↗(On Diff #21098)

Monitoring thread uses Cancel as well - should we redesign it for Android?

This revision is now accepted and ready to land.Mar 3 2015, 9:41 AM
tberghammer added inline comments.Mar 4 2015, 3:02 AM
3563 ↗(On Diff #21098)

In long run possibly yes, but it already have a lot of logic to stop it in different cases and I haven't seen a case when the cancel is actually have effect yet. I have a design to replace it on android but it is a bit nasty so I hold it back until it isn't cause any issue with the hope that we will find a better solution.

This revision was automatically updated to reflect the committed changes.
ovyalov added inline comments.Mar 6 2015, 6:45 PM
3563 ↗(On Diff #21098)

Monitoring thread cancellation definitely affects Detach request on Android - NativeProcessLinux::Detach after issuing PTRACE_DETACH calls StopMonitor and eventually becomes deadlocked on m_monitor_thread.Join(nullptr);
We should take care of this case since lldb on host side hangs in such situation - it's another issue why host lldb hangs forever in this case because detach seems to be timeout-based operation:

(lldb) exit
Quitting LLDB will detach from one or more processes. Do you really want to proceed: [Y/n] Y
Process::SetExitStatus (status=6 (0x00000006), description="failed to send the k packet")
Process::ControlPrivateStateThread (signal = 1)
Sending control event of type: 1.
Timed out responding to the control event, cancel got error: "(null)".

Created Cl to stop the monitor thread on android as D8205