This is an archive of the discontinued LLVM Phabricator instance.

Add code to exit the NativeProcessLinux Monitor thread on android
ClosedPublic

Authored by tberghammer on Mar 10 2015, 8:24 AM.

Details

Summary

Add code to exit the NativeProcessLinux Monitor thread on android

This CL change the logic used to terminate the monitor thread of NativeProcessLinux to use a signal instead of pthread_cancel as pthread_cancel is not supported on android.

Diff Detail

Repository
rL LLVM

Event Timeline

tberghammer retitled this revision from to Add code to exit the NativeProcessLinux Monitor thread on android.
tberghammer updated this object.
tberghammer edited the test plan for this revision. (Show Details)
tberghammer added a reviewer: ovyalov.
tberghammer added a subscriber: Unknown Object (MLST).
labath added a subscriber: labath.Mar 10 2015, 9:16 AM

Please see my inline comments on the signal handler.

A thought for general discussion: I am wondering whether we could avoid having two methods for shutting down a thread. If the code in question is not likely to be used on windows (which I think it's not as it is just a glorified waitpid() wrapper and waitpid is very posix-y), then all other platforms should support the pthread_kill method and we could use a single approach everywhere. Thoughts?

source/Host/common/Host.cpp
176 ↗(On Diff #21583)

Be bold and specify the signal behavior that you expect, don't rely on them being what you expect by default.
If the previous signal handler has SA_RESTART set, you would achieve nothing, as the waitpid call would get restarted. If the previous handler had SA_RESETHAND set, your trick would only work once.

178 ↗(On Diff #21583)

There is nothing special about SIGCHLD - any signal can interrupt a system call. Please use a signal which is not overloaded with other meanings, e.g. SIGUSR1.

198 ↗(On Diff #21583)

Be careful about race conditions here. What if your signal is not received in the waitpid call, but slightly before? The signal will be handled, but you will still end up blocked forever.
I don't think there is a completely race-free solution to this problem (apart from resending the signal until it works), but we should at least try to decrease the race window by testing the exit condition before and after the call.

ovyalov edited edge metadata.Mar 10 2015, 11:33 AM

Please see my comments.

source/Host/common/Host.cpp
178 ↗(On Diff #21583)

+1 to use SIGUSR1

198 ↗(On Diff #21583)

We may use sigtimedwait instead of waitpid on Android with some conservative timeout (to avoid a possibility to be blocked forever in case of thread race) and if the function timeouts -(callback (callback_baton, -1,..).

source/Plugins/Process/Android/NativeProcessAndroid.cpp
11 ↗(On Diff #21583)

Please check whether you do need this include.

64 ↗(On Diff #21583)

Could you make it unconditional - m_monitor_thread_running = error.Success () ?

72 ↗(On Diff #21583)

ditto.

95 ↗(On Diff #21583)

There is a chance that m_monitor_thread_running will updated and read concurrently from different threads - please consider an option to use atomic type here.

source/Plugins/Process/Linux/NativeProcessLinux.h
224 ↗(On Diff #21583)

s/DoLaunch/DoAttach

233 ↗(On Diff #21583)

I'd rather named this method StopMonitorThread in order to make it complement to StartMonitorThread.

vharron requested changes to this revision.Mar 10 2015, 6:10 PM
vharron added a reviewer: vharron.
vharron added a subscriber: vharron.

Thanks for the patch! Looking forward to v2!

source/Host/common/Host.cpp
146 ↗(On Diff #21583)

Rename to "emptySignalHandler"?

Comment that the signal is only to interrupt the thread from waitpid

178 ↗(On Diff #21583)

+1 to use SIGUSR1

194 ↗(On Diff #21583)

Please remove all pthread_cancel code paths. Linux and Android should follow the same code paths.

201 ↗(On Diff #21583)

Why are we calling a function? It seems like we should just check m_monitor_thread_running? If you want to log something, do it after the check, before the break. Reduces complexity

source/Plugins/Process/Android/NativeProcessAndroid.cpp
95 ↗(On Diff #21583)

I don't think that his current usage is a problem but it might cause noise in ThreadSanitizer and an atomic can't hurt.

This revision now requires changes to proceed.Mar 10 2015, 6:10 PM
tberghammer updated this object.
tberghammer added a reviewer: labath.

Thanks for the comments. Based on them I created a new patch which use the same (signal based) logic for Linux and Android. I still left the pthread_cancel logic in place in MonitorChildProcessThreadFunction because it is used in several other case also and I don't have the environment set up to update and test the effect of the change in those cases.

source/Host/common/Host.cpp
178 ↗(On Diff #21583)

Done

194 ↗(On Diff #21583)

Changed Linux to use the same logic used by Android, but other architectures (e.g.: FreeBSD) and Local debugging also use this function, so I still have to leave the pthread_cancel there.

198 ↗(On Diff #21583)

We can't specify the pid of the child we waiting for in sigtimedwait what is required in this case. Especially as sometimes we use group pid instead of a process pid what is handled by waitpid, but handling it manually with sigtimedwait would be tricky.

201 ↗(On Diff #21583)

We don't have explicit access to the instance of NativeProcessAndroid (we are in a static function in an other module). I can cast callback_baton into NativeProcessAndroid and then check for the flag, but I think that one is a bit nasty solution.

source/Plugins/Process/Linux/NativeProcessLinux.h
224 ↗(On Diff #21583)

Done

233 ↗(On Diff #21583)

Done

ovyalov added inline comments.Mar 11 2015, 10:38 AM
source/Host/common/Host.cpp
194 ↗(On Diff #21701)

I'd rather put this snippet into a separate function since it's called twice:

bool CheckForCancellation()
{
#ifdef __linux__
        if (g_usr1_signal_thread == ::pthread_self())
        {
            g_usr1_signal_thread = 0;
            return true;
        }
        return false;
#else
        ::pthread_testcancel ();
#endif
}
262 ↗(On Diff #21701)

Unrelated but looks weird - could you remove this log re-initialization? It looks unnecessary.

labath edited edge metadata.Mar 11 2015, 10:59 AM

I have an issue with one of the comments, but otherwise looks fine.

source/Host/common/Host.cpp
146 ↗(On Diff #21701)

Please remove the std::set reference, as it is not a valid solution. There are very few things that can be done safely in a signal handler and std::set manipulation is not one of them. If we ever need to cancel multiple threads, we will need to do something more clever, probably set a flag on NativeProcessAndroid (outside of the signal handler), as @vharron suggseted.

labath added inline comments.Mar 12 2015, 2:56 AM
source/Host/common/Host.cpp
146 ↗(On Diff #21701)

It just occurred to me that this would be a good use case for a thread_local variable. I'm not sure what is the official policy on thread local variables, as I can't find any occurrences in the codebase, but if that's ok then the killer solution would be

static thread_local volatile sig_atomic_t g_usr1_called;

If not then you can at least make your existing variable volatile for some added safety.

tberghammer edited edge metadata.
tberghammer added inline comments.
source/Host/common/Host.cpp
146 ↗(On Diff #21701)

Done

194 ↗(On Diff #21701)

Done

262 ↗(On Diff #21701)

It is necessary to let the user enable the logging after starting the monitor thread.

labath accepted this revision.Mar 12 2015, 6:45 AM
labath edited edge metadata.

lgtm

ovyalov accepted this revision.Mar 12 2015, 8:59 AM
ovyalov edited edge metadata.

LGTM

This revision was automatically updated to reflect the committed changes.