This is an archive of the discontinued LLVM Phabricator instance.

<WIP> [lldb] [Process/NetBSD] Multithread support
AbandonedPublic

Authored by mgorny on Jul 12 2019, 10:23 AM.

Details

Summary

This is my WIP on fixing multithreaded process support.

So far includes support for reporting new and exited threads, and resuming/stepping individual threads.

Diff Detail

Event Timeline

mgorny created this revision.Jul 12 2019, 10:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 10:23 AM
mgorny retitled this revision from [lldb] [Process/NetBSD] Implement per-thread executation control to [lldb] [Process/NetBSD] Implement per-thread execution control.Jul 12 2019, 10:27 AM

I think it looks OK. there are some nits that could be optimized in future or handled additionally.. but for now it should be fine.

Shouldn't there be some tests that come along with this patch?

lldb/source/Plugins/Process/NetBSD/NativeThreadNetBSD.cpp
56

Is this really meant to be negative?

mgorny marked 2 inline comments as done.Jul 12 2019, 11:06 AM
mgorny added inline comments.
lldb/source/Plugins/Process/NetBSD/NativeThreadNetBSD.cpp
56

Good catch, it's leftover from PT_STEP usage…

mgorny updated this revision to Diff 209537.Jul 12 2019, 11:06 AM
mgorny marked an inline comment as done.

Shouldn't there be some tests that come along with this patch?

I was actually hoping that the test suite already covers what needs to be covered, and buildbot would tell me which tests were fixed ;-).

Shouldn't there be some tests that come along with this patch?

I was actually hoping that the test suite already covers what needs to be covered, and buildbot would tell me which tests were fixed ;-).

That is possible, though it would be good to know that before landing the patch. But I would expect that at least some aspect of this functionality is not covered by existing tests (e.g. the handling of multiple concurrent signals).

lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
354

Is this "passing multiple signals simultaneously", or "passing multiple *distinct* signals simultaneously". (E.g,. thread 1 gets a SIGALRM, thread 2 gets SIGIO, etc.).

mgorny marked an inline comment as done.Jul 12 2019, 11:43 AM
mgorny added inline comments.
lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
354

The former. Basically there's one siginfo slot, so you can either put a signal for whole process, or for one LWP.

krytarowski added inline comments.Jul 12 2019, 12:51 PM
lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
354

Once we emit a single signal to all threads, it's still technically a single signal that hits the process.

krytarowski added a comment.EditedJul 12 2019, 12:57 PM

Something that we do not cover here is that once a tracee reports a signal (like someone poked it with SIGUSR1) and we want to pass it over to the tracee, we will reset siginfo.

This scenario should be covered by a test and we should handle it properly..

The solution in NetBSD for passing over signals without changing siginfo, is to not calling PT_SET_SIGINFO as the default one will be kept by the kernel. Optionally pick old siginfo with PT_GET_SIGINFO and optionally change destination lwp.

labath added inline comments.Jul 12 2019, 12:57 PM
lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
354

Ok, that makes sense. But I don't think that's what the code does right now (IIUC, this line will only fire if the current signal is different that the signal sent to the previous thread).

mgorny marked an inline comment as done.Jul 12 2019, 11:28 PM

Something that we do not cover here is that once a tracee reports a signal (like someone poked it with SIGUSR1) and we want to pass it over to the tracee, we will reset siginfo.

This scenario should be covered by a test and we should handle it properly..

The solution in NetBSD for passing over signals without changing siginfo, is to not calling PT_SET_SIGINFO as the default one will be kept by the kernel. Optionally pick old siginfo with PT_GET_SIGINFO and optionally change destination lwp.

How is 'properly'? Should we reject resuming with a signal when there's already another signal pending?

lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
354

There's a second if on line 400 that verifies the number of threads signaled.

The idea is that we have either:

  • exactly one thread signaled,
  • all threads signaled with the same signal.

Here we check for the 'same signal' condition.

krytarowski added a comment.EditedJul 13 2019, 2:25 AM

Something that we do not cover here is that once a tracee reports a signal (like someone poked it with SIGUSR1) and we want to pass it over to the tracee, we will reset siginfo.

This scenario should be covered by a test and we should handle it properly..

The solution in NetBSD for passing over signals without changing siginfo, is to not calling PT_SET_SIGINFO as the default one will be kept by the kernel. Optionally pick old siginfo with PT_GET_SIGINFO and optionally change destination lwp.

How is 'properly'? Should we reject resuming with a signal when there's already another signal pending?

We need to pass the same signal (with unchanged siginfo) to the tracee.

The easiest way is to not changing it, however as we can emit a signal over to tracee specifying LWP. we can go for the sequence of PT_GET_SIGINFO, change lwp, PT_SET_SIGINFO, PT_CONTINUE with a signal.

krytarowski added inline comments.Jul 13 2019, 2:35 AM
lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
394

For extra completeness, all basic signals (without specified siginfo) from debugger (that weren't intercepted by a debugger as they were routed into debuggee) that are emitted should be of type SI_USER with filled si_pid and si_uid of the debugger.

We thought have the power to set it to whatever value we want, but LLDB probably doesn't allow to set defailed siginfo.

labath added inline comments.Jul 13 2019, 7:01 AM
lldb/source/Plugins/Process/NetBSD/NativeProcessNetBSD.cpp
354

Ah, I see. That makes sense.

It might be more readable to move all the checks up front, perhaps into some function like Expected< ptrace_siginfo_t> ComputeSignalInfo(const ResumeActionList&). Apart from readability, doing this will make sure you don't issue ptrace commands before you before realize that the resume packet was bogus (and leave the inferior in a weird state in the process).

mgorny planned changes to this revision.Jul 31 2019, 9:17 PM

It seems that the patch is wrong with the current logic, and this is somehow blurry because of Linux implementation limitations. FWIU, eStateSuspended will never happen and instead we should keep threads without explicit action suspsended. It's unclear yet whether this is desirable long-term, or if we should change the logic to explicitly indicate state for each thread.

It seems that the patch is wrong with the current logic, and this is somehow blurry because of Linux implementation limitations. FWIU, eStateSuspended will never happen and instead we should keep threads without explicit action suspsended. It's unclear yet whether this is desirable long-term, or if we should change the logic to explicitly indicate state for each thread.

Yes, eStateSuspended will never happen on linux. It wouldn't be too hard to make use of that state -- we could set unresumed threads to "suspened" in NPL::Resume, and then when we go to stop all threads (NPL::StopRunningThreads), we would just avoid sending signals to suspended threads and just silently set them back to "stopped". However, I don't see a reason to do that now, as it would just add code for jumping between the states.

However, if it makes your life easier, I think you should be able to use eStateSuspended inside NetBSD code for anything you want. Nobody will inspect the state of threads while the process is running, so that could just be an implementation detail.

In fact I suspect the nobody will inspect the thread state whatsoever because once the process is stopped, all threads are assumed to be stopped too.

Not sure if we're taking about the same thing. I meant the status in resume actions. I think it'd be logical if 'no action' meant 'resume in same state as prior to stopping the process', and explicit eStateS* meant 'keep this thread stopped after resuming'.

Not sure if we're taking about the same thing. I meant the status in resume actions. I think it'd be logical if 'no action' meant 'resume in same state as prior to stopping the process', and explicit eStateS* meant 'keep this thread stopped after resuming'.

Ah, you're right, we weren't talking about the same thing. I see what you mean now...

I can see how explicit modelling of suspension in resume actions would make sense. And it would definitely map better to how netbsd controls threads. OTOH, I think the current semantics (no action => keep stopped) makes sense too, and it maps better to how the gdb protocol represents resume actions. So either way, you have to do the conversion somewhere...

mgorny updated this revision to Diff 220871.Sep 19 2019, 8:16 AM
mgorny marked 5 inline comments as done.
mgorny retitled this revision from [lldb] [Process/NetBSD] Implement per-thread execution control to <WIP> [lldb] [Process/NetBSD] Multithread support.
mgorny edited the summary of this revision. (Show Details)

So I've made some progress on this. However, I'm stuck and I'd use some help going forward.

I've made a trivial two-threaded program:

#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <pthread.h>

void* thread_func(void* x) {
	printf("thread 1\n");
	int i;
	for (i = 0; i < 100; i++)
	{
		printf("t2 %d\n", i);
		sleep(3);
	}
	return NULL;
}

int main() {
	pthread_t t;

	printf("step 1\n");
	assert(!pthread_create(&t, NULL, thread_func, NULL));
	printf("step 2\n");
	int i;
	for (i = 0; i < 100; i++)
	{
		printf("t1 %d\n", i);
		sleep(3);
	}
	assert(!pthread_join(t, NULL));
	printf("step 3\n");

	return 0;
}

Now, if I start it, then issue process interrupt, then thread continue 1, everything works as expected. However, if I interrupt it again and then thread continue 2 (i.e. try to change the thread running), nothing happens.

According to the gdb-remote log, vCont is received but nothing is sent back (though normally stdout should happen). If I send any other command, lldb-server dies:

1568906517.952658892 <  16> read packet: $vCont;c:0002#a4
1568906874.978140354 <   1> read packet: 
lldb-server exiting...

According to the posix log, Resume is working as expected.

1568906976.934143305 NativeProcessNetBSD.cpp:Resume                               no action specified for pid 12124 tid 1
1568906976.934377670 NativeProcessNetBSD.cpp:Resume                               processing resume action state suspended signal 2147483647 for pid 12124 tid 1
1568906976.934622765 NativeProcessNetBSD.cpp:Resume                               processing resume action state running signal 2147483647 for pid 12124 tid 2

I suspect I'm not setting some state correctly but I so far haven't managed to figure out which.

Nevermind that long report, it actually turned out to be a bug in NetBSD kernel that resulted in the thread not being resumed.

mgorny updated this revision to Diff 227295.Oct 31 2019, 9:24 AM

Another WIP update.

mgorny abandoned this revision.Nov 8 2019, 11:51 AM

I'm going to upload a new series shortly.