This is an archive of the discontinued LLVM Phabricator instance.

Fixes to lldb's eLaunchFlagLaunchInTTY feature on macOS
AbandonedPublic

Authored by jasonmolenda on Jan 15 2020, 4:29 PM.

Details

Reviewers
clayborg
jingham
Summary

On macOS lldb has an option that can be added to SBLaunchInfo to create a new Terminal window and run the inferior in that new window. lldb uses some AppleScript to open the new window and run the process there. It runs darwin-debug in the window to set up the architecture / current-working-directory / environment variables before launching the actual inferior process. lldb opens a socket on the local filesystem and passes the name to darwin-debug; darwin-debug sends back its pid to lldb over the socket, so lldb can attach to the inferior process once it has been exec'ed. AppleScript sits between lldb and the inferior, so the normal way we control processes at the start does not work. We want to attach to the inferior binary once it has been started, and is sitting at its entry point in dyld_start, stopped, waiting for lldb to connect.

Today, darwin-debug sends its pid over the socket, then parses / constructs the environment variables to be passed to the inferior, then calls posix_spawn to start the inferior. lldb gets the pid, then calls WaitForProcessToSIGSTOP which is intended to poll repeatedly for 5 seconds to detect when the inferior process has stopped in dyld_start. Unfortunately this function has a few bugs - the main loop never runs, the return value from proc_pidinfo is incorrectly handled so none of the intended code will ever run, and finally the condition that it's looking for -- pbi_status==SSTOP -- is not going to indicate that the inferior has been suspended.

The only way to detect that the inferior has been started, and is sitting at dyld_start suspended, is to use mach calls (task_info()) which requires that lldb task_for_pid the inferior, which lldb doesn't have permissions for. I haven't found a kernel API that doesn't require the task port which I can query the suspend count with yet.

The bug being fixed is that lldb attaches before the inferior has started when we have a lot of environment variables (darwin-debug is still processing the env vars when lldb attaches to it) and the UI layer isn't clear what is going on.

This patch changes darwin-debug so it sends its pid just before it calls posix_spawn. It changes lldb to wait up to 5 seconds to receive the pid, then it adds an extra 0.1 seconds of sleep in lldb before it tries to attach to the inferior (plus the time it takes to construct the attach packet and send it to debugserver, and debugserver to decode that packet and try to attach).

Fred suggested using the closing of the socket between lldb and darwin-debug as a way of telling when the exec() is actually happening, but the Read methods in lldb don't detect that closing via close-on-exec in my testing and I didn't dig in much further on this - it's straightforward to get the pid before we call posix_spawn which is close enough.

rdar://problem/29760580

Diff Detail

Event Timeline

jasonmolenda created this revision.Jan 15 2020, 4:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2020, 4:29 PM
clayborg added inline comments.Jan 15 2020, 5:07 PM
lldb/source/Host/macosx/objcxx/Host.mm
154–155

This seems racy still?

159

So if we fix this line to be:

const int num_retries = timeout_in_seconds * USEC_PER_SEC / time_delta_usecs;

This doesn't work? I am assuming you tried this?

lldb/tools/darwin-debug/darwin-debug.cpp
161

"close(socket_fd)" here if we fail and are going to exit?

jasonmolenda marked 3 inline comments as done.Jan 15 2020, 5:23 PM

Thanks for the review Greg. To really do this reliably, we probably either need to use a different kernel API, or push this down into debugserver where we can get the process' task port and inspect what its status is. This patch is more a preliminary one to get it working much of the time -- getting it to always work is going to require a slightly different approach.

lldb/source/Host/macosx/objcxx/Host.mm
154–155

It is. darwin-debug is going to call posix_spawn after this, which exec's the app binary, which is launched and stopped at dyld_start. At the same time, lldb is going to send the pid to debugserver which is going to attach to it. In my own testing, the lldb+debugserver work was much slower than the app binary launch even without the sleep.

I'm talking with the kernel folks to see if there's any way we could do better. The best approach may be to push all this logic down into debugserver which can task_for_pid and check the task suspend count directly. There are some other reasons why a task suspend count may be nonzero, but mostly it means we're done launching.

159

That would make the loop execute more than 0 times but the return value from proc_pidinfo is the size of the struct read, not an errno value like this is treating it as. And the app will not have a pbi_status of SSTOP when the task is suspended. There's no way to detect this via the BSD side of the kernel APIs as near as I can find.

lldb/tools/darwin-debug/darwin-debug.cpp
161

Good point.

clayborg accepted this revision.Jan 16 2020, 1:21 PM

ok if this works for now that is fine. Just close the socket_fd when we fail to write the pid to the socket and this is good to go.

This revision is now accepted and ready to land.Jan 16 2020, 1:21 PM

ok if this works for now that is fine. Just close the socket_fd when we fail to write the pid to the socket and this is good to go.

I mark it as close-on-exec so we don't need to close it explicitly (I was going to try to detect the closing of the socket up in lldb to get us closer to the new process being launched, but the existing Read method doesn't detect it)

I'm looking at whether we can special case the "darwin-debug execs a process, and that process is started suspended" handling in debugserver/lldb. If we attach while darwin-debug is executing, then we get the exec mach exception, and the inferior has a suspend count of two -- one, because we stopped for the mach exception we just got, but a second suspend because darwin-debug asked for the inferior to be started suspended. We need that second suspend count if we don't attach until after the inferior has been started. So I might toss this entire approach; working on this some more.

labath added a subscriber: labath.Jan 17 2020, 12:30 AM

If we attach while darwin-debug is executing, then we get the exec mach exception, ....

Not really my thing, but couldn't you just ensure that you *always* attach while darwin-debug is executing? E.g., the binary could do something like:

send_pid();
while(!am_i_being_debugged()) usleep(1000);
posix_spawn(...);

I had sort of assumed this wasn't possible because of SIP, but your comment makes it sounds that this does work at least sometimes...

BTW, is quite unfortunate that the CLOEXEC trick doesn't work. We use that for launching on linux, and it's pretty neat. The best part about it is that if the execve() fails, you still get to keep the fd and can send an error message about what went wrong. This bit of code uses the libc api directly, so I don't know if the problem here is with the lldb Read function or the darwin system apis...

jasonmolenda abandoned this revision.Jan 17 2020, 3:27 PM

Taking a different approach to fixing this.