This is an archive of the discontinued LLVM Phabricator instance.

Fixes to lldb's eLaunchFlagLaunchInTTY feature on macOS
AbandonedPublic

Authored by jasonmolenda on Jan 15 2020, 4:29 PM.

Download Raw Diff

Details

Reviewers

clayborg
jingham

Summary

On macOS lldb has an option that can be added to SBLaunchInfo to create a new Terminal window and run the inferior in that new window. lldb uses some AppleScript to open the new window and run the process there. It runs darwin-debug in the window to set up the architecture / current-working-directory / environment variables before launching the actual inferior process. lldb opens a socket on the local filesystem and passes the name to darwin-debug; darwin-debug sends back its pid to lldb over the socket, so lldb can attach to the inferior process once it has been exec'ed. AppleScript sits between lldb and the inferior, so the normal way we control processes at the start does not work. We want to attach to the inferior binary once it has been started, and is sitting at its entry point in dyld_start, stopped, waiting for lldb to connect.

Today, darwin-debug sends its pid over the socket, then parses / constructs the environment variables to be passed to the inferior, then calls posix_spawn to start the inferior. lldb gets the pid, then calls WaitForProcessToSIGSTOP which is intended to poll repeatedly for 5 seconds to detect when the inferior process has stopped in dyld_start. Unfortunately this function has a few bugs - the main loop never runs, the return value from proc_pidinfo is incorrectly handled so none of the intended code will ever run, and finally the condition that it's looking for -- pbi_status==SSTOP -- is not going to indicate that the inferior has been suspended.

The only way to detect that the inferior has been started, and is sitting at dyld_start suspended, is to use mach calls (task_info()) which requires that lldb task_for_pid the inferior, which lldb doesn't have permissions for. I haven't found a kernel API that doesn't require the task port which I can query the suspend count with yet.

The bug being fixed is that lldb attaches before the inferior has started when we have a lot of environment variables (darwin-debug is still processing the env vars when lldb attaches to it) and the UI layer isn't clear what is going on.

This patch changes darwin-debug so it sends its pid just before it calls posix_spawn. It changes lldb to wait up to 5 seconds to receive the pid, then it adds an extra 0.1 seconds of sleep in lldb before it tries to attach to the inferior (plus the time it takes to construct the attach packet and send it to debugserver, and debugserver to decode that packet and try to attach).

Fred suggested using the closing of the socket between lldb and darwin-debug as a way of telling when the exec() is actually happening, but the Read methods in lldb don't detect that closing via close-on-exec in my testing and I didn't dig in much further on this - it's straightforward to get the pid before we call posix_spawn which is close enough.

rdar://problem/29760580

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jasonmolenda created this revision.Jan 15 2020, 4:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2020, 4:29 PM

clayborg added inline comments.Jan 15 2020, 5:07 PM

lldb/source/Host/macosx/objcxx/Host.mm
154–155	This seems racy still?
159	So if we fix this line to be: const int num_retries = timeout_in_seconds * USEC_PER_SEC / time_delta_usecs; This doesn't work? I am assuming you tried this?
lldb/tools/darwin-debug/darwin-debug.cpp
161	"close(socket_fd)" here if we fail and are going to exit?

Thanks for the review Greg. To really do this reliably, we probably either need to use a different kernel API, or push this down into debugserver where we can get the process' task port and inspect what its status is. This patch is more a preliminary one to get it working much of the time -- getting it to always work is going to require a slightly different approach.

lldb/source/Host/macosx/objcxx/Host.mm
154–155	It is. darwin-debug is going to call posix_spawn after this, which exec's the app binary, which is launched and stopped at dyld_start. At the same time, lldb is going to send the pid to debugserver which is going to attach to it. In my own testing, the lldb+debugserver work was much slower than the app binary launch even without the sleep. I'm talking with the kernel folks to see if there's any way we could do better. The best approach may be to push all this logic down into debugserver which can task_for_pid and check the task suspend count directly. There are some other reasons why a task suspend count may be nonzero, but mostly it means we're done launching.
159	That would make the loop execute more than 0 times but the return value from proc_pidinfo is the size of the struct read, not an errno value like this is treating it as. And the app will not have a pbi_status of SSTOP when the task is suspended. There's no way to detect this via the BSD side of the kernel APIs as near as I can find.
lldb/tools/darwin-debug/darwin-debug.cpp
161	Good point.

ok if this works for now that is fine. Just close the socket_fd when we fail to write the pid to the socket and this is good to go.

This revision is now accepted and ready to land.Jan 16 2020, 1:21 PM

In D72813#1825006, @clayborg wrote:

ok if this works for now that is fine. Just close the socket_fd when we fail to write the pid to the socket and this is good to go.

I mark it as close-on-exec so we don't need to close it explicitly (I was going to try to detect the closing of the socket up in lldb to get us closer to the new process being launched, but the existing Read method doesn't detect it)

I'm looking at whether we can special case the "darwin-debug execs a process, and that process is started suspended" handling in debugserver/lldb. If we attach while darwin-debug is executing, then we get the exec mach exception, and the inferior has a suspend count of two -- one, because we stopped for the mach exception we just got, but a second suspend because darwin-debug asked for the inferior to be started suspended. We need that second suspend count if we don't attach until after the inferior has been started. So I might toss this entire approach; working on this some more.

In D72813#1825027, @jasonmolenda wrote:

If we attach while darwin-debug is executing, then we get the exec mach exception, ....

Not really my thing, but couldn't you just ensure that you *always* attach while darwin-debug is executing? E.g., the binary could do something like:

send_pid();
while(!am_i_being_debugged()) usleep(1000);
posix_spawn(...);

I had sort of assumed this wasn't possible because of SIP, but your comment makes it sounds that this does work at least sometimes...

BTW, is quite unfortunate that the CLOEXEC trick doesn't work. We use that for launching on linux, and it's pretty neat. The best part about it is that if the execve() fails, you still get to keep the fd and can send an error message about what went wrong. This bit of code uses the libc api directly, so I don't know if the problem here is with the lldb Read function or the darwin system apis...

Taking a different approach to fixing this.

Revision Contents

Path

Size

		lldb/
	i/	lldb/

source/

Host/

macosx/

objcxx/

Host.mm

44 lines

tools/

darwin-debug/

darwin-debug.cpp

45 lines

Diff 238387

lldb/source/Host/macosx/objcxx/Host.mm

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	const char connect_url = (const char )arg;			const char connect_url = (const char )arg;
	ConnectionFileDescriptor file_conn;			ConnectionFileDescriptor file_conn;
	Status error;			Status error;
	if (file_conn.Connect(connect_url, &error) == eConnectionStatusSuccess) {			if (file_conn.Connect(connect_url, &error) == eConnectionStatusSuccess) {
	char pid_str[256];			char pid_str[256];
	::memset(pid_str, 0, sizeof(pid_str));			::memset(pid_str, 0, sizeof(pid_str));
	ConnectionStatus status;			ConnectionStatus status;
	const size_t pid_str_len = file_conn.Read(			const size_t pid_str_len = file_conn.Read(
	pid_str, sizeof(pid_str), std::chrono::seconds(0), status, NULL);			pid_str, sizeof(pid_str), std::chrono::seconds(5), status, NULL);
	if (pid_str_len > 0) {			if (pid_str_len > 0) {
	int pid = atoi(pid_str);			int pid = atoi(pid_str);

				// The inferior sends its pid right before it calls posix_spawn.
				// We want to wait a tiny bit for the next process to get
				// started, and stop at dyld_start, before we attach.
				const int short_sleep = 100000; // 0.1 seconds
				::usleep(short_sleep);
				clayborgUnsubmitted Not Done Reply Inline Actions This seems racy still? clayborg: This seems racy still?
				jasonmolendaAuthorUnsubmitted Done Reply Inline Actions It is. darwin-debug is going to call posix_spawn after this, which exec's the app binary, which is launched and stopped at dyld_start. At the same time, lldb is going to send the pid to debugserver which is going to attach to it. In my own testing, the lldb+debugserver work was much slower than the app binary launch even without the sleep. I'm talking with the kernel folks to see if there's any way we could do better. The best approach may be to push all this logic down into debugserver which can task_for_pid and check the task suspend count directly. There are some other reasons why a task suspend count may be nonzero, but mostly it means we're done launching. jasonmolenda: It is. darwin-debug is going to call posix_spawn after this, which exec's the app binary…

	return (void *)(intptr_t)pid;			return (void *)(intptr_t)pid;
	}			}
	}			}
	return NULL;			return NULL;
	}			}

	static bool WaitForProcessToSIGSTOP(const lldb::pid_t pid,
	const int timeout_in_seconds) {
	const int time_delta_usecs = 100000;
	const int num_retries = timeout_in_seconds / time_delta_usecs;
	clayborgUnsubmitted Not Done Reply Inline Actions So if we fix this line to be: const int num_retries = timeout_in_seconds * USEC_PER_SEC / time_delta_usecs; This doesn't work? I am assuming you tried this? clayborg: So if we fix this line to be: ``` const int num_retries = timeout_in_seconds * USEC_PER_SEC /…
	jasonmolendaAuthorUnsubmitted Done Reply Inline Actions That would make the loop execute more than 0 times but the return value from proc_pidinfo is the size of the struct read, not an errno value like this is treating it as. And the app will not have a pbi_status of SSTOP when the task is suspended. There's no way to detect this via the BSD side of the kernel APIs as near as I can find. jasonmolenda: That would make the loop execute more than 0 times but the return value from proc_pidinfo is…
	for (int i = 0; i < num_retries; i++) {
	struct proc_bsdinfo bsd_info;
	int error = ::proc_pidinfo(pid, PROC_PIDTBSDINFO, (uint64_t)0, &bsd_info,
	PROC_PIDTBSDINFO_SIZE);

	switch (error) {
	case EINVAL:
	case ENOTSUP:
	case ESRCH:
	case EPERM:
	return false;

	default:
	break;

	case 0:
	if (bsd_info.pbi_status == SSTOP)
	return true;
	}
	::usleep(time_delta_usecs);
	}
	return false;
	}
	#if !defined(__arm__) && !defined(__arm64__) && !defined(__aarch64__)			#if !defined(__arm__) && !defined(__arm64__) && !defined(__aarch64__)

	const char *applscript_in_new_tty = "tell application \"Terminal\"\n"			const char *applscript_in_new_tty = "tell application \"Terminal\"\n"
	" activate\n"			" activate\n"
	" do script \"/bin/bash -c '%s';exit\"\n"			" do script \"/bin/bash -c '%s';exit\"\n"
	"end tell\n";			"end tell\n";

	const char *applscript_in_existing_tty = "\			const char *applscript_in_existing_tty = "\
	▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines

	if (!accept_thread)			if (!accept_thread)
	return Status(accept_thread.takeError());			return Status(accept_thread.takeError());

	[applescript executeAndReturnError:nil];			[applescript executeAndReturnError:nil];

	thread_result_t accept_thread_result = NULL;			thread_result_t accept_thread_result = NULL;
	lldb_error = accept_thread->Join(&accept_thread_result);			lldb_error = accept_thread->Join(&accept_thread_result);
	if (lldb_error.Success() && accept_thread_result) {			if (lldb_error.Success() && accept_thread_result)
	pid = (intptr_t)accept_thread_result;			pid = (intptr_t)accept_thread_result;

	// Wait for process to be stopped at the entry point by watching
	// for the process status to be set to SSTOP which indicates it it
	// SIGSTOP'ed at the entry point
	WaitForProcessToSIGSTOP(pid, 5);
	}

	llvm::sys::fs::remove(unix_socket_name);			llvm::sys::fs::remove(unix_socket_name);
	[applescript release];			[applescript release];
	if (pid != LLDB_INVALID_PROCESS_ID)			if (pid != LLDB_INVALID_PROCESS_ID)
	launch_info.SetProcessID(pid);			launch_info.SetProcessID(pid);
	return error;			return error;
	}			}

	#endif // #if !defined(__arm__) && !defined(__arm64__) && !defined(__aarch64__)			#endif // #if !defined(__arm__) && !defined(__arm64__) && !defined(__aarch64__)
	▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

lldb/tools/darwin-debug/darwin-debug.cpp

Show All 17 Lines
// flag to stop the program at the entry point.		// flag to stop the program at the entry point.
//		//
// Since it uses darwin specific flags this code should not be compiled		// Since it uses darwin specific flags this code should not be compiled
// on other systems.		// on other systems.
#if defined(__APPLE__)		#if defined(__APPLE__)

#include <crt_externs.h>		#include <crt_externs.h>
#include <getopt.h>		#include <getopt.h>
		#include <fcntl.h>
#include <limits.h>		#include <limits.h>
#include <mach/machine.h>		#include <mach/machine.h>
#include <signal.h>		#include <signal.h>
#include <spawn.h>		#include <spawn.h>
#include <stdio.h>		#include <stdio.h>
#include <stdlib.h>		#include <stdlib.h>
#include <string.h>		#include <string.h>
		#include <sys/errno.h>
#include <sys/socket.h>		#include <sys/socket.h>
#include <sys/stat.h>		#include <sys/stat.h>
#include <sys/types.h>		#include <sys/types.h>
#include <sys/un.h>		#include <sys/un.h>

#include <string>		#include <string>

#ifndef _POSIX_SPAWN_DISABLE_ASLR		#ifndef _POSIX_SPAWN_DISABLE_ASLR
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	static void exit_with_errno(int err, const char *prefix) {
if (err) {		if (err) {
fprintf(stderr, "%s%s", prefix ? prefix : "", strerror(err));		fprintf(stderr, "%s%s", prefix ? prefix : "", strerror(err));
exit(err);		exit(err);
}		}
}		}

pid_t posix_spawn_for_debug(char const argv, char const envp,		pid_t posix_spawn_for_debug(char const argv, char const envp,
const char *working_dir, cpu_type_t cpu_type,		const char *working_dir, cpu_type_t cpu_type,
int disable_aslr) {		int disable_aslr, int socket_fd) {
pid_t pid = 0;		pid_t pid = 0;

const char *path = argv[0];		const char *path = argv[0];

posix_spawnattr_t attr;		posix_spawnattr_t attr;

exit_with_errno(::posix_spawnattr_init(&attr),		exit_with_errno(::posix_spawnattr_init(&attr),
"::posix_spawnattr_init (&attr) error: ");		"::posix_spawnattr_init (&attr) error: ");
Show All 33 Lines	pid_t posix_spawn_for_debug(char const argv, char const envp,
// the inferior process we will spawn, but there currently isn't. If there		// the inferior process we will spawn, but there currently isn't. If there
// ever is a better way to do this, we should use it. I would rather not		// ever is a better way to do this, we should use it. I would rather not
// manually fork, chdir in the child process, and then posix_spawn with exec		// manually fork, chdir in the child process, and then posix_spawn with exec
// as the whole reason for doing posix_spawn is to not hose anything up		// as the whole reason for doing posix_spawn is to not hose anything up
// after the fork and prior to the exec...		// after the fork and prior to the exec...
if (working_dir)		if (working_dir)
::chdir(working_dir);		::chdir(working_dir);

		// We were able to connect to the socket, now write our PID so whomever
		// launched us will know this process's ID
		char pid_str[64];
		const int pid_str_len =
		::snprintf(pid_str, sizeof(pid_str), "%i", ::getpid());
		const int bytes_sent = ::send(socket_fd, pid_str, pid_str_len, 0);

		if (pid_str_len != bytes_sent) {
		perror("error: send (socket_fd, pid_str, pid_str_len, 0)");
		exit(1);
		clayborgUnsubmitted Not Done Reply Inline Actions "close(socket_fd)" here if we fail and are going to exit? clayborg: "close(socket_fd)" here if we fail and are going to exit?
		jasonmolendaAuthorUnsubmitted Done Reply Inline Actions Good point. jasonmolenda: Good point.
		}

		// Set the socket socket_fd marked as close-on-exec, leave
		// it open for now. lldb might use this to detect when the
		// exec has happened, and we can attach to the inferior
		// safely.
		errno = 0;
		int opts = fcntl (socket_fd, F_GETFL);
		if (errno == 0) {
		opts = opts \| O_CLOEXEC;
		fcntl (socket_fd, F_SETFL, opts);
		}

exit_with_errno(::posix_spawnp(&pid, path, NULL, &attr, (char const )argv,		exit_with_errno(::posix_spawnp(&pid, path, NULL, &attr, (char const )argv,
(char const )envp),		(char const )envp),
"posix_spawn() error: ");		"posix_spawn() error: ");

// This code will only be reached if the posix_spawn exec failed...		// This code will only be reached if the posix_spawn exec failed...
::posix_spawnattr_destroy(&attr);		::posix_spawnattr_destroy(&attr);

return pid;		return pid;
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	#endif
saddr_un.sun_path[sizeof(saddr_un.sun_path) - 1] = '\0';		saddr_un.sun_path[sizeof(saddr_un.sun_path) - 1] = '\0';
saddr_un.sun_len = SUN_LEN(&saddr_un);		saddr_un.sun_len = SUN_LEN(&saddr_un);

if (::connect(s, (struct sockaddr *)&saddr_un, SUN_LEN(&saddr_un)) < 0) {		if (::connect(s, (struct sockaddr *)&saddr_un, SUN_LEN(&saddr_un)) < 0) {
perror("error: connect (socket, &saddr_un, saddr_un_len)");		perror("error: connect (socket, &saddr_un, saddr_un_len)");
exit(1);		exit(1);
}		}

// We were able to connect to the socket, now write our PID so whomever
// launched us will know this process's ID
char pid_str[64];
const int pid_str_len =
::snprintf(pid_str, sizeof(pid_str), "%i", ::getpid());
const int bytes_sent = ::send(s, pid_str, pid_str_len, 0);

if (pid_str_len != bytes_sent) {
perror("error: send (s, pid_str, pid_str_len, 0)");
exit(1);
}

// We are done with the socket
close(s);

system("clear");		system("clear");
printf("Launching: '%s'\n", argv[0]);		printf("Launching: '%s'\n", argv[0]);
if (working_dir.empty()) {		if (working_dir.empty()) {
char cwd[PATH_MAX];		char cwd[PATH_MAX];
const char *cwd_ptr = getcwd(cwd, sizeof(cwd));		const char *cwd_ptr = getcwd(cwd, sizeof(cwd));
printf("Working directory: '%s'\n", cwd_ptr);		printf("Working directory: '%s'\n", cwd_ptr);
} else {		} else {
printf("Working directory: '%s'\n", working_dir.c_str());		printf("Working directory: '%s'\n", working_dir.c_str());
}		}
printf("%i arguments:\n", argc);		printf("%i arguments:\n", argc);

for (int i = 0; i < argc; ++i)		for (int i = 0; i < argc; ++i)
printf("argv[%u] = '%s'\n", i, argv[i]);		printf("argv[%u] = '%s'\n", i, argv[i]);

// Now we posix spawn to exec this process into the inferior that we want		// Now we posix spawn to exec this process into the inferior that we want
// to debug.		// to debug.
posix_spawn_for_debug(		posix_spawn_for_debug(
argv,		argv,
pass_env ? *_NSGetEnviron() : NULL, // Pass current environment as we may		pass_env ? *_NSGetEnviron() : NULL, // Pass current environment as we may
// have modified it if "--env" options		// have modified it if "--env" options
// was used, do NOT pass "envp" here		// was used, do NOT pass "envp" here
working_dir.empty() ? NULL : working_dir.c_str(), cpu_type, disable_aslr);		working_dir.empty() ? NULL : working_dir.c_str(), cpu_type, disable_aslr,
		s);

return 0;		return 0;
}		}

#endif // #if defined (__APPLE__)		#endif // #if defined (__APPLE__)