This is an archive of the discontinued LLVM Phabricator instance.

Differential D75004

Fix a race between lldb's packet timeout and killing the profile thread
ClosedPublic

Authored by jingham on Feb 21 2020, 5:01 PM.

Download Raw Diff

Details

Reviewers

clayborg

Commits

rG3cd13c4624b5: Fix a race between lldb's packet timeout and the profile thread's usleep.

Summary

The debugserver profile thread used to suspend itself between samples with
a usleep. When you detach or kill, MachProcess::Clear would delay replying
to the incoming packet until pthread_join of the profile thread returned.
If you are unlucky or the suspend delay is long, it could take longer than
the packet timeout for pthread_join to return. Then you would get an error
about detach not succeeding from lldb - even though in fact the detach was
successful...

I replaced the usleep with PThreadEvents entity. Then we just call a timed
WaitForEventBits, and when debugserver wants to stop the profile thread, it
can set the event bit, and the sleep will exit immediately.

Note, you have to get fairly unlucky because when lldb times out a packet it
then sends a qEcho, and tries to get back in sync. That adds some extra delay
which might give the detach a chance to succeed. I've had occasional mysterious
reports of Detach failing like this - only under Xcode which is currently the only client
I know of of the profiling info for years, but didn't get to chasing it down till now.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jingham created this revision.Feb 21 2020, 5:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2020, 5:01 PM

Herald added a subscriber: lldb-commits. · View Herald Transcript

Lots of change for something that might be fixed much easier:

Alt way: Why not just set m_profile_enabled to false in StopProfileThread() and let loop exit on next iteration? Code changes would be much smaller. All comments above are marked with "alt way:" for this solution

lldb/test/API/macosx/profile_vrs_detach/TestDetachVrsProfile.py
28	fix comment?
lldb/tools/debugserver/source/MacOSX/MachProcess.h
341–346	Alt way (see main comment): remove these lines
385	Alt way: remove
lldb/tools/debugserver/source/MacOSX/MachProcess.mm
28	remove
34	remove
490	alt way: remove
1326	alt way: m_profile_enabled = false;
1329	alt way: remove
2511–2538	Alt way: revert all changes here, when we call StopProfileThread, it will set m_profile_enabled = false and this loop will naturally exit. Your way: Move the conversion of profile interval out of the loop? timespec ts; using namespace std::chrono; std::chrono::microseconds dur(proc->ProfileInterval()); const auto dur_secs = duration_cast<seconds>(dur); const auto dur_usecs = dur % std::chrono::seconds(1); while (proc->IsProfilingEnabled()) { nub_state_t state = proc->GetState(); if (state == eStateRunning) { std::string data = proc->Task().GetProfileData(proc->GetProfileScanType()); if (!data.empty()) { proc->SignalAsyncProfileData(data.c_str()); } } else if ((state == eStateUnloaded) \|\| (state == eStateDetached) \|\| (state == eStateUnloaded)) { // Done. Get out of this thread. break; } DNBTimer::OffsetTimeOfDay(&ts, dur_secs.count(), dur_usecs.count()); // Exit if requested. if (proc->m_profile_events.WaitForSetEvents(eMachProcessProfileCancel, &ts)) break; }

Jim Ingham said in email:

I don’t understand your suggestion. The point of this change was to get the profile loop to exit without waiting the full sleep time. That time is generally pretty long (1 second or thereabouts) and so if you have the main debugserver thread wait on the profile thread’s timeout before replying to the detach packet, then lldb might have already timed out waiting for the packet reply by the time it does so.

Gotcha, then yes ignore my comments for alt way...

You could also fix this by not doing the pthread_join but instead have the detaching finish independently and then let the profile thread exit after the fact, but that seems like asking for trouble. Or you could reply to the detach right away, and then wait on the profile thread to exit as debugserver is going down. But the current code is not at all set up to do that.

Anyway, this solution has the benefit of being exactly what you want to have happen - the profile thread stops its wait sleep immediately when told to exit. And it relies on well tested mechanisms, and doesn’t seem terribly intrusive.

That is true.

BTW, I thought about doing the time calculation outside the profile thread loop, but I couldn’t see anything that said you have to stop and restart the profile loop when you change the timeout. So it didn’t seem safe to only calculate the timeout once. However, if we want to require that you stop and start the profile thread when you change its wait interval it would be fine to do that.

Fair point. All objections removed then, as this is the most performant and bullet proof solution.

This revision is now accepted and ready to land.Feb 24 2020, 11:10 AM

Closed by commit rG3cd13c4624b5: Fix a race between lldb's packet timeout and the profile thread's usleep. (authored by jingham). · Explain WhyFeb 25 2020, 11:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lldb/

test/

API/

macosx/

profile_vrs_detach/

Makefile

4 lines

TestDetachVrsProfile.py

76 lines

main.c

11 lines

tools/

debugserver/

source/

MacOSX/

MachProcess.h

9 lines

MachProcess.mm

38 lines

Diff 246518

lldb/test/API/macosx/profile_vrs_detach/Makefile

This file was added.

				C_SOURCES := main.c
				CFLAGS_EXTRAS := -std=c99

				include Makefile.rules

lldb/test/API/macosx/profile_vrs_detach/TestDetachVrsProfile.py

This file was added.

				"""
				debugserver used to block replying to the 'D' packet
				till it had joined the profiling thread. If the profiling interval
				was too long, that would mean it would take longer than the packet
				timeout to reply, and the detach would time out. Make sure that doesn't
				happen.
				"""



				import lldb
				import lldbsuite.test.lldbutil as lldbutil
				from lldbsuite.test.lldbtest import *
				from lldbsuite.test.decorators import *
				import os
				import signal

				class TestDetachVrsProfile(TestBase):

				mydir = TestBase.compute_mydir(__file__)

				NO_DEBUG_INFO_TESTCASE = True

				@skipUnlessDarwin
				@skipIfOutOfTreeDebugserver
				def test_profile_and_detach(self):
				"""There can be many tests in a test case - describe this test here."""
				self.build()
				clayborgUnsubmitted Not Done Reply Inline Actions fix comment? clayborg: fix comment?
				self.main_source_file = lldb.SBFileSpec("main.c")
				self.do_profile_and_detach()

				def do_profile_and_detach(self):
				(target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(self,
				"Set a breakpoint here", self.main_source_file)

				interp = self.dbg.GetCommandInterpreter()
				result = lldb.SBCommandReturnObject()

				# First make sure we are getting async data. Set a short interval, continue a bit and check:
				interp.HandleCommand("process plugin packet send 'QSetEnableAsyncProfiling;enable:1;interval_usec:500000;scan_type=0x5;'", result)
				self.assertTrue(result.Succeeded(), "process packet send failed: %s"%(result.GetError()))

				# Run a bit to give us a change to collect profile data:
				bkpt.SetIgnoreCount(1)
				threads = lldbutil.continue_to_breakpoint(process, bkpt)
				self.assertEqual(len(threads), 1, "Hit our breakpoint again.")
				str = process.GetAsyncProfileData(1000)
				self.assertTrue(len(str) > 0, "Got some profile data")

				# Now make the profiling interval very long and try to detach.
				interp.HandleCommand("process plugin packet send 'QSetEnableAsyncProfiling;enable:1;interval_usec:10000000;scan_type=0x5;'", result)
				self.assertTrue(result.Succeeded(), "process packet send failed: %s"%(result.GetError()))
				self.dbg.SetAsync(True)
				listener = self.dbg.GetListener()

				# We don't want to hit our breakpoint anymore.
				bkpt.SetEnabled(False)

				# Record our process pid so we can kill it since we are going to detach...
				self.pid = process.GetProcessID()
				def cleanup():
				self.dbg.SetAsync(False)
				os.kill(self.pid, signal.SIGKILL)
				self.addTearDownHook(cleanup)

				process.Continue()

				event = lldb.SBEvent()
				success = listener.WaitForEventForBroadcaster(0, process.GetBroadcaster(), event)
				self.assertTrue(success, "Got an event which should be running.")
				event_state = process.GetStateFromEvent(event)
				self.assertEqual(event_state, lldb.eStateRunning, "Got the running event")

				# Now detach:
				error = process.Detach()
				self.assertTrue(error.Success(), "Detached successfully")

lldb/test/API/macosx/profile_vrs_detach/main.c

This file was added.

				#include <stdio.h>

				int
				main()
				{
				while (1) {
				sleep(1); // Set a breakpoint here
				printf("I slept\n");
				}
				return 0;
				}

lldb/tools/debugserver/source/MacOSX/MachProcess.h

Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines
private:		private:
enum {		enum {
eMachProcessFlagsNone = 0,		eMachProcessFlagsNone = 0,
eMachProcessFlagsAttached = (1 << 0),		eMachProcessFlagsAttached = (1 << 0),
eMachProcessFlagsUsingBKS = (1 << 2), // only read via ProcessUsingBackBoard()		eMachProcessFlagsUsingBKS = (1 << 2), // only read via ProcessUsingBackBoard()
eMachProcessFlagsUsingFBS = (1 << 3), // only read via ProcessUsingFrontBoard()		eMachProcessFlagsUsingFBS = (1 << 3), // only read via ProcessUsingFrontBoard()
eMachProcessFlagsBoardCalculated = (1 << 4)		eMachProcessFlagsBoardCalculated = (1 << 4)
};		};

		enum {
		eMachProcessProfileNone = 0,
		eMachProcessProfileCancel = (1 << 0)
		};

		clayborgUnsubmitted Not Done Reply Inline Actions Alt way (see main comment): remove these lines clayborg: Alt way (see main comment): remove these lines
void Clear(bool detaching = false);		void Clear(bool detaching = false);
void ReplyToAllExceptions();		void ReplyToAllExceptions();
void PrivateResume();		void PrivateResume();
		void StopProfileThread();

uint32_t Flags() const { return m_flags; }		uint32_t Flags() const { return m_flags; }
nub_state_t DoSIGSTOP(bool clear_bps_and_wps, bool allow_running,		nub_state_t DoSIGSTOP(bool clear_bps_and_wps, bool allow_running,
uint32_t *thread_idx_ptr);		uint32_t *thread_idx_ptr);

pid_t m_pid; // Process ID of child process		pid_t m_pid; // Process ID of child process
cpu_type_t m_cpu_type; // The CPU type of this process		cpu_type_t m_cpu_type; // The CPU type of this process
int m_child_stdin;		int m_child_stdin;
Show All 18 Lines	private:
DNBProfileDataScanType		DNBProfileDataScanType
m_profile_scan_type; // Indicates what needs to be profiled		m_profile_scan_type; // Indicates what needs to be profiled
pthread_t		pthread_t
m_profile_thread; // Thread ID for the thread that profiles the inferior		m_profile_thread; // Thread ID for the thread that profiles the inferior
PThreadMutex		PThreadMutex
m_profile_data_mutex; // Multithreaded protection for profile info data		m_profile_data_mutex; // Multithreaded protection for profile info data
std::vector<std::string>		std::vector<std::string>
m_profile_data; // Profile data, must be protected by m_profile_data_mutex		m_profile_data; // Profile data, must be protected by m_profile_data_mutex
		PThreadEvent m_profile_events; // Used for the profile thread cancellable wait
		clayborgUnsubmitted Not Done Reply Inline Actions Alt way: remove clayborg: Alt way: remove
DNBThreadResumeActions m_thread_actions; // The thread actions for the current		DNBThreadResumeActions m_thread_actions; // The thread actions for the current
// MachProcess::Resume() call		// MachProcess::Resume() call
MachException::Message::collection m_exception_messages; // A collection of		MachException::Message::collection m_exception_messages; // A collection of
// exception messages		// exception messages
// caught when		// caught when
// listening to the		// listening to the
// exception port		// exception port
PThreadMutex m_exception_messages_mutex; // Multithreaded protection for		PThreadMutex m_exception_messages_mutex; // Multithreaded protection for
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lldb/tools/debugserver/source/MacOSX/MachProcess.mm

Show All 19 Lines
#include <mach/task.h>		#include <mach/task.h>
#include <pthread.h>		#include <pthread.h>
#include <signal.h>		#include <signal.h>
#include <spawn.h>		#include <spawn.h>
#include <sys/fcntl.h>		#include <sys/fcntl.h>
#include <sys/ptrace.h>		#include <sys/ptrace.h>
#include <sys/stat.h>		#include <sys/stat.h>
#include <sys/sysctl.h>		#include <sys/sysctl.h>
		#include <sys/time.h>
		clayborgUnsubmitted Not Done Reply Inline Actions remove clayborg: remove
#include <sys/types.h>		#include <sys/types.h>
#include <unistd.h>		#include <unistd.h>
#include <uuid/uuid.h>		#include <uuid/uuid.h>

#include <algorithm>		#include <algorithm>
		#include <chrono>
		clayborgUnsubmitted Not Done Reply Inline Actions remove clayborg: remove
#include <map>		#include <map>

#import <Foundation/Foundation.h>		#import <Foundation/Foundation.h>

#include "DNBDataRef.h"		#include "DNBDataRef.h"
#include "DNBLog.h"		#include "DNBLog.h"
#include "DNBThreadResumeActions.h"		#include "DNBThreadResumeActions.h"
#include "DNBTimer.h"		#include "DNBTimer.h"
▲ Show 20 Lines • Show All 439 Lines • ▼ Show 20 Lines

MachProcess::MachProcess()		MachProcess::MachProcess()
: m_pid(0), m_cpu_type(0), m_child_stdin(-1), m_child_stdout(-1),		: m_pid(0), m_cpu_type(0), m_child_stdin(-1), m_child_stdout(-1),
m_child_stderr(-1), m_path(), m_args(), m_task(this),		m_child_stderr(-1), m_path(), m_args(), m_task(this),
m_flags(eMachProcessFlagsNone), m_stdio_thread(0),		m_flags(eMachProcessFlagsNone), m_stdio_thread(0),
m_stdio_mutex(PTHREAD_MUTEX_RECURSIVE), m_stdout_data(),		m_stdio_mutex(PTHREAD_MUTEX_RECURSIVE), m_stdout_data(),
m_profile_enabled(false), m_profile_interval_usec(0), m_profile_thread(0),		m_profile_enabled(false), m_profile_interval_usec(0), m_profile_thread(0),
m_profile_data_mutex(PTHREAD_MUTEX_RECURSIVE), m_profile_data(),		m_profile_data_mutex(PTHREAD_MUTEX_RECURSIVE), m_profile_data(),
		m_profile_events(0, eMachProcessProfileCancel),
		clayborgUnsubmitted Not Done Reply Inline Actions alt way: remove clayborg: alt way: remove
m_thread_actions(), m_exception_messages(),		m_thread_actions(), m_exception_messages(),
m_exception_messages_mutex(PTHREAD_MUTEX_RECURSIVE), m_thread_list(),		m_exception_messages_mutex(PTHREAD_MUTEX_RECURSIVE), m_thread_list(),
m_activities(), m_state(eStateUnloaded),		m_activities(), m_state(eStateUnloaded),
m_state_mutex(PTHREAD_MUTEX_RECURSIVE), m_events(0, kAllEventsMask),		m_state_mutex(PTHREAD_MUTEX_RECURSIVE), m_events(0, kAllEventsMask),
m_private_events(0, kAllEventsMask), m_breakpoints(), m_watchpoints(),		m_private_events(0, kAllEventsMask), m_breakpoints(), m_watchpoints(),
m_name_to_addr_callback(NULL), m_name_to_addr_baton(NULL),		m_name_to_addr_callback(NULL), m_name_to_addr_baton(NULL),
m_image_infos_callback(NULL), m_image_infos_baton(NULL),		m_image_infos_callback(NULL), m_image_infos_baton(NULL),
m_sent_interrupt_signo(0), m_auto_resume_signo(0), m_did_exec(false),		m_sent_interrupt_signo(0), m_auto_resume_signo(0), m_did_exec(false),
▲ Show 20 Lines • Show All 793 Lines • ▼ Show 20 Lines	void MachProcess::Clear(bool detaching) {
m_flags = eMachProcessFlagsNone;		m_flags = eMachProcessFlagsNone;
m_stop_count = 0;		m_stop_count = 0;
m_thread_list.Clear();		m_thread_list.Clear();
{		{
PTHREAD_MUTEX_LOCKER(locker, m_exception_messages_mutex);		PTHREAD_MUTEX_LOCKER(locker, m_exception_messages_mutex);
m_exception_messages.clear();		m_exception_messages.clear();
}		}
m_activities.Clear();		m_activities.Clear();
if (m_profile_thread) {		StopProfileThread();
pthread_join(m_profile_thread, NULL);
m_profile_thread = NULL;
}
}		}

bool MachProcess::StartSTDIOThread() {		bool MachProcess::StartSTDIOThread() {
DNBLogThreadedIf(LOG_PROCESS, "MachProcess::%s ( )", __FUNCTION__);		DNBLogThreadedIf(LOG_PROCESS, "MachProcess::%s ( )", __FUNCTION__);
// Create the thread that watches for the child STDIO		// Create the thread that watches for the child STDIO
return ::pthread_create(&m_stdio_thread, NULL, MachProcess::STDIOThread,		return ::pthread_create(&m_stdio_thread, NULL, MachProcess::STDIOThread,
this) == 0;		this) == 0;
}		}

void MachProcess::SetEnableAsyncProfiling(bool enable, uint64_t interval_usec,		void MachProcess::SetEnableAsyncProfiling(bool enable, uint64_t interval_usec,
DNBProfileDataScanType scan_type) {		DNBProfileDataScanType scan_type) {
m_profile_enabled = enable;		m_profile_enabled = enable;
m_profile_interval_usec = static_cast<useconds_t>(interval_usec);		m_profile_interval_usec = static_cast<useconds_t>(interval_usec);
m_profile_scan_type = scan_type;		m_profile_scan_type = scan_type;

if (m_profile_enabled && (m_profile_thread == NULL)) {		if (m_profile_enabled && (m_profile_thread == NULL)) {
StartProfileThread();		StartProfileThread();
} else if (!m_profile_enabled && m_profile_thread) {		} else if (!m_profile_enabled && m_profile_thread) {
		StopProfileThread();
		}
		}

		void MachProcess::StopProfileThread() {
		if (m_profile_thread == NULL)
		return;
		m_profile_events.SetEvents(eMachProcessProfileCancel);
		clayborgUnsubmitted Not Done Reply Inline Actions alt way: m_profile_enabled = false; clayborg: alt way: ``` m_profile_enabled = false; ```
pthread_join(m_profile_thread, NULL);		pthread_join(m_profile_thread, NULL);
m_profile_thread = NULL;		m_profile_thread = NULL;
}		m_profile_events.ResetEvents(eMachProcessProfileCancel);
		clayborgUnsubmitted Not Done Reply Inline Actions alt way: remove clayborg: alt way: remove
}		}

bool MachProcess::StartProfileThread() {		bool MachProcess::StartProfileThread() {
DNBLogThreadedIf(LOG_PROCESS, "MachProcess::%s ( )", __FUNCTION__);		DNBLogThreadedIf(LOG_PROCESS, "MachProcess::%s ( )", __FUNCTION__);
// Create the thread that profiles the inferior and reports back if enabled		// Create the thread that profiles the inferior and reports back if enabled
return ::pthread_create(&m_profile_thread, NULL, MachProcess::ProfileThread,		return ::pthread_create(&m_profile_thread, NULL, MachProcess::ProfileThread,
this) == 0;		this) == 0;
}		}
▲ Show 20 Lines • Show All 1,165 Lines • ▼ Show 20 Lines	void MachProcess::ProfileThread(void arg) {
DNBLogThreadedIf(LOG_PROCESS,		DNBLogThreadedIf(LOG_PROCESS,
"MachProcess::%s ( arg = %p ) thread starting...",		"MachProcess::%s ( arg = %p ) thread starting...",
__FUNCTION__, arg);		__FUNCTION__, arg);

#if defined(__APPLE__)		#if defined(__APPLE__)
pthread_setname_np("performance profiling thread");		pthread_setname_np("performance profiling thread");
#endif		#endif

while (proc->IsProfilingEnabled()) {		while (proc->IsProfilingEnabled()) {
nub_state_t state = proc->GetState();		nub_state_t state = proc->GetState();
if (state == eStateRunning) {		if (state == eStateRunning) {
std::string data =		std::string data =
proc->Task().GetProfileData(proc->GetProfileScanType());		proc->Task().GetProfileData(proc->GetProfileScanType());
if (!data.empty()) {		if (!data.empty()) {
proc->SignalAsyncProfileData(data.c_str());		proc->SignalAsyncProfileData(data.c_str());
}		}
} else if ((state == eStateUnloaded) \|\| (state == eStateDetached) \|\|		} else if ((state == eStateUnloaded) \|\| (state == eStateDetached) \|\|
(state == eStateUnloaded)) {		(state == eStateUnloaded)) {
// Done. Get out of this thread.		// Done. Get out of this thread.
break;		break;
}		}
		timespec ts;
// A simple way to set up the profile interval. We can also use select() or		{
// dispatch timer source if necessary.		using namespace std::chrono;
usleep(proc->ProfileInterval());		std::chrono::microseconds dur(proc->ProfileInterval());
		const auto dur_secs = duration_cast<seconds>(dur);
		const auto dur_usecs = dur % std::chrono::seconds(1);
		DNBTimer::OffsetTimeOfDay(&ts, dur_secs.count(),
		dur_usecs.count());
		}
		uint32_t bits_set =
		proc->m_profile_events.WaitForSetEvents(eMachProcessProfileCancel, &ts);
		// If we got bits back, we were told to exit. Do so.
		if (bits_set & eMachProcessProfileCancel)
		break;
}		}
		clayborgUnsubmitted Not Done Reply Inline Actions Alt way: revert all changes here, when we call StopProfileThread, it will set m_profile_enabled = false and this loop will naturally exit. Your way: Move the conversion of profile interval out of the loop? timespec ts; using namespace std::chrono; std::chrono::microseconds dur(proc->ProfileInterval()); const auto dur_secs = duration_cast<seconds>(dur); const auto dur_usecs = dur % std::chrono::seconds(1); while (proc->IsProfilingEnabled()) { nub_state_t state = proc->GetState(); if (state == eStateRunning) { std::string data = proc->Task().GetProfileData(proc->GetProfileScanType()); if (!data.empty()) { proc->SignalAsyncProfileData(data.c_str()); } } else if ((state == eStateUnloaded) \|\| (state == eStateDetached) \|\| (state == eStateUnloaded)) { // Done. Get out of this thread. break; } DNBTimer::OffsetTimeOfDay(&ts, dur_secs.count(), dur_usecs.count()); // Exit if requested. if (proc->m_profile_events.WaitForSetEvents(eMachProcessProfileCancel, &ts)) break; } clayborg: Alt way: revert all changes here, when we call StopProfileThread, it will set m_profile_enabled…
return NULL;		return NULL;
}		}

pid_t MachProcess::AttachForDebug(pid_t pid, char *err_str, size_t err_len) {		pid_t MachProcess::AttachForDebug(pid_t pid, char *err_str, size_t err_len) {
// Clear out and clean up from any current state		// Clear out and clean up from any current state
Clear();		Clear();
if (pid != 0) {		if (pid != 0) {
DNBError err;		DNBError err;
▲ Show 20 Lines • Show All 1,505 Lines • Show Last 20 Lines