This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lldb/
-
include/lldb/Target/
-
lldb/
-
Target/
1
ThreadPlan.h
-
source/Target/
-
Target/
-
ThreadPlan.cpp
-
test/API/functionalities/plugins/python_os_plugin/stepping_plugin_threads/
-
API/
-
functionalities/
-
plugins/
-
python_os_plugin/
-
stepping_plugin_threads/
-
TestOSPluginStepping.py

Differential D86388

Fix use-after-free in ThreadPlan, and add test.
Needs ReviewPublic

Authored by comex on Aug 21 2020, 5:07 PM.

Download Raw Diff

Details

Reviewers

jingham
clayborg

Summary

Background:

ThreadPlan objects store a cached pointer to the associated Thread. To quote
the code:

We don't cache the thread pointer over resumes. This
Thread might go away, and another Thread represent
// the same underlying object on a later stop.

This can happen only when using an operating system plugin with
os-plugin-reports-all-threads = false (a new feature); otherwise, the
ThreadPlan will be wiped away when the Thread is.

Previously, this cached pointer was unowned, and ThreadPlan attempted to
prevent it from becoming stale by invalidating it in WillResume, reasoning that
the list of threads would only change whwen the target is running. However, it
turns out that the pointer can be re-cached after it's invalidated but before
the target actually starts running. At least one path where this happens is
ThreadPlan::ShouldReportRun -> GetPreviousPlan -> GetThread.

It might be possible to fix this by invalidating the pointer from other places,
but that seems unnecessarily risky and complicated. Instead, just keep around
a ThreadSP and check IsValid(), which becomes false when Thread::DestroyThread()
is called.

Note: This does not create a retain cycle because Thread does not own
ThreadPlans. (Even if it did, Thread::DestroyThread resets all of the thread's
owned pointers.)

As for testing, I made a small change to the existing reports-all-threads test
which causes it to trigger the use-after-free without the rest of the commit.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

comex created this revision.Aug 21 2020, 5:07 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2020, 5:07 PM

Herald added a subscriber: aaron.ballman. · View Herald Transcript

comex requested review of this revision.Aug 21 2020, 5:07 PM

Herald added a subscriber: JDevlieghere. · View Herald TranscriptAug 21 2020, 5:07 PM

comex retitled this revision from Fix use-after-free in ThreadPlan, and add test. Background: ThreadPlan objects store a cached pointer to the associated Thread. To quote the code: // We don't cache the thread pointer over resumes. This // Thread might go away, and another... to Fix use-after-free in ThreadPlan, and add test..Aug 21 2020, 5:09 PM

comex edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B69199: Diff 287139.Aug 21 2020, 5:33 PM

JDevlieghere added inline comments.Aug 21 2020, 11:11 PM

lldb/include/lldb/Target/ThreadPlan.h
603	drive-by nit: since you're touching this, can you make it a Doxygen comment (`///`) above the variable?

I'm confused as to how this patch actually fixes the problem. When the thread gets removed from the thread list, it should get Destroy called on it - which should set m_destroy_called, causing IsValid to return false.. So I am not clear under what circumstances FindThreadByID will fail, but the cached thread shared pointer's IsValid is still true? If IsValid is holding true over the thread's removal from the thread list, then I'm worried that this change will keep us using the old ThreadSP that was reported the next time we stopped and this thread ID was represented by a different ThreadSP.

In D86388#2234418, @jingham wrote:

I'm confused as to how this patch actually fixes the problem. When the thread gets removed from the thread list, it should get Destroy called on it - which should set m_destroy_called, causing IsValid to return false.. So I am not clear under what circumstances FindThreadByID will fail, but the cached thread shared pointer's IsValid is still true? If IsValid is holding true over the thread's removal from the thread list, then I'm worried that this change will keep us using the old ThreadSP that was reported the next time we stopped and this thread ID was represented by a different ThreadSP.

Hmm… I’m confused by your comment. This patch isn’t meant to address a situation where IsValid is true but FindThreadByID fails. It addresses the situation where a ThreadPlan already has a cached thread pointer (and would like to reuse it without calling FindThreadByID at all), but IsValid is false because the thread was removed from the thread list since it was cached.

The existing code doesn’t check IsValid; it assumes that the thread list can’t be changed until a resume happens, at which point WillResume will be called and reset the cached thread pointer to null. However, in the buggy case I found, GetThread is called again after WillResume has already run. GetThread sets the cached thread pointer back to non-null, and then later when the thread list actually changes, the pointer becomes dangling.

With this patch, the pointer never gets reset to null, so it can end up pointing to a thread that has been removed from the list. But now it’s a shared_ptr, so it at least keeps the thread object alive. And every time GetThread is called, it checks (using IsValid) whether the thread has been removed. If so, GetThread throws out the cached pointer and falls back to calling FindThreadByID.

What's calling into the thread plans when WillResume has already been called? That seems wrong, since the thread list is in an uncertain state, having been cleared out for resume and not reset after the stop. It seems to me it would be a better fix to ensure that we aren't doing that.

The sequence I found is:

WillResume
DoResume sends eBroadcastBitAsyncContinue to ProcessGDBRemote::AsyncThread
AsyncThread calls process->SetPrivateState(eStateRunning);
…which sends eBroadcastBitStateChanged back to the main thread, handled by Process::HandlePrivateEvent
…which ends up in this stack, when it tries to figure out whether to report the state change to the user:
- Process::HandlePrivateEvent ->
- Process::ShouldBroadcastEvent ->
- ThreadList::ShouldReportRun ->
- Thread::ShouldReportRun ->
- ThreadPlan::ShouldReportRun ->
- ThreadPlan::GetPreviousPlan ->
- ThreadPlan::GetThread

We should be able to calculate ShouldReportRun before we actually set the run going. That's better than just querying potentially stale threads. It would also be good to find a way to prevent ourselves from consulting the thread list after we've decided to invalidate it for run, but that's a second order consideration.

Disabling the thread list while the target is running sounds like a pretty complex change. For example, what should happen if a Python script calls lldb.process.GetThreadAtIndex(n) while the target is running, which currently works?

And is it really the right direction to be moving in? Long-term, wouldn't it be better if user-facing commands like thread list worked while the target is running? Or extra-long-term, one of the ideas on the LLDB projects page [1] is non-stop debugging (like GDB supports), where one thread is paused and can be inspected while other threads are still running. That would require a model where threads are created and destroyed on the fly, without a rigid sequence of "resume and thread list goes away; stop and thread list comes back".

FWIW, this bug causes intermittent crashes whenever I try to debug xnu, so I'd like to get it fixed relatively quickly if possible. Even just calculating ShouldReportRun earlier, while certainly doable, would be a considerably more complex change than this.

[1] https://lldb.llvm.org/status/projects.html#non-stop-debugging

I want to separate out two things here. One is whether lldb should internally ask questions of a thread once we've invalidated the thread list before running, the other is how we present threads to the user while the process is running.

I was only suggesting restricting the former. Once we've put the internal data structure that handles the thread list in an undefined state because we are about to resume, we shouldn't turn around and ask questions of it. That just seems like a bad practice.

The other question is what does it mean to hand out a thread list when the process is running. It has to be a historical artifact, right? We don't actually know that the threads we are handing out still exist, so I'm not really sure what handing them out would mean. For some platforms we might be able to track threads coming and going, but I'm pretty sure that we can't require that everywhere, os it can't be an essential part of lldb's design. In general, when the process is "continuing" lldb would like to change its behavior as little as possible, so unless there's a way to keep the thread list accurate while running that doesn't involve stopping and starting the process, we would rather not track this live. I know that both the Darwin user space and Darwin kernel don't provide such support.

And we can't answer any meaningful questions about the thread without pausing at least that thread. In non-stop mode, the threads that are running are still in an uncertain state, and only the threads that are stopped are known. So even when lldb supports keeping some threads running while others are stopped, we'll have to maintain a distinction between the stopped and the running threads, and be clear that to know something about a running thread we'll have to stop it. In fact, because the point of non-stop debugging is that there are some parts of the system that you really don't want to stop, in that mode we need to be even more careful to be clear about what operations do and don't require stopping the target.

But again, I don't think we need to have that discussion w.r.t. the current problem. This is all about how we treat our internal state, not what we show to users.

Revision Contents

Path

Size

lldb/

include/

lldb/

Target/

ThreadPlan.h

5 lines

source/

Target/

ThreadPlan.cpp

23 lines

test/

API/

functionalities/

plugins/

python_os_plugin/

stepping_plugin_threads/

TestOSPluginStepping.py

7 lines

Diff 287139

lldb/include/lldb/Target/ThreadPlan.h

//===-- ThreadPlan.h --------------------------------------------- C++ --===//		//===-- ThreadPlan.h --------------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLDB_TARGET_THREADPLAN_H		#ifndef LLDB_TARGET_THREADPLAN_H
#define LLDB_TARGET_THREADPLAN_H		#define LLDB_TARGET_THREADPLAN_H

#include <mutex>		#include <mutex>
#include <string>		#include <string>

#include "lldb/Target/Process.h"		#include "lldb/Target/Process.h"
		Lint: Pre-merge checks Inline Actions clang-tidy: error: 'lldb/Target/Process.h' file not found [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: 'lldb/Target/Process.h' file not found [clang-diagnostic-error] [[https…
#include "lldb/Target/StopInfo.h"		#include "lldb/Target/StopInfo.h"
#include "lldb/Target/Target.h"		#include "lldb/Target/Target.h"
#include "lldb/Target/Thread.h"		#include "lldb/Target/Thread.h"
#include "lldb/Target/ThreadPlanTracer.h"		#include "lldb/Target/ThreadPlanTracer.h"
#include "lldb/Utility/UserID.h"		#include "lldb/Utility/UserID.h"
#include "lldb/lldb-private.h"		#include "lldb/lldb-private.h"

namespace lldb_private {		namespace lldb_private {
▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	protected:
bool m_takes_iteration_count;		bool m_takes_iteration_count;
bool m_could_not_resolve_hw_bp;		bool m_could_not_resolve_hw_bp;
int32_t m_iteration_count = 1;		int32_t m_iteration_count = 1;

private:		private:
// For ThreadPlan only		// For ThreadPlan only
static lldb::user_id_t GetNextID();		static lldb::user_id_t GetNextID();

Thread *m_thread; // Stores a cached value of the thread, which is set to		lldb::ThreadSP m_thread_sp; // Stores a cached value of the thread. Don't use
		JDevlieghereUnsubmitted Not Done Reply Inline Actions drive-by nit: since you're touching this, can you make it a Doxygen comment (`///`) above the variable? JDevlieghere: drive-by nit: since you're touching this, can you make it a Doxygen comment (`///`) above the…
// nullptr when the thread resumes. Don't use this anywhere		// use this anywhere but ThreadPlan::GetThread().
// but ThreadPlan::GetThread().
ThreadPlanKind m_kind;		ThreadPlanKind m_kind;
std::string m_name;		std::string m_name;
std::recursive_mutex m_plan_complete_mutex;		std::recursive_mutex m_plan_complete_mutex;
LazyBool m_cached_plan_explains_stop;		LazyBool m_cached_plan_explains_stop;
bool m_plan_complete;		bool m_plan_complete;
bool m_plan_private;		bool m_plan_private;
bool m_okay_to_discard;		bool m_okay_to_discard;
bool m_is_master_plan;		bool m_is_master_plan;
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lldb/source/Target/ThreadPlan.cpp

//===-- ThreadPlan.cpp ----------------------------------------------------===//		//===-- ThreadPlan.cpp ----------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "lldb/Target/ThreadPlan.h"		#include "lldb/Target/ThreadPlan.h"
		Lint: Pre-merge checks Inline Actions clang-tidy: error: 'lldb/Target/ThreadPlan.h' file not found [clang-diagnostic-error] not useful Lint: Pre-merge checks: clang-tidy: error: 'lldb/Target/ThreadPlan.h' file not found [clang-diagnostic-error] [[https…
#include "lldb/Core/Debugger.h"		#include "lldb/Core/Debugger.h"
#include "lldb/Target/Process.h"		#include "lldb/Target/Process.h"
#include "lldb/Target/RegisterContext.h"		#include "lldb/Target/RegisterContext.h"
#include "lldb/Target/Target.h"		#include "lldb/Target/Target.h"
#include "lldb/Target/Thread.h"		#include "lldb/Target/Thread.h"
#include "lldb/Utility/Log.h"		#include "lldb/Utility/Log.h"
#include "lldb/Utility/State.h"		#include "lldb/Utility/State.h"

using namespace lldb;		using namespace lldb;
using namespace lldb_private;		using namespace lldb_private;

// ThreadPlan constructor		// ThreadPlan constructor
ThreadPlan::ThreadPlan(ThreadPlanKind kind, const char *name, Thread &thread,		ThreadPlan::ThreadPlan(ThreadPlanKind kind, const char *name, Thread &thread,
Vote stop_vote, Vote run_vote)		Vote stop_vote, Vote run_vote)
: m_process(*thread.GetProcess().get()), m_tid(thread.GetID()),		: m_process(*thread.GetProcess().get()), m_tid(thread.GetID()),
m_stop_vote(stop_vote), m_run_vote(run_vote),		m_stop_vote(stop_vote), m_run_vote(run_vote),
m_takes_iteration_count(false), m_could_not_resolve_hw_bp(false),		m_takes_iteration_count(false), m_could_not_resolve_hw_bp(false),
m_thread(&thread), m_kind(kind), m_name(name), m_plan_complete_mutex(),		m_thread_sp(thread.shared_from_this()), m_kind(kind), m_name(name),
m_cached_plan_explains_stop(eLazyBoolCalculate), m_plan_complete(false),		m_plan_complete_mutex(), m_cached_plan_explains_stop(eLazyBoolCalculate),
m_plan_private(false), m_okay_to_discard(true), m_is_master_plan(false),		m_plan_complete(false), m_plan_private(false), m_okay_to_discard(true),
m_plan_succeeded(true) {		m_is_master_plan(false), m_plan_succeeded(true) {
SetID(GetNextID());		SetID(GetNextID());
}		}

// Destructor		// Destructor
ThreadPlan::~ThreadPlan() = default;		ThreadPlan::~ThreadPlan() = default;

Target &ThreadPlan::GetTarget() { return m_process.GetTarget(); }		Target &ThreadPlan::GetTarget() { return m_process.GetTarget(); }

const Target &ThreadPlan::GetTarget() const { return m_process.GetTarget(); }		const Target &ThreadPlan::GetTarget() const { return m_process.GetTarget(); }

Thread &ThreadPlan::GetThread() {		Thread &ThreadPlan::GetThread() {
if (m_thread)		if (m_thread_sp && m_thread_sp->IsValid())
return *m_thread;		return *m_thread_sp;

ThreadSP thread_sp = m_process.GetThreadList().FindThreadByID(m_tid);		m_thread_sp = m_process.GetThreadList().FindThreadByID(m_tid);
m_thread = thread_sp.get();		return *m_thread_sp;
return *m_thread;
}		}

bool ThreadPlan::PlanExplainsStop(Event *event_ptr) {		bool ThreadPlan::PlanExplainsStop(Event *event_ptr) {
if (m_cached_plan_explains_stop == eLazyBoolCalculate) {		if (m_cached_plan_explains_stop == eLazyBoolCalculate) {
bool actual_value = DoPlanExplainsStop(event_ptr);		bool actual_value = DoPlanExplainsStop(event_ptr);
m_cached_plan_explains_stop = actual_value ? eLazyBoolYes : eLazyBoolNo;		m_cached_plan_explains_stop = actual_value ? eLazyBoolYes : eLazyBoolNo;
return actual_value;		return actual_value;
} else {		} else {
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	if (log) {
", sp = 0x%8.8" PRIx64 ", fp = 0x%8.8" PRIx64 ", "		", sp = 0x%8.8" PRIx64 ", fp = 0x%8.8" PRIx64 ", "
"plan = '%s', state = %s, stop others = %d",		"plan = '%s', state = %s, stop others = %d",
__FUNCTION__, GetThread().GetIndexID(),		__FUNCTION__, GetThread().GetIndexID(),
static_cast<void *>(&GetThread()), m_tid, static_cast<uint64_t>(pc),		static_cast<void *>(&GetThread()), m_tid, static_cast<uint64_t>(pc),
static_cast<uint64_t>(sp), static_cast<uint64_t>(fp), m_name.c_str(),		static_cast<uint64_t>(sp), static_cast<uint64_t>(fp), m_name.c_str(),
StateAsCString(resume_state), StopOthers());		StateAsCString(resume_state), StopOthers());
}		}
}		}
bool success = DoWillResume(resume_state, current_plan);		return DoWillResume(resume_state, current_plan);
m_thread = nullptr; // We don't cache the thread pointer over resumes. This
// Thread might go away, and another Thread represent
// the same underlying object on a later stop.
return success;
}		}

lldb::user_id_t ThreadPlan::GetNextID() {		lldb::user_id_t ThreadPlan::GetNextID() {
static uint32_t g_nextPlanID = 0;		static uint32_t g_nextPlanID = 0;
return ++g_nextPlanID;		return ++g_nextPlanID;
}		}

void ThreadPlan::DidPush() {}		void ThreadPlan::DidPush() {}
▲ Show 20 Lines • Show All 139 Lines • Show Last 20 Lines

lldb/test/API/functionalities/plugins/python_os_plugin/stepping_plugin_threads/TestOSPluginStepping.py

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	def run_python_os_step_missing_thread(self, do_prune):
# Our OS plugin does NOT report all threads:		# Our OS plugin does NOT report all threads:
result = self.dbg.HandleCommand("settings set process.experimental.os-plugin-reports-all-threads false")		result = self.dbg.HandleCommand("settings set process.experimental.os-plugin-reports-all-threads false")

python_os_plugin_path = os.path.join(self.getSourceDir(),		python_os_plugin_path = os.path.join(self.getSourceDir(),
"operating_system.py")		"operating_system.py")
(target, self.process, thread, thread_bkpt) = lldbutil.run_to_source_breakpoint(		(target, self.process, thread, thread_bkpt) = lldbutil.run_to_source_breakpoint(
self, "first stop in thread - do a step out", self.main_file)		self, "first stop in thread - do a step out", self.main_file)

		# Disabling the thread breakpoint here ensures that we don't have an
		# unreported stop (to step over that breakpoint), which ensures that
		# ThreadPlan::ShouldReportRun is called in between ThreadPlan::WillResume
		# and when the thread actually disappears. This previously triggered
		# a use-after-free.
		thread_bkpt.SetEnabled(False)

main_bkpt = target.BreakpointCreateBySourceRegex('Stop here and do not make a memory thread for thread_1',		main_bkpt = target.BreakpointCreateBySourceRegex('Stop here and do not make a memory thread for thread_1',
self.main_file)		self.main_file)
self.assertEqual(main_bkpt.GetNumLocations(), 1, "Main breakpoint has one location")		self.assertEqual(main_bkpt.GetNumLocations(), 1, "Main breakpoint has one location")

# There should not be an os thread before we load the plugin:		# There should not be an os thread before we load the plugin:
self.assertFalse(self.get_os_thread().IsValid(), "No OS thread before loading plugin")		self.assertFalse(self.get_os_thread().IsValid(), "No OS thread before loading plugin")

# Now load the python OS plug-in which should update the thread list and we should have		# Now load the python OS plug-in which should update the thread list and we should have
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines