This is an archive of the discontinued LLVM Phabricator instance.

Prevent LLGS from crashing when exiting - make NativeProcessLinux to wait until ThreadStateCoordinator is fully stopped before entering ~NativeProcessLinux.
ClosedPublic

Authored by ovyalov on Feb 16 2015, 5:33 PM.

Download Raw Diff

Details

Reviewers

vharron
tberghammer

Summary

ninja-check-lldb with LLGS_LOCAL on produces plenty of core dumps - SIGSEGVs happen in llgs by following reasons:

NativeProcessLinux doesn't join on ThreadStateCoordinator thread;
NativeProcessLinux initiates ThreadStateCoordinator termination from its own destructor, but in the same time ThreadStateCoordinator references NativeProcessLinux by itself.

Added NativeProcessProtocol::Terminate - stop method that should be called prior to termination of NativeProcessProtocol instance.

Diff Detail

Event Timeline

ovyalov updated this revision to Diff 20063.Feb 16 2015, 5:33 PM

ovyalov retitled this revision from to Prevent LLGS from crashing when exiting - make NativeProcessLinux to wait until ThreadStateCoordinator is fully stopped before entering ~NativeProcessLinux..

ovyalov updated this object.

ovyalov edited the test plan for this revision. (Show Details)

ovyalov added reviewers: vharron, tberghammer.

ovyalov added a subscriber: Unknown Object (MLST).

I don't understand this:

NativeProcessLinux initiates ThreadStateCoordinator termination from its own destructor, but in the same time ThreadStateCoordinator references NativeProcessLinux by itself.

Where is the reference from ThreadStateCoordinator to NativeProcessLinux?

I also don't see what difference the Terminate method do on NativeProcessLinux. As far as I understand, the compiler generated destructor of GDBRemoteCommunicationServerLLGS did exactly the same what you written now using the Terminate function. (In both case StopMonitor will be called by the destructor of GDBRemoteCommunicationServerLLGS)

source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
102–106	Are you sure you don't have to lock the m_debugged_process_mutex here?

Added mutex lock in ~GDBRemoteCommunicationServerLLGS.

ThreadStateCoordinator references NativeProcessLinux indirectly:

TSC thread function (NativeProcessLinux::CoordinatorThread, https://github.com/llvm-mirror/lldb/blob/master/source/Plugins/Process/Linux/NativeProcessLinux.cpp#L3605) uses a process pointer
NativeProcessLinux calls TSC methods passing lambda with this pointer - for example, https://github.com/llvm-mirror/lldb/blob/master/source/Plugins/Process/Linux/NativeProcessLinux.cpp#L2116

I removed StopMonitor(); from ~NativeProcessLinux and moved this call to NativeProcessLinux::Terminate but with next differences:

Now StopMonitor() is called from NativeProcessLinux::Terminate before ~NativeProcessLinux - i.e., NativeProcessLinux is alive and can be safely referenced by TSC
Without my change StopMonitor() is called within ~NativeProcessLinux (so, NativeProcessLinux is the middle of destruction) after ~GDBRemoteCommunicationServerLLGS has been called.

As far as I know NativeProcessLinux is still fully functional during the execution of it's destructor (as every C++ class). The only difference is that the members in GDBRemoteCommunicationServerLLGS defined after the NativeProcessLinux smart pointer are already destructed. If this ordering is the problem then it can be solved with changing the order of the member declarations inside GDBRemoteCommunicationServerLLGS.

I am confused if the introduction of the Terminate method made any difference.

Added check for non-null return value of GetProcess in NativeThreadLinux::SetRunning - PTAL.

Thank you.

tberghammer accepted this revision.Feb 17 2015, 11:28 AM

tberghammer edited edge metadata.

This revision is now accepted and ready to land.Feb 17 2015, 11:28 AM

Greetings,

I have been following this discussion, and would like to add my 2 cents. I'll start with my thoughts on virtual functions.

On 17 February 2015 at 18:44, Oleksiy Vyalov <ovyalov@google.com> wrote:

> On Tue, Feb 17, 2015 at 10:01 AM, Tamas Berghammer <tberghammer@google.com> wrote:
> As far as I know NativeProcessLinux is still fully functional during the execution of it's destructor. The only difference is that the members in
> GDBRemoteCommunicationServerLLGS >defined after the NativeProcessLinux smart pointer are already destructed. If this ordering is the problem then it can be solved with
> changing the order of the member declarations >inside GDBRemoteCommunicationServerLLGS.

Yes, an instance is functional when its destructor is being called but there some limitations:
You may hit undefined behavior if virtual function are called when destruction is in progress - http://www.artima.com/cppsource/nevercall.html, i.e. if some virtual function of >NativeProcessLinux are called by TSC when ~NativeProcessLinux is in progress - it might be a problem.

And a more quote from the standard.

Member functions, including virtual functions (10.3), can be called during construction or destruction (12.6.2). When a virtual function is called directly or indirectly from a
constructor or from a destructor, including during the construction or destruction of the class’s non-static data members, and the object to which the call applies is the object (call it
x) under construction or destruction, the function called is the final overrider in the constructor’s or destructor’s class and not one overriding it in a more-derived class.

Judging from this, there is nothing undefined about the situation you mentioned. While ~NativeProcessLinux is in progress any call to its virtual functions will resolve "as expected". This is especially true if you are still executing the body of the destructor (as would be the case if you placed the Join() call in ~NativeProcessLinux), as all the member variables are still fully constructed, so you cannot get undefined behavior if the virtual functions do member access.

A different case would be if you were calling virtual functions of the (now non-existing) NativeProcessLinux object after its destructor has completed, and the destruction has moved on to its Base classes. In some cases the behavior would be undefined, in others "defined, but surprising". And if this were to happen (which, as far as i can see, is not the case), then I would argue that the bug is in the fact that ThreadStateCoordinator was still alive after the destruction of NativeProcessLinux -- the coordinator is owned by nativeprocess, so it should never outlive it

Therefore I think it is a bad idea to create a "destructor" function just to avoid doing something in a destructor. The presence of Terminate() defeats the purpose of having a shared pointer to the process: a shared pointer should delete an object, once the last pointer to it goes out of scope, but right now you cannot properly delete the process object in any other way except by calling ~GDBRemoteCommunicationServerLLGS(). If someone happened to be holding a NativeProcessProtocolSP expecting it to be a functional process, it will be surprised that it is non functional after GDBRemoteCommunicationServerLLGS has been gone.

Since it seems that the root of the problem was something else (a null ptr dereference in NativeThreadLinux), and this has already been addressed, I would recommend moving the Join() back to the destructor. Unless there are other issues which would require its presence... (?)

PS:
After my discussion with Tamas, I have come to think that the root cause is unclear ownership semantics between Threads, Processes and TSC: a process holds shared_ptrs to threads, which seems to imply that it is sharing the ownership of them with someone else. However, from the code it would seem that the threads should never outlive the process (or the TSC for that matter). Therefore, it would seem that a Process (or maybe the TSC) is the perfect candidate for the "owner" of it's threads, which would have the sole responsibility for destroying them (which could be represented by a unique_ptr). And then, when the lifetime of threads gets tied to the lifetime of their process, they no longer need to hold a shared (or weak) pointer, they can be happy with a regular pointer. However, this is definitely an issue far out of scope of this CL.

This completes my braindump. Sorry about the length, I did not expect it to be this long when I started. I realize some of the claims might be too strong for someone who just started and is getting to know the codebase, but I figure I might as well send it, since I spent so much time thinking about it.

Thanks Pavel.

the root cause is unclear ownership semantics between Threads, Processes

and TSC

Who is going to work on this problem? How do we solve this problem?
Document our best guess and refactor to fit? (And it that doesn't work,
try again)

Vince

Be careful about this sort of thing, there are all sorts of gnarly corner cases, like a breakpoint command that does:

(lldb) break command add
Enter your debugger command(s). Type 'DONE' to end.

settings set auto-confirm true
process kill
DONE

The "process kill" gets executed while you are running through all the threads that have stopped for some reason to figure out what they want to do, and you have to keep enough of the thread alive to successfully get out of that logic. So you can't just nuke the thread when the process dies or this won't go well.

Jim

Hi,

please see my comments inline:

In D7692#125759, @ovyalov wrote:

Hi,

please see my comments inline:

I don't see them. Please check if they are posted or still in some pending state.

Hi,

please see my comments inline:

In D7692#125375, @labath wrote:

Greetings,

I have been following this discussion, and would like to add my 2 cents. I'll start with my thoughts on virtual functions.

On 17 February 2015 at 18:44, Oleksiy Vyalov <ovyalov@google.com> wrote:

> On Tue, Feb 17, 2015 at 10:01 AM, Tamas Berghammer <tberghammer@google.com> wrote:
> As far as I know NativeProcessLinux is still fully functional during the execution of it's destructor. The only difference is that the members in
> GDBRemoteCommunicationServerLLGS >defined after the NativeProcessLinux smart pointer are already destructed. If this ordering is the problem then it can be solved with
> changing the order of the member declarations >inside GDBRemoteCommunicationServerLLGS.

Yes, an instance is functional when its destructor is being called but there some limitations:
You may hit undefined behavior if virtual function are called when destruction is in progress - http://www.artima.com/cppsource/nevercall.html, i.e. if some virtual function of >NativeProcessLinux are called by TSC when ~NativeProcessLinux is in progress - it might be a problem.

And a more quote from the standard.

Member functions, including virtual functions (10.3), can be called during construction or destruction (12.6.2). When a virtual function is called directly or indirectly from a
constructor or from a destructor, including during the construction or destruction of the class’s non-static data members, and the object to which the call applies is the object (call it
x) under construction or destruction, the function called is the final overrider in the constructor’s or destructor’s class and not one overriding it in a more-derived class.

Judging from this, there is nothing undefined about the situation you mentioned. While ~NativeProcessLinux is in progress any call to its virtual functions will resolve "as expected". This is especially true if you are still executing the body of the destructor (as would be the case if you placed the Join() call in ~NativeProcessLinux), as all the member variables are still fully constructed, so you cannot get undefined behavior if the virtual functions do member access.

A different case would be if you were calling virtual functions of the (now non-existing) NativeProcessLinux object after its destructor has completed, and the destruction has moved on to its Base classes. In some cases the behavior would be undefined, in others "defined, but surprising". And if this were to happen (which, as far as i can see, is not the case), then I would argue that the bug is in the fact that ThreadStateCoordinator was still alive after the destruction of NativeProcessLinux -- the coordinator is owned by nativeprocess, so it should never outlive it

Thank you for looking into this. You brought up good points and my attitude here for not calling virtual methods from constructor/destructor is rather maintaining a safety net - calling virtual methods on an instance after last line of constructor and before first line of destructor.

Therefore I think it is a bad idea to create a "destructor" function just to avoid doing something in a destructor. The presence of Terminate() defeats the purpose of having a shared pointer to the process: a shared pointer should delete an object, once the last pointer to it goes out of scope, but right now you cannot properly delete the process object in any other way except by calling ~GDBRemoteCommunicationServerLLGS(). If someone happened to be holding a NativeProcessProtocolSP expecting it to be a functional process, it will be surprised that it is non functional after GDBRemoteCommunicationServerLLGS has been gone.

Since it seems that the root of the problem was something else (a null ptr dereference in NativeThreadLinux), and this has already been addressed, I would recommend moving the Join() back to the destructor. Unless there are other issues which would require its presence... (?)

I propose to leave Terminate call outside of ~NativeThreadLinux while the ownership issue in Threads/Processes/TSC will be fully addressed. Putting Join() in ~NativeThreadLinux may lead to the deadlock and potential SIGSEGV:

Let's imagine NativeThreadLinux::SetRunning in the middle of execution when LLGS is about to shutdown. As part of GDBRemoteCommunicationServerLLGS destruction it tries to delete m_debugged_process_sp. Since NativeThreadLinux::SetRunning has locked this instance of NativeProcessProtocolSP, GDBRemoteCommunicationServerLLGS::m_debugged_process_sp just decrements its reference counter and eventually ~NativeProcessLinux will be called in TSC thread context from NativeThreadLinux::SetRunning - so, it will be waiting for TSC thread to join from TSC thread itself.
If ~NativeProcessLinux might be called from NativeThreadLinux call it may lead to the situation when ~NativeProcessLinux is destroying threads from NativeProcessProtocol::m_threads, i.e. destroying NativeThreadLinux instance that running SetRunning right now

Once ownership issue is resolved, it will be safe to remove Terminate method and call StopMonitor from ~NativeProcessLinux.

PS:
After my discussion with Tamas, I have come to think that the root cause is unclear ownership semantics between Threads, Processes and TSC: a process holds shared_ptrs to threads, which seems to imply that it is sharing the ownership of them with someone else. However, from the code it would seem that the threads should never outlive the process (or the TSC for that matter). Therefore, it would seem that a Process (or maybe the TSC) is the perfect candidate for the "owner" of it's threads, which would have the sole responsibility for destroying them (which could be represented by a unique_ptr). And then, when the lifetime of threads gets tied to the lifetime of their process, they no longer need to hold a shared (or weak) pointer, they can be happy with a regular pointer. However, this is definitely an issue far out of scope of this CL.

Yes, it should be enough for thread class just to keep a reference to its process and process by itself maintains sequence of threads' unique_ptrs.

This completes my braindump. Sorry about the length, I did not expect it to be this long when I started. I realize some of the claims might be too strong for someone who just started and is getting to know the codebase, but I figure I might as well send it, since I spent so much time thinking about it.

AFFECTED FILES

/lldb/trunk/include/lldb/Host/common/NativeProcessProtocol.h
/lldb/trunk/source/Host/common/NativeProcessProtocol.cpp
/lldb/trunk/source/Plugins/Process/Linux/NativeProcessLinux.cpp
/lldb/trunk/source/Plugins/Process/Linux/NativeProcessLinux.h
/lldb/trunk/source/Plugins/Process/Linux/NativeThreadLinux.cpp
/lldb/trunk/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp

USERS

ovyalov (Author)

http://reviews.llvm.org/rL229875

Revision Contents

Path

Size

include/

lldb/

Host/

common/

NativeProcessProtocol.h

4 lines

source/

Host/

common/

NativeProcessProtocol.cpp

6 lines

Plugins/

Process/

Linux/

NativeProcessLinux.h

9 lines

NativeProcessLinux.cpp

4 lines

NativeThreadLinux.cpp

16 lines

gdb-remote/

GDBRemoteCommunicationServerLLGS.cpp

7 lines

Diff 20099

include/lldb/Host/common/NativeProcessProtocol.h

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	public:
/// @return Returns \b true if the NativeDelegate was		/// @return Returns \b true if the NativeDelegate was
/// successfully removed from the process, \b false otherwise.		/// successfully removed from the process, \b false otherwise.
///		///
/// @see NativeProcessProtocol::NativeDelegate		/// @see NativeProcessProtocol::NativeDelegate
//------------------------------------------------------------------		//------------------------------------------------------------------
bool		bool
UnregisterNativeDelegate (NativeDelegate &native_delegate);		UnregisterNativeDelegate (NativeDelegate &native_delegate);

		// Called before termination of NativeProcessProtocol's instance.
		virtual void
		Terminate ();

protected:		protected:
lldb::pid_t m_pid;		lldb::pid_t m_pid;

std::vector<NativeThreadProtocolSP> m_threads;		std::vector<NativeThreadProtocolSP> m_threads;
lldb::tid_t m_current_thread_id;		lldb::tid_t m_current_thread_id;
mutable Mutex m_threads_mutex;		mutable Mutex m_threads_mutex;

lldb::StateType m_state;		lldb::StateType m_state;
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

source/Host/common/NativeProcessProtocol.cpp

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	uint32_t NativeProcessProtocol::GetStopID () const
return m_stop_id;		return m_stop_id;
}		}

void		void
NativeProcessProtocol::DoStopIDBumped (uint32_t /* newBumpId */)		NativeProcessProtocol::DoStopIDBumped (uint32_t /* newBumpId */)
{		{
// Default implementation does nothing.		// Default implementation does nothing.
}		}

		void
		NativeProcessProtocol::Terminate ()
		{
		// Default implementation does nothing.
		}

source/Plugins/Process/Linux/NativeProcessLinux.h

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	public:

static lldb_private::Error		static lldb_private::Error
AttachToProcess (		AttachToProcess (
lldb::pid_t pid,		lldb::pid_t pid,
lldb_private::NativeProcessProtocol::NativeDelegate &native_delegate,		lldb_private::NativeProcessProtocol::NativeDelegate &native_delegate,
NativeProcessProtocolSP &native_process_sp);		NativeProcessProtocolSP &native_process_sp);

// ---------------------------------------------------------------------		// ---------------------------------------------------------------------
// Public Instance Methods
// ---------------------------------------------------------------------

~NativeProcessLinux() override;

// ---------------------------------------------------------------------
// NativeProcessProtocol Interface		// NativeProcessProtocol Interface
// ---------------------------------------------------------------------		// ---------------------------------------------------------------------
Error		Error
Resume (const ResumeActionList &resume_actions) override;		Resume (const ResumeActionList &resume_actions) override;

Error		Error
Halt () override;		Halt () override;

Show All 34 Lines	public:
GetArchitecture (ArchSpec &arch) const override;		GetArchitecture (ArchSpec &arch) const override;

Error		Error
SetBreakpoint (lldb::addr_t addr, uint32_t size, bool hardware) override;		SetBreakpoint (lldb::addr_t addr, uint32_t size, bool hardware) override;

void		void
DoStopIDBumped (uint32_t newBumpId) override;		DoStopIDBumped (uint32_t newBumpId) override;

		void
		Terminate () override;

// ---------------------------------------------------------------------		// ---------------------------------------------------------------------
// Interface used by NativeRegisterContext-derived classes.		// Interface used by NativeRegisterContext-derived classes.
// ---------------------------------------------------------------------		// ---------------------------------------------------------------------

/// Reads the contents from the register identified by the given (architecture		/// Reads the contents from the register identified by the given (architecture
/// dependent) offset.		/// dependent) offset.
///		///
/// This method is provided for use by RegisterContextLinux derivatives.		/// This method is provided for use by RegisterContextLinux derivatives.
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

source/Plugins/Process/Linux/NativeProcessLinux.cpp

Show First 20 Lines • Show All 1,447 Lines • ▼ Show 20 Lines	WAIT_AGAIN:
if (!m_monitor_thread.IsJoinable())		if (!m_monitor_thread.IsJoinable())
{		{
error.SetErrorToGenericError ();		error.SetErrorToGenericError ();
error.SetErrorString ("Process attach failed to create monitor thread for NativeProcessLinux::MonitorCallback.");		error.SetErrorString ("Process attach failed to create monitor thread for NativeProcessLinux::MonitorCallback.");
return;		return;
}		}
}		}

NativeProcessLinux::~NativeProcessLinux()		void
		NativeProcessLinux::Terminate ()
{		{
StopMonitor();		StopMonitor();
}		}

//------------------------------------------------------------------------------		//------------------------------------------------------------------------------
// Thread setup and tear down.		// Thread setup and tear down.

void		void
▲ Show 20 Lines • Show All 2,167 Lines • ▼ Show 20 Lines
{		{
Log *const log (GetLogIfAllCategoriesSet (LIBLLDB_LOG_THREAD));		Log *const log (GetLogIfAllCategoriesSet (LIBLLDB_LOG_THREAD));
if (log)		if (log)
log->Printf ("NativeProcessLinux::%s requesting ThreadStateCoordinator stop for pid %" PRIu64, __FUNCTION__, GetID ());		log->Printf ("NativeProcessLinux::%s requesting ThreadStateCoordinator stop for pid %" PRIu64, __FUNCTION__, GetID ());

// Tell the coordinator we're done. This will cause the coordinator		// Tell the coordinator we're done. This will cause the coordinator
// run loop thread to exit when the processing queue hits this message.		// run loop thread to exit when the processing queue hits this message.
m_coordinator_up->StopCoordinator ();		m_coordinator_up->StopCoordinator ();
		m_coordinator_thread.Join (nullptr);
}		}

bool		bool
NativeProcessLinux::HasThreadNoLock (lldb::tid_t thread_id)		NativeProcessLinux::HasThreadNoLock (lldb::tid_t thread_id)
{		{
for (auto thread_sp : m_threads)		for (auto thread_sp : m_threads)
{		{
assert (thread_sp && "thread list should not contain NULL threads");		assert (thread_sp && "thread list should not contain NULL threads");
▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

source/Plugins/Process/Linux/NativeThreadLinux.cpp

Show First 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	NativeThreadLinux::SetRunning ()

m_stop_info.reason = StopReason::eStopReasonNone;		m_stop_info.reason = StopReason::eStopReasonNone;
m_stop_description.clear();		m_stop_description.clear();

// If watchpoints have been set, but none on this thread,		// If watchpoints have been set, but none on this thread,
// then this is a new thread. So set all existing watchpoints.		// then this is a new thread. So set all existing watchpoints.
if (m_watchpoint_index_map.empty())		if (m_watchpoint_index_map.empty())
{		{
const auto &watchpoint_map = GetProcess()->GetWatchpointMap();		const auto process_sp = GetProcess();
		if (process_sp)
		{
		const auto &watchpoint_map = process_sp->GetWatchpointMap();
if (watchpoint_map.empty()) return;		if (watchpoint_map.empty()) return;
GetRegisterContext()->ClearAllHardwareWatchpoints();		GetRegisterContext()->ClearAllHardwareWatchpoints();
for (const auto &pair : watchpoint_map)		for (const auto &pair : watchpoint_map)
{		{
const auto& wp = pair.second;		const auto& wp = pair.second;
SetWatchpoint(wp.m_addr, wp.m_size, wp.m_watch_flags, wp.m_hardware);		SetWatchpoint(wp.m_addr, wp.m_size, wp.m_watch_flags, wp.m_hardware);
}		}
}		}
}		}
		}

void		void
NativeThreadLinux::SetStepping ()		NativeThreadLinux::SetStepping ()
{		{
const StateType new_state = StateType::eStateStepping;		const StateType new_state = StateType::eStateStepping;
MaybeLogStateChange (new_state);		MaybeLogStateChange (new_state);
m_state = new_state;		m_state = new_state;

▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	GDBRemoteCommunicationServerLLGS::GDBRemoteCommunicationServerLLGS(
RegisterPacketHandlers();		RegisterPacketHandlers();
}		}

//----------------------------------------------------------------------		//----------------------------------------------------------------------
// Destructor		// Destructor
//----------------------------------------------------------------------		//----------------------------------------------------------------------
GDBRemoteCommunicationServerLLGS::~GDBRemoteCommunicationServerLLGS()		GDBRemoteCommunicationServerLLGS::~GDBRemoteCommunicationServerLLGS()
{		{
		Mutex::Locker locker (m_debugged_process_mutex);

		if (m_debugged_process_sp)
		{
		m_debugged_process_sp->Terminate ();
		tberghammerUnsubmitted Not Done Reply Inline Actions Are you sure you don't have to lock the m_debugged_process_mutex here? tberghammer: Are you sure you don't have to lock the m_debugged_process_mutex here?
		m_debugged_process_sp.reset ();
		}
}		}

void		void
GDBRemoteCommunicationServerLLGS::RegisterPacketHandlers()		GDBRemoteCommunicationServerLLGS::RegisterPacketHandlers()
{		{
RegisterMemberFunctionHandler(StringExtractorGDBRemote::eServerPacketType_A,		RegisterMemberFunctionHandler(StringExtractorGDBRemote::eServerPacketType_A,
&GDBRemoteCommunicationServerLLGS::Handle_A);		&GDBRemoteCommunicationServerLLGS::Handle_A);
RegisterMemberFunctionHandler(StringExtractorGDBRemote::eServerPacketType_C,		RegisterMemberFunctionHandler(StringExtractorGDBRemote::eServerPacketType_C,
▲ Show 20 Lines • Show All 2,634 Lines • Show Last 20 Lines