This is an archive of the discontinued LLVM Phabricator instance.

[LLDB] [Windows] Don't crash if the debugged process is unable to start up
ClosedPublic

Authored by mstorsjo on Oct 28 2019, 4:25 AM.

Details

Summary

This can e.g. happen if the debugged executable exits before the initial stop, e.g. if it fails to load dependent DLLs.

Add a virtual destructor to ProcessDebugger and let it clean up the session, and make ProcessWindows::OnExitProcess call ProcessDebugger::OnExitProcess for shared parts.

Diff Detail

Event Timeline

mstorsjo created this revision.Oct 28 2019, 4:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2019, 4:25 AM

Sorry, I'm OOO today, so I can't take a precise look on this now. But I'm not sure, isn't race condition possible here on the m_session_data reset?

I don't know whether this is the right way to check for this, but at the very least is seems suspicious, as this function is called from LaunchProcess, which initializes m_session_data. This means that someone else needs to clear it in the mean time, and he probably does that asynchronously. Btw, have you read the comment in OnExitProcess() ? It seems that this code already tries to handle this case somehow...

Anyway, could you add a test for this? It should not be too hard to produce a missing dll in a test.

Yes, this definitely is a race condition. Without any changes, I triggered this condition on arm64, but not on x86_64.

The issue is in the subclass ProcessWindows, where OnExitProcess does this:

void ProcessWindows::OnDebuggerError(const Status &error, uint32_t type) {
  // [simplified]
  if (m_session_data->m_initial_stop_received) {
  } else {
    // If we haven't actually launched the process yet, this was an error
    // launching the process.  Set the internal error and signal the initial
    // stop event so that the DoLaunch method wakes up and returns a failure.
    m_session_data->m_launch_error = error;
    ::SetEvent(m_session_data->m_initial_stop_event);
    return;
  }
}   

void ProcessWindows::OnExitProcess(uint32_t exit_code) {
  // [irrelevant bits excluded]

  // If the process exits before any initial stop then notify the debugger
  // of the error otherwise WaitForDebuggerConnection() will be blocked.
  // An example of this issue is when a process fails to load a dependent DLL.
  if (m_session_data && !m_session_data->m_initial_stop_received) {
    Status error(exit_code, eErrorTypeWin32);
    OnDebuggerError(error, 0);
  }

  // Reset the session.
  m_session_data.reset();
}

So ProcessWindows::OnExitProcess signals to ProcessDebugger::WaitForDebuggerConnection to proceed, and then ProcessWindows::OnExitProcess resets m_session_data, which WaitForDebuggerConnection starts inspecting.

What's the correct course of action here? Remove the m_session_data.reset() from ProcessWindows::OnExitProcess, concluding that it's up to some other class to clear m_session_data in this case?

What's the correct course of action here? Remove the m_session_data.reset() from ProcessWindows::OnExitProcess, concluding that it's up to some other class to clear m_session_data in this case?

I'm not really familiar with this code, but removing the reset() call does not sound completely unreasonable to me. I think the only one who may have enough context here is @amccarth, so maybe let's wait if he has to say anything?

(Ideally, I'd just delete ProcessWindows entirely, and move everything to lldb-server based model -- it's much easier to reason about threads there)

(Ideally, I'd just delete ProcessWindows entirely, and move everything to lldb-server based model -- it's much easier to reason about threads there)

I believe there are people working on moving the Windows bits to the lldb-server model, but I don't think it's far enough along to get rid of ProcessWindows just yet.

The m_mutex is supposed to protect m_session_data. OnExitProcess says:

// No need to acquire the lock since m_session_data isn't accessed.

but it actually does access m_session_data.

So I think you've identified the cause of the race condition, but I'm not sure whether this is the best fix. I need a little time to look at this more closely.

amccarth requested changes to this revision.Oct 30 2019, 4:16 PM

Thanks for identifying this race condition.

After some poking around, I think there's a cleaner way to address it.

First, m_session_state's lifetime is currently managed mostly by ProcessDebugger (base class) but for one exception in ProcessWindows (derived class). There doesn't seem to be a good reason for that inconsistency, so my first step was to eliminate it and make ProcessDebugger responsible for its lifetime in all cases. This is done by moving the m_session_state.reset() to ProcessDebugger::OnExitProcess and having ProcessWindows::OnExitProcess call the other. This has the nice additional benefit of eliminating some duplicate code and comments.

But I'm not sure we need to clear m_session_state even in ProcessDebugger::OnExitProcess. There are two places where m_session_state is created: ProcessDebugger::AttachProcess and ProcessDebugger::LaunchProcess. We could put m_session_state.reset()s in those functions, where it handles failure of WaitForDebuggerConnection. But I'm not even sure if _that_ is necessary.

By adding a virtual destructor to the base class, it seems everything cleans up naturally before starting the next debugging session.

Here's what it would look like: https://reviews.llvm.org/P8168

(If I'm wrong, and it _is_ important for us to clear the session state explicitly, then we'd need to find a place after OnExitProcess. I don't think such a hook exists right now, so perhaps we'd have to create one. That hook could be the one place we reset m_session_state for both the successful and unsuccessful debug sessions.)

FYI: I'll be offline until November 4.

This revision now requires changes to proceed.Oct 30 2019, 4:16 PM
mstorsjo updated this revision to Diff 227242.Oct 31 2019, 2:10 AM
mstorsjo edited the summary of this revision. (Show Details)

Changed to use Adrian's fix suggestion, and added a testcase for it (which only is executed on x86_64, as it uses a pre-canned exe in yaml form, but a similar testcase for arm64 also runs fine).

labath accepted this revision.Oct 31 2019, 2:18 AM
This revision was not accepted when it landed; it landed in state Needs Review.Oct 31 2019, 2:30 AM
This revision was automatically updated to reflect the committed changes.