Page MenuHomePhabricator

Prevent dead locking when calling PrintAsync
AbandonedPublic

Authored by teemperor on Jun 21 2018, 4:21 PM.

Details

Reviewers
clayborg
labath
Summary

While working on the expr completion, I've encountered the issue that sometimes lldb
deadlocks when doing input/output. The underlying cause for this is that we seem to expect
that we can always call Debugger::PrintAsync from any point of lldb and that this call will
always return at some point.

However this is not always the case. For example, when calling the completion handler, we
obviously hold locks that make the console IO thread safe. The expression parsing
code then tries to parse the user expression to provide possible completions. It's possible
that while parsing our expression, we hit a case where some code decides to print
information to the debugger output (for example to inform the user that some functionality
has failed unexpectedly). This call now deadlocks the lldb prompt as PrintAsync waits on
Editline to release the mutex, while our Editline handler waits on PrintAsync to print its error
message.

To prevent this issue in the future, this patch introduces the notion of delayed output scopes,
that essentially act as safe zones in which all calls to PrintAsync for a given debugger
are safe independently of the locking inside the IOHandlers or related issues. While printing in
this scope, we queue messages and then actually print them once we leave the scope.

Diff Detail

Event Timeline

teemperor created this revision.Jun 21 2018, 4:21 PM
teemperor added subscribers: friss, jingham.

Not sure who should review this, but maybe Fred or Jim know :)

teemperor updated this revision to Diff 152402.Jun 21 2018, 4:28 PM
  • Small code style adjustment.

The IOHandlers are Greg's construct. He's likely the best person to review this.

teemperor added a subscriber: labath.

Adding Pavel because he wrote the PrintAsync code.

Also @labath: Can you tell me what variables/functionality the m_output_mutex in Editline.cpp is supposed to shield? I don't see any documentation for that.

The m_output_mutex name suggests its related to console output, but we actually take the lock also when reading *input*. Especially in EditLine::GetLine we take a guard on the lock but then somehow unlock the guarded mutex from inside Editline::GetCharacter that we call afterwards (which completely breaks this patch):

// This mutex is locked by our caller (GetLine). Unlock it while we read a
// character (blocking operation), so we do not hold the mutex
// indefinitely. This gives a chance for someone to interrupt us. After
// Read returns, immediately lock the mutex again and check if we were
// interrupted.
m_output_mutex.unlock();
int read_count = m_input_connection.Read(&ch, 1, llvm::None, status, NULL);
m_output_mutex.lock();
if (m_editor_status == EditorStatus::Interrupted) {
  while (read_count > 0 && status == lldb::eConnectionStatusSuccess)
    read_count = m_input_connection.Read(&ch, 1, llvm::None, status, NULL);
  lldbassert(status == lldb::eConnectionStatusInterrupted);
  return 0;
}
clayborg added inline comments.Jun 22 2018, 11:23 AM
source/Core/Debugger.cpp
1011–1027

Can this code become:

// We check if any user requested to delay output to a later time
// (which is designated by m_delayed_output_counter being not 0).
{
  std::lock_guard<std::mutex> guard(m_delayed_output_mutex);
  if (m_delayed_output_counter != 0) {
    // We want to delay messages, so push them to the buffer.
    m_delayed_output.emplace_back(std::string(s, len), is_stdout);
      return;
  }
}
lldb::StreamFileSP stream = is_stdout ? GetOutputFile() : GetErrorFile();
m_input_reader_stack.PrintAsync(stream.get(), s, len);
source/Core/IOHandler.cpp
358

So anytime we have a "(lldb)" prompt we won't be able to output something?

Adding Pavel because he wrote the PrintAsync code.

Also @labath: Can you tell me what variables/functionality the m_output_mutex in Editline.cpp is supposed to shield? I don't see any documentation for that.

The m_output_mutex name suggests its related to console output, but we actually take the lock also when reading *input*. Especially in EditLine::GetLine we take a guard on the lock but then somehow unlock the guarded mutex from inside Editline::GetCharacter that we call afterwards (which completely breaks this patch):

If I remember correctly (it was a long time ago), the idea was behind this was to make sure that nobody prints anything to stdout while libedit is playing around with it. That's why it's called "output" mutex. The thing that this (admittedly strange) locking strategy achieves is that the the only time the PrintAsync function can write to stdout is while we are blocked waiting for input, as that's the place we can be sure we aren't writing to stdout -- before, or after that, we could be in libedit code and it could mess with stdout to write it's prompt or whatever. IIRC, PrintAsync should eventually end up taking the same mutex, and then depending on the current state of libedit machinery, do the "right thing" wrt. erasing the prompt et al.

Hmm.. Now that I have spelled this out, it sounds like this same pattern could be used to achieve your goal too -- while you're inside the completion handler, you also shouldn't be doing any funny things with stdout. Maybe the solution would be do just drop the output mutex while the running the completion code. Could you try if something like that would work?

// This mutex is locked by our caller (GetLine). Unlock it while we read a
// character (blocking operation), so we do not hold the mutex
// indefinitely. This gives a chance for someone to interrupt us. After
// Read returns, immediately lock the mutex again and check if we were
// interrupted.

This comment refers to the second purpose of the surrounding code, which is to make sure that any interrupt request is handled exactly once.

  • We now also handle the case where a guarded mutex is manually unlocked from a nested call.
  • Refactoring based on review comments.
teemperor edited the summary of this revision. (Show Details)
  • Fixed that we only pass a nullptr as a debugger to the real editline object (which now fixes the deadlock in the code completion patch).
teemperor marked 2 inline comments as done.Jun 27 2018, 11:15 AM

@labath Thanks for the explanation! Well in theory we could poke more holes in our guard from some nested function, but this only fixes the deadlock symptom. PrintAsync would still be blocking whenever the mutex is hold by someone.

source/Core/Debugger.cpp
1011–1027

It can and did. Removed the same pattern from the TryFlushingDelayedMessages method.

source/Core/IOHandler.cpp
358

Yes, that was the unintended effect of this patch (which was caught by the test suite actually). I didn't see until after I submitted that we actually manually unlock the guarded mutex from a called function. This now has been fixed by doing something like the unlock/lock code with the DisableMessageDelayScope in EditLine::GetCharacter.

labath requested changes to this revision.Jun 28 2018, 3:43 AM

Well in theory we could poke more holes in our guard from some nested function, but this only fixes the deadlock symptom. PrintAsync would still be blocking whenever the mutex is hold by someone.

Yes, but for me the real question is do really need to be calling PrintAsync while holding the output mutex. The output mutex is only used within the editline support code, and I think it's fair to require that this code be wary of when it is printing stuff to stdout. Making sure the mutex is dropped (and the rest of the output machinery is in a sane state) when calling "foreign" code is a part of its job.

"Poking holes" in the mutex, as you have so aptly phrased it, is not nice, but neither is your solution (you're doing the exact same thing with the DisableMessageDelayScope object), and I'd rather have one sub-optimal solution than two.

Instead of coming up with another solution which does something-similar-but-not-quite, I think we should invest more time improving the existing solution. For example the DisableMessageDelayScope approach would be useful for the scoped disabling of the output mutex. This will make it easy to unlock the mutex in any editline callback where we deem it safe to print text (and any place where it isn't safe to print text shouldn't be calling random code anyway). We're never going to be able to have a nice scoped locking strategy for this because the thing we want to lock is not scoped, but this should improve the situation somewhat. It will also make the thing fully contained inside the Editline class, instead of spreading it out over Core and Host modules. That also means the solution will be unit-testable, at least to some extent (right now, you don't have any tests for this code)

source/Host/common/Editline.cpp
14

This is wrong layering. Editline should not know anything about the debugger. If we really want to proceed with this, then this needs to be abstracted and re-layered. However, I don't think that is really necessary. More on that below.

This revision now requires changes to proceed.Jun 28 2018, 3:43 AM
teemperor abandoned this revision.Aug 22 2018, 1:24 PM
teemperor marked 2 inline comments as done.

Abandon in favor of D50912