This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Simplify the is_finalized logic in process and make it thread safe.
ClosedPublic

Authored by JDevlieghere on Dec 17 2020, 11:35 AM.

Details

Summary

This is a speculative fix when looking at the finalization code in Process. It tackles the following issues:

  • Adds synchronization to prevent races between threads.
  • Marks the process as finalized/invalid as soon as Finalize is called rather than at the end.
  • Simplifies the code by using only a single instance variable to track finalization.

Diff Detail

Event Timeline

JDevlieghere created this revision.Dec 17 2020, 11:35 AM
JDevlieghere requested review of this revision.Dec 17 2020, 11:35 AM
jingham accepted this revision.Dec 18 2020, 5:58 PM

LGTM. It looks like we never made use of the distinction between "started to finalize" and "done finalizing", so just marking it at the start of finalization seems fine.

I quibble a bit with "m_finalized" because when it gets set to true we really are starting to finalize, we aren't done finalizing. And it's a little weird to have tests for "am I shutting down", if so don't do X. You might wonder, if we're really all the way done, who would ever ask me this question? I think m_finalizing is a more accurate name. But I don't feel strongly about this.

lldb/source/Target/Process.cpp
1568–1569

I don't think that comment refers to anything still in the code... Might as well ax it here.

This revision is now accepted and ready to land.Dec 18 2020, 5:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2020, 6:41 PM
shafik added a subscriber: shafik.Dec 18 2020, 6:53 PM
shafik added inline comments.
lldb/include/lldb/Target/Process.h
2837
std::atomic<bool> m_finalizing{false};

honestly the whole thing should be refactored, the mem-initializer list is kind of ridiculous.

Thanks Jim, I addressed both issues in the commit.

Hi all,

Apologies for being the bearer of bad news, but I believe that this patch breaks our[1] (downstream) lldb, by introducing a deadlock when a process is killed by a parent debugging process. Specifically, I believe that this patch causes a process to fail to exit, which causes a later deadlock when a listener waits for the process to exit.

I'm in the process of trying to produce some convincing output from our backend so that I can properly file a bug, but in the meantime I wanted to raise this here in case anyone has any comments or thoughts on this. Although I managed to narrow the issue down to this commit through a git bisect, I am not confident in this patch being the *cause* of the deadlock, as it may be the case that we are relying on incorrect behaviour in lldb that this patch fixes.

[1]: https://github.com/upmem/llvm-project/tree/upmem_release_120

I wonder if instead of doing:

// Use our target to get a shared pointer to ourselves...
if (m_finalize_called && !PrivateStateThreadIsValid())
  BroadcastEvent(event_sp);
else
  m_private_state_broadcaster.BroadcastEvent(event_sp);

->

m_private_state_broadcaster.BroadcastEvent(event_sp);

we should have just replaced m_finalize_called with m_finalizing? If you tried to sent the exited event to the private event broadcaster after it was shut down, that event would never get to the public process event queue.

That's the only part of the patch that seems a little suspect to me.

Can you try making that change and see if things go better?

Jim

I wonder if instead of doing:

// Use our target to get a shared pointer to ourselves...
if (m_finalize_called && !PrivateStateThreadIsValid())
  BroadcastEvent(event_sp);
else
  m_private_state_broadcaster.BroadcastEvent(event_sp);

->

m_private_state_broadcaster.BroadcastEvent(event_sp);

we should have just replaced m_finalize_called with m_finalizing? If you tried to sent the exited event to the private event broadcaster after it was shut down, that event would never get to the public process event queue.

Hi Jim, I've tried reverting this check and replacing m_finalize_called with m_finalizing, but sadly the code still seems to deadlock.

I'm also a little suspicious of (in Process::Finalize):

if (m_finalizing.exchange(true))
  return;

rather than

m_finalize_called = true;

As it would seem (to me) to introduce the possibility of an early exit where there wasn't one before. I don't know if that was intended (to avoid things being done twice?), but it's the only other place I can see a clear change in control flow.

I'm also a little suspicious of (in Process::Finalize):

if (m_finalizing.exchange(true))
  return;

rather than

m_finalize_called = true;

I've also tried replacing this with m_finalizing.exchange(true);, which also (sadly) doesn't seem to stop the deadlock.

Huh... I fear you are going to have to debug this further on your end. I can't see anything else suspect in this patch.

Huh... I fear you are going to have to debug this further on your end. I can't see anything else suspect in this patch.

Agreed. Sadly I need to prioritise another task in the short term, but I think I'm going to try slowly re-implementing the changes in the patch to see I can find a minimal change that triggers it when I have time.

Huh... I fear you are going to have to debug this further on your end. I can't see anything else suspect in this patch.

Agreed. Sadly I need to prioritise another task in the short term, but I think I'm going to try slowly re-implementing the changes in the patch to see I can find a minimal change that triggers it when I have time.

Don't hesitate to reach out for questions, explanations, "what the heck"-s once you get back to this.

! In D93479#2850445, @jingham wrote:
Don't hesitate to reach out for questions, explanations, "what the heck"-s once you get back to this.

Thanks Jim, that's very kind of you!