Page MenuHomePhabricator

Check if thread was suspended during previous stop added.
Needs ReviewPublic

Authored by fallkrum on Mon, May 18, 2:32 AM.

Details

Reviewers
clayborg
jingham
Group Reviewers
Restricted Project
Summary

Encountered the following situation: Let we started thread T1 and it hit breakpoint on B1 location. We suspended T1 and continued the process. Then we started thread T2 which hit for example the same location B1. This time in a breakpoint callback we decided not to stop returning false.

Expected result: process continues (as if T2 did not hit breakpoint) its workflow with T1 still suspended.
Actual result: process do stops (as if T2 callback returned true).

Solution: We need invalidate StopInfo for threads that was previously suspended just because something that is already inactive can not be the reason of stop. Thread::GetPrivateStopInfo() may be appropriate place to do it, because it gets called (through Thread::GetStopInfo()) every time before process reports stop and user gets chance to change m_resume_state again i.e if we see m_resume_state == eStateSuspended it definitely means it was set during previous stop and it also means this thread can not be stopped again (cos' it was frozen during previous stop).

Diff Detail

Event Timeline

fallkrum created this revision.Mon, May 18, 2:32 AM
fallkrum updated this revision to Diff 264576.Mon, May 18, 3:36 AM
fallkrum added a reviewer: Restricted Project.
fallkrum removed a subscriber: lldb-commits.
fallkrum edited the summary of this revision. (Show Details)Mon, May 18, 1:31 PM
fallkrum added a subscriber: lldb-commits.
fallkrum edited the summary of this revision. (Show Details)Mon, May 18, 2:12 PM
fallkrum set the repository for this revision to rG LLVM Github Monorepo.

Anybody there? Do you see me?

Anybody there? Do you see me?

Unless this change is somehow urgent, the common courtesy ‘ping’ rate is once a week [1][2]. I'm sure Jim or Greg will take a look when they can.

Would it be possible to add a test for this so it doesn't regress in the future? For example a test that would fail with TSan would be sufficient.

[1] https://llvm.org/docs/Contributing.html#how-to-submit-a-patch
[2] https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity

fallkrum added a comment.EditedThu, May 21, 2:50 PM

Anybody there? Do you see me?

Unless this change is somehow urgent, the common courtesy ‘ping’ rate is once a week [1][2]. I'm sure Jim or Greg will take a look when they can.

Would it be possible to add a test for this so it doesn't regress in the future? For example a test that would fail with TSan would be sufficient.

[1] https://llvm.org/docs/Contributing.html#how-to-submit-a-patch
[2] https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity

Thanks for the answer, I don't think it is urgent.
Are there any docs on how to write tests for lldb? I found unit tests (lldb/unittests) written in c++ and tests (lldb/test) written in Python.
Tried to find tests for Thread class to add check for my editing but there is no such a tests. It is very unclear for me in which way TSan
can be helpful in this situation?

Anybody there? Do you see me?

Unless this change is somehow urgent, the common courtesy ‘ping’ rate is once a week [1][2]. I'm sure Jim or Greg will take a look when they can.

Would it be possible to add a test for this so it doesn't regress in the future? For example a test that would fail with TSan would be sufficient.

[1] https://llvm.org/docs/Contributing.html#how-to-submit-a-patch
[2] https://llvm.org/docs/CodeReview.html#code-reviews-speed-and-reciprocity

Thanks for the answer, I don't think it is urgent.

You're welcome!

Are there any docs on how to write tests for lldb? I found unit tests (lldb/unittests) written in c++ and tests (lldb/test) written in Python.
Tried to find tests for Thread class to add check for my editing but there is no such a tests. It is very unclear for me in which way TSan
can be helpful in this situation?

The lldb website [1] contains an overview of the high level structure of the test suite. For a new test I usually either start form the example or from an existing tests.

Ignore my TSan comment, I didn't pay enough attention reading the description, I thought this was fixing a race in lldb.

[1] https://lldb.llvm.org/resources/test.html

fallkrum marked an inline comment as done.Fri, May 22, 3:29 AM
fallkrum added inline comments.
lldb/source/Target/Thread.cpp
382

What is historical need for this check? How is it possible for a breakpoint to stop a thread that was already stopped second time even while stepping in multithreaded programs?

fallkrum updated this revision to Diff 267561.Mon, Jun 1, 2:13 AM

Unit tests added.

fallkrum updated this revision to Diff 267564.Mon, Jun 1, 2:24 AM

Added changes made to Thread.cpp itself to the patch.

The one scenario I can think of where this might do the wrong thing is:

  1. Hit breakpoint location 1.1 on thread A
  2. Switch to thread B
  3. Run a function on thread B, has to only run on thread B and has to actually run code in the target, like:
(lldb) expr -a 0 -- (int) printf("Hello\n")

The stop reason for thread A should still stay "breakpoint 1.1" and not "no reason".

This one is a little tricky because we have to run the target to execute the function, but if running the expression didn't cause other threads to run, we don't want to lose the program state when we stopped.

I looked for a test that ensures this result, and didn't find one, so I just added one: TestStateAfterExpression.py. Can you make sure that your change still passes this test. If it does, I can't think of anything else that might go wrong with this patch.

@jingham tried test you added, got the following output:

Ilyas-Mac-mini:Ninja ilya$ ./bin/lldb-dotest -p TestStateAfterExpression.py
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 /Users/ilya/Documents/Projects/llvm-project/lldb/test/API/dotest.py --arch x86_64 -s /Users/ilya/Documents/Projects/llvm-project/build/Ninja/lldb-test-traces -S nm -u CXXFLAGS -u CFLAGS --codesign-identity - --server /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/debugserver --build-dir /Users/ilya/Documents/Projects/llvm-project/build/Ninja/lldb-test-build.noindex --executable /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/lldb --compiler /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/clang --dsymutil /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/dsymutil --filecheck /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/FileCheck --lldb-libs-dir /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./lib -p TestStateAfterExpression.py
LLDB library dir: /Users/ilya/Documents/Projects/llvm-project/build/Ninja/bin
LLDB import library dir: /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./lib
lldb version 11.0.0

clang revision c0b351fea652442190b5c890cbd23297762a5d64
llvm revision c0b351fea652442190b5c890cbd23297762a5d64

libstdcxx tests will not be run because: Don't know how to build with libstdcxx on macosx
Skipping following debug info categories: ['dwo']

Session logs for test failures/errors/unexpected successes will go into directory '/Users/ilya/Documents/Projects/llvm-project/build/Ninja/lldb-test-traces'
Command invoked: /Users/ilya/Documents/Projects/llvm-project/lldb/test/API/dotest.py --arch x86_64 -s /Users/ilya/Documents/Projects/llvm-project/build/Ninja/lldb-test-traces -S nm -u CXXFLAGS -u CFLAGS --codesign-identity - --server /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/debugserver --build-dir /Users/ilya/Documents/Projects/llvm-project/build/Ninja/lldb-test-build.noindex --executable /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/lldb --compiler /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/clang --dsymutil /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/dsymutil --filecheck /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./bin/FileCheck --lldb-libs-dir /Users/ilya/Documents/Projects/llvm-project/build/Ninja/./lib -p TestStateAfterExpression.py
PASS: LLDB (/Users/ilya/Documents/Projects/llvm-project/build/Ninja/bin/clang-11-x86_64) :: test_thread_state_after_expr_dsym (TestStateAfterExpression.TestStopReasonAfterExpression)
PASS: LLDB (/Users/ilya/Documents/Projects/llvm-project/build/Ninja/bin/clang-11-x86_64) :: test_thread_state_after_expr_dwarf (TestStateAfterExpression.TestStopReasonAfterExpression)
UNSUPPORTED: LLDB (/Users/ilya/Documents/Projects/llvm-project/build/Ninja/bin/clang-11-x86_64) :: test_thread_state_after_expr_dwo (TestStateAfterExpression.TestStopReasonAfterExpression) (test case does not fall in any category of interest for this run)

PASS: LLDB (/Users/ilya/Documents/Projects/llvm-project/build/Ninja/bin/clang-11-x86_64) :: test_thread_state_after_expr_gmodules (TestStateAfterExpression.TestStopReasonAfterExpression)

Ran 4 tests in 28.206s

RESULT: PASSED (3 passes, 0 failures, 0 errors, 1 skipped, 0 expected failures, 0 unexpected successes)

If I understand everything correctly the patch passes the test.

fallkrum marked an inline comment as done.Thu, Jun 4, 1:42 PM
fallkrum added inline comments.
lldb/source/Target/Thread.cpp
382

Any thoughts on this? Maybe it is better to get rid of IsStillAtLastBreakpointHit at all? In this case there will be no need to check wether thread was suspended.

Sorry, now that I'm thinking about this more, I'm confused as to why you are seeing the symptoms you describe. Thread::ShouldStop starts with:

if (GetResumeState() == eStateSuspended) {
  LLDB_LOGF(log,
            "Thread::%s for tid = 0x%4.4" PRIx64 " 0x%4.4" PRIx64
            ", should_stop = 0 (ignore since thread was suspended)",
            __FUNCTION__, GetID(), GetProtocolID());
  return false;
}

So a suspended thread should always return false from "ShouldStop" and should never make a process stop. Why isn't this effective in your case?

It bugs me a little bit that we are doing work masking the stop info when in fact the ShouldStop mechanism is the one that shouldn't be consulting threads that were suspended by the user during the last run.

As far as I see the problem lies in Process::ProcessEventData::DoOnRemoval:

StopInfoSP stop_info_sp = thread_sp->GetStopInfo();
      if (stop_info_sp && stop_info_sp->IsValid()) {
        does_anybody_have_an_opinion = true;
        bool this_thread_wants_to_stop;
        if (stop_info_sp->GetOverrideShouldStop()) {
          this_thread_wants_to_stop =
              stop_info_sp->GetOverriddenShouldStopValue();
        } else {
          stop_info_sp->PerformAction(event_ptr);
          // The stop action might restart the target.  If it does, then we
          // want to mark that in the event so that whoever is receiving it
          // will know to wait for the running event and reflect that state
          // appropriately. We also need to stop processing actions, since they
          // aren't expecting the target to be running.

          // FIXME: we might have run.
          if (stop_info_sp->HasTargetRunSinceMe()) {
            SetRestarted(true);
            break;
          }

          this_thread_wants_to_stop = stop_info_sp->ShouldStop(event_ptr);
        }

        if (!still_should_stop)
          still_should_stop = this_thread_wants_to_stop;
      }
    }

As you can see we get StopInfo from all the the threads available even suspended (note that all thread's stop_info are valid at this moment due to GetPrivateStopInfo gets called prior to DoOnRemoval). As a result we have a situation when suspended thread's stop_info tells we should stop even when the thread that is a real reason of stop says we should not. Maybe you are right and the right place for the fix is inside Process::ProcessEventData::DoOnRemoval, something like this:

if (stop_info_sp && stop_info_sp->IsValid() && thread_sp->ShouldStop()) {
.....
}
}

But you know, I don't know if it possible to apply it, semantics of Thread::ShouldStop is Thread::ShouldStop(Event *) and it is unclear what kind of event to pass in. In any case, maybe I don't see the whole picture of what's going on yet but I don't see any reason to hold on stop_info of suspended thread.

fallkrum added a comment.EditedThu, Jun 4, 4:35 PM

Didn't notice that we use in Process::ProcessEventData::DoOnRemoval:

this_thread_wants_to_stop = stop_info_sp->ShouldStop(event_ptr);

and yes it is probably correct to fix the issue by this:

if (stop_info_sp && stop_info_sp->IsValid() && thread_sp->ShouldStop(event_ptr))

But again not sure if all this correct. Because further in the code we call stop_info_sp->ShouldStop(event_ptr) what looks like a duplicate of thread_sp->ShouldStop(event_ptr). Need your further comments :)