Page MenuHomePhabricator

Simplify TestThreadSpecificBreakpoint.py
ClosedPublic

Authored by zturner on Dec 4 2015, 11:57 AM.

Details

Summary
This test would fail before if conditional breakpoints weren't
working correctly, and the nature of the test (spinning up 10
threads, etc) opens the door to raciness.

This patch vastly simplifies the test, removes the need for relying
on conditional expression evaluation, and as a result makes the
correctness of the test vastly easier to reason about and reduces
flakiness.

This test also used to pass on Windows only when the inferior was
compiled with a specific version of MSVC, even though I could verify
from inside of LLDB that thread-specific breakpoints don't work.  With
this patch, the test now correctly fails no matter what version of MSVC
the inferior was compiled with.

Jim: Since the test still fails for me even after this patch, can you verify that
it does not regress anything on OSX? I think it's a better test overall, so as
long as it doesn't regress anything anywhere I think it should probably go in.

Diff Detail

Event Timeline

zturner updated this revision to Diff 41908.Dec 4 2015, 11:57 AM
zturner retitled this revision from to Simplify TestThreadSpecificBreakpoint.py.
zturner updated this object.
zturner added reviewers: jingham, labath.
zturner added a subscriber: lldb-commits.
jingham requested changes to this revision.Dec 4 2015, 12:06 PM
jingham edited edge metadata.

This test passes for me on OS X before the patch, but fails after it with the error:

FAIL: test_python_dsym (TestThreadSpecificBreakpoint.ThreadSpecificBreakTestCase)

Test that we obey thread conditioned breakpoints.

Traceback (most recent call last):

File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/lldbtest.py", line 550, in wrapper
  return func(self, *args, **kwargs)
File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/lldbtest.py", line 2288, in dsym_test_method
  return attrvalue(self)
File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/lldbtest.py", line 636, in wrapper
  func(*args, **kwargs)
File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/lldbtest.py", line 636, in wrapper
  func(*args, **kwargs)
File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/lldbtest.py", line 636, in wrapper
  func(*args, **kwargs)
File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/lldbtest.py", line 781, in wrapper
  func(*args, **kwargs)
File "/Volumes/ThePlayground/Users/jingham/Work/LLDB/llvm-svn/lldb-working/packages/Python/lldbsuite/test/functionalities/thread/thread_specific_break/TestThreadSpecificBreakpoint.py", line 67, in test_python
  self.assertEqual(stopped_threads[0].GetThreadID(), main_thread_id, "thread breakpoint stopped at the wrong thread")

AssertionError: 3452873 != 3452831 : thread breakpoint stopped at the wrong thread
Config=x86_64-clang

This revision now requires changes to proceed.Dec 4 2015, 12:06 PM

Oh, wait, in your patch, the line in the test file that sets the thread on the supposedly thread specific breakpoint is commented out???

If I uncomment that line, then the test passes.

Ahh, derp. I commented that out when I was testing something locally, and forgot to uncomment it. So yea, that should be uncommented.

I was having some trouble following the logic of the original test, so it's possible this test misses an edge case that I haven't thought of. Does it look to you like it handles everything?

The logic was:

  • Set a breakpoint on some loop that will get hit multiple times in some thread worker function.
  • The first time it is hit, make it specific to the thread that hit it by setting a Thread ID on the breakpoint.
  • Then add a condition to the breakpoint that all the other threads will pass, but the thread that matches the thread specification will fail.
  • Then continue, and there should be no more breakpoint hits.

So the original test really tested that the combinations "breakpoint thread ID doesn't pass, but condition does" and "breakpoint thread ID passes, but breakpoint condition doesn't" work properly.

Making sure such combinations function as expected seems like a good thing to test, so I don't think we should remove this test.

But it would also be good to have a test JUST on the thread specification. So I'd be happy to have this as an extra test.

Wouldn't the functionality that's tested by that combination of two things at the same be equivalently tested by this test, plus a test that only tests the behavior of conditional breakpoints?

In other words, if you had two tests, one which only tests thread specific breakpoints, and one which only tests conditional breakpoints, then wouldn't the original test which tests both at the same time always pass? The two features are kind of independent of each other aren't they? So I wouldn't imagine that you could have TestConditionalBreakpoint and TestThreadspecificBreakpoint both pass, but TestConditionalBreakpointWithThreadSpecificBreakpoint fail.

Also the origianl test as written was either flaky or disabled on almost every platform, so it doesn't seem like it was providing much value to anyone.

The test that test "Only breakpoint conditions" will always have LLDB_INVALID_ID for the breakpoint thread ID. That means the thread test is a no-op. So it doesn't test the case "thread passes, condition doesn't" or "condition passes, thread doesn't". I wrote this test because I didn't do those combinations correctly at some point and wanted to make sure they didn't regress.

Do you have an example of a fail when it is flakey? This doesn't seem like a particularly sensitive test, other than it runs a bunch of threads through a bunch of breakpoints so it might just time out overall.

But it is a pretty simple test. Grab the first thread that hits a breakpoint, set a condition and thread ID on the breakpoint, and continue and you should just run to exit.

If it is just timing out over-all that's probably just because I made it pass too many times through the worker loop. You could probably either reduce the number of threads from 10 to something smaller, or reduce the number of times through the loop from 20 to something smaller.

It doesn't require any thread rendezvousing or anything fancy like that.

I don't have any examples, one of the linux guys might. But you can look at the decorators at the top, which say this:

@skipIfFreeBSD # test frequently times out or hangs
@expectedFailureFreeBSD('llvm.org/pr18522') # hits break in another thread in testrun
@expectedFlakeyLinux # this test fails 6/100 dosep runs

So the only platform it seems to be robust on is OSX, for whatever reason. Sadly I don't have any more info than that though.

For now I'll just make this a separate test in the same file I guess. But it's a bummer to have a test that's broken almost everywhere. Makes me think something is wrong with the test instead of with LLDB. I agree with you though that it's not obvious what might be wrong with the test.

The comments in llvm.org/pr18522 seem to me to be bugs that the test is uncovering, not bugs in the test itself. It looks like we hit a breakpoint on thread A, and try to run the condition on thread B. In some cases, thread B isn't really alive yet, and so the attempt to add a stack frame to it fails. But the whole code path down from StopInfoBreakpoint::PerformAction down to UserExpression::Execute etc all take an execution context that specifies the thread. So I'm not sure how this would happen. It also seems like sometimes they were getting spurious interrupts that were being interpreted as breakpoint hits. That's again clearly a bug in the platform layer.

This is a really simple test at the Python level, though it does trigger some complex behavior under the hood. If it is failing just because it is taking too long for the test timeout, then that should get fixed once Todd does his "run the tests that failed and are marked flakey one by one after the main run is over" work. But in the meantime it wouldn't hurt to reduce the amount of work it does, and see if that clears up at least the timeouts.

But some of the comments in that PR make it sound like it is uncovering actual bugs in handling breakpoint conditions in multi-threaded programs. It looked like in the discussion, the test is failing AFTER the continue, at which point all the test is doing is to wait for the program to stop running. So that must be some real bug in handling breakpoint conditions. In that case for sure the answer is to fix the code not the test. And of course, those bugs are going to make testing "many threads x ANYTHING x breakpoint conditions" fail, regardless of what ANYTHING is...

OTOH it makes sense to add a test for JUST breakpoint thread ID's, since that we do seem to be able to handle reliably on more systems.

jingham accepted this revision.Dec 4 2015, 2:57 PM
jingham edited edge metadata.

Anyway, adding a separate test is fine with me.

This revision is now accepted and ready to land.Dec 4 2015, 2:57 PM
labath edited edge metadata.Dec 7 2015, 1:17 AM

I don't have any examples, one of the linux guys might. But you can look at the decorators at the top, which say this:

@skipIfFreeBSD # test frequently times out or hangs
@expectedFailureFreeBSD('llvm.org/pr18522') # hits break in another thread in testrun
@expectedFlakeyLinux # this test fails 6/100 dosep runs

So the only platform it seems to be robust on is OSX, for whatever reason. Sadly I don't have any more info than that though.

I've been going through the flaky tests on linux, but I haven't reached this one yet, so unfortunately can't provide more information. It is marked flaky, but I
haven't seen it fail lately, so it must be running pretty well on linux (it has to fail twice it a row to be considered a failure in the flaky mode).

I don't think timeouts are affecting this test (as the flaky logic does not help there), but I can certainly see how extra load might trigger some corner cases.

I'd also be in favor of keeping this test, and I am planning to re-audit all the flaky decorators on linux, time permitting.

labath closed this revision.Mar 15 2016, 1:43 AM

This was committed ages ago.