This is an archive of the discontinued LLVM Phabricator instance.

Disable ExecControl/StopHook/stop-hook-threads.test on Linux
ClosedPublic

Authored by jgorbe on Feb 14 2019, 3:24 PM.

Details

Summary

ExecControl/StopHook/stop-hook-threads.test is flaky on Linux (it's consistently failing on my machine, but doesn't fail on a co-worker's). I'm seeing the following assertion failure:

CommandObject.cpp:145: bool lldb_private::CommandObject::CheckRequirements(lldb_private::CommandReturnObject&): Assertion `m_exe_ctx.GetTargetPtr() == NULL' failed.

Interestingly, this doesn't happen when typing the same commands in interactive mode. The cause seems to be that, in synchronous execution mode continue waits until the process stops again, and that includes running any stop-hooks for that later stop, so we end with a stack trace like this (lots of frames omitted for clarity):

abort()
CommandObject::CheckRequirements() <-- this is again the same instance of CommandObjectProcessContinue, fails assertion because the previous continue command hasn't finished.
Target::RunStopHooks()
CommandObjectProcessContinue::DoExecute()
Target::RunStopHooks()

In general, it seems like using process control commands inside stop-hooks does not have very well defined semantics. You don't even need multiple threads to make that assertion fail, you can build

#include <cstdio>
int main() {
  printf("1\n");  // break1
  printf("2\n");  // break2
}

and then on lldb

target stop-hook add -o continue
break set -f stop-hook-simple.cpp -p "break1"
break set -f stop-hook-simple.cpp -p "break2"
run

In this case it's even worse because the presence of multiple threads makes it prone to race conditions. In some tests I ran with a simpler version of this test case, I was hitting either the previous assertion failure or the following issue:

  1. Two threads reach a breakpoint
  2. First stop-hook does a process continue
  3. Threads end
  4. Second stop-hook runs, complains about process not existing.

This change disables the test on Linux. It's already marked as XFAIL on Windows, so maybe we should just delete it until it's clear what should be the expected behavior in these cases. Or maybe try to come up with a way to write a similar multithreaded test without calling continue from a stop hook, I don't know.

Diff Detail

Repository
rL LLVM

Event Timeline

jgorbe created this revision.Feb 14 2019, 3:24 PM

I haven't gotten a chance to revisit the stop hooks in a long time. They definitely need some love.

IMO, stop hooks should have the same semantics as breakpoint commands, the first command that causes the target to run should exit the stop hook. That way you would never get into a situation where you hit another stop hook while handing previous stop's hit.

And we should also add a --continue (maybe also --step & --next) option to the stop hook. Then we could just disallow step/next/continue in stop hooks. It is really hard to reason about what to do when handling stops if the contents of the stop hook can restart the target out from under you. For instance, if you have multiple stop hooks, you have to abort running all the other stop hooks after the first one continues.

labath accepted this revision.Feb 14 2019, 11:26 PM

lgtm. I think the current behavior of stop hooks with process control commands is so under-defined that it does not make sense to run this test, especially given that we're considering disallowing that altogether. If we wanted to just test that stop hooks do run in the presence of multiple threads, then it should be possible to replace the "continue" command in the hook with something else which e.g. prints a variable. Then we can just check that the variable gets displayed correctly.

And we should also add a --continue (maybe also --step & --next) option to the stop hook. Then we could just disallow step/next/continue in stop hooks. It is really hard to reason about what to do when handling stops if the contents of the stop hook can restart the target out from under you. For instance, if you have multiple stop hooks, you have to abort running all the other stop hooks after the first one continues.

I like that a lot. In fact, I was proposing something very similar when we were discussing this internally. :)

This revision is now accepted and ready to land.Feb 14 2019, 11:26 PM

BTW, I am pretty sure the assert that Jorge is hitting would be reproduced everywhere when running the test in debug build (so that LLDB_CONFIGURATION_DEBUG is defined). I normally use a Release+Asserts build, hoping that has all assertions enabled, but it turns out this is not controlled by that.

I think we should at least hook things up so that LLDB_CONFIGURATION_DEBUG is triggered by -DLLVM_ENABLE_ASSERTIONS in cmake, but ideally, I'd just remove this variable altogether and replace it with standard _(N)DEBUG macros (or, for some of the more extreme uses of LLDB_CONFIGURATION_DEBUG, with LLVM_ENABLE_EXPENSIVE_CHECKS).

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2019, 9:53 AM

Ah, I see what happened.

stop-hooks should always be run in asynchronous mode, it doesn't make sense to have a stop hook persist over a continue. The intention is still to do that, as indicated by the fact that we set StopOnContinue to true in the options we pass to handle command when we run the stop hooks. These options were added in a (nice mostly) refactor of HandleCommands. But StopOnContinue is incorrectly handled. CommandInterpreter::HandleCommands just does:

if (!options.GetStopOnContinue()) {
  m_debugger.SetAsyncExecution(false);
}

So if you pass stop on continue as true then we don't actually change the sync/async mode of the debugger. So it it was synchronous going in, it's going to stay that way. That's clearly incorrect.

BTW, I still think that we should add an explicit way to indicate that stop hooks should continue. But the current intended behavior was regular. If any stop hook caused the target to proceed, it would terminate at that point and all the other stop hooks would be jettisoned. But if the stop hook gets (incorrectly) run in sync mode, we can no longer implement that policy.