This is an archive of the discontinued LLVM Phabricator instance.

RunThreadPlan: Fix halting logic in IgnoreBreakpoints = false
ClosedPublic

Authored by labath on May 17 2017, 8:21 AM.

Details

Summary

The function had logic to handle the case when the expression terminated
while we were trying to halt the process, but it failed to take into
account the possibility that the expression stopped because it hit a
breakpoint. This was caused by the fact that the handling of the stopped
events was duplicated for the "halting" and regular cases (the regular
case handled this situation correctly). I've tried to merge these two
cases into one to make sure they stay in sync.

I should call out that the two cases were checking whether the thread
plan has completed in slightly different ways. I am not sure what is the
difference between them, but I think the check should be the same in
both cases, whatever it is, so I just took the one from the regular
case, as that is probably more tested.

For the test, I modified TestUnwindExpression to run the expression with
a smaller timeout (this is how I found this bug originally). With a 1ms
one thread timeout, the test failed consistently without this patch.

Event Timeline

labath created this revision.May 17 2017, 8:21 AM
jingham requested changes to this revision.May 17 2017, 12:00 PM

The first code site did: checking for thread plan success, then check for hit breakpoint, then for anything else (thread plan failed or other stop reason.)

The code in the first of your substitution sites retains the anything else case outside the bits you factored out, in the if clause you've retained after your function (5281 in the new version.)

But ThreadPlanDone returns true regardless of the success or failure of the plan (it gets used in a bunch of places, so it's no less well tested, just had different semantics...) So if you use your new function in the second site, the code will no longer handle the case where the plan was completed but failed. Be good to keep a check for that.

Other than that, this looks good. Thanks for catching this.

This revision now requires changes to proceed.May 17 2017, 12:00 PM

I'm not sure I understand what you're saying. Did you mean to say that I should add the "thread plan didn't successfully complete." (line 5281) block to the "Halt" branch as well ? (possibly by including it into the factored out function)

Ok, I've missed the distinction between plan completing (aka being "done") and completing sucessfully. Things make a bit more sense after that.

With that in mind, let me try to explain how I understand it the code now, and then you can tell me if it's correct :)

For our purposes we can get a Stopped event due to one of these things happening:

  • plan_success
  • breakpoint_hit
  • interrupt
  • plan_failure (I'm not really sure when can that happen)
  • other (this can include the process stopping due to a signal)

What the old code did in the non-halt case was:

if(plan_success) handle_and_stop()
else if(breakpoint_hit) handle_and_stop()
else {
  // "interrupt" (which we can still get even though we haven't sent it) , "plan_failure" or "other"
  handle_and_stop()
}

The old code in the halt case did:

if (plan_failure || plan_success) handle_and_stop()
else {
  // "interrupt", "breakpoint_hit" or "other"
  resume_with_all_threads()
}

Here, the else part is wrong because we treat two other events the same way as the interrupt.

In my mind, the two versions of the code should behave the same way for all cases except for the "interrupt" case. In the first one, we should stop because the interrupt must have been initiated externally, while in the second one, we should resume because it was (probably) our halt event.

I'll upload a new version which attempts to to just that. The logic in the first case should remain unchanged, while the "halt" logic becomes:

if(plan_success) handle_and_stop()
else if(breakpoint_hit) handle_and_stop()
else if(interrupt) resume_with_all_threads()
else {
  // "plan_failure" or "other"
  handle_and_stop()
}

Let me know what you think.

PS: I'm not in a hurry with this, so I can wait a couple of days if you're busy right now.

labath updated this revision to Diff 99435.May 18 2017, 7:54 AM
labath edited edge metadata.

New version

That looks right to me, and is much nicer to read.

I think "plan failure" once you've actually kicked off the execution of a function calling thread plan is theoretical, there are plenty of ways the plan can fail, though at present all the ways I can think of would happen either before or after you got here. But just because we haven't found a way to make that stage fail yet...

jingham accepted this revision.May 23 2017, 1:40 PM

Oh, yeah, check the checkbox, Jim...

This revision is now accepted and ready to land.May 23 2017, 1:40 PM
This revision was automatically updated to reflect the committed changes.
labath reopened this revision.May 24 2017, 8:31 AM

Reopening for a re-review of a fix.

This revision is now accepted and ready to land.May 24 2017, 8:31 AM
labath updated this revision to Diff 100099.May 24 2017, 8:39 AM

Fixed version.

The original patch caused a regression in TestLoadUnload, which has only showed
up when running the remote test suite. The problem there was that we interrupted
the target just as it has hit the rendezvous breakpoint in the dlopen call. This
meant that the stop reason was set to "breakpoint" even though the event would
not have been broadcast if we had not stopped the process. I fix this by
checking StopInfo->ShouldNotify() before stopping.

I was hoping I would be able to create a more reliable test for this bug by
calling an expression which will hit a conditional breakpoint, whose condition
will evaluate to false. However, I have found that this does not work at all --
we always stop at the breakpoint, regardless of the expression. So, I add a
(disabled) test for that instead.

labath requested review of this revision.May 24 2017, 8:41 AM
labath edited edge metadata.

Let me know what you think of the fix, and please confirm whether the ignoring of the breakpoint condition is a bug.

thanks.

jingham accepted this revision.May 24 2017, 12:40 PM

The fix seems good.

The fact that a breakpoint hit while evaluating an expression doesn't check the condition is a known limitation.

You've got to protect against artificial recursions in handling breakpoints. For instance, you could have a breakpoint condition in a function where the condition calls back into the same function, hitting the breakpoint again, calling the condition again, etc... Sounds a bit far-fetched, but it actually happened in the field, which is why I added this protection. The code that does this is in StopInfoBreakpoint.cpp::PerformAction.

We could be smarter about condition evaluation, since nesting expression evaluation does work. Maybe tracking how deeply nested the condition evaluation is and having a cutoff at some (settable?) depth. Or have a "I promise I'm not going to do anything stupid" setting - though that seems hacky... I don't think there is a bug about this, but it would be great to file one.

Note, we also prohibit breakpoint command execution in expressions. We have to do that because the current implementation of the command evaluator isn't re-entrant. We really should fix that some day, but that's a decent chunk of work. Anyway, so we have to be draconian about command evaluation, but with some care we could relax the condition evaluation prohibition.

This revision is now accepted and ready to land.May 24 2017, 12:40 PM

Thanks for the explanation. I'll continue the discussion about the conditional breakpoint thingy on bug #33164, which I've just filed.

This revision was automatically updated to reflect the committed changes.