This is an archive of the discontinued LLVM Phabricator instance.

Accurate watchpoint hit counts redux
ClosedPublic

Authored by jingham on Jul 27 2022, 4:42 PM.

Details

Summary

This is a reworking of the previous patch to get the watchpoint hit counts to be accurate on systems where you have to walk the PC past the faulting instruction before reporting the stop. The previous patch worked reliably on macOS, but failed on the Linux aarch64 bots. As I suspected from the discussion of the previous patch, the problem was that the threads that don't get to step over the faulting instruction see a new StopInfoWatchpoint on the next stop, and so they schedule a new step-over plan, and things go poorly after that...

I fixed that issue by checking whether the thread had gotten a chance to run in StopInfoWatchpoint::ShouldStopSynchronous and not doing any work in that case. In the answer to Pavel's question, I managed to get both modes to work but it is much easier for lldb to deal with debugserver's clearing of stop info's on each step, than lldb-server's preserving them. The former is easy for lldb to reason about, since it is in control of what stop-info's it decides to preserve, the latter requires a little guessing...

However, even with that fix, the "threads/concurrent_events" Linux test runs that had several watchpoints and a breakpoint or signal were flaky. The reason for that is that depending on how events arrived, a signal or breakpoint event could force a stop before the threads stopped at watchpoints all had a chance to move past their faulting instructions. Then on Linux, we would get all the unprocessed watchpoint hits reported anew when the breakpoint stopped so that it looked like the watchpoint threads had hit the watchpoints too many times.

I fixed that by adding an explicit mechanism for a thread & it's thread plans to say "I have work that needs to be done before a public stop can be declared". That way all the threads that stop on the watchpoint can force their step-instructions before the final stop is reported, and we won't end up with some having run the faulting instruction and others not, causing us to overcount the hits. That's the ShouldRunBeforePublicStop machinery.

The only other substantial change over the previous patch is that there were a handful of places where we weren't being rigorous about our rule that "LLDB preserves the StopInfo for thread that doesn't get to run". In one place, we were using GetResumeState not GetTemporaryResumeState to check whether the thread had run. GetResumeState only returns true if the User suspended the thread, not if lldb has suspended it for this one resume for its own purposes, so it wasn't the right test. And in another place, if we get thread stop info in the JSON form, we would unconditionally reset all the threads w/o checking if any of them had run or not.

With this I'm getting clean test suite runs on macOS and on a Linux guest running Ubuntu22.04 on my AS Mac.

Diff Detail

Event Timeline

jingham created this revision.Jul 27 2022, 4:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 4:42 PM
jingham requested review of this revision.Jul 27 2022, 4:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 4:42 PM

Note, just so it isn't confusing, there are some other bits of the ThreadPlan machinery where a ThreadPlan or the Thread itself can say "Even though the thread plan computation says I should stop, I'm going to force a run". However, those controls were all just about the one thread's vote. They didn't force the overall ThreadList::ShouldStop decision. To get this to work, however, I needed threads to be able say "You have to keep resuming till I say I'm done". So there needed to be a different mechanism, checked higher up in the should stop computation, for that behavior.

labath accepted this revision.Aug 1 2022, 4:46 AM

This makes sense to me. Thanks for doing this.

This revision is now accepted and ready to land.Aug 1 2022, 4:46 AM