This is an archive of the discontinued LLVM Phabricator instance.

Report stopped by trace if none of the watchpoint was hit
ClosedPublic

Authored by tberghammer on Mar 5 2015, 6:42 AM.

Details

Summary

Report stopped by trace if none of the watchpoint was hit

Some linux kernel reports a watchpoint hit after single stepping even when no watchpoint was hit. This CL looks for a watchpoint which was hit and reports a stop by trace if it haven't found any.

Diff Detail

Event Timeline

tberghammer updated this revision to Diff 21274.Mar 5 2015, 6:42 AM
tberghammer retitled this revision from to Expect stop reason watchpoint for the step over breakpoint thread plan.
tberghammer updated this object.
tberghammer edited the test plan for this revision. (Show Details)
tberghammer added a reviewer: jingham.
tberghammer added a subscriber: Unknown Object (MLST).
jingham edited edge metadata.Mar 5 2015, 12:40 PM

How will you handle the case where the instruction we are stepping over triggers a real watchpoint?

vharron added a subscriber: vharron.Mar 5 2015, 1:46 PM

How will you handle the case where the instruction we are stepping over triggers a real watchpoint?

Thanks for the suggestion:

"Create a test that verifies correct behavior when a watchpoint is hit during a step"
http://llvm.org/bugs/show_bug.cgi?id=22814

I didn't understand your reply. Adding a test case won't make this code work correctly.

This change should not go in till you figure out how to identify the watchpoint stop reason which corresponds to the return from the single step and only return true in that case.

Adding a test case won't make this code work correctly.

Sorry, that was confusing. I don't have a great solution at this point. The kernel is what the kernel is and it's on shipped Android systems, so we have to live with it. Hopefully that bug will eventually drive a kernel patch.

This change should not go in till you figure out how to identify the watchpoint stop reason which corresponds to the return from the single step and only return true in that case.

I see your point. Tamas, I've been assuming that this is related to your work on aarch64. Is that correct? If so, do you want to try to patch the kernel, ask Linaro for help or punt this whole problem to them?

Do you get any data back from the watchpoint hit? For instance, if the trace is implemented by a "pc != curr_pc" type watch, then you should be able to get the watched address from the watchpoint hit event. In that case, you can check whether the watched address was the current PC, and only report this as a trace stop when that is true.

Another way to do this, if you are using llgs, is to have THAT guy convert the watchpoint hit into a trace event. llgs is closer to the metal, and might have a better chance to figure out what just went on.

I'm no expert at this layer but you might want to have your kernel people look at the "ARMv8 Debug Architecture" manual from ARM, specifically section 4.2.1, "Effect of taking debug exception to an EL using AArch64 on system registers" (or the AArch32 section below that). It should be possible for the kernel to distinguish between a single step with the MDSCR_EL1 SS bit and a watchpoint debug event.

tberghammer updated this revision to Diff 21343.Mar 6 2015, 5:26 AM
tberghammer retitled this revision from Expect stop reason watchpoint for the step over breakpoint thread plan to Report stopped by trace if none of the watchpoint was hit.
tberghammer updated this object.
tberghammer added reviewers: vharron, chaoren.

Thanks for noticing this bug and for all of the suggestions. I created a new fix what address the actual root cause of the problem with reporting the right stop reason from lldb-server.

For ARMv8 the watchpoint handling is not implemented yet, but it should be possible to implement the IsWatchpointHit function.

The approach looks right. I'll let somebody who works on the NativeThread side weigh in on whether the implementation is correct.

vharron accepted this revision.Mar 13 2015, 2:32 PM
vharron edited edge metadata.
vharron added inline comments.
source/Plugins/Process/Linux/NativeThreadLinux.cpp
406 ↗(On Diff #21343)

Which kernels is it known to happen on/not happen on?

This revision is now accepted and ready to land.Mar 13 2015, 2:32 PM
chaoren accepted this revision.Mar 13 2015, 2:48 PM
chaoren edited edge metadata.

LGTM

Some nits below:

source/Plugins/Process/Linux/NativeRegisterContextLinux_x86_64.h
49 ↗(On Diff #21343)

Is this necessary?

source/Plugins/Process/Linux/NativeThreadLinux.cpp
405 ↗(On Diff #21343)

What Vince asked (which kernels). Also nit: grammar.

tberghammer added inline comments.Mar 17 2015, 7:31 AM
source/Plugins/Process/Linux/NativeRegisterContextLinux_x86_64.h
49 ↗(On Diff #21343)

Which part?

The override keyword is optional but gives us some more compile time check.

This revision was automatically updated to reflect the committed changes.
tberghammer added inline comments.Mar 17 2015, 8:29 AM
source/Plugins/Process/Linux/NativeThreadLinux.cpp
406 ↗(On Diff #21343)

Added kernel description to comment (android-arm64 platfrom-21) and fixed grammar

labath added a subscriber: labath.Mar 17 2015, 9:59 AM

It could be that the kernel reports a watchpoint hit on single stepping because on arm single stepping is actually implemented using hardware breakpoints. This is just a guess, but in any case it might be interesting to create a test case, which what happens with single stepping after we have set the maximum number of watchpoints possible. Does single stepping still work? Do some of our watchpoints get overwritten, etc.

On armv7 architecture systems, there are 4 hardware breakpoint registers (DBGBVR) and 4 hardware watchpoint registers (DBGWVR). The breakpoint registers track values in the pc register. You can also say "stop when the PC is not equal to this value" which is how single instruction stepping is done on armv7.

On armv8 architecture systems, AArch32 or AArch64, there is a single instruction step bit (MDSCR_EL1, SS bit) to accomplish this more simply. There are still the dual sets of hardware breakpoint and hardware watchpoint registers -- but the hardware breakpoint registers don't have the "stop when the PC is not equal to this value" capability.

The kernel should have no problem distinguishing between an MDSCR_EL1 SS bit stop event and a watchpoint exception (on armv8 systems). On an armv7 system, it's possible to distinguish between a breakpoint and watchpoint event. And exhausting the watchpoint register file won't prevent stepping from working.

fwiw, lldb today doesn't use the hardware breakpoints, except for single stepping on armv7 systems. breakpoint set does have the --hardware option to request that the remote gdb stub use a hardware breakpoint if it has that capability. But lldb doesn't do that on its own today.

Thank you, that was very enlightening. :)