This is an archive of the discontinued LLVM Phabricator instance.

Interpret ESR/FAR bits directly on watchpoint exceptions in debugserver, clarify how watchpoint descriptions in stop packets work
AbandonedPublic

Authored by jasonmolenda on Apr 5 2023, 6:30 PM.

Details

Summary

This patch originally started with handling the case where a watchpoint traps when a write starts before the watched region, and the trap address reported may be outside the watched region. I wanted to find the nearest watched region on these events, and relay that up to lldb so they behave correctly. On reading about how the ESR and FAR registers contain details about the watchpoint, including possibly the watchpoint index that was triggered, I added code to parse those, use the watchpoint index if it's available, and pass along the other fields to lldb in case they become interesting.

On the lldb side, in the stop packet, watchpoints can be flagged by one of two ways. One is the bog standard gdb remote serial protocol watch:/awatch:/rwatch: followed by a watchpoint address. The second is an lldb extension to do "reason:watchpoint" followed by a "description:" whose value is an asciihex encoded string of three integers base10 encoded, space separated. These fields are an address within the watchpoint that was triggered, the watchpoint index, and optionally, the actual address that caused the watchpoint fault, which may be outside the range of any watched region. ProcessGDBRemote translates watch:/awatch:/rwatch: k-v pairs into this reason:watchpoint+description format (with only a single address).

This third integer was added by Jaydeep Patil in 2015 ( https://reviews.llvm.org/D11672 ) for supporting MIPS where the hardware can only watch 8 byte granules, so if you watch 4 bytes and an access happens within that 8B granule but outside that range, execution will stop. These false positive watchpoints are intended to be hidden from the user, so when we had a third integer that is not contained by a watchpoint region on MIPS, we would silently step past the watchpoint and continue without notifying the user.

In 2016 Omair Javaid ( https://reviews.llvm.org/D21516 ) started passing the actual trap address on Linux arm targets up as a third integer, and updated the conditional that turns this into a StopInfo so that the above "watchpoint hit outside a watched region" behavior would happen on arm. Arm systems have a different behavior than the MIPS one -- if you watch 8 bytes at address 0x1008 and someone does a 16-byte STP instruction starting at address 0x1000, 0x1000-0x100f are modified but the watchpoint trap address reported is 0x1000. This address is not within the range of any watched region, so lldb can fail to know how to proceed. I believe Omair was handling this, and while passing this value in the third argument is a good idea, enabling the same "silently step past this watchpoint" behavior was a mistake. This was all took me quite a while to figure out, I'm not surprised it was misunderstood. I've spent so much time staring at this I'm pretty sure I've got it correct now.

I documented how the three fields of this reason:watchpoint description should be provided. I've kept the MIPS behavior of skipping over false positive watchpoints, but instead of passing an address which is outside of any Watchpoint range to CreateStopReasonWithWatchpointID, I pass a boolean silently_continue instead. I think it is a little easier to understand what it's being used for. Nothing actually did anything with the address passed to CreateStopReasonWithWatchpointID except to detect that it was not contained within a watchpoint region, and use this to indicate that we're doing the silent skipping of the watchpoint.

I wanted debugserver to use this reason:watchpoint description style reporting, and that dragged me into figuring out what these fields were meant to be, and what the "silently skip watchpoint" behavior was intended to be handling. By the time I figured all the intended behaviors out, I needed to start documenting and, hopefully, clarifying them for when I forget again in a year.

Outside of the documentation and debugserver changes, the real changes to generic lldb are in ProcessGDBRemote::SetThreadStopInfo where I've documented what this is doing with the fields and, hopefully, made it a little easier to understand in the future. And a minor change to StopInfoWatchpoint/CreateStopReasonWithWatchpointID to use a boolean flag for this behavior.

I added a test which has a uint8_t[8] array and I watch one member of that, cast the address to a uint64_t* and write to it. I suspect this will Just Work on non-AArch64 targets (the watchpoint trap address will be the address of the uint8_t we're watching, instead of the start of the uint64_t* write I get on our AArch64 systems).

In debugserver, I'm sending all of the ESR fields that I decode up to lldb in the stop info packets. This is mostly to aid in debugging of these in the medium term, so we can see them in packet logs we request easily if we have any reports of problems with watchpoints. My next trick I'm going to do is look into adding MASK watchpoint support to debugserver in addition to BAS watchpoints. I'll slowly be ongoing with work in this area I think. It could be argued that these should be logged to console like most debugserver debug logging, only enabled when requested, but there will be no measurable perf impact from putting these in watchpoint stop notifications so I'm starting here.

I know the MIPS support was removed c. 2021 but it may be added again, and the ability to skip a false watchpoint hit is a useful one that we may need on AArch64 in the future if we can detect it appropriately.

Diff Detail

Event Timeline

jasonmolenda created this revision.Apr 5 2023, 6:30 PM
jasonmolenda requested review of this revision.Apr 5 2023, 6:30 PM

One possible criticism I have of the current reason:watchpoint + description-up-to-three-integers is that we should add key-value pairs to the stop-info packet wp-address: wp-index: wp-hit-address and honestly, silently-continue: instead of doing an architectural hardcode in lldb. The remote stub is probably in the best position to know if this is a false positive watch trigger that needs to be silently moved past, instead of depending on "a third integer which isn't within the range of a watched region on MIPS". If I wanted to add a 4th field for "silently skip" to the current description, now I'm requiring that the stub reporting flag a false hit watchpoint needs to provide the first three always. And maybe it only has the one address or something.

jasonmolenda abandoned this revision.Apr 7 2023, 3:41 PM

Intermixing generic/linux AArch64 lldb watchpoint StopInfo changes in with a change to debugserver made this a difficult patch to ask anyone to review in total. I've separated it into one phab for the lldb watchpoint StopInfo handling https://reviews.llvm.org/D147816 and a separate phab for the debugserver changes to find the nearest watchpoint when we have a trap address outside any watched memory region and pass the ESR fields up to lldb https://reviews.llvm.org/D147820 . Closing this phab.

This comment was removed by omjavaid.