This is an archive of the discontinued LLVM Phabricator instance.

hwasan: Teach the runtime to identify the local variable being accessed in UAR reports.
ClosedPublic

Authored by pcc on Jun 17 2019, 8:36 PM.

Details

Summary

Each function's PC is recorded in the ring buffer. From there we can access
the function's local variables and reconstruct the tag of each one with the
help of the information printed by llvm-symbolizer's new FRAME command. We
can then find the variable that was likely being accessed by matching the
pointer's tag against the reconstructed tag.

Depends on D63300

Depends on D63468

Diff Detail

Repository
rL LLVM

Event Timeline

pcc created this revision.Jun 17 2019, 8:36 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 17 2019, 8:36 PM
Herald added subscribers: Restricted Project, kubamracek. · View Herald Transcript
eugenis added inline comments.Jun 21 2019, 2:51 PM
compiler-rt/lib/hwasan/hwasan_report.cpp
255 ↗(On Diff #205238)

Let's give names to the magic numbers - they are ABI details after all.

266 ↗(On Diff #205238)

This is a bit tricky. Leave a comment that underflow modulo 2^20 is expected when the address is expected when the address is below the local variable?

281 ↗(On Diff #205238)

I think it might be useful to iterate until the end of the buffer to find more candidates; if anything, that would avoid losing the information that's already there.

This is good enough for now. Some time in the future I think we should refactor this to collect all the candidates (including ex. potential buffer overflows with larger offsets, across multiple small allocation) and sort them by likelihood.

282 ↗(On Diff #205238)

This function is way too big. Do you mind splitting it?

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer.h
87 ↗(On Diff #205238)

group the bools together to reduce the gaps

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_internal.h
18 ↗(On Diff #205238)

is this include necessary here?

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cc
424 ↗(On Diff #205238)

remove the space from "FRAME ", it can be added in the callee

compiler-rt/test/hwasan/TestCases/stack-uar.c
36 ↗(On Diff #205238)

The advantage of printing decoded records, as before, is that they can be parsed with all the old symbolization scripts. Explicitly listing the last digits of FP is also sometimes useful to match against register values in the report.

Could we change the format to

[tag], fp:0x12345 #0 PC (module+offset)

#0 is also for compatibility with symbolization scripts.

Please check what happens in "weird" cases with dynamic stack realignment or dynamic sp adjustment. If one or both of these cases can not be handled with this debug info format, can we at least avoid getting confused by them? Are we simply missing fp-offset in the debug info if the offset is not statically known? Add a test case.

pcc updated this revision to Diff 206329.Jun 24 2019, 4:43 PM
pcc marked 10 inline comments as done.
  • Address review comments
compiler-rt/lib/hwasan/hwasan_report.cpp
281 ↗(On Diff #205238)

By the time we get here, we've already visited every frame. We don't stop when we find the first match or anything like that.

Or are you saying that we should print the Previously allocated frames section in all cases (e.g. for manual identification of OOB accesses)?

compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_internal.h
18 ↗(On Diff #205238)

No, removed.

compiler-rt/test/hwasan/TestCases/stack-uar.c
36 ↗(On Diff #205238)

Do you think it's important to be compatible with old symbolization scripts here? I think we will probably need a new symbolization script anyway that does all the same things that the runtime is doing to identify stack objects. That's also why I decided to print raw values here, so that we'd be less likely to need to change the line format in the future if the entry format changes.

pcc added a comment.Jun 24 2019, 4:44 PM

Please check what happens in "weird" cases with dynamic stack realignment or dynamic sp adjustment. If one or both of these cases can not be handled with this debug info format, can we at least avoid getting confused by them? Are we simply missing fp-offset in the debug info if the offset is not statically known? Add a test case.

Forgot to address this part. I'll upload a new patch in a bit.

pcc updated this revision to Diff 206349.Jun 24 2019, 6:10 PM
  • Add tests covering UAR with dynamic alloca and stack realignment
pcc added a comment.Jun 24 2019, 6:10 PM

Please check what happens in "weird" cases with dynamic stack realignment or dynamic sp adjustment. If one or both of these cases can not be handled with this debug info format, can we at least avoid getting confused by them? Are we simply missing fp-offset in the debug info if the offset is not statically known? Add a test case.

In the case of dynamic sp adjustment, things seem to work just fine (for non-dynamic-alloca variables, which is the only kind that we instrument). I see a DW_OP_fbreg associated with each of the variables.

In the case of dynamic stack realignment, the debug info is expressed relative to another register with the post-realignment frame address (usually x19), which means we end up using e.g. DW_OP_breg19 instead of DW_OP_fbreg in the variable's DW_AT_location. We don't get confused by this sort of debug info because llvm-symbolizer only emits an fp offset if it see a DW_OP_fbreg in the location. I think it should be possible in principle to get this case to work by teaching the backend to use the post-realignment frame address register as DW_AT_frame_base, making the backend prefer DW_OP_fbreg for frames with stack realignment and creating a new intrinsic that returns the value of the DW_AT_frame_base designated register (whose value we would store in the ring buffer).

eugenis accepted this revision.Jun 27 2019, 3:51 PM

LGTM
We will need some kind of offline symbolization solution to go with this.

compiler-rt/lib/hwasan/hwasan_report.cpp
281 ↗(On Diff #205238)

I think I was confused by the amount of indentation in this function. Looks good now.

165 ↗(On Diff #206349)

the lower 20 bits of FP

Nit: they are not the lowest 20 bits. Add "significant" or something like that?

compiler-rt/test/hwasan/TestCases/stack-uar.c
36 ↗(On Diff #205238)

sgtm

This revision is now accepted and ready to land.Jun 27 2019, 3:51 PM
This revision was automatically updated to reflect the committed changes.
pcc marked an inline comment as done.
pcc added inline comments.Jun 27 2019, 4:27 PM
compiler-rt/lib/hwasan/hwasan_report.cpp
165 ↗(On Diff #206349)

Reworded to bits 4-19 of FP (bits 0-3 are guaranteed to be zero); should be a little clearer now.