This is an archive of the discontinued LLVM Phabricator instance.

Support more report types in AddressSanitizerRuntime.cpp
ClosedPublic

Authored by kubamracek on Nov 22 2016, 4:48 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

kubamracek updated this revision to Diff 78979.Nov 22 2016, 4:48 PM
kubamracek retitled this revision from to Support more report types in AddressSanitizerRuntime.cpp.
kubamracek updated this object.
kubamracek set the repository for this revision to rL LLVM.
kubamracek added a project: Restricted Project.
kubamracek added subscribers: lldb-commits, zaks.anna.
clayborg accepted this revision.Nov 23 2016, 7:50 AM
clayborg edited edge metadata.

Looks fine. One question though: do we always have a shared library that contains the ASAN runtime? If so, then we can actually put the array of short to long description into the ASAN dylib and then we read this array as a global variable. This would allow you to add as many descriptions as you want directly in your ASAN code and have LLDB detect all of the new strings. Or, if there is a function you can call with the short description that returns the long description, then we can still have the ASAN code contain all of the strings.

This revision is now accepted and ready to land.Nov 23 2016, 7:50 AM
filcab added a subscriber: filcab.Nov 23 2016, 12:53 PM

I have some minor fixes I'd like to see.
If it's prefixed by "Nit: " it's a really minor one and I'm ok with it as is if that's what you prefer.

Thank you,
Filipe

source/Plugins/InstrumentationRuntime/AddressSanitizer/AddressSanitizerRuntime.cpp
220 ↗(On Diff #78979)

Not necessarily recursion. There's also stack variables. I'd omit the stuff in parenthesis.

224 ↗(On Diff #78979)

Nit: "Wild jump" is probably better than "wild pointer jump", no?

226 ↗(On Diff #78979)

Nit: I'd probably use "Write through wild pointer ..." (same for the others)

234 ↗(On Diff #78979)

Nit: Phrasing seems off. I think "Double free detected" is easier to read and more explicit.

236 ↗(On Diff #78979)

Nit: Maybe "Deallocation size mismatch detected"? The user isn't deallocating a "size".

238 ↗(On Diff #78979)

s/address/memory region/?

242 ↗(On Diff #78979)

Maybe "Call to malloc_usable_size with not owned pointer detected"?
Unsure if it's obvious why it's "invalid". Same for the one below.

246 ↗(On Diff #78979)

I think this really needs additional detail. It doesn't seem very meaningful as a sentence.
Maybe "Overlapping memory ranges in function that doesn't allow them (detected?)" or something?
My suggestion can be improved too, for sure :-)

250 ↗(On Diff #78979)

Maybe "Invalid parameters to call to __sanitizer_annotate_contiguous_container" is more explicit?
Maybe also a good solution for the similar cases above.

252 ↗(On Diff #78979)

"One Definition Rule violation"
It's not an initialization order problem.

Thanks for these comments, I'll have them run by more people before committing this (I'm not a native English speaker). These really need to be as user-focused as possible, so I'm really against stuff like "not owned pointer", because that's not something we should expect the user to understand. Also note that we're not using these as a replacement of full reports, we're showing them as a quick description in addition to ASan reports.

source/Plugins/InstrumentationRuntime/AddressSanitizer/AddressSanitizerRuntime.cpp
220 ↗(On Diff #78979)

Multiple times have I seen that people read "stack overflow" as "stack buffer overflow" and they spend a lot of time trying to find what buffer was actually overflown... I'd like to somehow point that out. Ideas?

filcab added inline comments.Nov 23 2016, 2:57 PM
source/Plugins/InstrumentationRuntime/AddressSanitizer/AddressSanitizerRuntime.cpp
220 ↗(On Diff #78979)

Maybe instead of "recursion too deep" have "stack space exhausted" or something like that?
I've seen stack overflow errors on as few as 10 (maybe even fewer) stack frames (with big objects). ASan is also more likely to make this a problem. I think seeing "recursion too deep" but having only a dozen frames is probably confusing.
Not that "stack space exhausted" is much better, but I think it's less likely to be misleading.

And yes, please ask native speakers too, as I'm not one either. :-)

Here is another pass at this:

source/Plugins/InstrumentationRuntime/AddressSanitizer/AddressSanitizerRuntime.cpp
198 ↗(On Diff #78979)

Kuba would like to drop all of these "detected" that do not add anything and are just used as filler words in all of the description messages. This will make some of them into non-complete sentences, but keeping them short is more user friendly and gets the point across just fine.

220 ↗(On Diff #78979)

We could just say "Stack space exhausted" with no parentheses.

222 ↗(On Diff #78979)

Drop "detected" or "Dereference of null pointer".

224 ↗(On Diff #78979)

How about "Jump to a wild address". "Wild jump" sounds like a term, but I am not familiar with it.

226 ↗(On Diff #78979)

"Write through", "Access through" but "Read from" (as per the English speaker on our team :)).

234 ↗(On Diff #78979)

Double free is a bit of a jargon, I think it's better to be explicit:
"Deallocation of freed memory"

236 ↗(On Diff #78979)

Is this always about "new" and "delete"? If so, we could try to be more explicit about it "The size of deleted object does not match the size at allocation"

"Deallocation size different than allocation size"?

238 ↗(On Diff #78979)

"Deallocation of non-allocated memory"

242 ↗(On Diff #78979)

"Invalid argument to malloc_usable_size"? And "Invalid argument to __sanitizer_get_allocated_size" below.

246 ↗(On Diff #78979)

How about this: "Call to function disallowing overlapping memory ranges"

248 ↗(On Diff #78979)

Negative size used when accessing memory

250 ↗(On Diff #78979)

"Invalid arguments to __sanitizer_annotate_contiguous_container"

252 ↗(On Diff #78979)

Is it possible to use something similar to what clang uses, for example, "external variable %0 defined in multiple translation units" (from ./include//clang/Basic/DiagnosticASTKinds.td)?

Ex: "Symbol defined in multiple translation units"

254 ↗(On Diff #78979)

Comparison or arithmetic on pointers from different memory regions

I'd mostly use Anna's suggestions. Kuba: Can you remove the "detected" from the messages and refresh the patch, just so we can have one last look? Otherwise, if you think it's good enough, commit with these changes and we can do post-commit review.

source/Plugins/InstrumentationRuntime/AddressSanitizer/AddressSanitizerRuntime.cpp
198 ↗(On Diff #78979)

Agree.

220 ↗(On Diff #78979)

Yes.

222 ↗(On Diff #78979)

I'd go for the second option.

224 ↗(On Diff #78979)

"Jump to non-executable address" (basically, it's either non-readable (or unmapped) or non-executable. But I don't think we need to have those distinctions in this error message)? If not, I'm fine with Anna's suggestion too.

226 ↗(On Diff #78979)

Heh. Right.

234 ↗(On Diff #78979)

Good.

236 ↗(On Diff #78979)

"... different from..."?
Otherwise, I like it.

kubamracek added reviewers: filcab, zaks.anna, dcoughlin.
kubamracek removed rL LLVM as the repository for this revision.

Updating the patch.

kubamracek requested a review of this revision.Nov 29 2016, 11:27 AM
kubamracek edited edge metadata.
kubamracek marked 33 inline comments as done.

Looks fine. One question though: do we always have a shared library that contains the ASAN runtime? If so, then we can actually put the array of short to long description into the ASAN dylib and then we read this array as a global variable. This would allow you to add as many descriptions as you want directly in your ASAN code and have LLDB detect all of the new strings. Or, if there is a function you can call with the short description that returns the long description, then we can still have the ASAN code contain all of the strings.

We do have the dylib with the runtime any time this code is run. However, I don't think these should live in the ASan runtime, because they should be considered as more high-level user-visible strings. The ASan runtime itself produces detailed technical reports that don't use the wording used here. For example, the console report will just say "Invalid pointer pair" assuming the user knows what that means. When using LLDB (and/or Xcode) we should provide less technical descriptions.

Anyway, I think this asks for a longer discussion. Are you okay with leaving these strings here for now?

zaks.anna accepted this revision.Nov 30 2016, 10:54 AM
zaks.anna edited edge metadata.

Looks great to me. Thank you Kuba!!!

This revision is now accepted and ready to land.Nov 30 2016, 10:54 AM
filcab accepted this revision.Dec 2 2016, 10:08 AM
filcab edited edge metadata.

LGTM

(I commented on a minor nit. It might just be me, so feel free to keep the current wording if you feel it's preferred)

LGTM

(I commented on a minor nit. It might just be me, so feel free to keep the current wording if you feel it's preferred)

I'm not seeing this comment. Can you post it again?

This revision was automatically updated to reflect the committed changes.

Thanks for reviewing this!