This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] ASan debugging API for report info extraction and locating addresses
ClosedPublic

Authored by kubamracek on Jul 15 2014, 3:00 PM.

Details

Summary

This patch is part of an effort to implement a more generic debugging API, as proposed in http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074656.html, with first part reviewed at http://reviews.llvm.org/D4466.

Now adding two new APIs:

  • __asan_get_report_data which returns a 0 if no asan report happened yet, or 1 if yes and also returns the PC, BP, SP, address, access type (read/write), access size and bug description (e.g. "heap-use-after-free")
  • __asan_locate_address which takes a pointer and tries to locate it, i.e. say whether it is a heap pointer, a global or a stack, or whether it's a pointer into the shadow memory. If global or stack, tries to also return the variable name, address and size. If heap, tries to return the chunk address and size.

Generally these should serve as an alternative to "asan_describe_address", which only returns all the data in text form. Having an API to get these data could allow having debugging scripts/extensions that could show additional information about a variable/expression/pointer. The idea behind having asan_get_report_data is that we could have a special lldb stop reason that could show much more information than just "EXC_BAD_ACCESS".

Diff Detail

Event Timeline

kubamracek retitled this revision from to [compiler-rt] ASan debugging API for report info extraction and locating addresses.Jul 15 2014, 3:00 PM
kubamracek updated this object.
kubamracek edited the test plan for this revision. (Show Details)
kubamracek added reviewers: kcc, glider.
kubamracek added subscribers: Unknown Object (MLST), Unknown Object (MLST).
kubamracek updated this revision to Diff 11470.Jul 15 2014, 3:00 PM
kcc added inline comments.Jul 16 2014, 1:36 AM
include/sanitizer/asan_interface.h
69

I don't like the interface.
You may eventually need to get other kinds of data, so you will have to change this function.
Instead, I suggest to add a set of functions like __asan_get_report_foo each of which returns just one part of report if such is known.

78

Do you need this to be a enum?
Can this be just a 'const char *' containing the description, e.g.
'stack', "heap".
BTW, this is missign at least one kind of memory: fake-stack.

lib/asan/asan_debugging.cc
46

here and below, please avoid copy-paste at all costs.
You've copied this code from DescribeAddressIfStack -- don't do it.
Instead, refactor the code so that you can call it from other places.
If possible, please send the refactoring as separate patches.

lib/asan/asan_report.cc
34

wrap these into a struct.

struct ReportData {

uptr foo;
uptr bar;

};

static ReportData report_data;

947

why did you move this here?

glider added inline comments.Jul 16 2014, 2:45 AM
include/sanitizer/asan_interface.h
69

As an alternative you can make __asan_get_report_data return the ReportData struct suggested by Kostya below.
Keep in mind you'll have to keep your LLDB scripts in sync with that struct's layout which might be less convenient than having __asan_get_report_foo for every foo.
On the other hand __asan_get_report_data is more convenient to call from the user code that includes asan_interface.h

test/asan/TestCases/debug_locate.cc
9

Please fix the includes order.

kubamracek updated this revision to Diff 11530.Jul 16 2014, 1:36 PM

Updated the patch to address review comments. Sent a separate refactor patch, http://reviews.llvm.org/D4545, this patch depends on it.

kubamracek added inline comments.Jul 16 2014, 1:41 PM
lib/asan/asan_report.cc
947

Because ScopedInErrorReport's constructor does things where I'd like the report data to already be available. Namely, it calls __asan_on_error, which seems like the only sensible place for a breakpoint *before* the actual report is printed (I'm using it in the debug_report.cc test case).

samsonov added inline comments.
lib/asan/asan_report.cc
35

I believe this wouldn't compile on Windows :-/

248

static is not necessary

kubamracek added inline comments.Jul 16 2014, 2:54 PM
lib/asan/asan_report.cc
35

I don't have a Windows box nearby, would it be because of the member initialization or because of assignment of zero into a bool?

kubamracek updated this revision to Diff 11546.Jul 16 2014, 5:38 PM

Update addressing review comments.

glider edited edge metadata.Jul 29 2014, 11:50 AM

Sorry for the delay, will review tomorrow (MSK time)

Looks mostly good.

lib/asan/asan_debugging.cc
51

Single-line comments must start with a capital letter and end with a period. Please fix here and below.

lib/asan/asan_interface_internal.h
94

Please add a comment for this family of functions.

lib/asan/asan_report.cc
35

IIUC for a non-empty ReportData the value of |already_happened| is always true. I suggest to extract it into a separate flag variable to avoid copying this meaningless value when passing ReportData around.

test/asan/TestCases/debug_report.cc
7

Please mind the order of includes.

17

nit: too much vertical whitespace in this function.

glider added inline comments.Jul 30 2014, 4:12 AM
lib/asan/asan_report.cc
994

BTW why this function returns int, not bool?

Please disregard my question about bool.

kcc edited edge metadata.Jul 30 2014, 4:24 AM
  1. I still see some copy-paste
  2. Your patches are uploaded w/o context. Please make them with full context.
kubamracek updated this revision to Diff 13429.Sep 8 2014, 4:07 PM
kubamracek edited edge metadata.

Addressing the review comments. Tried to minimize any code duplication. Added full context.

samsonov added inline comments.
lib/asan/asan_globals.cc
93–94

Can you make this function do one of the following:
(1) print the address description (what it does now) if "print" input parameter is true
(2) set __asan_global output parameter to the structure describing the global.
Then you won't need GetInfoForAddressRelativeToGlobal at all.

116

See above: passing a pack of fake parameters is ugly. Instead you can call this function in two modes:

DescrineOrGetInfoForGlobal(addr, size, /*print*/true, /*global*/nullptr);

or

Global g;
DescribeOrGetInfoForGlobal(addr, 1, /*print*/false, &g);
lib/asan/asan_report.cc
306–307

Can you use the identical strings in the output and in shadow_type variable? That is, make shadow_type equal to "shadow gap", or "high shadow", or "low shadow".

311

Consider using internal_strncpy here.

994

Why have you moved ScopedInErrorReport here?

1055

Any specific reason to not expose report_data structure layout in the interface header? This pack of functions doesn't look nice.

Any specific reason to not expose report_data structure layout in the interface header? This pack of functions doesn't look nice.

It's to make it easier to use from LLDB. When we add a new function, it's easier to check whether that symbol exists than to version this struct. See the original suggestion at http://reviews.llvm.org/D4527?id=11470#inline-37585

Why have you moved ScopedInErrorReport here?

Because ScopedInErrorReport's constructor does things where I'd like the report data to already be available. Namely, it calls __asan_on_error, which seems like the only sensible place for a breakpoint *before* the actual report is printed (I'm using it in the debug_report.cc test case). I know we have to be careful about this, but it seems to me this move should be safe (the code in between doesn't do any locks, etc.). Or is there any subtle issue due to this move?

I agree with the other comments, will submit a new patch.

Kuba

kubamracek updated this revision to Diff 13498.Sep 9 2014, 2:23 PM

Addressing comments from review.

samsonov added inline comments.Sep 11 2014, 3:09 PM
lib/asan/asan_debugging.cc
38

internal_strncpy?

119

Consider introducing a struct for address description instead of passing quadruple (name, name_size, region_address, region_size) around. It would be much easier to modify it later. You can also add region_kind ("global", "stack" etc.) string there.

lib/asan/asan_globals.cc
93–94

This function can be static.

106

I believe output_global should always be nonnull in this case. Remove the if() and optionally add a CHECK.

lib/asan/asan_interface_internal.h
97

Don't duplicate the comments here.

lib/asan/asan_report.cc
279

This clearly belongs to asan_globals.cc

314

I guess that if print() is false, output_type should be nonnull.

986

Can you use aggregate assignment here?

report_data = { ... };
kubamracek updated this revision to Diff 13666.Sep 12 2014, 3:53 PM

Updated patch.

Consider introducing a struct for address description instead of
passing quadruple (name, name_size, region_address, region_size)
around. It would be much easier to modify it later. You can also add
region_kind ("global", "stack" etc.) string there.

Introduced AddressDescription. However, I still kept the external interface not to use this struct, again for LLDB convenience reasons.

samsonov edited edge metadata.Sep 12 2014, 4:42 PM

Getting closer.

lib/asan/asan_debugging.cc
35

Why not internal_strncpy(descr->name, vars[i].name_pos, Min(descr->name_size - 1, vars[i].name_len)) ?

40

s/break/return

48

See another comment about passing AddressDescription into DescribeAddressIfShadow()

55

This should be filled by GtInfoForAddressIfGlobal

69

Consider factoring out filling heap address description into a separate routine. AsanLocateAddress would then be really straightforward.

lib/asan/asan_globals.cc
105

put CHECK() under if()

126

You can (and probably should) fill descr->region_type here.

lib/asan/asan_report.h
44

Why not pass AddressDescription here as well?

kubamracek updated this revision to Diff 13669.Sep 12 2014, 5:53 PM
kubamracek edited edge metadata.

Getting closer.

Thanks for the patience :)

Why not internal_strncpy(descr->name, vars[i].name_pos, Min(descr->name_size - 1, vars[i].name_len)) ?

Because that would not write \0 after the string (we're copying just a part of the string pointed to by vars[i].name_pos). So I would have to memset the whole buffer to zero before. Should I do that, instead of the strncat? (The strncat is also used in asan_report.cc:436).

In D4527#31, @kubabrecka wrote:

Getting closer.

Thanks for the patience :)

Why not internal_strncpy(descr->name, vars[i].name_pos, Min(descr->name_size - 1, vars[i].name_len)) ?

Because that would not write \0 after the string (we're copying just a part of the string pointed to by vars[i].name_pos). So I would have to memset the whole buffer to zero before. Should I do that, instead of the strncat? (The strncat is also used in asan_report.cc:436).

Right, let's keep strncat here then.

In D4527#33, @samsonov wrote:

Why not internal_strncpy(descr->name, vars[i].name_pos, Min(descr->name_size - 1, vars[i].name_len)) ?

Because that would not write \0 after the string (we're copying just a part of the string pointed to by vars[i].name_pos). So I would have to memset the whole buffer to zero before. Should I do that, instead of the strncat? (The strncat is also used in asan_report.cc:436).

Right, let's keep strncat here then.

Ok. So, are you fine with how the patch looks like now?

samsonov added inline comments.Sep 17 2014, 3:04 PM
lib/asan/asan_debugging.cc
43

what if you haven't found a stack variable? Should you still set descr->region_kind in this case?

58

Minor nit: comments seems to be redundant in this function. Also, you can reduce the amount of vertical whitespace here.

73

Rename this function to GetInfoForStackVar to make this consistent with GetInfoForAddressIfGlobal and GetInfoForHeapAddress.

lib/asan/asan_report.cc
306–307

Get rid of this variable - just put this string inside a single Printf call.

310

else if

311

else if

985

Note that report_happened might be set to true even though report_data is not yet initialized. Have you considered setting this flag inside "ScopedInErrorReport constructor? Yes, you will have no report data for other kinds of error reports, but it still seems like the right place for it.

You can also pass ReportData into ScopedInErrorReport constructor, to make sure you initialize global "report_data" exactly once, under a lock.

kubamracek updated this revision to Diff 13963.Sep 22 2014, 7:15 PM

Addressing review comments.

kubamracek updated this revision to Diff 13964.Sep 22 2014, 7:16 PM

Uploading patch with more context.

I've got only a handful of comments left.

Kostya, are you OK with the change?

lib/asan/asan_debugging.cc
49

Do you need to at least set descr->name to empty string?

lib/asan/asan_report.cc
596

Just make nullptr default value for report.

explicit ScopedInErrorReport(ReportData *report = nullptr) {...}
test/asan/TestCases/debug_locate.cc
32

You can just use asserts instead of CHECK-lines here. It doesn't look like this test needs FileCheck.

assert(0 == strcmp(type, "global"));
kubamracek updated this revision to Diff 14060.Sep 24 2014, 7:01 PM

Addressing review comments.

samsonov accepted this revision.Sep 24 2014, 7:05 PM
samsonov edited edge metadata.

LGTM (but please let Kostya sign this off).

This revision is now accepted and ready to land.Sep 24 2014, 7:05 PM
kcc added a comment.Sep 25 2014, 3:27 PM

Almost LGTM, sorry for delay

lib/asan/asan_debugging.cc
30

If the stack corruption is really bad we may return false here. This should be handled.

kubamracek updated this revision to Diff 14088.Sep 25 2014, 3:58 PM
kubamracek edited edge metadata.

Handle the case when ParseFrameDescription returns false.

kcc accepted this revision.Sep 25 2014, 4:44 PM
kcc edited edge metadata.

LGTM, thanks!

kubamracek closed this revision.Sep 25 2014, 5:05 PM

Landed in r218481, thanks for review!

Reverted in r218501 due to test failure on the sanitizer-x86_64-linux buildbot. Committed again in r218538 with a fix, intptr_t -> uintptr_t, and an extra free(heap_ptr) to fix the LeakSanitizer report. Both changes are in the test file only (debug_locate.cc).