This is an archive of the discontinued LLVM Phabricator instance.

[tsan] Provide API for libraries for race detection on custom objects
ClosedPublic

Authored by kubamracek on Jan 17 2017, 5:45 PM.

Details

Summary

Hi,

TSan's race detection algorithm can be applied generally to anything where modification disallows other simulatenous accesses. Data races on memory locations are the common case, but we also already use the same algorithm to track races on file descriptors. This patch extends race detection and proposes to provide an interface that would allow libraries/framework to track races on objects that are exposed via API. If a mutable object is accessible via some API, it's very common that the threading rules of the API follow TSan's standard race rules. Example:

ArchiverRef ArchiverCreateNew();
void        ArchiverAddFile(ArchiverRef ref, char *filename, char *data, size_t len);
void        ArchiverRemoveAllFiles(ArchiverRef ref);
long        ArchiverGetNumberOfFiles(ArchiverRef ref);
char *      ArchiverGetFileData(ArchiverRef ref, char *filename, size_t *out_len);

A reasonable threading contract would be that "read-only" methods can be called simulatenously from multiple threads (for one ArchiverRef handle), while modifying methods must be exclusive.

One option to detect violations of this API contract is to TSanify the library itself, but this has some downsides:

  1. The detected race will be on some internal state variable, possibly several functions deep (but the real bug is that user's code is breaking the API contract).
  2. The bug might not be detected at all, because the two APIs just happen not to touch the same memory. It's still a misuse of the API, and something we want to report.
  3. It might not be feasible/easy to recompile the library with TSan instrumentation.

Another option is to add interceptors for the APIs of the library. However, interceptors introduce a dependency, which is undesirable for high-level libraries, and we want TSan to link against as few libraries as possible. Secondly, maintaining a large list of interceptors for various libraries is hard and error-prone.

The idea of this patch is to allow a non-instrumented library to call into TSan runtime, which would be weakly (dynamically) linked (if TSan is not loaded into the process, this won't do anything). The library would tell TSan about "readonly" and "modifying" accesses to an "object" (which simply translate to a read/write of the object's pointer) and provide the caller and tag (type of object):

static void *tag_Archiver = __tsan_register_tag("MyLibrary", "Archiver");

void ArchiverAddFile(ArchiverRef ref, ...) {
  __tsan_external_write(ref, CALLER_PC, tag_Archiver);
  ...
}

long ArchiverGetNumberOfFiles(ArchiverRef ref) {
  __tsan_external_read(ref, CALLER_PC, tag_Archiver);
  ...
}

The caller and tag are used to generate a user-friendly report:

==================
WARNING: ThreadSanitizer: race on a library object
  Mutating access of object Archiver from library MyLibrary at 0x7b04000000f0 by thread T6:
    #0 ArchiverAddFile (archiver.dylib+...)
    #1 UserCodeFunc1 (usercode+...)

  Previous read-only access of object Archiver from library MyLibrary at 0x7b04000000f0 by thread T5:
    #0 ArchiverAddFile (archiver.dylib+...)
    #1 UserCodeFunc2 (usercode+...)

  Location is Archiver object from library MyLibrary of size 16 at 0x7b04000000f0 allocated by main thread:
    #0 malloc (...)
    #1 ArchiverCreateNew (archiver.dylib+...)
    #2 main (usercode+...)
...

So, the proposal here is to add these functions to TSan's interface:

void *__tsan_external_register_tag(char *library_name, char *object_type);
void  __tsan_external_assign_tag(void *addr, void *tag);
void  __tsan_external_read(void *addr, void *caller_pc, void *tag);
void  __tsan_external_write(void *addr, void *caller_pc, void *tag);

The __tsan_external_register_tag function creates a new tag, __tsan_external_assign_tag marks a heap-allocated memory chunk with the specified tag (this is to identify the object in the report). The __tsan_external_read/__tsan_external_read calls are basically the same as __tsan_read8/__tsan_write8, but they identify that the call isn't from the instrumentation, but from a library.

Another common threading model in library API is that a object handle must only be used from a single thread "at any given time" (but may be passed to another thread). This can be enforced by simply treating all APIs as modifying methods.

Diff Detail

Event Timeline

kubamracek created this revision.Jan 17 2017, 5:45 PM
dvyukov edited edge metadata.Jan 24 2017, 7:26 AM

Did not notice this before somehow. Sorry.

Do you plan to this functionality? Where? How?

Please post full reports from the tests.

Do you plan to this functionality? Where? How?

Please elaborate here. I need to understand the use case.

void tsan_external_read(void *addr, void *caller_pc, void *tag);
void
tsan_external_write(void *addr, void *caller_pc, void *tag);

Are they meant to be called from instrumented code or from non-instrumented code?

dvyukov added inline comments.Jan 24 2017, 7:59 AM
lib/tsan/rtl/tsan_external.cc
37

We should not assume nor impose the two-level hierarchy of library/object_type. The less such assumptions we do the more general support we provide. There can also be library/sublibrary/object_type/object_subtype. Or, lots of libraries provide only 1 object type, then will have to do tautology like library "foobar", object type "foobar". There can also be objects with hard to identify library affiliation, for example objects provided by C/C++ language, or objects provided by OS kernel (like file descriptors). "From library OS kernel" or "from library C language" will sound quite awkward.
Just one string for object identification is enough.
It covers all of "MyLibrary::Archiver", "another_library.foo/bar", "fd", "stdout".

Please elaborate here. I need to understand the use case.

void tsan_external_read(void *addr, void *caller_pc, void *tag);
void
tsan_external_write(void *addr, void *caller_pc, void *tag);

Are they meant to be called from instrumented code or from non-instrumented code?

From non-instrumented code. We want to bake the callbacks to __tsan_external_* into regular (shipped) builds of some system libraries, framework and API suppliers. We can't ship those libraries TSanified (performance, security) nor can we maintain interceptors for them (too many APIs, changing APIs, ...). At this point, it's not clear which libraries will adopt this, but the idea is that any library that provides "objects" via opaque handles should be able to use these callbacks to detect races on the API-user side.

E.g. libcurl's curl_easy_init returns a CURL * handle, which you can only use to call other libcurl functions. You can pass the handle to a different thread, but you must only use the handle from one thread at a time (this is stated in the API docs). Even if there is no real race happening, using the same handle to call libcurl methods in two threads without synchronization is a bug -- we want to detect those. Now, I don't think libcurl should actually adopt these callbacks, but there are other libraries shipped in macOS which are very good candidates for this, because 1) they're frequently used in multithreaded ways, 2) users often misuse them, 3) we're the developers of them.

We should not assume nor impose the two-level hierarchy of library/object_type... Just one string for object identification is enough.
It covers all of "MyLibrary::Archiver", "another_library.foo/bar", "fd", "stdout".

Right. I agree.

We can't ship those libraries TSanified (performance, security) nor can we maintain interceptors for them (too many APIs, changing APIs, ...). At this point, it's not clear which libraries will adopt this

I would prefer if we get buy in from at least few libraries before committing this for 2 reasons:

  1. This code can end up being dead.
  2. Designing a good API without uses is hard.

I would prefer if we get buy in from at least few libraries before committing this...

I do already have a buy in from 2 major libraries on our side and we're in the talks with a 3rd library. The library owners did preliminary performance tests and they are fine with inserting those callbacks. Sorry for being vague here :)

I do already have a buy in from 2 major libraries on our side and we're in the talks with a 3rd library. The library owners did preliminary performance tests and they are fine with inserting those callbacks. Sorry for being vague here :)

OK, looks reasonable then.

But please post full reports for the tests.

This is first round of comments. Need to figure out what happens when this new callbacks race with the old callback and/or tag is not assigned, or old callbacks + tag.

lib/tsan/rtl/tsan_external.cc
22

Inline this into __tsan_external_register_tag, as that's the only caller and the only thing it does is calling this function.

25

This is, well, a data race.
Use an atomic for tag allocation.

32

Needs a CHECK that tag is correct. Or probably better to return NULL and let callers deal with absence of tag (treat the same as tag==0). The header potentially can be corrupted due to e.g. data races with tag assignment, or block freeing in a different thread.

42

This needs a CHECK that tag was previously allocated.

55

caller_pc is unsed
80 columns please

58

I would prefer if is_external_race/external_tag into a single variable. We already have tag==0 reserved for "no tag". So it looks reasonable to just threat external_tag!=0 as is_external_race.

64

ditto

lib/tsan/rtl/tsan_external.h
20

This does not need to be in header.

p.s. consts are static by default

27

This does not need to be in header.

TagDescription will go away as well, if we do just 1 string.

What's left is only TagDescriptionFromTag, so I would just declare it in tsan_rtl.h.

...what happens when this new callbacks race with the old callback and/or tag is not assigned, or old callbacks + tag.

I'll address the comments, but just to follow up about the tags: The patch adds tagging of heap memory (adds tag field into MBlock), basically only for the purpose of having a better "Location is..." line in the reports. This "feature" is orthogonal to the rest of the patch and I can extract it into a separate patch if you want me to.

A second tagging happens, where we tag the memory accesses themselves (external_tag field in ReportMop). This is used to identify that we're dealing with a race on a library object, rather than a regular race.

But please post full reports for the tests.

What exactly would you like to see? The TSan output for the lit test that's part of the patch? Or the TSan output for a more realistic use case?

kubamracek updated this revision to Diff 85637.Jan 24 2017, 2:55 PM
kubamracek marked 7 inline comments as done.Jan 24 2017, 3:09 PM
kubamracek added inline comments.
lib/tsan/rtl/tsan_external.cc
55

I kept this unused "for now" and only planned to use it in subsequent patches. I think it's important to have access to caller_pc, because it's the "responsible frame" (frame that contains the bug), and it might not necessarily be __builtin_return_address(1). One thing is duplicate suppression -- we might want to base dup detection on the caller rather than the library frame. Second thing is the "SUMMARY" line, which should point to the responsible frame (but it doesn't with the current patch). Third thing is knowing which module contains the responsible frame -- the module should be used for ignore_noninstrumented_modules flag (and not the library itself).

Sorry for not explaining before. Let me know if you think I need to address those in the patch already, or if you think we don't need caller_pc at all.

But please post full reports for the tests.

What exactly would you like to see? The TSan output for the lit test that's part of the patch? Or the TSan output for a more realistic use case?

For the test.

lib/tsan/rtl/tsan_external.cc
52

What is it u32?
I could understand u16 and u64, or not cast at all. But u32 looks confusing.
I would just remove the (u32) cast.

dvyukov added inline comments.Jan 27 2017, 1:05 AM
lib/tsan/rtl/tsan_external.cc
55

Why don't you call FuncEntry(caller_pc)/FuncExit(caller_pc) around the accesses?
That will use the argument, print the missing frame in reports and solve the suppressions problem.

kubamracek updated this revision to Diff 86445.Jan 31 2017, 9:19 AM

Here's the output of the testcase from the patch (the "TEST3" scenario, which is the only one interesting). The stacks have some C++'ish noise in them due to the use of std::thread.

$ ./projects/compiler-rt/test/tsan/X86_64Config/Darwin/Output/external.cc.tmp-lib-noninstrumented-callbacks
RR test done
==================
WARNING: ThreadSanitizer: race on a library object (pid=13155)
  Mutating access of object MyLibrary::MyObject at 0x7b04000000a0 by thread T4:
    #0 ObjectWrite external.cc:73 (external.cc.tmp-lib-noninstrumented-callbacks.dylib:x86_64+0xea2)
    #1 main::$_3::operator()() const external.cc:106 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x10000771f)
    #2 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_3> >(void*) thread:346 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x1000073f0)

  Previous read-only access of object MyLibrary::MyObject at 0x7b04000000a0 by thread T3:
    #0 ObjectRead external.cc:66 (external.cc.tmp-lib-noninstrumented-callbacks.dylib:x86_64+0xe5e)
    #1 main::$_2::operator()() const external.cc:105 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100005fa8)
    #2 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_2> >(void*) thread:346 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100005c80)

  Location is MyLibrary::MyObject object of size 16 at 0x7b04000000a0 allocated by main thread:
    #0 malloc sanitizer_malloc_mac.inc:134 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x44d27)
    #1 ObjectCreate external.cc:57 (external.cc.tmp-lib-noninstrumented-callbacks.dylib:x86_64+0xe03)
    #2 start <null>:94378568 (libdyld.dylib:x86_64+0x5234)

  Thread T4 (tid=13251813, running) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:897 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x27dce)
    #1 std::__1::thread::thread<main::$_3, void>(main::$_3&&) thread:362 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x1000069e2)
    #2 std::__1::thread::thread<main::$_3, void>(main::$_3&&) thread:354 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x1000017b8)
    #3 main external.cc:106 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100001434)

  Thread T3 (tid=13251812, finished) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:897 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x27dce)
    #1 std::__1::thread::thread<main::$_2, void>(main::$_2&&) thread:362 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100005272)
    #2 std::__1::thread::thread<main::$_2, void>(main::$_2&&) thread:354 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100001758)
    #3 main external.cc:105 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x10000141a)

SUMMARY: ThreadSanitizer: race on a library object external.cc:73 in ObjectWrite
==================
RW test done
==================
WARNING: ThreadSanitizer: race on a library object (pid=13155)
  Mutating access of object MyLibrary::MyObject at 0x7b04000000f0 by thread T6:
    #0 ObjectWriteAnother external.cc:80 (external.cc.tmp-lib-noninstrumented-callbacks.dylib:x86_64+0xef2)
    #1 main::$_5::operator()() const external.cc:132 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x10000a5ff)
    #2 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_5> >(void*) thread:346 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x10000a2d0)

  Previous mutating access of object MyLibrary::MyObject at 0x7b04000000f0 by thread T5:
    #0 ObjectWrite external.cc:73 (external.cc.tmp-lib-noninstrumented-callbacks.dylib:x86_64+0xea2)
    #1 main::$_4::operator()() const external.cc:131 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100008e8f)
    #2 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, main::$_4> >(void*) thread:346 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100008b60)

  Location is MyLibrary::MyObject object of size 16 at 0x7b04000000f0 allocated by main thread:
    #0 malloc sanitizer_malloc_mac.inc:134 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x44d27)
    #1 ObjectCreate external.cc:57 (external.cc.tmp-lib-noninstrumented-callbacks.dylib:x86_64+0xe03)
    #2 start <null>:94378568 (libdyld.dylib:x86_64+0x5234)

  Thread T6 (tid=13251870, running) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:897 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x27dce)
    #1 std::__1::thread::thread<main::$_5, void>(main::$_5&&) thread:362 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x1000098c2)
    #2 std::__1::thread::thread<main::$_5, void>(main::$_5&&) thread:354 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100001878)
    #3 main external.cc:132 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x1000014fb)

  Thread T5 (tid=13251869, finished) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:897 (libclang_rt.tsan_osx_dynamic.dylib:x86_64h+0x27dce)
    #1 std::__1::thread::thread<main::$_4, void>(main::$_4&&) thread:362 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100008152)
    #2 std::__1::thread::thread<main::$_4, void>(main::$_4&&) thread:354 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x100001818)
    #3 main external.cc:131 (external.cc.tmp-lib-noninstrumented-callbacks:x86_64+0x1000014d8)

SUMMARY: ThreadSanitizer: race on a library object external.cc:80 in ObjectWriteAnother
==================
WW test done
ThreadSanitizer: reported 2 warnings

Here's the output of the testcase from the patch (the "TEST3" scenario, which is the only one interesting). The stacks have some C++'ish noise in them due to the use of std::thread.

Thanks! Looks good.

There is also a minor nit re u32 cast.
And:

Why don't you call FuncEntry(caller_pc)/FuncExit(caller_pc) around the accesses?

Is there a reason to not do this now? Looks simple to do and it should improve things.

kubamracek marked 4 inline comments as done.Feb 1 2017, 5:43 AM
  1. There is also a minor nit re u32 cast.
  2. Why don't you call FuncEntry(caller_pc)/FuncExit(caller_pc) around the accesses?

I fixed both these in the last version of the patch.

FuncEntry/FuncExit does really improve the stacks, but it still doesn't solve all the problems, and I plan to discuss them later in upcoming patches. Sounds okay?

dvyukov accepted this revision.Feb 1 2017, 6:17 AM
This revision is now accepted and ready to land.Feb 1 2017, 6:17 AM
This revision was automatically updated to reflect the committed changes.