This is an archive of the discontinued LLVM Phabricator instance.

tsan: intercept libatomic
AbandonedPublic

Authored by dvyukov on Oct 12 2016, 4:01 AM.

Details

Reviewers
kcc
eugenis
Summary

When compiler can't emit native code for atomic operations, it instead emits calls to libatomic which provides fallback atomic operation implementation. For all relevant architectures the fallback happens only for weirdly-sized atomics (e.g. 7 or 64 bytes). There are 4 relevant functions in libatomic: atomic_load, atomic_store, atomic_exchange and atomic_compare_exchange (i.e. we can't do fetch_add on a 64-byte struct). See section 1.2.4 of http://stdatomic.gforge.inria.fr for details of the interface.

Libatomic fallback causes both false positives and false negatives for tsan. Current implementation uses pthread_mutex_t to guard atomic operations, which tsan intercepts and that causes massive false negatives. However, 16-byte operations are done with CMPXCHG16B if possible and that becomes completely invisible to tsan and causes false positives.

Intercept libatomic functions for variable-sized atomics and model them precisely for tsan.

Diff Detail

Event Timeline

dvyukov updated this revision to Diff 74355.Oct 12 2016, 4:01 AM
dvyukov retitled this revision from to tsan: intercept libatomic.
dvyukov updated this object.
dvyukov added reviewers: kcc, eugenis.
dvyukov added a subscriber: jakubjelinek.
kcc added inline comments.Oct 12 2016, 6:23 PM
lib/interception/interception.h
77

"we may have been already included "
something here is redundant

78

do we include std headers?

161–162

Sounds like an intrusive change. How portable is it?
Any other alternatives?

dvyukov added inline comments.Oct 13 2016, 12:40 AM
lib/interception/interception.h
77

What is redundant here? Looks proper to me.
http://lmgtfy.com/?q=%22may+have+been+already%22

78

this change resolves the only issue that prevented us from including std headers

you can consider that for builtins compiler has preincluded some magic standard headers that define builtins in a way that precludes most of usages

161–162

I can't think of any other alternatives. One can't refer to builtins other than calling them.
More portable then the rest of magic here as this involves only compiler, while the rest also involves assembler and linker.

kcc added inline comments.Oct 13 2016, 9:07 AM
lib/interception/interception.h
77

"been" feels redundant (not a native English speaker here)
"we may have already included "?

78

Sorry, please explain in more detail.

Where are we including / not including the std headers?
In user code or in the sanitizer run-time?

One other thing that we can do with builtins is to replace them with hard calls at compile time,
we already do that with memset in asan. Is that applicable here?

This patch currently doesn’t compile on Darwin:

tsan_interceptors.cc:1347:24: error: conflicting types for '__atomic_load'
  TSAN_INTERCEPTOR(void, __atomic_load, SIZE_T n, void *a, void *v, int ord) {
tsan_interceptors.cc:1347:24: note: '__atomic_load' is a builtin with type 'void ()'
tsan_interceptors.cc:1347:24: error: builtin functions must be directly called
  TSAN_INTERCEPTOR(void, __atomic_load, SIZE_T n, void *a, void *v, int ord) {
tsan_interceptors.cc:1348:27: error: builtin functions must be directly called
  SCOPED_TSAN_INTERCEPTOR(__atomic_load, n, a, v, ord);
tsan_interceptors.cc:1348:51: error: too many arguments to function call, expected 3, have 4
  SCOPED_TSAN_INTERCEPTOR(__atomic_load, n, a, v, ord);

I noticed that lib/builtins/atomic.c uses #pragma redefine_extname to work around this. And this also seems to work for Darwin interceptors, e.g. something like:

#pragma redefine_extname __atomic_load_c ___atomic_load
TSAN_INTERCEPTOR(void, __atomic_load_c, SIZE_T n, void *a, void *v, int ord) {
   SCOPED_TSAN_INTERCEPTOR(__atomic_load_c, n, a, v, ord);
   AtomicLoad(thr, pc, n, a, v, ord);
}

Can this also be used on Linux?

test/tsan/atomic_test.cc
2

Add -std=c++11 to all compile commands.

6–12

-latomic doesn’t compile on Darwin. On Darwin, this parameter is not needed.

dvyukov added a comment.EditedOct 17 2016, 10:39 PM
In D25509#572230, @kubabrecka wrote:

This patch currently doesn’t compile on Darwin:

tsan_interceptors.cc:1347:24: error: conflicting types for '__atomic_load'
  TSAN_INTERCEPTOR(void, __atomic_load, SIZE_T n, void *a, void *v, int ord) {
tsan_interceptors.cc:1347:24: note: '__atomic_load' is a builtin with type 'void ()'
tsan_interceptors.cc:1347:24: error: builtin functions must be directly called
  TSAN_INTERCEPTOR(void, __atomic_load, SIZE_T n, void *a, void *v, int ord) {
tsan_interceptors.cc:1348:27: error: builtin functions must be directly called
  SCOPED_TSAN_INTERCEPTOR(__atomic_load, n, a, v, ord);
tsan_interceptors.cc:1348:51: error: too many arguments to function call, expected 3, have 4
  SCOPED_TSAN_INTERCEPTOR(__atomic_load, n, a, v, ord);

I noticed that lib/builtins/atomic.c uses #pragma redefine_extname to work around this. And this also seems to work for Darwin interceptors, e.g. something like:

#pragma redefine_extname __atomic_load_c ___atomic_load
TSAN_INTERCEPTOR(void, __atomic_load_c, SIZE_T n, void *a, void *v, int ord) {
   SCOPED_TSAN_INTERCEPTOR(__atomic_load_c, n, a, v, ord);
   AtomicLoad(thr, pc, n, a, v, ord);
}

Can this also be used on Linux?

redefine_extname seems to be introduced only for compatibility with solaris compiler and has a number of downsides as compared to asm labels:
https://gcc.gnu.org/onlinedocs/gcc/Symbol-Renaming-Pragmas.html

Why not use asm labels on darwin instead?

I think we need something along the lines of:

      • ../../interception/interception.h (revision 283521) +++ ../../interception/interception.h (working copy) @@ -160,9 +160,9 @@ }
    1. define ASSIGN_REAL(dst, src) REAL(dst) = REAL(src) #else // APPLE
    2. define REAL(x) x +# define REAL(x) __interceptor_fake_ ## x
    3. define DECLARE_REAL(ret_type, func, ...) \
  • extern "C" ret_type func(VA_ARGS); + extern "C" ret_type interceptor_fake_ ## func(VA_ARGS__) asm(#func);
    1. define ASSIGN_REAL(x, y) #endif // APPLE

But interception.h is total mess so probably I missed something. Can you please test this change on darwin?

lib/interception/interception.h
77

will do

78

Where are we including / not including the std headers?

we don't

Is that applicable here?

Probably.
Is it better? Why?

dvyukov added inline comments.Oct 17 2016, 10:42 PM
test/tsan/atomic_test.cc
2

Isn't -std=c++11 a part of %clangxx_tsan?
How does "#include <atomic>" work then?

I think we need something along the lines of:

Please use "View Raw" to view the patch. I've tried posting the patch as is and prefixed with spaces. This thing does not seem to be meant for showing programming stuff...

Defining the builtins directly is certainly not going to work very well.
E.g. GCC libatomic uses:

void libat_load (size_t, void *, void *, int) __asm ("__atomic_load");

or e.g.

void lbat_load (size_t, void *, void *, int);
...
typeof (libat_load) export_load __asm ("__atomic_load") __attribute__((alias ("libat_load")));

or ifuncs.

kubamracek added inline comments.Oct 18 2016, 11:13 AM
test/tsan/atomic_test.cc
2

Apparently, we only add -std=c++11 when we have an instrumented libcxx, see test/tsan/lit.cfg. I submitted https://reviews.llvm.org/D25740 to always use -std=c++11.

I think we need something along the lines of:

Please use "View Raw" to view the patch. I've tried posting the patch as is and prefixed with spaces. This thing does not seem to be meant for showing programming stuff...

Prefix code blocks with 4 spaces, then they will get displayed nicely.

Why not use asm labels on darwin instead?
I think we need something along the lines of ...

This breaks a lot of other interceptors with various errors ranging from “use of undeclared identifier” to "cannot apply asm label to function after its first use” and requires a lot of other changes to interception.h to make it work (I gave up after 3 or 4 tries to fix this). Do we really need these intrusive changes to interception.h? What about declaring a new type of interception and using it just for the libatomic wrappers (and leave the existing interceptors as they are)?

Any news here? I didn't want to block this patch, I only noticed that it doesn't build on macOS.

Any news here?

No news yet. I wanted to get a darwin machine somewhere to figure out a working solution, but did not figure out where to get a good darwin machine yet... and then was paged out by other stuff.

I didn't want to block this patch, I only noticed that it doesn't build on macOS.

Well, that's serious enough reason to not submit it as is :)

What about declaring a new type of interception and using it just for the libatomic wrappers (and leave the existing interceptors as they are)?

We can, of course, resort to this.
But generally I prefer to have a single solution that works in all cases, rather than one solution that works for A but not for B + another that works for B but not for A.

How bad are darwin breakages? I mean while testing linux build I had to do changes like:

  • int res = vswprintf(str, size, format, ap); + int res = WRAP(vswprintf)(str, size, format, ap);

Which was just incorrect code in the first place. All intercepted functions should be called via WRAP or REAL.
Can it's be the case that we just need more of that for darwin? And maybe declare some interceptrs before calling the "real" counterparts (looking at errors that you mentioned)?

Any news here?

No news yet. I wanted to get a darwin machine somewhere to figure out a working solution, but did not figure out where to get a good darwin machine yet... and then was paged out by other stuff.

I didn't want to block this patch, I only noticed that it doesn't build on macOS.

Well, that's serious enough reason to not submit it as is :)

What about declaring a new type of interception and using it just for the libatomic wrappers (and leave the existing interceptors as they are)?

We can, of course, resort to this.
But generally I prefer to have a single solution that works in all cases, rather than one solution that works for A but not for B + another that works for B but not for A.

How bad are darwin breakages? I mean while testing linux build I had to do changes like:

  • int res = vswprintf(str, size, format, ap); + int res = WRAP(vswprintf)(str, size, format, ap);

/\/\/\

You can see that I prefixed it with 4 spaces, but it's still broken when displayed....

One way to make this work on Darwin is by using the __asm trick:

extern "C" void my__atomic_load (SIZE_T, void *, void *, int) __asm ("___atomic_load");
extern "C" void my__atomic_store (SIZE_T, void *, void *, int) __asm ("___atomic_store");
extern "C" void my__atomic_exchange (SIZE_T, void *, void *, void *, int) __asm ("___atomic_exchange");
extern "C" bool my__atomic_compare_exchange (SIZE_T, void *, void *, void *, int, int) __asm ("___atomic_compare_exchange");

TSAN_INTERCEPTOR(void, my__atomic_load, SIZE_T n, void *a, void *v, int ord) {
  SCOPED_TSAN_INTERCEPTOR(my__atomic_load, n, a, v, ord);
  AtomicLoad(thr, pc, n, a, v, ord);
}

TSAN_INTERCEPTOR(void, my__atomic_store, SIZE_T n, void *a, void *v, int ord) {
  SCOPED_TSAN_INTERCEPTOR(my__atomic_store, n, a, v, ord);
  AtomicStore(thr, pc, n, a, v, ord);
}

TSAN_INTERCEPTOR(void, my__atomic_exchange, SIZE_T n, void *a, void *v, void *ret,
    int ord) {
  SCOPED_TSAN_INTERCEPTOR(my__atomic_exchange, n, a, v, ret, ord);
  AtomicExchange(thr, pc, n, a, v, ret, ord);
}

TSAN_INTERCEPTOR(bool, my__atomic_compare_exchange, SIZE_T n, void *a, void *cmp,
    void *v, int sord, int ford) {
  SCOPED_TSAN_INTERCEPTOR(my__atomic_compare_exchange, n, a, cmp, v, sord, ford);
  return AtomicCompareExchange(thr, pc, n, a, cmp, v, sord, ford);
}

I think this is a pretty clean solution (it just allows us to use the __atomic_* symbols without parse errors). Note that Darwin needs triple underscores there, I believe this is different from Linux. If I also remove -latomic from test_atomic.cc, this builds fine and passes the tests.

Let me know if/why you don't like this solution and I can test other approaches on Darwin.

dvyukov updated this revision to Diff 84141.Jan 12 2017, 10:21 AM

One way to make this work on Darwin is by using the __asm trick:

Thank you for testing.

I updated this patch to use __asm instead of asm, and added _ to asm names on darwin. Does this work on darwin?
One thing that I did not do yet is removing -latomic from tests. Linux linker requires it, I think it is wrong, but it is the way it is.
Do you have a good suggestion for how to conditionally remove it on darwin? Maybe we should make lit always add -latomic for tsan tests on linux?

The test also currently fails with libc++:
https://llvm.org/bugs/show_bug.cgi?id=31620

Does this work on darwin?

Still fails to build -- the lines you changed in interception.h aren't even used on Darwin.

tsan_interceptors.cc:1388:24: error: conflicting types for '__atomic_load'
TSAN_INTERCEPTOR(void, __atomic_load, SIZE_T n, void *a, void *v, int ord) {
                       ^
tsan_interceptors.cc:1388:24: note: '__atomic_load' is a builtin with type 'void ()'
tsan_interceptors.cc:1388:24: error: builtin functions must be directly called
TSAN_INTERCEPTOR(void, __atomic_load, SIZE_T n, void *a, void *v, int ord) {
                       ^
tsan_interceptors.cc:1389:27: error: builtin functions must be directly called
  SCOPED_TSAN_INTERCEPTOR(__atomic_load, n, a, v, ord);

My suggestion was to declare __interceptor_fake_* directly in tsan_interceptors.cc just for the few __atomic_* functions, but I guess you're looking for a general solution.

Do you have a good suggestion for how to conditionally remove it on darwin?

Adding a %link_atomic lit substitution (that would expand to nothing on Darwin) sounds good enough to me.

lib/interception/interception.h
155–159

This part is already only used on Linux. There's a #ifdef __APPLE_ above (line 99).

I don't have time to figure out how to support Mac. Filed https://github.com/google/sanitizers/issues/816 so that this is not lost.

dvyukov abandoned this revision.Jul 22 2021, 6:58 AM