This is an archive of the discontinued LLVM Phabricator instance.

tsan: make invisible test barrier portable
ClosedPublic

Authored by dvyukov on Nov 6 2015, 2:31 AM.

Details

Reviewers
kubamracek
Summary

The current implementation does not work on darwin and can have issues with other OSes in future. See http://reviews.llvm.org/D14427
Make it portable once and for all (minus usleep call).

Diff Detail

Event Timeline

dvyukov updated this revision to Diff 39502.Nov 6 2015, 2:31 AM
dvyukov retitled this revision from to tsan: make invisible test barrier portable.
dvyukov updated this object.
dvyukov added a reviewer: kubamracek.
dvyukov added a subscriber: llvm-commits.
kubamracek edited edge metadata.Nov 6 2015, 2:55 AM

Yes, this works for me, thanks!

Just a quick question: Wouldn't usleep() invoke the interceptor and establish an unwanted synchronization?

Also, this fails a single test on Linux for me, thread_name2.cc. I didn't investigate further.

Huh, changing *barrier = count; in the initialization into:

__atomic_store_n(barrier, count, __ATOMIC_RELAXED);

fixes the failing test. But I'm not going to pretend that I understand why exactly was it failing...

kubamracek accepted this revision.Nov 6 2015, 3:25 AM
kubamracek edited edge metadata.
This revision is now accepted and ready to land.Nov 6 2015, 3:25 AM

Also, this fails a single test on Linux for me, thread_name2.cc. I didn't investigate further.

How exactly does it fail? Please also run it as a stand-alone binary and post output.
FWIW I run linux tests 30 times before uploading this change. All passed.

Huh, changing *barrier = count; in the initialization into:

I don't understand how it can affect anything. Initialization happens before any threads are started.

Just a quick question: Wouldn't usleep() invoke the interceptor and establish an unwanted synchronization?

usleep is not synchronization logically. So if the interceptor introduces any synchronization, then it's a bug that we will need to fix. Hopefully the tests will catch it, there are positive tests that must detect races and they should be pretty sensitive to any parasitic synchronization.

Also, this fails a single test on Linux for me, thread_name2.cc. I didn't investigate further.

How exactly does it fail? Please also run it as a stand-alone binary and post output.
FWIW I run linux tests 30 times before uploading this change. All passed.

It doesn't detect any race, the output is empty. FYI, it's an Ubuntu Server 15.10 machine.

Also, the test (thread_name2.cc) compiles under -O1. If I change it to -O0, -O2 or -O3, it passes (!).

Does changing 'int Global;' to 'long long Global;' fix it?

Or better change:
typedef unsigned invisible_barrier_t;
to:
typedef unsigned long long invisible_barrier_t;
in this patch.

dvyukov updated this revision to Diff 39510.Nov 6 2015, 3:45 AM
dvyukov edited edge metadata.

Does changing 'int Global;' to 'long long Global;' fix it?

Yes, this fixes it.

Or better change typedef unsigned invisible_barrier_t; to: typedef unsigned long long invisible_barrier_t;

This works as well.

What's going on here, can you explain it to me?

Or better change:
typedef unsigned invisible_barrier_t;
to:
typedef unsigned long long invisible_barrier_t;
in this patch.

Reuploaded this patch with this change. Should fix the issue.

The problem was that tsan shadow has 8-byte granularity. When I changed pthread_barrier_t (which is at least 8 bytes) to unsigned, it started being collocated with the Global variable in the same 8-bytes, and that caused shadow slot evictions for Global.

Ah, that makes sense. Thanks!

Committed in 252292

dvyukov closed this revision.Nov 6 2015, 3:55 AM

Follow up question: Wouldn't it be good to simply force all globals to be at least 8-aligned, when using -fsanitize=thread?