Page MenuHomePhabricator

[TSAN] Honor failure memory orders in AtomicCAS
ClosedPublic

Authored by bruno on Mar 26 2021, 12:12 PM.

Details

Summary

LLVM has lifted strong requirements for CAS failure memory orders in 431e3138a and 819e0d105e84.

Add support for honoring them in AtomicCAS.

https://github.com/google/sanitizers/issues/970

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
bruno requested review of this revision.Mar 26 2021, 12:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2021, 12:12 PM
bruno added a reviewer: yln.
delcypher added a reviewer: kubamracek.

@bruno Thanks for the patch. TSan's runtime isn't my specialty so I've added other reviewers.

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
433

Is this something user facing code can set? If yes, then we might want to emit a warning rather than crashing the process.

yln accepted this revision.Apr 6 2021, 9:38 AM

I am happy with the mechanics and quality of the patch. Ideally @dvyukov could give a final sign-off.

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
433

I think calls are generated by the compiler in ThreadSanitizer::instrumentAtomic():

Value *Args[] = {IRB.CreatePointerCast(Addr, PtrTy),
                 CmpOperand,
                 NewOperand,
                 createOrdering(&IRB, CASI->getSuccessOrdering()),
                 createOrdering(&IRB, CASI->getFailureOrdering())};
CallInst *C = IRB.CreateCall(TsanAtomicCAS[Idx], Args);
This revision is now accepted and ready to land.Apr 6 2021, 9:38 AM
bruno added a comment.Apr 6 2021, 5:33 PM

Thanks both of you for the review @delcypher and @yln, will wait for @dvyukov to sign-off!

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
433

Right, but before things are expanded for instrumentation, there's some sanitization of the failure memory order in emitAtomicCmpXchgFailureSet, so an invalid combination shouldn't get to this point.

dvyukov added inline comments.Apr 14 2021, 1:40 AM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
434

Can't fmo be consume/acq_rel/seq_cst? If yes, please add a test.

436

I think we should not search the object and re-acquire the mutex second time for performance reasons and complexity (won't need to re-read the value).
Note that we may have already acquired due to mo. We should not acquire second time in that case.

443

I don't think this is correct. Can't this lead to false positives?
Consider that a thread does CAS-release to hand off the object to another thread, and that thread frees the object. I think this memory access can race with the free. It's generally not OK to touch memory after atomic operations.
If if leads to a false positive, please add a test that catches it.

compiler-rt/test/tsan/compare_exchange_release_relaxed.cpp
18

If this test differs only by memory order, it can make sense to use parametrized tests. It would also be good to test other mo combinations (e.g. that the CHECK(IsLoadOrder(fmo)) does not fail).
See e.g. test/tsan/ignore_lib0.cpp for an example.

Also I wonder if we could first evaluate CAS and then use either mo or fmo accordingly, so that we don't over-synchronize on failure if mo is stronger than fmo.

Not directly related to the change but we could also episodically fail weak CASes to test fmo case better.

yln requested changes to this revision.Apr 20 2021, 6:26 PM

Unapprove until Dmitry's comments are addressed.

This revision now requires changes to proceed.Apr 20 2021, 6:26 PM
bruno updated this revision to Diff 341362.Apr 28 2021, 5:11 PM
bruno marked an inline comment as done.
bruno edited the summary of this revision. (Show Details)

Address reviewer comments. To prevent false positive on release/consume, this now depends on https://reviews.llvm.org/D101501.

bruno marked 2 inline comments as done.Apr 28 2021, 5:16 PM
bruno added inline comments.
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
434

Added several variations. The consume failing order currently fallbacks to monotonic in LLVM, just opened https://reviews.llvm.org/D101501 to be consistent with success and fallback to acquire instead, without this change it leads to false positive, thanks for bringing this up.

443

Looks like I overthought the approach, thanks for pointing out.

dvyukov added inline comments.Apr 28 2021, 11:38 PM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
413

I think it's better to avoid doing this if the CAS will fail and fmo == mo_relaxed. It will provide more precise race detection.
I think we could do something along the following lines:

  • if either mo or fmo != relaxed, do GetOrCreateAndLock
  • if either mo or fmo involves release, do a write lock
  • evaluate cas
  • respect mo or fmo based on the cas result
436

We did not create/obtain the sync object if mo == mo_relaxed. Is mo == mo_relaxed and fmo == mo_acquire possible? If yes, then we will fail to respect fmo.

438

We used a different condition to decide if we need to do read lock. Can't this deadlock? I think we need to check write_lock.

bruno updated this revision to Diff 342557.May 3 2021, 3:08 PM
bruno marked an inline comment as done and an inline comment as not done.
bruno edited the summary of this revision. (Show Details)

Apply review comments.

Note that we cannot test mo == relaxed and fmo == anything yet because LLVM fallbacks to relaxed/relaxed.

Remove the dep on https://reviews.llvm.org/D101501 for now, since there's even more stuff to improve in LLVM before we get this 100% right, at least we've got release/acquire working as of this patch. I'll come back to fix the other tests once LLVM is fixed.

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
413

Sounds good, minor correction on the fact that fmo cannot be release.

436

Right, will fix.

438

Yea, somehow it slipped, thanks!

bruno updated this revision to Diff 342565.May 3 2021, 3:15 PM

Update broken comment in the testcase.

dvyukov added inline comments.May 4 2021, 12:42 AM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
413

Do I wait for this in this change? I see we still acquire/release before evaluating CAS. Or you want to do it later with other improvements you mentioned?

431

Better to do in the beginning of the function.

438

With the current code it makes sense to do "IsAcquireOrder(fmo) && !IsAcquireOrder(mo)", because otherwise we already acquired.
But even better what's discussed above: first evaluate CAS, then decide what order to use.

bruno updated this revision to Diff 343581.May 6 2021, 10:38 PM
bruno retitled this revision from [TSAN] Honor acquire failure mode on AtomicCAS to [TSAN] Honor failure memory orders in AtomicCAS.
bruno edited the summary of this revision. (Show Details)

Apply more reviewer suggested changes: first evaluate CAS, then decide what order to use.

Now that LLVM side of this is fixed, also cover all fmo's allowed, add more tests.

bruno updated this revision to Diff 343584.May 6 2021, 10:50 PM

Cover more cases in atomic.ll.

dvyukov added inline comments.May 6 2021, 11:48 PM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
405

Can it be seq_cst?
If yes, then I think write_lock below is incorrect and checking only IsAcquireOrder(fmo) is incorrect as well.

421

The CAS needs to be evaluated under the sync object mutex, otherwise we can get inconsistent value/memory visibility. If we are going to lock the mutex, we need to lock it before CAS.

bruno updated this revision to Diff 344174.May 10 2021, 1:20 PM

Thank you @dvyukov for another round of reviews! A have more questions for you.

Yes, fmo can be seq_cst. Take mo == mo_release and fmo == mo_seq_cst, we'd be acquiring the mutex with write_lock, which is fine under success. However, during failure, I don't see another solution besides s->mtx.Unlock() before honoring and calling GetOrCreateAndLock again to obtain the lock with write_lock = false. Since you said in a previous review that it would be too expensive to lock again, what do you suggest as an alternative? I wonder whether my concern is even legit (?) given that the mo == mo_release and fmo == mo_seq_cst in the testcase works regardless of this lock/unlock dance.

It's also the case that the current added testcase takes ~20s on a fast linux machine, which seems way over the bar, I haven't tried to make it better yet, but (at least) diminishing the mo/fmo combinations should help, other ideas?

In the meantime, I've updated the patch with that approach.

Thank you @dvyukov for another round of reviews! A have more questions for you.

Yes, fmo can be seq_cst. Take mo == mo_release and fmo == mo_seq_cst, we'd be acquiring the mutex with write_lock, which is fine under success. However, during failure, I don't see another solution besides s->mtx.Unlock() before honoring and calling GetOrCreateAndLock again to obtain the lock with write_lock = false. Since you said in a previous review that it would be too expensive to lock again, what do you suggest as an alternative? I wonder whether my concern is even legit (?) given that the mo == mo_release and fmo == mo_seq_cst in the testcase works regardless of this lock/unlock dance.

We don't need to re-lock in read mode. We can lock in the strongest mode: write lock if either mo or fmo requires a release, and then do acquire under write if it turned out we need only acquire. It's fine to read a data structure under write lock.

Thank you @dvyukov for another round of reviews! A have more questions for you.

Yes, fmo can be seq_cst. Take mo == mo_release and fmo == mo_seq_cst, we'd be acquiring the mutex with write_lock, which is fine under success. However, during failure, I don't see another solution besides s->mtx.Unlock() before honoring and calling GetOrCreateAndLock again to obtain the lock with write_lock = false. Since you said in a previous review that it would be too expensive to lock again, what do you suggest as an alternative? I wonder whether my concern is even legit (?) given that the mo == mo_release and fmo == mo_seq_cst in the testcase works regardless of this lock/unlock dance.

It's also the case that the current added testcase takes ~20s on a fast linux machine, which seems way over the bar, I haven't tried to make it better yet, but (at least) diminishing the mo/fmo combinations should help, other ideas?

This is too long. But I think it's just to the "atexit sleep" in tsan. You can do this, and maybe even set atexit_sleep_ms to 0:

test/tsan/fork_atexit.cpp:// RUN: %clangxx_tsan -O1 %s -o %t && %env_tsan_opts=atexit_sleep_ms=50 %run %t 2>&1 | FileCheck %s
bruno updated this revision to Diff 344614.May 11 2021, 5:43 PM

Cool, thanks for the input!

  • Update the patch to prevent unnecessary lock/unlock dance.
  • Changing atexit_sleep_ms didn't really changed the ballpark, rewrite test to include all checks in one run, this brings it down to 1.5-2s total time.
dvyukov added inline comments.May 11 2021, 10:08 PM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
405

If I am not missing something, I think this comment still holds:

Can it be seq_cst?
If yes, then I think write_lock below is incorrect and checking only IsAcquireOrder(fmo) is incorrect as well.

441

There is some duplication here between handling of success case and failure case: adding to trace, acquire, unlock. And taking into account that fmo can be seq_cst, we will need even more duplication to fix this, e.g. we may need to do Release and WriteUnlock here.
I would structure it along the following lines (it should both handle all cases and avoid duplication at the same time):

SyncVar *s = 0;
bool write_lock = IsReleaseOrder(mo) || IsReleaseOrder(fmo);
if (mo != mo_relaxed || fmo != mo_relaxed)
  s = ctx->metamap.GetOrCreateAndLock(thr, pc, (uptr)a, write_lock);

T cc = *c;
T pr = func_cas(a, cc, v);
bool success = pr == cc;
if (!success) {
  *c = pr;
  mo = fmo;
}
if (s) {
  thr->fast_state.IncrementEpoch();
  // Can't increment epoch w/o writing to the trace as well.
  TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0);
  if (IsAcqRelOrder(mo))
    AcquireReleaseImpl(thr, pc, &s->clock);
  else if (IsReleaseOrder(mo))
    ReleaseImpl(thr, pc, &s->clock);
  else if (IsAcquireOrder(mo))
    AcquireImpl(thr, pc, &s->clock);

  if (write_lock)
    s->mtx.Unlock();
  else
    s->mtx.ReadUnlock();
}
return success;
bruno added inline comments.May 12 2021, 2:23 PM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
405

If I am not missing something, I think this comment still holds:

Can it be seq_cst?

Yes, I also added tests to cover that.

If yes, then I think write_lock below is incorrect and checking only IsAcquireOrder(fmo) is incorrect as well.

See my next comment.

441

fmo cannot be mo_release or mo_acq_rel, and when it's mo_seq_cst my understanding (which could be wrong) is that it has load semantics. @rjmccall @jfb does my understanding makes sense?

With that in mind, write_lock only needs to track IsReleaseOrder(mo), which is what the current patch does.

bruno updated this revision to Diff 344961.May 12 2021, 2:25 PM

Update patch to reduce code dup as suggested by @dvyukov

dvyukov accepted this revision.May 12 2021, 11:48 PM

I don't see any issues now. Thanks for bearing with me.

compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
434

"success &&" is unnecessary here right?
I see it's not harmful, but it also makes me a but nervous because we used just the "write_lock" condition when locking, but we use a different one when unlocking. This is not symmetric and have some potential to get out of sync, so I would drop "success &&" part here.

441

I think you are right. This makes sense.

bruno added inline comments.May 13 2021, 12:20 AM
compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cpp
434

No problemo, will do. Thanks for the careful review!

This revision was not accepted when it landed; it landed in state Needs Review.May 13 2021, 1:07 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 13 2021, 1:07 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript