This is an archive of the discontinued LLVM Phabricator instance.

Thread safety analysis: Allow exlusive/shared joins for managed and asserted capabilities
ClosedPublic

Authored by aaronpuchert on May 6 2021, 2:14 PM.

Details

Summary

Similar to how we allow managed and asserted locks to be held and not
held in joining branches, we also allow them to be held shared and
exclusive. The scoped lock should restore the original state at the end
of the scope in any event, and asserted locks need not be released.

We should probably only allow asserted locks to be subsumed by managed,
not by (directly) acquired locks, but that's for another change.

Diff Detail

Event Timeline

aaronpuchert requested review of this revision.May 6 2021, 2:14 PM
aaronpuchert created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2021, 2:14 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Ping @aaron.ballman, also for the parent change.

In general, I think this makes sense, but I'd like to hear from @delesley as well.

clang/lib/Analysis/ThreadSafety.cpp
2198

I have a few concerns. First, this patch introduces an inconsistency between managed and unmanaged locks. For unmanaged locks, we warn, and then assume the lock is held exclusively. For managed locks, we don't warn, but assume it is held shared. The warn/not-warn is consistent with more relaxed handling of managed locks, but exclusive/shared difference could lead to confusion. Both mechanisms are sound; they're just not consistent. Any thoughts?

Second, the mixing of AssertHeld() and Lock() on different control flow branches looks very weird to me. I can see one obvious case where you might want to do that, e.g. if (mu_.is_locked()) mu_.AssertHeld(); else mu_.Lock(). However, if a function locks mu_ on one branch, and assert mu_ on the other, then that usually indicates a problem. It means that the lock-state going into that function was unknown; the programmer didn't know whether mu_ was locked or not, and that's never good. The only valid condition in that case is to check whether mu_ is locked -- any other condition would be unsound. The correct way to deal with this situation is to mark the is_locked() method with an attribute, e.g. TEST_LOCKED_FUNCTION, rather than relying on the AssertHeld mechanism. That's a lot more work, but I would prefer to at least have a TODO comment indicating that the AssertHeld is a workaround, and restrict the mixing of assert/lock to managed locks in the mean time.

delesley added inline comments.May 24 2021, 9:15 AM
clang/test/SemaCXX/warn-thread-safety-analysis.cpp
4570–4571

This function should have a warning because is not marked with UNLOCK_FUNCTION. The lock was held on entry, and not held on exit. So the original location of the comment, on the closing curly brace, was correct.

4629

I'm not wild about having FIXMEs in test code. I would prefer to have the patch actually issue a warning here (but see below).

HOWEVER, does the current code issue a warning in this situation? If not, leave it as-is and keep the FIXME. You can't make the analysis more strict without checking with the C++ compiler team at Google (which I am no longer on), because it may break the build on our end.

The warn/not-warn is consistent with more relaxed handling of managed locks, but exclusive/shared difference could lead to confusion. Both mechanisms are sound; they're just not consistent. Any thoughts?

Essentially I'm suggesting here to treat exclusive/shared joins the same way as locked/unlocked joins: we allow them for managed locks and take the "weaker" state afterwards to prevent false negatives, but not for unmanaged locks, where we take the "stronger" state afterwards to prevent spamming the user.

Second, the mixing of AssertHeld() and Lock() on different control flow branches looks very weird to me. I can see one obvious case where you might want to do that, e.g. if (mu_.is_locked()) mu_.AssertHeld(); else mu_.Lock().

This pattern already works, but not with exclusive/shared mismatch.

However, if a function locks mu_ on one branch, and assert mu_ on the other, then that usually indicates a problem. It means that the lock-state going into that function was unknown; the programmer didn't know whether mu_ was locked or not, and that's never good.

Generally I agree with you. It's something that repeatedly came up though, people were having these functions that were called sometimes with lock, sometimes without. It's not incredibly common, but I saw it a few times. I recommended the above pattern to get rid of the warnings that popped up, when reworking this into a function with clear preconditions didn't work.

The only valid condition in that case is to check whether mu_ is locked -- any other condition would be unsound.

Well, that information might not always come from mu_ itself. It might be tracked in the class that owns mu_.

The correct way to deal with this situation is to mark the is_locked() method with an attribute, e.g. TEST_LOCKED_FUNCTION, rather than relying on the AssertHeld mechanism. That's a lot more work, but I would prefer to at least have a TODO comment indicating that the AssertHeld is a workaround,

It certainly is a workaround, but it's powerful enough to cover everything that we could do with TEST_LOCKED_FUNCTION, except that we can't find cases where the branches have been mixed up.

I've thought about such an attribute (after all it shouldn't be much different from try_acquire implementation-wise), but I've found it hard to justify because we can just use these assertions to achieve the same thing. You can even do

if (mu.isLocked())
    []() ASSERT_CAPABILITY(mu) {}();
else
    mu.Lock();

and restrict the mixing of assert/lock to managed locks in the mean time.

I think we should disallow releasing asserted locks, and then naturally we can't allow assert/lock joins for unmanaged locks. I'm planning another change for this.

It is the status quo: we allow these joins, just not with mixed mode (shared/exclusive). So this change will allow shared/exclusive - asserted/unmanaged joins until my follow-up will disallow them again, though because of the asserted/unmanaged part and not the shared/exclusive part. If you're not happy about this gap in warning coverage, we can also postpone this change until we have the asserted/unmanaged joins disallowed.

clang/test/SemaCXX/warn-thread-safety-analysis.cpp
4570–4571

Putting it on the Unlock call would some more natural to me, and also more helpful. Natural, because that's where we (should) discover the issue, and helpful, because in a longer function you wouldn't know what's going on if the unlock is somewhere in the middle.

If we didn't have AssertHeld, we would also warn on the Unlock call, and I think we should treat this no different (other than having a different warning message).

4629

The comment is that the warning should be different. It's somewhat related with the other discussion: one shouldn't be able to release asserted locks, and thus the join here should fail for that reason and not because of a shared/exclusive mismatch.

Naturally I want to do those things together in one change, that's why I'm not addressing it here.

I'm not particularly worried that this is going to have a big impact, but we've had interactions with someone at Google whom I might ask to try it out.

So this change will allow shared/exclusive - asserted/unmanaged joins until my follow-up will disallow them again, though because of the asserted/unmanaged part and not the shared/exclusive part. If you're not happy about this gap in warning coverage, we can also postpone this change until we have the asserted/unmanaged joins disallowed.

Sorry, I got confused here. The change doesn't allow that. It's still disallowed as a shared/exclusive join, but I want to disallow it because it's an asserted/unmanaged join instead. So there is no gap in warning coverage.

delesley accepted this revision.May 25 2021, 12:48 PM

Thanks for taking the time to discuss things with me. :-)

Wrt. to the TEST_LOCKED_FUNCTION, I agree that you can simulate the behavior using Assert and Lock. But that pattern is too general/powerful, because it also allows you to write nonsensical and unsound code. Philosophically, static analysis is concerned with allowing things that are sound, but preventing things that are not, so I would prefer to allow the valid case, but warn on the nonsensical/unsound code. The goal is not to provide powerful back doors so that you can squeeze anything past the checker -- doing so kind of defeats the point. :-) That being said, I'm not certainly no asking you to implement TEST_LOCKED functionality in this particular patch, and I totally understand that it may simply not be worth the effort.

Wrt. to unlocking an Assert, I see no problem. It makes perfect sense to dynamically assert that a mutex is held, then do something, and then unlock the mutex afterwards; after all, the caller asserted that it was held before unlocking. You shouldn't disallow that.

This revision is now accepted and ready to land.May 25 2021, 12:48 PM

Thanks for taking the time to discuss things with me. :-)

Thank you as well!

Wrt. to the TEST_LOCKED_FUNCTION, I agree that you can simulate the behavior using Assert and Lock. But that pattern is too general/powerful, because it also allows you to write nonsensical and unsound code. Philosophically, static analysis is concerned with allowing things that are sound, but preventing things that are not, so I would prefer to allow the valid case, but warn on the nonsensical/unsound code. The goal is not to provide powerful back doors so that you can squeeze anything past the checker -- doing so kind of defeats the point. :-)

You're right, it does allow nonsensical code. But in some sense it's just a combination of two backdoors that we already have:

  • Without -Wthread-safety-negative you don't have to declare preconditions of the kind "mutex should not be locked". We're basically assuming that locking is fine by default, thereby risking double locks.
  • The assert_capability attribute is also a bit of a backdoor. Instead of statically propagating through the code that a mutex is held, we can just get that fact "out of thin air".

That being said, I'm not certainly no asking you to implement TEST_LOCKED functionality in this particular patch, and I totally understand that it may simply not be worth the effort.

It certainly is appealing, especially because in our code we don't really have AssertHeld-like functions, but only isLocked-like functions, so our assertions are typically of the form assert(mu.isLocked()). But I haven't made up my mind yet.

Either way I think this change is just a consistent extension of the current behavior.

Wrt. to unlocking an Assert, I see no problem. It makes perfect sense to dynamically assert that a mutex is held, then do something, and then unlock the mutex afterwards; after all, the caller asserted that it was held before unlocking. You shouldn't disallow that.

Let's discuss this when I put forward the change, I'll add you as reviewer as usual. I think I have good arguments why we shouldn't be allowing it.

This revision was landed with ongoing or failed builds.May 27 2021, 8:46 AM
This revision was automatically updated to reflect the committed changes.
aaronpuchert marked 2 inline comments as done.
  • The assert_capability attribute is also a bit of a backdoor. Instead

of statically propagating through the code that a mutex is held, we can
just get that fact "out of thin air".

Assert_capability is not a back door. It is supposed to be used only on a
function which does a run-time check: if (!mu_.is_locked()) fail(). This
sort of thing is very common in static analysis. There are places in the
code where you cannot statically prove that a property holds at
compile-time, so you insert a run-time check into the code, and then
propagate that property to the static analysis on the branch where the
check succeeds.

Of course, you can use assert_capability to create a back door, by putting
it on a function that doesn't actually check anything, just like you can
declare random methods to be lock_functions, even if they don't lock
anything. :-)

-DeLesley

Assert_capability is not a back door. It is supposed to be used only on a function which does a run-time check: if (!mu_.is_locked()) fail().

Right, although assertions can turn into no-ops depending on the build profile. We discussed this on D87629.

That's why it's more like stating an assumption. We don't really know if/how that assumption is being enforced.

There are places in the code where you cannot statically prove that a property holds at compile-time, so you insert a run-time check into the code, and then propagate that property to the static analysis on the branch where the check succeeds.

That's exactly what I meant with “back door”: if you can't prove that a capability is held, you can still assert it. That shifts the burden of checking to the runtime, so in that sense it is a backdoor for static analysis.