This is an archive of the discontinued LLVM Phabricator instance.

Thread safety analysis: Don't warn about managed locks on join points
ClosedPublic

Authored by aaronpuchert on Mar 16 2021, 3:46 PM.

Details

Summary

We already did so for scoped locks acquired in the constructor, this
change extends the treatment to deferred locks and scoped unlocking, so
locks acquired outside of the constructor. Obviously this makes things
more consistent.

Originally I thought this was a bad idea, because obviously it
introduces false negatives when it comes to double locking, but these
are typically easily found in tests, and the primary goal of the Thread
safety analysis is not to find double locks but race conditions.
Since the scoped lock will release the mutex anyway when the scope ends,
the inconsistent state is just temporary and probably fine.

Diff Detail

Event Timeline

aaronpuchert requested review of this revision.Mar 16 2021, 3:46 PM
aaronpuchert created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 3:46 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Negative capabilities don't need to be considered as managed.

I'd really like to hear from @delesley about these changes, specifically because of this bit:

the primary goal of the Thread safety analysis is not to find double locks but race conditions.

I believe the primary goal of TSA is to find threading-related issues that can be caught at compile time which includes both double locks and race conditions, and I don't have a good feel for whether these changes have a larger impact on the overall design or not.

As far as the changes in the patch go, they look reasonable to me.

Thanks for roping me into the conversation, Aaron, and sorry about the delay. I have mixed feelings about this patch. With respect to the purpose of thread safety analysis, finding race conditions is obviously a major concern, because race conditions are hard to find in tests. However, preventing deadlocks is almost as important, and can be even more important in certain critical applications. The consequence of a double unlock is usually a crash or an exception with an error message. The consequence of a double lock is often a deadlock, depending on the underlying thread library, which is a more serious problem that is harder to fix. Thus, I am concerned about the change in the test case on line 2895, where a potential deadlock has gone from detected, to no warning. Deadlocks are often not found in tests, because test coverage is never perfect over all possible control-flow paths.

The use cases here are also slightly different. A MutexLock object uses the RAII design pattern to ensure that the underlying mutex is unlocked on every control flow path, and presumably includes logic to prevent double releases/unlocks, as is usually the case with RAII. The RAII pattern is sufficiently robust that we can "trust the implementation", and relax static checks. A MutexUnlock object inverts the RAII pattern. Because a lock is being acquired on different control-flow paths, rather than being released, it may make sense to keep the stronger checks in order to guard against potential deadlocks. On the other hand, we are still using RAII, so the implementer of MutexUnlock could presumably insert logic to guard against deadlocks if that was a concern. Thus, I could be persuaded that deadlocks are less important in this particular case.

Is there a compelling reason for this change, other than consistency/symmetry concerns? In other words, do you have real-world code where you are getting false positives when using MutexUnlock? If so, then I am in favor of the change. If not, then I tend to err on the side of "if it ain't broke don't fix it." :-)

Not replying in order, I hope that's not confusing. Let's start with the motivation.

Is there a compelling reason for this change, other than consistency/symmetry concerns? In other words, do you have real-world code where you are getting false positives when using MutexUnlock? If so, then I am in favor of the change. If not, then I tend to err on the side of "if it ain't broke don't fix it." :-)

When I originally implemented relockable scopes in D49885 I was (not sure if deliberately or not) making it stricter than regular scopes. But then there were a couple of cases in our code like this:

RelockableMutexLock scope(&mu, DeferTraits{});
scope.Lock(); // Often also something like "if (!scope.TryLock()) return;"
if (condition) {
  scope.Unlock();
  // ... do stuff without the lock
} // warning: not all paths joining here hold the lock

The warning is somewhat surprising, because if you acquire in the constructor it's fine. (Compare e.g. Foo::test2 on line 2593 with join on line 2891. The latter has a warning on the join point, the former doesn't.)

These are usually easy to fix, you can just add an else branch that releases the lock, but then we got into cases with nested ifs and people started asking: why can't I just rely on the RAII type to do the unlocking for me? There were maybe 5 cases or so and I could fix them all, but I couldn't really explain why it's fine when you lock in the constructor, but not if you lock later. (All through the RAII object of course.)

A MutexLock object uses the RAII design pattern to ensure that the underlying mutex is unlocked on every control flow path, and presumably includes logic to prevent double releases/unlocks, as is usually the case with RAII.

Precisely, if MutexLock has an Unlock method, the destructor can't assume the lock is still held. Unlock itself though might assume it, so we might have false negatives here. But we've been accepting them before: see Foo::test5. (See below for a potential solution.)

The RAII pattern is sufficiently robust that we can "trust the implementation", and relax static checks. A MutexUnlock object inverts the RAII pattern. Because a lock is being acquired on different control-flow paths, rather than being released, it may make sense to keep the stronger checks in order to guard against potential deadlocks.

I think we can assume the same thing here. If MutexUnlock has a Lock, then the destructor can't assume the lock is not held. But Lock might assume it, which means we have essentially the same false negative, except the other way around.

When it comes to deadlocks, I'm afraid that's not new:

MutexLocker rlock(&mu_);
if (c)
  rlock.Unlock();
rlock.Lock();

might deadlock, but we don't warn currently. The reason is that we remove scoped locks when there are discrepancies at join points, because we don't want to miss race conditions. So we miss that the second Lock call might deadlock.

We could probably address all false negatives by introducing a FactEntry for "managed locks that might be held", which we introduce on discrepancies at join points. We assume that they're fine for the destructor, but we would warn if there is an explicit lock/unlock. I could start working on this if you think that's a good idea, but it would not only address false negatives introduced in this change, but also false negatives that we've had before.

We could probably address all false negatives by introducing a FactEntry for "managed locks that might be held", which we introduce on discrepancies at join points. We assume that they're fine for the destructor, but we would warn if there is an explicit lock/unlock. I could start working on this if you think that's a good idea, but it would not only address false negatives introduced in this change, but also false negatives that we've had before.

Hmm, this won't work for back edges, because we can't change the lock set after the join point for them.

But I believe we could warn on back edges if a "managed lock that might be held" joins with a managed lock that is held or not held right away. If there is a path from the loop entry that might acquire or release the lock, then we might run into this same path with our lock that might be held. Might not be easy to place the diagnostics correctly, but otherwise it seems consistent.

delesley accepted this revision.Apr 6 2021, 9:35 AM

I am convinced by your argument. I think it is reasonable to assume that if someone is using an RAII object, then the underlying object is responsible for managing double locks/unlocks, and we can restrict our efforts to detecting race conditions. Thanks for the detailed explanation! :-)

This revision is now accepted and ready to land.Apr 6 2021, 9:35 AM
This revision was landed with ongoing or failed builds.Apr 6 2021, 1:30 PM
This revision was automatically updated to reflect the committed changes.