This is an archive of the discontinued LLVM Phabricator instance.

Thread safety analysis: Support copy-elided production of scoped capabilities through arbitrary calls
ClosedPublic

Authored by aaronpuchert on Jul 14 2022, 4:48 AM.

Details

Summary

When support for copy elision was initially added in e97654b2f2807, it
was taking attributes from a constructor call, although that constructor
call is actually not involved. It seems more natural to use attributes
on the function returning the scoped capability, which is where it's
actually coming from. This would also support a number of interesting
use cases, like producing different scope kinds without the need for tag
types, or producing scopes from a private mutex.

Changing the behavior was surprisingly difficult: we were not handling
CXXConstructorExpr calls like regular calls but instead handled them
through the DeclStmt they're contained in. This was based on the
assumption that constructors are basically only called in variable
declarations (not true because of temporaries), and that variable
declarations necessitate constructors (not true with C++17 anymore).

Untangling this required separating construction from assigning a
variable name. When a call produces an object, we use a placeholder
til::LiteralPtr for this, and we collect the call expression and
placeholder in a map. Later when going through a DeclStmt, we look up
the call expression and set the placeholder to the new VarDecl.

The change has a couple of nice side effects:

  • We don't miss constructor calls not contained in DeclStmts anymore, allowing patterns like MutexLock{&mu}, requiresMutex(); The scoped lock temporary will be destructed at the end of the full statement, so it protects the following call without the need for a scope, but with the ability to unlock in case of an exception.
  • We support lifetime extension of temporaries. While unusual, one can now write const MutexLock &scope = MutexLock(&mu); and have it behave as expected.
  • Destructors used to be handled in a weird way: since there is no expression in the AST for implicit destructor calls, we instead provided a made-up DeclRefExpr to the variable being destructed, and passed that instead of a CallExpr. Then later in translateAttrExpr there was special code that knew that destructor expressions worked a bit different.
  • We were producing dummy DeclRefExprs in a number of places, this has been eliminated. We now use til::SExprs instead.

Technically this could break existing code, but the current handling
seems unexpected enough to justify this change.

Diff Detail

Event Timeline

aaronpuchert created this revision.Jul 14 2022, 4:48 AM
Herald added a project: Restricted Project. · View Herald Transcript
aaronpuchert requested review of this revision.Jul 14 2022, 4:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 4:48 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
aaronpuchert added inline comments.Jul 14 2022, 5:12 AM
clang/docs/ThreadSafetyAnalysis.rst
935

The NO_THREAD_SAFETY_ANALYSIS here is due to a false positive warning because the scope is not destructed in the function body. Maybe we should postpone documentation until this is resolved.

clang/include/clang/Analysis/Analyses/ThreadSafetyTIL.h
653–654

For placeholders we're comparing identity.

clang/include/clang/Analysis/Analyses/ThreadSafetyTraverse.h
629

Perhaps no warning will ever print this, but I'm not entirely sure.

clang/lib/Analysis/ThreadSafety.cpp
1223

Placeholders are always local variables.

1532

This is essentially an std::map<const Expr *, til::LiteralPtr *>, but since it only keeps objects between construction and assignment to a variable or temporary destruction, I thought that a small vector would be more appropriate. However, there is no guarantee that this doesn't grow in size significantly: temporaries with trivial destructors (which don't show up in the CFG) would never be removed from this list. That's unlikely for properly annotated scopes, but not impossible.

1791–1793

In case you're wondering why we need this for records with ScopedLockableAttr: otherwise we get failures in SelfConstructorTest:

class SelfLock {
public:
  SelfLock()  EXCLUSIVE_LOCK_FUNCTION(mu_);
  ~SelfLock() UNLOCK_FUNCTION(mu_);

  void foo() EXCLUSIVE_LOCKS_REQUIRED(mu_);

  Mutex mu_;
};

void test() {
  SelfLock s;
  s.foo();
}

For s.foo() we need s.mu, and to see that we've acquired this we need to plugin s for the placeholder that we used for construction, although SelfLock is not a scoped lockable.

Though we could restrict this to functions that have actual thread safety attributes (instead of other attributes).

2122–2134

This is not strictly required, but it enables the lifetime_extension test below. It could also be a separate change, but having it here demonstrates that the approach can handle this case as well.

2418–2436

Scopes as temporaries are probably unusual, but as the temporary test demonstrates they could be useful. In any event, if we recognize acquisition via CXXConstructExprs not contained in a DeclStmt, we should probably also remove locks when they disappear again.

Adding @rsmith because this essentially reverts rGe97654b2f28072ad9123006c05e03efd82852982, and @efriedma because it partially reverts D76943. No need for a detailed review, but I'd like to know if this would break use cases that you're interested in.

aaronpuchert added inline comments.Jul 14 2022, 5:23 AM
clang/include/clang/Analysis/Analyses/ThreadSafetyTIL.h
435

By the way, I considered using this, but it didn't seem wholly appropriate. Our placeholder can't be lazily evaluated, we simply don't know initially. And later we do know and can just plug in the VarDecl.

clang/lib/Analysis/ThreadSafety.cpp
1791–1793

Sorry, records without ScopedLockableAttr of course.

Thank you for your patience on the review! I've taken a cursory look through and I'm still thinking about the patch. I've not seen anything that's really worrying. But this is pretty dense stuff and @delesley has way more experience with TIL, so I'm hoping he might be able to take a peek as well.

clang/docs/ThreadSafetyAnalysis.rst
935

I don't know that there's a lot of value to documenting it if we have to disable thread safety analysis, so my instinct is to hold off on the documentation for the moment.

clang/include/clang/Analysis/Analyses/ThreadSafetyTIL.h
435

Hmmm, naively, it seems like FS_pending is "we don't know initially" and FS_done is "now we do know and can plugin in the VarDecl". But I agree it seems a little different from lazy evaluation in that it's not a performance optimization.

clang/include/clang/Analysis/Analyses/ThreadSafetyTraverse.h
629

Should we assert here instead?

clang/lib/Analysis/ThreadSafety.cpp
1532

Would it make more sense to use a SmallDenseMap here?

2419
clang/lib/Analysis/ThreadSafetyCommon.cpp
339–342
aaronpuchert marked 3 inline comments as done.Sep 7 2022, 2:53 PM
aaronpuchert added inline comments.
clang/docs/ThreadSafetyAnalysis.rst
935

Ok, then let's leave it out for now.

clang/include/clang/Analysis/Analyses/ThreadSafetyTIL.h
435

Yes, it seems quite close. But my reading is that calling result is supposed to force the evaluation, even when we might not be able to provide a name yet.

clang/include/clang/Analysis/Analyses/ThreadSafetyTraverse.h
629

I played around a bit and it is actually possible to run into this in contrived examples. I'll push some tests so that you can judge for yourself how we should print this.

clang/lib/Analysis/ThreadSafety.cpp
1532

I can't find an awful lot of documentation about this, but judging from the name and a quick glance at the code I think that might be just what I'm looking for. Thanks for the tip!

2419

I was imitating lines 2405/2411, but auto is of course fine. (The code above might actually predate C++11.)

clang/lib/Analysis/ThreadSafetyCommon.cpp
339–342

No doubt referring to https://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return. I was trying to emphasize the dual nature of llvm::PointerUnion<const Expr *, til::SExpr *>, but I can certainly also drop the else.

aaronpuchert marked 3 inline comments as done.

Use SmallDenseMap plus some minor changes.

aaronpuchert added inline comments.Sep 7 2022, 2:59 PM
clang/test/SemaCXX/warn-thread-safety-analysis.cpp
4254–4260

Here we're printing <temporary>, because we don't have a name for this object.

I was under the impression that we've already switched to C++17, but the Windows pre-submit build failed with:

C:\ws\w9\llvm-project\premerge-checks\clang\lib\Analysis\ThreadSafety.cpp(2107): error C2429: language feature 'init-statements in if/switch' requires compiler flag '/std:c++17'
C:\ws\w9\llvm-project\premerge-checks\clang\lib\Analysis\ThreadSafety.cpp(2120): error C2429: language feature 'init-statements in if/switch' requires compiler flag '/std:c++17'
C:\ws\w9\llvm-project\premerge-checks\clang\lib\Analysis\ThreadSafety.cpp(2418): error C2429: language feature 'init-statements in if/switch' requires compiler flag '/std:c++17'

Perhaps I should move the init statements out again?

aaron.ballman accepted this revision.Sep 9 2022, 6:29 AM

I was under the impression that we've already switched to C++17, but the Windows pre-submit build failed with:

C:\ws\w9\llvm-project\premerge-checks\clang\lib\Analysis\ThreadSafety.cpp(2107): error C2429: language feature 'init-statements in if/switch' requires compiler flag '/std:c++17'
C:\ws\w9\llvm-project\premerge-checks\clang\lib\Analysis\ThreadSafety.cpp(2120): error C2429: language feature 'init-statements in if/switch' requires compiler flag '/std:c++17'
C:\ws\w9\llvm-project\premerge-checks\clang\lib\Analysis\ThreadSafety.cpp(2418): error C2429: language feature 'init-statements in if/switch' requires compiler flag '/std:c++17'

Perhaps I should move the init statements out again?

No, I've filed https://github.com/google/llvm-premerge-checks/issues/416 to find out what's going on with the Windows precommit CI bot.

The changes LGTM, though it'd still be great if @delesley had some time to put a second set of eyes on the changes. If we don't hear from him by Tue, I think we should go ahead and land this. Please be sure to add a release note for the changes!

clang/lib/Analysis/ThreadSafetyCommon.cpp
339–342

I don't have a strong opinion.

clang/test/SemaCXX/warn-thread-safety-analysis.cpp
4254–4260

Ah thank you, this example makes a lot of sense. I think printing <temporary> is quite reasonable.

This revision is now accepted and ready to land.Sep 9 2022, 6:29 AM

Please be sure to add a release note for the changes!

Any opinion as to what the release note might say? I'm asking since we dropped the documentation changes.

Perhaps I should add them back in, but without function bodies (thus eliminating the need to add NO_THREAD_SAFETY_ANALYSIS)? Something like this:

// Same as constructors, but without tag types. (Requires C++17 copy elision.)
static MutexLocker Lock(Mutex *mu) ACQUIRE(mu);
static MutexLocker Adopt(Mutex *mu) REQUIRES(mu);
static MutexLocker ReaderLock(Mutex *mu) ACQUIRE_SHARED(mu);
static MutexLocker AdoptReaderLock(Mutex *mu) REQUIRES_SHARED(mu);
static MutexLocker DeferLock(Mutex *mu) EXCLUDES(mu);

Then I could write in the note that this is now possible, and annotations on move constructors are without effect.

Please be sure to add a release note for the changes!

Any opinion as to what the release note might say? I'm asking since we dropped the documentation changes.

No strong opinion, but Technically this could break existing code, but the current handling seems unexpected enough to justify this change. suggests we should be mentioning to users what we changed and what code could theoretically break.

Perhaps I should add them back in, but without function bodies (thus eliminating the need to add NO_THREAD_SAFETY_ANALYSIS)? Something like this:

// Same as constructors, but without tag types. (Requires C++17 copy elision.)
static MutexLocker Lock(Mutex *mu) ACQUIRE(mu);
static MutexLocker Adopt(Mutex *mu) REQUIRES(mu);
static MutexLocker ReaderLock(Mutex *mu) ACQUIRE_SHARED(mu);
static MutexLocker AdoptReaderLock(Mutex *mu) REQUIRES_SHARED(mu);
static MutexLocker DeferLock(Mutex *mu) EXCLUDES(mu);

Then I could write in the note that this is now possible, and annotations on move constructors are without effect.

I'd be okay with that!

Add trimmed-down documentation back in and a release note.

This revision was landed with ongoing or failed builds.Oct 6 2022, 1:21 PM
This revision was automatically updated to reflect the committed changes.
chapuni added a subscriber: chapuni.Oct 6 2022, 5:27 PM
chapuni added inline comments.
clang/lib/Analysis/ThreadSafety.cpp
1794

'inserted' is used only here.

aaronpuchert added inline comments.Oct 6 2022, 5:54 PM
clang/lib/Analysis/ThreadSafety.cpp
1794

Correct, is that an issue? Perhaps -Wunused-variable in Release builds?

Otherwise I believe it's correct—we don't need the iterator and we should generally not insert expressions twice, hence the assertion.

aaron.ballman added inline comments.Oct 7 2022, 4:46 AM
clang/lib/Analysis/ThreadSafety.cpp
1794

Yeah, that's the trouble -- this breaks release builds using -Werror. You should add [[maybe_unused]] to the declaration (as an NFC commit).

hans added a subscriber: hans.Oct 7 2022, 5:31 AM

We're hitting a false positive in grpc after this:

> ../../third_party/grpc/src/src/core/lib/gprpp/ref_counted_ptr.h:335:31: error: calling function 'TlsSessionKeyLoggerCache' requires holding mutex 'g_tls_session_key_log_cache_mu' exclusively [-Werror,-Wthread-safety-analysis]
>   return RefCountedPtr<T>(new T(std::forward<Args>(args)...));
>                               ^
> ../../third_party/grpc/src/src/core/tsi/ssl/key_logging/ssl_key_logging.cc:121:26: note: in instantiation of function template specialization 'grpc_core::MakeRefCounted<tsi::TlsSessionKeyLoggerCache>' requested here
>      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();
>                         ^


The code looks like this:

grpc_core::RefCountedPtr<TlsSessionKeyLogger> TlsSessionKeyLoggerCache::Get(
    std::string tls_session_key_log_file_path) {
  gpr_once_init(&g_cache_mutex_init, do_cache_mutex_init);
  GPR_DEBUG_ASSERT(g_tls_session_key_log_cache_mu != nullptr);
  if (tls_session_key_log_file_path.empty()) {
    return nullptr;
  }
  {
    grpc_core::MutexLock lock(g_tls_session_key_log_cache_mu);        <---------- holding the mutex
    grpc_core::RefCountedPtr<TlsSessionKeyLoggerCache> cache;
    if (g_cache_instance == nullptr) {
      // This will automatically set g_cache_instance.
      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();      <------ line 121



lock is holding a MutexLock (I assume that's an exclusive thing) on g_tls_session_key_log_cache_mu.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1372394#c4 for how to reproduce.

I've reverted this in https://github.com/llvm/llvm-project/commit/a4afa2bde6f4db215ddd3267a8d11c04367812e5 in the meantime.

We're hitting a false positive in grpc after this:

> ../../third_party/grpc/src/src/core/lib/gprpp/ref_counted_ptr.h:335:31: error: calling function 'TlsSessionKeyLoggerCache' requires holding mutex 'g_tls_session_key_log_cache_mu' exclusively [-Werror,-Wthread-safety-analysis]
>   return RefCountedPtr<T>(new T(std::forward<Args>(args)...));
>                               ^
> ../../third_party/grpc/src/src/core/tsi/ssl/key_logging/ssl_key_logging.cc:121:26: note: in instantiation of function template specialization 'grpc_core::MakeRefCounted<tsi::TlsSessionKeyLoggerCache>' requested here
>      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();
>                         ^


The code looks like this:

grpc_core::RefCountedPtr<TlsSessionKeyLogger> TlsSessionKeyLoggerCache::Get(
    std::string tls_session_key_log_file_path) {
  gpr_once_init(&g_cache_mutex_init, do_cache_mutex_init);
  GPR_DEBUG_ASSERT(g_tls_session_key_log_cache_mu != nullptr);
  if (tls_session_key_log_file_path.empty()) {
    return nullptr;
  }
  {
    grpc_core::MutexLock lock(g_tls_session_key_log_cache_mu);        <---------- holding the mutex
    grpc_core::RefCountedPtr<TlsSessionKeyLoggerCache> cache;
    if (g_cache_instance == nullptr) {
      // This will automatically set g_cache_instance.
      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();      <------ line 121



lock is holding a MutexLock (I assume that's an exclusive thing) on g_tls_session_key_log_cache_mu.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1372394#c4 for how to reproduce.

I've reverted this in https://github.com/llvm/llvm-project/commit/a4afa2bde6f4db215ddd3267a8d11c04367812e5 in the meantime.

I'm not opposed to a revert here, but reverting on (plausible) assumptions of this being a false positive without discussion or a reproducer beyond "compile chromium" comes across as a somewhat aggressive revert. Especially given that we knew this had the potential to break code and we were okay with that code being broken (as stated in the commit message). No community bots went red from this change and it's possible this could have been fixed forward/explained as a desired break with less churn to the codebase. Nothing to be done for it now (and it wasn't wrong per se), just something to keep in mind for future reverts -- a bit of discussion before a revert like this would be appropriate.

gulfem added a subscriber: gulfem.Oct 7 2022, 9:55 AM

We also started seeing -Wthread-safety-precise error in our Fuchsia code.
https://luci-milo.appspot.com/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-release/b8800959115965408001/overview
I'm trying to verify with our team whether it is a false positive, but I just wanted to give you heads up!

aaronpuchert added a comment.EditedOct 7 2022, 9:56 AM

We're hitting a false positive in grpc after this:

> ../../third_party/grpc/src/src/core/lib/gprpp/ref_counted_ptr.h:335:31: error: calling function 'TlsSessionKeyLoggerCache' requires holding mutex 'g_tls_session_key_log_cache_mu' exclusively [-Werror,-Wthread-safety-analysis]
>   return RefCountedPtr<T>(new T(std::forward<Args>(args)...));
>                               ^
> ../../third_party/grpc/src/src/core/tsi/ssl/key_logging/ssl_key_logging.cc:121:26: note: in instantiation of function template specialization 'grpc_core::MakeRefCounted<tsi::TlsSessionKeyLoggerCache>' requested here
>      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();
>                         ^


The code looks like this:

grpc_core::RefCountedPtr<TlsSessionKeyLogger> TlsSessionKeyLoggerCache::Get(
    std::string tls_session_key_log_file_path) {
  gpr_once_init(&g_cache_mutex_init, do_cache_mutex_init);
  GPR_DEBUG_ASSERT(g_tls_session_key_log_cache_mu != nullptr);
  if (tls_session_key_log_file_path.empty()) {
    return nullptr;
  }
  {
    grpc_core::MutexLock lock(g_tls_session_key_log_cache_mu);        <---------- holding the mutex
    grpc_core::RefCountedPtr<TlsSessionKeyLoggerCache> cache;
    if (g_cache_instance == nullptr) {
      // This will automatically set g_cache_instance.
      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();      <------ line 121



lock is holding a MutexLock (I assume that's an exclusive thing) on g_tls_session_key_log_cache_mu.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1372394#c4 for how to reproduce.

I've reverted this in https://github.com/llvm/llvm-project/commit/a4afa2bde6f4db215ddd3267a8d11c04367812e5 in the meantime.

This doesn't look like a false positive to me. The documentation states that no inlining is being done, so unless grpc_core::MakeRefCounted (or a specialization) has a require_capability attribute, the lock doesn't count as locked in the function body. That has always been the case.

The only thing that changed is that we now also consider CXXConstructExpr outside of DeclStmts, which is arguably an improvement.

From the logs:

../../extensions/browser/api/content_settings/content_settings_store.cc(75,49): note: mutex acquired here
  std::unique_ptr<base::AutoLock> auto_lock(new base::AutoLock(lock_));
                                                ^

That seems to be precisely the example that the documentation says doesn't work. The constructor of std::unique_ptr has no thread safety annotations and so we have a constructor call without a matching destructor call. (The destructor is called indirectly from the destructor of unique_ptr.) We see this now because we've started looking at constructor calls that don't initialize a local variable.

The documentation also lists ways to make this work (such as having more powerful scope types).

I'll give you some time to reply before I reland, but I can't see a bug here. For the next time, as already pointed out, give the involved people some time to reply before you revert (unless it's a crash). We have to be able to continue working on the compiler without having Google revert changes all the time because we produce new warnings.

aaronpuchert reopened this revision.Oct 7 2022, 9:57 AM
This revision is now accepted and ready to land.Oct 7 2022, 9:57 AM
aaronpuchert added a comment.EditedOct 7 2022, 10:12 AM

We also started seeing -Wthread-safety-precise error in our Fuchsia code.
https://luci-milo.appspot.com/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-release/b8800959115965408001/overview
I'm trying to verify with our team whether it is a false positive, but I just wanted to give you heads up!

Both places have

Cursor cursor(DiscardableVmosLock::Get(), ...);
AssertHeld(cursor.lock_ref());

At the end of the scope we get

error: calling function '~VmoCursor' requires holding mutex 'lock_' exclusively [-Werror,-Wthread-safety-precise]
note: found near match 'cursor.lock_'

Presumably Cursor is some kind of alias to VmoCursor, as we don't look at base destructors yet. Since the code is not easily searchable for me, can you look up the annotations on DiscardableVmosLock::Get, the constructor of Cursor/VmoCursor being used here, Cursor::lock_ref, and AssertHeld?

Edit: the second error seems to be same that @hans was posting. That's a true positive sadly, we didn't emit a warning previously by accident.

aaronpuchert planned changes to this revision.EditedOct 7 2022, 10:32 AM

Presumably Cursor is some kind of alias to VmoCursor, as we don't look at base destructors yet. Since the code is not easily searchable for me, can you look up the annotations on DiscardableVmosLock::Get, the constructor of Cursor/VmoCursor being used here, Cursor::lock_ref, and AssertHeld?

Nevermind, I think I've found it: lock_ref has TA_RET_CAP(lock_), so cursor.lock_ref() translates to cursor.lock_. So we have that lock. The destructor ~VmoCursor has TA_REQ(lock_) referring to the same lock. So that looks like a bug, I think we're missing the substitution somehow. Need to look into it.

gulfem added a comment.EditedOct 7 2022, 10:33 AM

We also started seeing -Wthread-safety-precise error in our Fuchsia code.
https://luci-milo.appspot.com/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-release/b8800959115965408001/overview
I'm trying to verify with our team whether it is a false positive, but I just wanted to give you heads up!

Both places have

Cursor cursor(DiscardableVmosLock::Get(), ...);
AssertHeld(cursor.lock_ref());

At the end of the scope we get

error: calling function '~VmoCursor' requires holding mutex 'lock_' exclusively [-Werror,-Wthread-safety-precise]
note: found near match 'cursor.lock_'

Presumably Cursor is some kind of alias to VmoCursor, as we don't look at base destructors yet. Since the code is not easily searchable for me, can you look up the annotations on DiscardableVmosLock::Get, the constructor of Cursor/VmoCursor being used here, Cursor::lock_ref, and AssertHeld?

https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/kernel/vm/vm_cow_pages.cc
It looks like Cursor is an alias to VmoCursor.
https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/kernel/vm/include/vm/vm_cow_pages.h;drc=0f6485b1f7f6eb73f9faa4653f34def3be827d15;l=1373

Edit: the second error seems to be same that @hans was posting. That's a true positive sadly, we didn't emit a warning previously by accident.

Yes, that is the same error.

hans added a comment.Oct 7 2022, 12:47 PM

We're hitting a false positive in grpc after this:

> ../../third_party/grpc/src/src/core/lib/gprpp/ref_counted_ptr.h:335:31: error: calling function 'TlsSessionKeyLoggerCache' requires holding mutex 'g_tls_session_key_log_cache_mu' exclusively [-Werror,-Wthread-safety-analysis]
>   return RefCountedPtr<T>(new T(std::forward<Args>(args)...));
>                               ^
> ../../third_party/grpc/src/src/core/tsi/ssl/key_logging/ssl_key_logging.cc:121:26: note: in instantiation of function template specialization 'grpc_core::MakeRefCounted<tsi::TlsSessionKeyLoggerCache>' requested here
>      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();
>                         ^


The code looks like this:

grpc_core::RefCountedPtr<TlsSessionKeyLogger> TlsSessionKeyLoggerCache::Get(
    std::string tls_session_key_log_file_path) {
  gpr_once_init(&g_cache_mutex_init, do_cache_mutex_init);
  GPR_DEBUG_ASSERT(g_tls_session_key_log_cache_mu != nullptr);
  if (tls_session_key_log_file_path.empty()) {
    return nullptr;
  }
  {
    grpc_core::MutexLock lock(g_tls_session_key_log_cache_mu);        <---------- holding the mutex
    grpc_core::RefCountedPtr<TlsSessionKeyLoggerCache> cache;
    if (g_cache_instance == nullptr) {
      // This will automatically set g_cache_instance.
      cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>();      <------ line 121



lock is holding a MutexLock (I assume that's an exclusive thing) on g_tls_session_key_log_cache_mu.

See https://bugs.chromium.org/p/chromium/issues/detail?id=1372394#c4 for how to reproduce.

I've reverted this in https://github.com/llvm/llvm-project/commit/a4afa2bde6f4db215ddd3267a8d11c04367812e5 in the meantime.

This doesn't look like a false positive to me. The documentation states that no inlining is being done, so unless grpc_core::MakeRefCounted (or a specialization) has a require_capability attribute, the lock doesn't count as locked in the function body. That has always been the case.

Thanks for looking into this so quickly.

I'd still call it a false positive since the mutex is in fact being held. However I now understand that this is due to a pre-existing limitation in the analysis.

How would you suggest the developers work around this?

From the logs:

../../extensions/browser/api/content_settings/content_settings_store.cc(75,49): note: mutex acquired here
  std::unique_ptr<base::AutoLock> auto_lock(new base::AutoLock(lock_));
                                                ^

That seems to be precisely the example that the documentation says doesn't work. The constructor of std::unique_ptr has no thread safety annotations and so we have a constructor call without a matching destructor call. (The destructor is called indirectly from the destructor of unique_ptr. We see this now because of the constructor call that doesn't initialize a local variable.

The documentation also lists ways to make this work (such as having more powerful scope types).

I see. I looked through the docs but it wasn't clear what you referred too. How would one change base::AutoLock in this case? Or do the developers need to avoid unique_ptr, or is there something else?

I'll give you some time to reply before I reland, but I can't see a bug here. For the next time, as already pointed out, give the involved people some time to reply before you revert (unless it's a crash). We have to be able to continue working on the compiler without having Google revert changes all the time because we produce new warnings.

We've seen reports here of various breakages from two different projects in short time after your patch. The general policy is to revert to green, so I think that was the right call. Typically, new warnings are introduced behind new flags so that users can adopt them incrementally, though I realize that might be tricky in this instance. In any case, perhaps we should give other projects some time to assess the impact of this before relanding?

This also seems like the kind of disruptive change we'd want to announce on Discourse, as discussed in Aaron's recent RFC. Probably should be mentioned in the "Potentially Breaking Changes" section of the release notes too.

  • Pass til::LiteralPtr *Self for automatic object destructors into handling of require_capability and locks_excluded attributes.
  • Add [[maybe_unused]] attribute to silence -Wunused-variable warinng in Release builds.
  • A few more tests.
This revision is now accepted and ready to land.Oct 7 2022, 1:58 PM

We also started seeing -Wthread-safety-precise error in our Fuchsia code.
https://luci-milo.appspot.com/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-release/b8800959115965408001/overview
I'm trying to verify with our team whether it is a false positive, but I just wanted to give you heads up!

This was a genuine bug which I believe to be fixed now (see the comments below). Thanks for the report!

I'd appreciate if you could test this, but it's not necessary as I could reproduce the bug and the test seems to show it fixed.

Thanks for looking into this so quickly.

I'd still call it a false positive since the mutex is in fact being held. However I now understand that this is due to a pre-existing limitation in the analysis.

It's a deliberate limitation. In the compiler we can't really look into function definitions (for a start, they might not be visible), so we have to rely on annotations.

How would you suggest the developers work around this?

Perhaps you could have the constructor call directly inline in the same function, and use move construction? Like

cache = grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache>(TlsSessionKeyLoggerCache());

Then the default constructor is called where we know the lock to be held. With a bit of luck the move is optimized out.

Another solution would be to specialize grpc_core::MakeRefCounted<TlsSessionKeyLoggerCache> and add the attribute on the specialization, but then you'll have to duplicate the implementation.

Lastly, of course __attribute__((no_thread_safety_analysis)) on grpc_core::MakeRefCounted should also do the job.

I looked through the docs but it wasn't clear what you referred too.

The section "No inlining" has this example:

Thread safety analysis is strictly intra-procedural, just like ordinary type checking. It relies only on the declared attributes of a function, and will not attempt to inline any method calls. As a result, code such as the following will not work:

template<class T>
class AutoCleanup {
  T* object;
  void (T::*mp)();

public:
  AutoCleanup(T* obj, void (T::*imp)()) : object(obj), mp(imp) { }
  ~AutoCleanup() { (object->*mp)(); }
};

Mutex mu;
void foo() {
  mu.Lock();
  AutoCleanup<Mutex>(&mu, &Mutex::Unlock);
  // ...
}  // Warning, mu is not unlocked.

How would one change base::AutoLock in this case? Or do the developers need to avoid unique_ptr, or is there something else?

Composing scoped locks through generic ADTs is incompatible with Thread safety analysis, as operations on the scopes are now hidden behind the ADT. So yes, anything like unique_ptr<AutoLock> should be avoided.

I've been working on supporting more complex RAII types that allow relocking (D49885) and deferred locking (D81332). This should allow to replace something like optional<AutoLock>. In your example it's a bit more difficult, as the AutoLock* is then passed to some object that's returned, so it likely leaves the function. Such data-flow-like patterns are not supported yet and will probably not be for a long time. (@delesley pointed out in a talk years ago that this would require some kind of dependent type system.) So my advice here would simply be to add the previously mentioned no_thread_safety_analysis attribute. For now this is simply intractable, so you're not losing anything.

We've seen reports here of various breakages from two different projects in short time after your patch. The general policy is to revert to green, so I think that was the right call. Typically, new warnings are introduced behind new flags so that users can adopt them incrementally, though I realize that might be tricky in this instance. In any case, perhaps we should give other projects some time to assess the impact of this before relanding?

For conceptual changes I'd always put them behind a new flag, in fact we have -Wthread-safety-beta for this. We are aware that this might break some code, but the previous behavior was so strange that it would be a pity to leave it. (We were using annotations on a function that doesn't do anything interesting and isn't even called.)

The two cases that you reported boil down to us previously not looking into some constructor calls, which is arguably just an inconsistency. Here is what it boils down to:

struct X {
    X() ANNOTATION();
};

void f() {
  X x;       // Annotation was always taken into account.
  X x = X(); // Annotation was previously taken into account only with -std=c++17 (guaranteed copy elisision) or newer, but not before.
  X();       // Annotation was previously not taken into account.
  new X();   // Dito.
}

This also seems like the kind of disruptive change we'd want to announce on Discourse, as discussed in Aaron's recent RFC. Probably should be mentioned in the "Potentially Breaking Changes" section of the release notes too.

I'll extend the release note to mention that as a side effect this looks into more constructor calls and might thus produce additional warnings consistent with how we behave otherwise.

I'm not sure if this counts as "potentially breaking change" since it only changes an optional diagnostic. But I'd defer to @aaron.ballman on this.

I can also post to Discourse, but Thread Safety Analysis is a bit of a niche topic and previous posts that I've seen didn't attract much attention.

clang/test/SemaCXX/warn-thread-safety-analysis.cpp
1620

The previous change had

warning: calling function '~DestructorRequires' requires holding mutex 'mu' exclusively
note: found near match 'rd.mu'

here which is now gone. This is basically the issue posted by @gulfem.

1630–1631

Similar to the reported issue, here we were not reporting warnings. Underlying reason is the same missing substitution.

gulfem added a comment.Oct 7 2022, 4:40 PM

We also started seeing -Wthread-safety-precise error in our Fuchsia code.
https://luci-milo.appspot.com/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-release/b8800959115965408001/overview
I'm trying to verify with our team whether it is a false positive, but I just wanted to give you heads up!

This was a genuine bug which I believe to be fixed now (see the comments below). Thanks for the report!

I'd appreciate if you could test this, but it's not necessary as I could reproduce the bug and the test seems to show it fixed.

I verified that it fixes the following error in our zircon kernel, and thanks for the fix!

[89054/268252] CXX kernel_x64/obj/zircon/kernel/vm/vm.vm_cow_pages.cc.o
FAILED: kernel_x64/obj/zircon/kernel/vm/vm.vm_cow_pages.cc.o
../../../recipe_cleanup/clang72pd81kv/bin/clang++ -MD -MF kernel_x64/obj/zircon/kernel/vm/vm.vm_cow_pages.cc.o.d -o kernel_x64/obj/zircon/kernel/vm/vm.vm_cow_pages.cc.o -D_LIBCPP_DISABLE_VISIBILITY...
../../zircon/kernel/vm/vm_cow_pages.cc:6400:3: error: calling function '~VmoCursor' requires holding mutex 'lock_' exclusively [-Werror,-Wthread-safety-precise]
  }
  ^
../../zircon/kernel/vm/vm_cow_pages.cc:6400:3: note: found near match 'cursor.lock_'
../../zircon/kernel/vm/vm_cow_pages.cc:6483:10: error: calling function '~VmoCursor' requires holding mutex 'lock_' exclusively [-Werror,-Wthread-safety-precise]
  return total_pages_discarded;
         ^
../../zircon/kernel/vm/vm_cow_pages.cc:6483:10: note: found near match 'cursor.lock_'

I'll give you some time to reply before I reland, but I can't see a bug here. For the next time, as already pointed out, give the involved people some time to reply before you revert (unless it's a crash). We have to be able to continue working on the compiler without having Google revert changes all the time because we produce new warnings.

We've seen reports here of various breakages from two different projects in short time after your patch. The general policy is to revert to green, so I think that was the right call.

I don't think it was the wrong call per se, but it was an aggressive revert IMO. Chromium tends to revert changes very quickly and without discussion, and I'm hoping we can change that slightly to include more up-front discussion before a revert. It's one thing to revert when the patch author isn't responsive, but reverting immediately and without discussion when the commit message explicitly states there is a change in behavior isn't how I would expect the revert policy to be interpreted.

Typically, new warnings are introduced behind new flags so that users can adopt them incrementally, though I realize that might be tricky in this instance. In any case, perhaps we should give other projects some time to assess the impact of this before relanding?

Other projects won't have the time to assess the impact of these changes because the changes were reverted (almost nobody applies one-off patches to try them out without there being something else driving that experiment). I think the changes should be re-landed so people can get us that feedback.

This also seems like the kind of disruptive change we'd want to announce on Discourse, as discussed in Aaron's recent RFC. Probably should be mentioned in the "Potentially Breaking Changes" section of the release notes too.

+1!

aaron.ballman added a reviewer: Restricted Project.Oct 11 2022, 7:39 AM

Adding the clang-vendors group for awareness of potential disruptions from these changes.

hans added a comment.Oct 11 2022, 10:59 AM

I'll give you some time to reply before I reland, but I can't see a bug here. For the next time, as already pointed out, give the involved people some time to reply before you revert (unless it's a crash). We have to be able to continue working on the compiler without having Google revert changes all the time because we produce new warnings.

We've seen reports here of various breakages from two different projects in short time after your patch. The general policy is to revert to green, so I think that was the right call.

I don't think it was the wrong call per se, but it was an aggressive revert IMO. Chromium tends to revert changes very quickly and without discussion, and I'm hoping we can change that slightly to include more up-front discussion before a revert. It's one thing to revert when the patch author isn't responsive, but reverting immediately and without discussion when the commit message explicitly states there is a change in behavior isn't how I would expect the revert policy to be interpreted.

In general I think us reverting fast is a good thing. Reverts and relands are cheap, and reverting early prevents more people from being exposed to breakages, which are expensive to investigate. Minimizing the amount time where the source tree is in a bad state seems like a good thing to me, and reverts also tend to become harder and more disruptive with time as more work lands on top.

I certainly don't want us to come across as aggressive though, and since for this patch it turned out that the new false positives were expected, in hindsight I probably shouldn't have been so quick to revert.

Since the bug that Gulfem reported is fixed, relanding sounds good to me.

I'll give you some time to reply before I reland, but I can't see a bug here. For the next time, as already pointed out, give the involved people some time to reply before you revert (unless it's a crash). We have to be able to continue working on the compiler without having Google revert changes all the time because we produce new warnings.

We've seen reports here of various breakages from two different projects in short time after your patch. The general policy is to revert to green, so I think that was the right call.

I don't think it was the wrong call per se, but it was an aggressive revert IMO. Chromium tends to revert changes very quickly and without discussion, and I'm hoping we can change that slightly to include more up-front discussion before a revert. It's one thing to revert when the patch author isn't responsive, but reverting immediately and without discussion when the commit message explicitly states there is a change in behavior isn't how I would expect the revert policy to be interpreted.

In general I think us reverting fast is a good thing. Reverts and relands are cheap, and reverting early prevents more people from being exposed to breakages, which are expensive to investigate. Minimizing the amount time where the source tree is in a bad state seems like a good thing to me, and reverts also tend to become harder and more disruptive with time as more work lands on top.

In general, yes. Quick reverts for real problems are definitely appreciated (and our general policy to boot)!

I certainly don't want us to come across as aggressive though, and since for this patch it turned out that the new false positives were expected, in hindsight I probably shouldn't have been so quick to revert.

Yup! I'm mostly just asking to raise awareness on the review before reverting in cases where it's not a build error (or other stop-the-world-this-is-obviously-wrong situation) and none of our community bots have gone red. Chromium is a downstream and we definitely want to make sure the ToT is usable for downstreams regardless of when they pull code, but downstreams should not have an expectation that any disturbance to their code base qualifies as something to revert without discussion. (We've had similar tension from in-tree downstreams as well, such as libc++ and lldb, so we're trying to find a happy medium between "instant revert of conforming changes without warning" and "leaving the ToT in a broken state too long and building up frustrating debt from that.")

Since the bug that Gulfem reported is fixed, relanding sounds good to me.

Thanks!

Add sentence to release notes that we now look into more constructor calls and that this could lead to additional warnings being emitted.

aaronpuchert marked 5 inline comments as done.Oct 11 2022, 2:28 PM

@hans, where you able to fix or work around the warnings? I'd like to land this again, but if you need more time it can also wait a bit.

hans added a comment.Oct 11 2022, 11:33 PM

@hans, where you able to fix or work around the warnings? I'd like to land this again, but if you need more time it can also wait a bit.

The work-around for gRPC is still pending, but we can turn off the warning for that library if we need to, so no need to wait.

@aaron.ballman, would like some feedback on the release notes. Should I additionally write something under "Potentially Breaking Changes", or is it enough to mention this under "Improvements to Clang's diagnostics"? Though I guess we could also add this later on if we get more complaints that this breaks things.

@aaron.ballman, would like some feedback on the release notes. Should I additionally write something under "Potentially Breaking Changes", or is it enough to mention this under "Improvements to Clang's diagnostics"? Though I guess we could also add this later on if we get more complaints that this breaks things.

I think it's sufficient in "Improvements to Clang's diagnostics". We're not looking to document every externally observable change in "Potentially Breaking Changes", IMO. Thanks for asking. We could definitely shift it later if needed.

@aaron.ballman, would like some feedback on the release notes. Should I additionally write something under "Potentially Breaking Changes", or is it enough to mention this under "Improvements to Clang's diagnostics"? Though I guess we could also add this later on if we get more complaints that this breaks things.

I think it's sufficient in "Improvements to Clang's diagnostics". We're not looking to document every externally observable change in "Potentially Breaking Changes", IMO. Thanks for asking. We could definitely shift it later if needed.

+1 to this -- I think it's defensible either way. It's not that this is a *breaking* change, but it is a potentially disruptive one. If we get many more folks raising questions about code changes, we can move the release note then. I think what you have now is fine by me.

This revision was landed with ongoing or failed builds.Oct 13 2022, 10:37 AM
This revision was automatically updated to reflect the committed changes.

I'm seeing a fair number of breakages from this patch (not really sure how many we truly have, I've hit ~5-10 so far in widely used libraries, but I suspect we have far more in the long tail). They're all valid (most are just adding missing thread safety annotations, etc., so far one breakage caught a real bug), but it would help if we could disable just these new ones. How invasive/annoying would it be to carve out a subwarning for this category of bugs and call it something like -Wthread-safety-elision?

I'm seeing a fair number of breakages from this patch (not really sure how many we truly have, I've hit ~5-10 so far in widely used libraries, but I suspect we have far more in the long tail).

That might be an argument to move up our release note to "Potentially breaking changes", but we can also decide that around the next release.

How invasive/annoying would it be to carve out a subwarning for this category of bugs and call it something like -Wthread-safety-elision?

Are you seeing warnings because of the different treatment of copy-elided construction, or because we've started to consider CXXConstructorCalls outside of the initializer of a DeclStmt? In any event, this change is quite fundamental and doesn't make it easy to selectively switch back to the old behavior:

  • For copy elision we'd need to stop adding the scoped capability in BuildLockset::handleCall, then in BuildLockset::VisitDeclStmt add back in the code building the fake constructor call. Then there might be inconsistencies.
  • If it's the additional constructor calls, we'd have to change BuildLockset::VisitCXXConstructExpr and BuildLockset::VisitDeclStmt so that we continue to only consider some constructor calls. This would be easier, but would result in a rather odd warning flag.

Are you seeing warnings because of the different treatment of copy-elided construction, or because we've started to consider CXXConstructorCalls outside of the initializer of a DeclStmt?

I'm not familiar with internal Clang APIs so I'm not entirely sure how to answer that, but I can share sketches of the breakages I've addressed so far:

  1. (2x) Returning a MutexLock-like structure, e.g.
MutexLock Lock() {
  return MutexLock(&mu);
}

this is documented in the test as a known false positive. Is that something you plan to address some day, or is it beyond the limits of thread safety analysis? (Or theoretically possible, but then compile times would be too much if you have to do more analysis?) Anyway, I applied __attribute__((no_thread_safety_analysis)) to the method.

  1. (3x) deferred/offloaded unlocking, e.g.
void Func() {
  auto *lock = new MutexLock(&mu);
  auto cleanup = [lock]() { delete lock; };
  cleanup();
} // Error that mu is not unlocked

I assume this is beyond the scope of thread safety analysis -- cleanup() in the internal case actually happens in a different TU (the "cleanup" gets passed to some generic scheduler thingy), which is even more complex than this trivial example. In one of the cases the unlocking would happen on some other thread, which I guess is a little suspicious, but should still be guaranteed to execute (I didn't dig _too_ deep there). Anyway, I also suppressed analysis here.

  1. Missed lock annotation on the constructor, take 1
struct Foo {
  explicit Foo(Mutex *mu) EXCLUSIVE_LOCKS_REQUIRED(mu);
};

Something Func() {
  SomethingLikeMutexLock lock(&mu);
  lock.Release(); // !!!
  return Get(Foo(&mu));
}

Glaringly obvious bug that wasn't being caught before this patch. Moving the call to Foo(&mu) earlier while the lock is still held fixes the bug.

  1. Missed lock annotation on the constructor, take 2
struct Foo {
  explicit Foo(Mutex *mu) EXCLUSIVE_LOCKS_REQUIRED(mu);
};
void X() {
  auto lock = MutexLock(&mu);
  auto f = MakeFoo();
}
Foo Y() {
  return Foo(&mu);
}

X grabs the lock, but Y warns that we don't have mu held. In this case, mu->AssertHeld() suffices (the mutex pattern is a little more complicated, and the pattern is used elsewhere, but was missed in this method).

  1. Alternate std::make_unique implementation (this one still has me puzzled)
template <typename T, typename... Args>
inline std::unique_ptr<T> MyMakeUnique(Args &&...args) {
  return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}

static Mutex y_mu;

class Z {
 public:
  Z() EXCLUSIVE_LOCKS_REQUIRED(y_mu) {}
};

std::unique_ptr<Z> Y() {
  auto lock = MutexLock(&y_mu);
  return MyMakeUnique<Z>();
}

Returning std::make_unique<Z>() instead of the custom helper works. Why? No idea. MyMakeUnique is a c++11 compatibility helper -- std::unique_ptr is in c++11 but std::make_unique is only in c++14 -- and fortunately this code doesn't need it anymore. The implementation seems to be identical to the one in libc++, so my best guess is threading analysis has some special handling for some std:: methods.

I might have a better answer in a day or two of how widespread this is beyond just the core files that seem to make the world break when we enable this. We can just fix the bugs it if it's only a few of them, but it might be difficult if we have too many.

Thanks for the detailed write-up, very much appreciated.

  1. (2x) Returning a MutexLock-like structure, e.g.
MutexLock Lock() {
  return MutexLock(&mu);
}

this is documented in the test as a known false positive. Is that something you plan to address some day, or is it beyond the limits of thread safety analysis? (Or theoretically possible, but then compile times would be too much if you have to do more analysis?)

I'm planning to address this, but don't have a good idea how to solve it yet. Conceptually it should definitely be possible though.

Anyway, I applied __attribute__((no_thread_safety_analysis)) to the method.

This seems about right for now. I filed bug 58480, which you could link to in a comment or subscribe to for being notified of a fix.

  1. (3x) deferred/offloaded unlocking, e.g.
void Func() {
  auto *lock = new MutexLock(&mu);
  auto cleanup = [lock]() { delete lock; };
  cleanup();
} // Error that mu is not unlocked

I assume this is beyond the scope of thread safety analysis -- cleanup() in the internal case actually happens in a different TU (the "cleanup" gets passed to some generic scheduler thingy), which is even more complex than this trivial example. In one of the cases the unlocking would happen on some other thread, which I guess is a little suspicious, but should still be guaranteed to execute (I didn't dig _too_ deep there). Anyway, I also suppressed analysis here.

Yes, this is pretty much out of scope, at least for the foreseeable future. This is also documented: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#no-inlining.

  1. Missed lock annotation on the constructor, take 1
struct Foo {
  explicit Foo(Mutex *mu) EXCLUSIVE_LOCKS_REQUIRED(mu);
};

Something Func() {
  SomethingLikeMutexLock lock(&mu);
  lock.Release(); // !!!
  return Get(Foo(&mu));
}

Glaringly obvious bug that wasn't being caught before this patch. Moving the call to Foo(&mu) earlier while the lock is still held fixes the bug.

Glad to hear that!

  1. Missed lock annotation on the constructor, take 2
struct Foo {
  explicit Foo(Mutex *mu) EXCLUSIVE_LOCKS_REQUIRED(mu);
};
void X() {
  auto lock = MutexLock(&mu);
  auto f = MakeFoo();
}
Foo Y() {
  return Foo(&mu);
}

X grabs the lock, but Y warns that we don't have mu held. In this case, mu->AssertHeld() suffices (the mutex pattern is a little more complicated, and the pattern is used elsewhere, but was missed in this method).

Possibly you could also "propagate" the lock requirement onto MakeFoo like this:

Foo MakeFoo() EXCLUSIVE_LOCKS_REQUIRED(mu) {
  return Foo(&mu);
}

But sometimes this isn't possible for encapsulation reasons, in which case AssertHeld is indeed a good workaround.

  1. Alternate std::make_unique implementation (this one still has me puzzled)
template <typename T, typename... Args>
inline std::unique_ptr<T> MyMakeUnique(Args &&...args) {
  return std::unique_ptr<T>(new T(std::forward<Args>(args)...));
}

static Mutex y_mu;

class Z {
 public:
  Z() EXCLUSIVE_LOCKS_REQUIRED(y_mu) {}
};

std::unique_ptr<Z> Y() {
  auto lock = MutexLock(&y_mu);
  return MyMakeUnique<Z>();
}

Returning std::make_unique<Z>() instead of the custom helper works. Why? No idea. MyMakeUnique is a c++11 compatibility helper -- std::unique_ptr is in c++11 but std::make_unique is only in c++14 -- and fortunately this code doesn't need it anymore. The implementation seems to be identical to the one in libc++, so my best guess is threading analysis has some special handling for some std:: methods.

We had a similar case earlier in the thread. Basically the problem is that MyMakeUnique calls the constructor of Z without holding the lock (statically). I don't think that we have special handling for std::, but it's possible that the warning is suppressed in system headers.

Apart from the solutions mentioned above, manually inlining the make_unique/MyMakeUnique should do the job, i.e.

std::unique_ptr<Z> Y() {
  auto lock = MutexLock(&y_mu);
  return std::unique_ptr<Z>(new Z());
}

Of course this has downsides of its own, but we can't really do "inlining" in a compiler warning and so it's either the warning or (slightly) more verbose code.

I might have a better answer in a day or two of how widespread this is beyond just the core files that seem to make the world break when we enable this. We can just fix the bugs it if it's only a few of them, but it might be difficult if we have too many.

The good news is that for now we've only seen the second category of issues, for which a flag to restore the old behavior would be feasible. I can't say for certain whether that would make all the issues here disappear, but it definitely seems so.

I might have a better answer in a day or two of how widespread this is beyond just the core files that seem to make the world break when we enable this. We can just fix the bugs it if it's only a few of them, but it might be difficult if we have too many.

The good news is that for now we've only seen the second category of issues, for which a flag to restore the old behavior would be feasible. I can't say for certain whether that would make all the issues here disappear, but it definitely seems so.

We're about done with our cleanup. Our fix count is at 34, and should be final unless there are surprises. I'm not sure if others would benefit from having this warning pushed to a subflag, but we don't need it anymore.

While making some fixes, I ran across a strange false positive which I filed as https://github.com/llvm/llvm-project/issues/58535. I think it's another situation where the code is too complex for thread analysis to be feasible, but I also didn't see it documented as one of the common cases that isn't supported.