Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I updated the title to "Fix tsan problem where the per-thread shared_ptr() can be locked right before the cache is destroyed causing a race where it tries to remove an entry from a destroyed cache."
Could someone help review this patch? It's caught by tsan for jax tests and we verified the fix internally.
Thank you River for the review! I don't have commit access. @rriddle could you help us commit this patch? Thank you so much!
Hey, looks like this broke the asan-aarch64 buildbot. https://lab.llvm.org/buildbot/#/builders/239/builds/771
Sorry it took so long to bisect this down. It seems like the relevant leaks happen because of:
Indirect leak of 104 byte(s) in 1 object(s) allocated from: #0 0xaaaad7bce55c in operator new(unsigned long) /home/mitchp/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3 #1 0xaaaad7e81b44 in __libcpp_operator_new<unsigned long> /home/mitchp/build/libcxx_build_asan/include/c++/v1/new:266:10 #2 0xaaaad7e81b44 in __libcpp_allocate /home/mitchp/build/libcxx_build_asan/include/c++/v1/new:292:10 #3 0xaaaad7e81b44 in allocate /home/mitchp/build/libcxx_build_asan/include/c++/v1/__memory/allocator.h:115:38 #4 0xaaaad7e81b44 in allocate /home/mitchp/build/libcxx_build_asan/include/c++/v1/__memory/allocator_traits.h:268:20 #5 0xaaaad7e81b44 in __allocation_guard<std::__1::allocator<mlir::ThreadLocalCache<llvm::DenseSet<(anonymous namespace)::ParametricStorageUniquer::HashedStorage, (anonymous namespace)::ParametricStorageUniquer::StorageKeyInfo> >::PerInstanceState> > /home/mitchp/build/libcxx_build_asan/include/c++/v1/__memory/allocation_guard.h:53:18 #6 0xaaaad7e81b44 in allocate_shared<mlir::ThreadLocalCache<llvm::DenseSet<(anonymous namespace)::ParametricStorageUniquer::HashedStorage, (anonymous namespace)::ParametricStorageUniquer::StorageKeyInfo> >::PerInstanceState, std::__1::allocator<mlir::ThreadLocalCache<llvm::DenseSet<(anonymous namespace)::ParametricStorageUniquer::HashedStorage, (anonymous namespace)::ParametricStorageUniquer::StorageKeyInfo> >::PerInstanceState>, void> /home/mitchp/build/libcxx_build_asan/include/c++/v1/__memory/shared_ptr.h:947:48 #7 0xaaaad7e81b44 in make_shared<mlir::ThreadLocalCache<llvm::DenseSet<(anonymous namespace)::ParametricStorageUniquer::HashedStorage, (anonymous namespace)::ParametricStorageUniquer::StorageKeyInfo> >::PerInstanceState, void> /home/mitchp/build/libcxx_build_asan/include/c++/v1/__memory/shared_ptr.h:957:12 #8 0xaaaad7e81b44 in ThreadLocalCache /home/mitchp/llvm-project/mlir/include/mlir/Support/ThreadLocalCache.h:123:7
And I'm guessing the rest of the leaks are cascading failures from that leak.
I've attached the rest of the logfile from ninja check-mlir inside of the sanitizer bot (https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild, the buildscript that you'd want to use is buildbot_bootstrap_asan.sh instead of buildbot_fast.sh). It was generated with the following patch, which allows us to actually see the asan report:
diff --git a/mlir/test/CAPI/pdl.c b/mlir/test/CAPI/pdl.c index 673ad8d6af7e..4dbada4060da 100644 --- a/mlir/test/CAPI/pdl.c +++ b/mlir/test/CAPI/pdl.c @@ -7,7 +7,9 @@ // //===----------------------------------------------------------------------===// -// RUN: mlir-capi-pdl-test 2>&1 | FileCheck %s +// RUN: mlir-capi-pdl-test 2>&1 +// +// | FileCheck %s #include "mlir-c/Dialect/PDL.h" #include "mlir-c/BuiltinTypes.h" diff --git a/mlir/test/CAPI/transform.c b/mlir/test/CAPI/transform.c index 1db5c2ecafe4..c69e40642c47 100644 --- a/mlir/test/CAPI/transform.c +++ b/mlir/test/CAPI/transform.c @@ -7,7 +7,9 @@ // //===----------------------------------------------------------------------===// -// RUN: mlir-capi-transform-test 2>&1 | FileCheck %s +// RUN: mlir-capi-transform-test +// +// 2>&1 | FileCheck %s #include "mlir-c/Dialect/Transform.h" #include "mlir-c/IR.h"
That patch is applied on top of bcc10817d5569172ee065015747e226280e9b698, which is the commit for this patch.
I'm not exactly sure why this doesn't reproduce on the x86_64 asan bot. I'm sorry about that.