This is an archive of the discontinued LLVM Phabricator instance.

Use pthreads for thread-local lsan allocator cache on darwin
ClosedPublic

Authored by fjricci on Feb 15 2017, 9:52 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

fjricci created this revision.Feb 15 2017, 9:52 AM
kubamracek added inline comments.Feb 15 2017, 1:40 PM
lib/lsan/lsan_allocator.cc
29 ↗(On Diff #88566)

Again, why do we need this change? We need to make sure not to break these platforms (mips, aarch64).

111 ↗(On Diff #88566)

Looks too weird to me. sizeof(AllocatorCache) should do.

lib/lsan/lsan_allocator.h
58 ↗(On Diff #88566)

Do we need to change this code here? It's not obvious to me that this doesn't change the parameters for some platforms.

lib/sanitizer_common/sanitizer_allocator_primary64.h
49–50 ↗(On Diff #88566)

Why is this necessary? I mean, the change looks reasonable, but is it related to this change? If not, please submit it as a separate patch.

fjricci added inline comments.Feb 16 2017, 11:43 AM
lib/lsan/lsan_allocator.cc
29 ↗(On Diff #88566)

This is exactly equivalent to the existing #if condition (#if (mips || aarch || i386) && !i386), but simplified because the data has been split out.

111 ↗(On Diff #88566)

sizeof(TYPENAME) fails check-sanitizer style checks.

lib/lsan/lsan_allocator.h
58 ↗(On Diff #88566)

It seemed cleaner to do this way, now that kMaxAllowedMallocSize has its own if block. However, I think you're right, this could potentially miss some edge cases for architecture.

lib/sanitizer_common/sanitizer_allocator_primary64.h
49–50 ↗(On Diff #88566)

It's necessary, at least on Darwin, because you'll overflow the type if you use a uptr to store a ULL. I'll move it to a separate patch.

kubamracek added inline comments.Feb 16 2017, 11:52 AM
lib/lsan/lsan_allocator.cc
29 ↗(On Diff #88566)

I'd still prefer not to make refactoring together with functional changes. (This particular change is okay, but it's hard to immediately see that the change is NFC.)

111 ↗(On Diff #88566)

Does it really? Is the linter smart enough to figure out that AllocatorCache is a type and not a variable? Any other suggestions? I really don't like to have a function call inside a sizeof() (I know it's not really calling the function, but still.)

lib/sanitizer_common/sanitizer_allocator_primary64.h
49–50 ↗(On Diff #88566)

This whole class should be ifdef'd to only compile on 64-bit platforms. Isn't that so?

fjricci added inline comments.Feb 16 2017, 12:44 PM
lib/lsan/lsan_allocator.cc
111 ↗(On Diff #88566)

Ahh, yeah, the linter isn't smart enough.

lib/sanitizer_common/sanitizer_allocator_primary64.h
49–50 ↗(On Diff #88566)

On non-windows platforms, uptr is defined as an unsigned long, not an unsigned long long (sanitizer_internal_defs.h:103)

kubamracek added inline comments.Feb 16 2017, 1:02 PM
lib/sanitizer_common/sanitizer_allocator_primary64.h
49–50 ↗(On Diff #88566)

Right, and unsigned long is 64-bit on (supported) 64-bit platforms, isn't it?

fjricci added inline comments.Feb 16 2017, 1:48 PM
lib/sanitizer_common/sanitizer_allocator_primary64.h
49–50 ↗(On Diff #88566)

Turns out that __arm__ was creeping in due to a gap in the if condition. I'll fix that.

fjricci updated this revision to Diff 88780.Feb 16 2017, 1:59 PM

Fix architecture issues

fjricci updated this revision to Diff 88841.Feb 16 2017, 7:33 PM

Rebase and initialize AllocatorCache

kubamracek added inline comments.Feb 16 2017, 7:42 PM
lib/lsan/lsan_common.h
18 ↗(On Diff #88841)

Hm... Could we have it the other way around? lsan_allocator.h would include lsan_common.h and GetAllocatorCache would be declared in lsan_allocator.h only. lsan_allocator.cc would include lsan_allocator.h. The _linux.cc and _mac.cc files would also include lsan_allocator.h. Is there a dependency problem with that?

fjricci added inline comments.Feb 16 2017, 8:16 PM
lib/lsan/lsan_common.h
18 ↗(On Diff #88841)

Ahh, that's a very good idea, didn't think to move just the declaration.

fjricci updated this revision to Diff 88846.Feb 16 2017, 8:19 PM

Move declaration of GetAllocatorCache

kubamracek added inline comments.Feb 16 2017, 8:21 PM
lib/lsan/lsan_allocator.h
55 ↗(On Diff #88846)

Does the issue with __arm__ still exist now?

fjricci updated this revision to Diff 88847.Feb 16 2017, 8:30 PM

Remove arm workaround

kubamracek accepted this revision.Feb 16 2017, 8:34 PM

LGTM. Thanks for working on this and for being patient with the review!

This revision is now accepted and ready to land.Feb 16 2017, 8:34 PM

Thanks for taking the time to review! This should fix the last of the build failures, currently working on a patch to enable builds by default but disable actually running it.

This revision was automatically updated to reflect the committed changes.

Thanks for taking the time to review! This should fix the last of the build failures, currently working on a patch to enable builds by default but disable actually running it.

Okay. It should be enough to exclude "check-lsan" from "check-all", but still keep "check-lsan" so that we can run the tests manually. Look for add_lit_testsuite and EXCLUDE_FROM_ALL.

Thanks for taking the time to review! This should fix the last of the build failures, currently working on a patch to enable builds by default but disable actually running it.

Okay. It should be enough to exclude "check-lsan" from "check-all", but still keep "check-lsan" so that we can run the tests manually. Look for add_lit_testsuite and EXCLUDE_FROM_ALL.

We'll also need to change the default in asan, since asan enables lsan by default currently.

fjricci reopened this revision.Feb 17 2017, 6:51 AM

This patch causes a build failure on one test on linux, and I'm having trouble figuring out why. It appears that the test error is fixed by simply moving the definition of GetAllocatorCache() from lsan_common_linux.cc to lsan_allocator.cc. Not sure why this would change the behavior of the test, but investigating now. In the meantime, I reverted the patch.

This revision is now accepted and ready to land.Feb 17 2017, 6:51 AM

The failing test is Linux-x86_64 stack-use-after-return.cc, failing with: AddressSanitizer: nested bug in the same thread, aborting.

Can you post the backtrace of the crash/failure?

FAIL: AddressSanitizer-x86_64-linux :: TestCases/Posix/stack-use-after-return.cc (1102 of 1258)

  • TEST 'AddressSanitizer-x86_64-linux :: TestCases/Posix/stack-use-after-return.cc' FAILED ****

Script:

/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc -pthread -o /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp && env ASAN_OPTIONS=detect_stack_use_after_return=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O1 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc -pthread -o /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp && env ASAN_OPTIONS=detect_stack_use_after_return=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O2 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc -pthread -o /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp && env ASAN_OPTIONS=detect_stack_use_after_return=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O3 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc -pthread -o /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp && env ASAN_OPTIONS=detect_stack_use_after_return=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc
env ASAN_OPTIONS=detect_stack_use_after_return=0 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -O3 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc -pthread -o /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp -DkSize=10000 -DUseThread -DkStackSize=65536 && env ASAN_OPTIONS=detect_stack_use_after_return=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck --check-prefix=THREAD /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/./bin/clang --driver-mode=g++ -fsanitize=address -mno-omit-leaf-frame-pointer -fno-omit-frame-pointer -fno-optimize-sibling-calls -gline-tables-only -m64 -DUseThread -O2 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc -pthread -o /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp && env ASAN_OPTIONS=detect_stack_use_after_return=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck --check-prefix=THREAD /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc
env ASAN_OPTIONS=detect_stack_use_after_return=1:max_uar_stack_size_log=20:verbosity=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck --check-prefix=CHECK-20 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc

env ASAN_OPTIONS=detect_stack_use_after_return=1:min_uar_stack_size_log=24:max_uar_stack_size_log=24:verbosity=1 not /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm_build_ninja/projects/compiler-rt/test/asan/X86_64LinuxConfig/TestCases/Posix/Output/stack-use-after-return.cc.tmp 2>&1 | FileCheck --check-prefix=CHECK-24 /mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc

Exit Code: 1

Command Output (stderr):

1: 0x7fff258045e0
2: 0x7fff258045e0
/mnt/b/sanitizer-buildbot1/sanitizer-x86_64-linux/build/llvm/projects/compiler-rt/test/asan/TestCases/Posix/stack-use-after-return.cc:52:13: error: expected string not found in input
// THREAD: WRITE of size 1 {{.*}} thread T{{[1-9]}}

^

<stdin>:1:1: note: scanning from here
1: 0x7f922b0cf820
^
<stdin>:5:24: note: possible intended match here
AddressSanitizer: nested bug in the same thread, aborting.

^

--

Right, but can you run this in a debugger and see what the actual backtrace is?

I believe the issue is that we're using too small of a stack in that test, and perhaps we're just pushed over the limit by this new AllocatorCache. If I change -DkStackSize=65536 to be a number >=75392, the test passes. Is this an acceptable solution? If so, what would be a good number to use?

I believe the issue is that we're using too small of a stack in that test, and perhaps we're just pushed over the limit by this new AllocatorCache. If I change -DkStackSize=65536 to be a number >=75392, the test passes. Is this an acceptable solution? If so, what would be a good number to use?

Ok, that's even weirder. Why should this change the local stack overhead? Do we overflow the stack? Can you actually post the backtrace from the debugger? Why is there a nested ASan failure?

The stack trace from the debugger isn't particularly helpful:

 thread #2, name = 'stack-use-after', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7e72020)
  frame #0: stack-use-after-return.cc.tmp`Func1() at stack-use-after-return.cc:39
   36  	}
   37  	
   38  	__attribute__((noinline))
-> 39  	char *Func1() {
   40  	  char local[kSize];
   41  	  return Ident(local);
   42  	}
(lldb) bt all
  thread #1, name = 'stack-use-after'
    frame #0: libpthread.so.0`pthread_join(threadid=140737352574720, thread_return=0x0000000000000000) at pthread_join.c:92
    frame #1: stack-use-after-return.cc.tmp`main at stack-use-after-return.cc:74
    frame #2: libc.so.6`__libc_start_main(main=(stack-use-after-return.cc.tmp`main), argc=1, argv=0x00007fffffffdb78, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffdb68) at libc-start.c:287
    frame #3: 0x000000000041ba7c stack-use-after-return.cc.tmp`_start + 41
* thread #2, name = 'stack-use-after', stop reason = signal SIGSEGV: address access protected (fault address: 0x7ffff7e72020)
  * frame #0: stack-use-after-return.cc.tmp`Func1() at stack-use-after-return.cc:39
    frame #1: stack-use-after-return.cc.tmp`Thread(void*) at stack-use-after-return.cc:61
    frame #2: libpthread.so.0`start_thread(arg=0x00007ffff7e82700) at pthread_create.c:312
    frame #3: libc.so.6`__clone at clone.S:111

The crash error also goes away if I add a CHECK(0); to GetAllocatorCache(), which is similarly strange.

That does look like it could be a stack-overflow seg fault though.

I found something very odd and very relevant. I added a line in test-use-after-return.cc to call pthread_attr_getstacksize immediately after pthread_attr_setstacksize. On master, the returned stack size is 8388608. With my change, it's 65536. I assume this means that we're intercepting something inside call to pthread_attr_setstacksize. But regardless, this means that 65536 was not large enough to begin with, unless we expect the sanitizers to intercept that call and increase the stack size. I assume we don't, since there's no direct interceptor for pthread_attr_setstacksize.

My diff is:

int main(int argc, char **argv) {
   pthread_attr_init(&attr);
   if (kStackSize > 0)
     pthread_attr_setstacksize(&attr, kStackSize);
+  size_t stacksize = 0;
+  pthread_attr_getstacksize(&attr, &stacksize);
+  printf("stacksize: %lu\n", stacksize);
   pthread_t t;
   pthread_create(&t, &attr, Thread, 0);
   pthread_attr_destroy(&attr);

Ok, then the test is just flaky. Let's simply double the stack size value in the test (as a separate commit).

This revision was automatically updated to reflect the committed changes.
fjricci reopened this revision.Mar 13 2017, 2:00 PM

Reverted due to failure with D30267

This revision is now accepted and ready to land.Mar 13 2017, 2:00 PM
This revision was automatically updated to reflect the committed changes.
This revision is now accepted and ready to land.Mar 20 2017, 6:56 AM

Okay, this is getting ridiculous. I've disabled the test on AArch64 in r298267. The test was not testing the right thing on AArch64 anyway (it wasn't setting the stack size).

This revision was automatically updated to reflect the committed changes.

This patch is having a real rough time. @vitalybuka thinks it probably caused this failure, and I wouldn't be too surprised: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/1117/steps/test%20standalone%20compiler-rt/logs/stdio

It doesn't repro on my CentOS or my Ubuntu machine though...

Actually, looks like the buildbot isn't failing currently. I won't revert the patch for now.

fjricci reopened this revision.Mar 21 2017, 8:58 AM

Going to try to get the failure to repro locally.

This revision is now accepted and ready to land.Mar 21 2017, 8:58 AM

I've tried everything I can think of in terms of a repro here - different gcc/clang versions, etc. I also looked up the zorg config and ran exactly the way it was run on the buildbot (standalone build + i386 test), and still no failure. Perhaps I can split this into two changes, to try to see where the failure is springing up.

My guess is that this is still caused simply by moving the GetAllocatorCache() to a different file, so I can try to commit the whole change minus that bit first, and make sure tests pass. If just moving the function definition to a different file causes a failure, it's very unlikely to me that the failure is legitimate and not a flake.

Given that the code review on this is fine, I'll go ahead with merging the partial commit (the bit that moves the definitions to the header and adds a function for GetAllocatorCache() in the same file), provided I don't hear any objection. I'll then upload another review for the second half.

fjricci updated this revision to Diff 92665.Mar 22 2017, 11:34 AM

Split revision into smaller pieces

fjricci updated this revision to Diff 92669.Mar 22 2017, 11:41 AM

Actually, let's make it even smaller.

This revision was automatically updated to reflect the committed changes.