This is an archive of the discontinued LLVM Phabricator instance.

tsan: capture shadow map start/end on init and reuse in reset
ClosedPublic

Authored by thanm on Jun 30 2022, 6:35 AM.

Details

Summary

Capture the computed shadow begin/end values at the point where
the shadow is first created and reuse those values on reset.

See https://github.com/golang/go/issues/53539#issuecomment-1168778740
for context; intended to help with updating the syso for
Go's windows/amd64 race detector.

DO NOT SUBMIT -- still experimental.

Diff Detail

Event Timeline

thanm created this revision.Jun 30 2022, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 6:35 AM
Herald added a subscriber: Enna1. · View Herald Transcript
thanm requested review of this revision.Jun 30 2022, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 6:35 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dvyukov added inline comments.Jun 30 2022, 7:02 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
570–571

MapShadow is called multiple times by the Go runtime (as it grows heap), so we need to memorize superset of all mapped ranges.
It's somewhat problematic if the heap is not continuous.
Previously Go heap was continuous, but now I think it's not.
But I also see that below we still assume it's continuous for the meta_begin/end logic.

571

I think this needs some synchronization with DoReset.
Since MapShadow is called directly from Go, I think we can take any mutex that's locked by DoReset, e.g. thread registry mutex.

thanm added inline comments.Jun 30 2022, 7:49 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
570–571

MapShadow is called multiple times by the Go runtime (as it grows heap), so we need to memorize superset of all mapped ranges.
It's somewhat problematic if the heap is not continuous.
Previously Go heap was continuous, but now I think it's not.
But I also see that below we still assume it's continuous for the meta_begin/end logic.

570–571

I consulted with Austin.

According to them the Go runtime is trying to keep things contiguous when race mode is on, e.g.

https://cs.opensource.google/go/go/+/d3ffff2790059226ae3aa90856d687e138701b5c:src/runtime/malloc.go;l=456?q=malloc.go&ss=go%2Fgo

and it will error if this invariant can't be enforced:

https://cs.opensource.google/go/go/+/d3ffff2790059226ae3aa90856d687e138701b5c:src/runtime/malloc.go;l=613?q=malloc.go&ss=go%2Fgo

571

Not sure if I understand -- maybe you could be a bit more specific about what you had in mind?

From looking at https://github.com/llvm/llvm-project/blob/13fb97d68821e1948d176057ebee94f50cb05b62/compiler-rt/lib/tsan/rtl/tsan_rtl.cpp#L136 it appears that DoResetImpl already holds the thread registry mutex.

dvyukov added inline comments.Jul 1 2022, 12:14 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
570–571

This is good! Then we just need to track union of all ranges as a single start/end range (as we do for meta).

571

it appears that DoResetImpl already holds the thread registry mutex.

Yes, but MapShadow does not, thus there is race on the new global variables you added. To protect accesses to these variables, we need MapShadow to lock e.g. thread registry mutex.

thanm updated this revision to Diff 441766.Jul 1 2022, 12:05 PM

Added synchronization between MapShadow and DoReset.
Add logic to update mapped shadow range variables, modeled
after the corresponding code for the meta shadow.

thanm added a comment.Jul 1 2022, 12:08 PM

I've made an attempt to incorporate your suggestions.

Still doesn't seem to work with -buildmode=pie on windows-- first call to MapShadow fails on the first call. Here's a run with GORACE=verbosity=2 of a small himom-ish program:

C:\workdir\xxx>extcgohimom.exe
***** Running under ThreadSanitizer v3 (pid 1336) *****
=-= mapping shadow for (0x7ff618ffb1a0-0x7ff61905f1a0) at (0x100ec31ff0000-0x100ec320c0000)
==1336==ERROR: ThreadSanitizer failed to allocate 0x0000000d0000 (851968) bytes at 0x100ec31ff0000 (error code: 87)

I've made an attempt to incorporate your suggestions.

Still doesn't seem to work with -buildmode=pie on windows-- first call to MapShadow fails on the first call. Here's a run with GORACE=verbosity=2 of a small himom-ish program:

C:\workdir\xxx>extcgohimom.exe
***** Running under ThreadSanitizer v3 (pid 1336) *****
=-= mapping shadow for (0x7ff618ffb1a0-0x7ff61905f1a0) at (0x100ec31ff0000-0x100ec320c0000)
==1336==ERROR: ThreadSanitizer failed to allocate 0x0000000d0000 (851968) bytes at 0x100ec31ff0000 (error code: 87)

Oh, now I see what happens from this error message.
The memory address where the executable is loaded is just outside of the supported range:
https://github.com/llvm/llvm-project/blob/b2e9684fe4d10dbdd7679a29c6f971dc59ede3dd/compiler-rt/lib/tsan/rtl/tsan_platform.h#L388-L416
So when we calculate the corresponding shadow address 0x100ec31ff0000, it's simply outside of the canonical address range/address space. You can see it has 49-th bit set, while the address space is only 47 bits.

So, yes, the simplest solution for now is to disable PIE binaries.
You can leave a comment there that theoretically it's possible to support b/c e.g. for Linux/C we support that:
https://github.com/llvm/llvm-project/blob/b2e9684fe4d10dbdd7679a29c6f971dc59ede3dd/compiler-rt/lib/tsan/rtl/tsan_platform.h#L40-L51

dvyukov added inline comments.Jul 2 2022, 1:42 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
135

Please move this to the Context class in tsan_rtl.h and surround with #if SANITIZER_GO/#endif.
(it may require surrounding MapShadow with #if SANITIZER_GO as well, for some reason it's not yet).

206

This may work, but logically it's somewhat confusing. If mapped_shadow_captured is not set for Go, then we will still use ShadowBeg/ShadowEnd which will fail on windows. So it's not really about mapped_shadow_captured, it's about SANITIZER_GO or not.

575

Drop the comment if it's not relevant.

585

Shouldn't this set mapped_shadow_captured? Not sure how it works without it...

Logically, first_complete is the same as mapped_shadow_captured, right? Do we need both?

But also I would consider removing it entirely b/c mapped_shadow_begin/end are sufficient. I think we could also reduce both cases to a single case with:

if (!mapped_shadow_begin)
  mapped_shadow_begin = mapped_shadow_end = shadow_begin;
626

Use CHECK_GT(meta_end, mapped_meta_end)
Both more useful message and clear that it comes from tsan runtime.

thanm marked 2 inline comments as done.Jul 7 2022, 6:07 AM
thanm added inline comments.
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
135

I've moved mapped_shadow_{begin,end} to fields of Context and guarded with #if SANITIZER_GO.

Just to be clear, you're saying that we only want the machinery tracking shadow mappings to kick in for Go, correct? E.g. in MapShadow for the !SANITIZER_GO case, we want the original unmodified shadow mapping behavior, e.g.

// Global data is not 64K aligned, but there are no adjacent mappings,
// so we can get away with unaligned mapping.
// CHECK_EQ(addr, addr & ~((64 << 10) - 1));  // windows wants 64K alignment
const uptr kPageSize = GetPageSizeCached();
uptr shadow_begin = RoundDownTo((uptr)MemToShadow(addr), kPageSize);
uptr shadow_end = RoundUpTo((uptr)MemToShadow(addr + size), kPageSize);
if (!MmapFixedNoReserve(shadow_begin, shadow_end - shadow_begin, "shadow"))
  Die();

Thanks.

206

OK, I've changed this to:

auto shadow_begin = ShadowBeg();
auto shadow_end = ShadowEnd();
#if SANITIZER_GO
CHECK_NEQ(0, ctxt->mapped_shadow_begin);
shadow_begin = ctxt->mapped_shadow_begin;
shadow_end = ctxt->mapped_shadow_end;
#endif
585

I'll see about removing mapped_shadow_captured, that makes sense. Thanks.

dvyukov added inline comments.Jul 7 2022, 11:08 PM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
135

Just to be clear, you're saying that we only want the machinery tracking shadow mappings to kick in for Go, correct?

Yes.

Apologies; had to move away from this to work on other things for a while (too many pots on the stove). I will rebase and post my latest version.

thanm updated this revision to Diff 446232.Jul 20 2022, 12:16 PM

Incorporate code review suggestions, add some
additional VPrintf(2, ...) calls.

This latest version is still not working as it should. Here is a run of "go test -race bytes" with GORACE=verbosity=2:

C:\workdir\xxx>go test -race bytes 
***** Running under ThreadSanitizer v3 (pid 484) *****
MapShadow for (0x140353080-0x1403dd080), begin/end: (0x102806a0000-0x102807c0000)
mapped meta shadow for (0x140353080-0x1403dd080) at (0x700a01a0000-0x700a01f0000)
MapShadow for (0xc000000000-0xc000400000), begin/end: (0x28000000000-0x28000800000)
ctx->mapped_shadow_{begin,end} = (0x0-0x0)
MapShadow begin/end = (0x28000000000-0x28000800000)
mapped meta shadow for (0xc000000000-0xc000400000) at (0x76000000000-0x76000200000)
ThreadSanitizer: growing sync allocator: 0 out of 1048576*1024
shadow_begin-shadow_end: (0x28000000000-0x28000800000)
==484==ERROR: ThreadSanitizer failed to allocate 0x000000800000 (8388608) bytes at 0x028000000000 (error code: 487)
failed to reset shadow memory
FAIL	bytes	1.000s
FAIL

The MmapFixedSuperNoReserve in DoResetImpl is failing with windows error code 487 ("Attempt to access invalid address"), not sure why.

The range looks correct, right?
Maybe something with MEM_RESERVE/COMMIT flags. Is it OK to COMMIT the same memory twice? Maybe it's only OK on UNIX systems.

thanm added a comment.Jul 21 2022, 5:11 AM

The range looks correct, right?
Maybe something with MEM_RESERVE/COMMIT flags. Is it OK to COMMIT the same memory twice? Maybe it's only OK on UNIX systems.

Yes, range seems ok. I will play with the flags and see what happens there (although I am just grasping at straws, don't have much windows expertise).

thanm added a comment.Jul 21 2022, 8:53 AM

Maybe something with MEM_RESERVE/COMMIT flags. Is it OK to COMMIT the same memory twice? Maybe it's only OK on UNIX systems.

Yes, range seems ok. I will play with the flags and see what happens there (although I am just grasping at straws, don't have much windows expertise).

Passing a zero value (instead of MEM_RESERVE|MEM_COMMIT) for that single Mmap call (the one from DoResetImpl) also results in an error, but this time with windows error code 87 ("The parameter is incorrect").

Passing just MEM_RESERVE (no MEM_COMMIT) gives the same error again (487).

Passing only MEM_COMMIT does seem to work (runtime/race package test looks ok, and race.bat is passing so far).

Looking at the docs (https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc):

"VirtualAlloc cannot reserve a reserved page. It can commit a page that is already committed. This means you can commit a range of pages, regardless of whether they have already been committed, and the function will not fail."

Seems like the patch forward is to change DoResetImpl call a new Mmap entry point that passes MEM_COMMIT but not MEM_RESERVE. Things are a little weird in that the function it currently invokes is named MmapFixedSuperNoReserve, but in spite of the "NoReserve" tag, MmapFixedSuperNoReserve does indeed pass MEM_RESERVE on windows(?). I am assuming that the NoReserve naming refers to MMAP_NORESERVE as opposed to windows MEM_RESERVE.

How about if I call a new entry point RepeatMmapFixed (indicating that this is a repeated mmap/VirtualAlloc), and the VirtualAlloc in RepeatMmapFixed can avoid passing MEM_RESERVE. Sound OK?

Yes, it's about MMAP_NORESERVE.
And I agree we need a new function.
The question is: does MEM_COMMIT alone _zero_ memory? Because that's the purpose of that mmap call. Alternatively we can just call something like internal_memset(0) I think. Not sure how slow/fast it is.
Re naming... not sure. We could also use "zero" in the name b/c that's the real intention.

thanm added a comment.Jul 22 2022, 6:48 AM

The question is: does MEM_COMMIT alone _zero_ memory? Because that's the purpose of that mmap call.

As far as I can tell, the answer seems to be "no". Which is a bit surprising, I would have thought this would have caused failures in the race detector tests.

Here is a toy program I wrote:

$ cat tryva.c

#include <stdio.h>
#include <memoryapi.h>

#define SZ 0x40000

void dump(const char *tag, int *imem) {
  fprintf(stderr, "%s\n", tag);
  fprintf(stderr, "i[0] is %d\n", imem[0]);
  fprintf(stderr, "i[16384] is %d\n", imem[16384]);
}

void scribble(int *imem) {
  imem[0] = 1;
  imem[16384] = 2;
}

int main() {
  LPVOID addr = 0xc000000000ull;
  SIZE_T siz = 0x40000;
  void *mem = VirtualAlloc(addr, siz, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
  fprintf(stderr, "mem is 0x%zx\n", mem);
  int *imem = ((int*) mem);
  dump("initial", imem);
  scribble(imem);
  dump("after write", imem);
  void *xmem = VirtualAlloc(addr, siz, MEM_COMMIT, PAGE_READWRITE);
  fprintf(stderr, "xmem is 0x%zx\n", xmem);
  dump("after second VirtualAlloc", imem);
  return 0;
}

and here's what happens when I run this on my windows machine:

C:\workdir\xxx>gcc -o tryva.exe -Wl,--no-dynamicbase tryva.c 
tryva.c:19:10: warning: incompatible integer to pointer conversion initializing 'LPVOID' (aka 'void *') with an expression of type 'unsigned long long' [-Wint-conversion]
  LPVOID addr = 0xc000000000ull;
         ^      ~~~~~~~~~~~~~~~
1 warning generated.

C:\workdir\xxx>tryva.exe
mem is 0xc000000000
initial
i[0] is 0
i[16384] is 0
after write
i[0] is 1
i[16384] is 2
xmem is 0xc000000000
after second VirtualAlloc
i[0] is 1
i[16384] is 2
C:\workdir\xxx>

Looks like internal_memset() is the way to go. I will pursue that.

thanm added a comment.Jul 22 2022, 7:01 AM

Looks like internal_memset() is the way to go. I will pursue that.

Actually, reading the docs a bit more I think we can get a memory clear just using

VirtualFree(addr, siz, MEM_DECOMMIT);
VirtualAlloc(addr, siz, MEM_COMMIT, PAGE_READWRITE);

I tested that with my C example program and it seems to work.

thanm updated this revision to Diff 446867.Jul 22 2022, 9:10 AM

Introduce new windows-specific function "ZeroMmapFixedRegion" for
zeroing out an address space region previously returned by one of the
MmapFixed* routines; call this function (on windows) from DoResetImpl
tsan_rtl.cpp instead of MmapFixedSuperNoReserve.

thanm added a comment.Jul 22 2022, 9:13 AM

OK, I've switched to using the VirtualFree/VirtualAlloc strategy for shadow reset on windows. PTAL.

What should I be using to test this other than Go's race detector tests? I'll run
'ninja check-tsan' on linux and windows; let me know if there are other tests I should include.

thanm updated this revision to Diff 446993.Jul 22 2022, 3:01 PM

Bug fixes.

OK, I've switched to using the VirtualFree/VirtualAlloc strategy for shadow reset on windows. PTAL.

This unfortunately won't work. There may be concurrent accesses during this operations. They will cause paging faults if we do VirtualFree.

What should I be using to test this other than Go's race detector tests? I'll run
'ninja check-tsan' on linux and windows; let me know if there are other tests I should include.

ninja check-tsan is always good.
I also always copy the new runtime into Go and then run 'go test -race -short std cmd'. And then optionally either run all benchmarks as well, or sync/runtime tests to stress tsan more.

compiler-rt/lib/sanitizer_common/sanitizer_win.cpp
241 ↗(On Diff #446993)

This unfortunately won't

compiler-rt/lib/tsan/go/build.bat
60 ↗(On Diff #446993)

There is already SANITIZER_WINDOWS.
There is also SANITIZER_WORDSIZE == 64, but I don't think it's necessary. Tsan supports only 64 bits, and that code does not look 64-bit specific anyway.

thanm updated this revision to Diff 447279.Jul 25 2022, 4:56 AM

Switch from VirtualAlloc/VirtualFree to internal_memset when
zeroing shadow memory.

thanm added inline comments.Jul 25 2022, 5:27 AM
compiler-rt/lib/sanitizer_common/sanitizer_win.cpp
241 ↗(On Diff #446993)

OK, sounds like internal_memset is the way to go. I will revise the patch.

compiler-rt/lib/tsan/go/build.bat
60 ↗(On Diff #446993)

Switched to SANITIZER_WINDOWS.

thanm added a comment.Jul 25 2022, 6:03 AM

FWIW looks like "tsan" and "check-tsan" are not legal targets for Windows (I assume this is WAI). cmake runs, but ninja cmd fails:

...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Configuring done
-- Generating done
-- Build files have been written to: C:/workdir/build

C:\workdir\build>C:\ProgramData\chocolatey\bin\ninja.exe tsan 
ninja: error: unknown target 'tsan', did you mean 'asan'?
Error running run: exit status 1

"go test -race -test.short std cmd" is clean.

Unfortunately it looks like I am getting a timeout for "go test -test.bench=. -race -test.v -test.count=1 context". Here is the run output at the point of the hang:

goos: windows
goarch: amd64
pkg: context
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkCommonParentCancel-4       	   21184	      5300 ns/op
BenchmarkWithTimeout/concurrency=40-4         	    6099	     23371 ns/op
BenchmarkWithTimeout/concurrency=4000-4       	    4008	     25585 ns/op

The next line should be

BenchmarkWithTimeout/concurrency=400000
BenchmarkWithTimeout/concurrency=400000-4     	   NNN	     MMM ns/op

Not sure what my strategy should be for debugging this.

thanm added a comment.Jul 25 2022, 6:55 AM

For the BenchmarkWithTimeout/concurrency=400000 problem, I tried running with GORACE=verbosity=2. Output looks like

...
ThreadSanitizer: growing sync allocator: 1171 out of 1048576*1024
ThreadSanitizer: growing sync allocator: 1172 out of 1048576*1024
ThreadSanitizer: growing sync allocator: 1173 out of 1048576*1024
ThreadSanitizer: growing sync allocator: 1174 out of 1048576*1024
ThreadSanitizer: suppressing report as doubled (addr)
ThreadSanitizer: suppressing report as doubled (addr)
<about 2500 more of this same message>

At which point it times out and the test is killed.

For the BenchmarkWithTimeout/concurrency=400000 problem, I tried running with GORACE=verbosity=2. Output looks like

...
ThreadSanitizer: growing sync allocator: 1171 out of 1048576*1024
ThreadSanitizer: growing sync allocator: 1172 out of 1048576*1024
ThreadSanitizer: growing sync allocator: 1173 out of 1048576*1024
ThreadSanitizer: growing sync allocator: 1174 out of 1048576*1024
ThreadSanitizer: suppressing report as doubled (addr)
ThreadSanitizer: suppressing report as doubled (addr)
<about 2500 more of this same message>

At which point it times out and the test is killed.

Coincidentally I just deleted the code that printed "suppressing report as doubled (addr)":
https://github.com/llvm/llvm-project/commit/7505cc301f718c3a4015595729609aadcd702890
Maybe try to re-sync. There is also another fix for some pathological slowdowns, though these are supposed to be extremely rare.

First question: does it pass with the current runtime?

Does it hang dead, or just very slow with the new runtime? Bots should run benchmarks with -benchtime=1ms. If you add prints to every iteration, does it progress?

Not sure what debugging tools you have on windows. On Linux I would first run it under perf to see where it's looping. Then gdb to capture more precise stacks and see what happens. Is there a way to SIGABRT on windows to see stacks? Then maybe import net/http/pprof.

thanm added a comment.Jul 26 2022, 5:16 AM

Coincidentally I just deleted the code that printed "suppressing report as doubled (addr)":
https://github.com/llvm/llvm-project/commit/7505cc301f718c3a4015595729609aadcd702890
Maybe try to re-sync. There is also another fix for some pathological slowdowns, though these are supposed to be extremely rare.

OK, I'll try syncing/rebasing and see if that helps.

First question: does it pass with the current runtime?

Yes.

Does it hang dead, or just very slow with the new runtime? Bots should run benchmarks with -benchtime=1ms. If you add prints to every iteration, does it progress?

Not sure what debugging tools you have on windows. On Linux I would first run it under perf to see where it's looping. Then gdb to capture more precise stacks and see what happens. Is there a way to SIGABRT on windows to see stacks? Then maybe import net/http/pprof.

Debugging tools: perf and gdb are unfortunately not available (working on windows... "the crime is its own punishment" as the saying goes). I will see if I can get some sort of profile via pprof.

From a logistics perspective, it would be great to get the current patch checked in and then I can work on the context problem as a follow on. Are you ok with that?

dvyukov accepted this revision.Jul 26 2022, 5:21 AM

From a logistics perspective, it would be great to get the current patch checked in and then I can work on the context problem as a follow on. Are you ok with that?

Yes.

Looks good to me with few nits.

compiler-rt/lib/sanitizer_common/sanitizer_win.cpp
239 ↗(On Diff #447279)

Update the comment.

compiler-rt/lib/tsan/go/build.bat
60 ↗(On Diff #446993)

Is this necessary? I think SANITIZER_WINDOWS must be defined by sanitizer_common headers already.

This revision is now accepted and ready to land.Jul 26 2022, 5:21 AM
thanm marked an inline comment as done.Jul 26 2022, 6:09 AM
thanm added inline comments.
compiler-rt/lib/sanitizer_common/sanitizer_win.cpp
239 ↗(On Diff #447279)

Done.

compiler-rt/lib/tsan/go/build.bat
60 ↗(On Diff #446993)

It definitely is needed. The SANITIZER_WINDOWS def is in sanitizer_platform.h, and the build.bat script does this:

type ^

tsan_go.cpp ^
..\rtl\tsan_interface_atomic.cpp ^
..\rtl\tsan_flags.cpp ^
..\rtl\tsan_md5.cpp ^
..\rtl\tsan_report.cpp ^
..\rtl\tsan_rtl.cpp ^
..\rtl\tsan_rtl_access.cpp ^

so at the point where gcc sees the tsan_rtp.cpp source, sanitizer_platform.h has not yet been included.

thanm marked an inline comment as done.Jul 26 2022, 6:11 AM

OK, I'll try syncing/rebasing and see if that helps.

Good news -- after the rebase the hang on the context benchmark is gone... whew. I'm getting a clean run of race.bat (benchmarks and all) now.

thanm updated this revision to Diff 447665.Jul 26 2022, 6:23 AM

Rebase to tip of main branch (fixes context package benchmark hang with -race).
Update comments.

This revision was landed with ongoing or failed builds.Jul 26 2022, 7:04 AM
This revision was automatically updated to reflect the committed changes.

OK, I'll try syncing/rebasing and see if that helps.

Good news -- after the rebase the hang on the context benchmark is gone... whew. I'm getting a clean run of race.bat (benchmarks and all) now.

Phew indeed!

dvyukov added inline comments.Jul 26 2022, 7:10 AM
compiler-rt/lib/tsan/go/build.bat
60 ↗(On Diff #446993)

sanitizer_platform.h should be included into tsan_rtl.h via normal C++ includes.
build.bat only chains source files together, but all headers are included as usual.

thanm added a comment.Jul 26 2022, 8:00 AM

Dmitry thanks for all your code review comments + suggestions, I really appreciate your help.

Dmitry thanks for all your code review comments + suggestions, I really appreciate your help.

You are welcome, Than.