This is an archive of the discontinued LLVM Phabricator instance.

tsan: fix bug in shadow reset introduced in D128909
ClosedPublic

Authored by thanm on Aug 5 2022, 5:37 AM.

Details

Summary

Correct a bug in the code that resets shadow memory introduced as part
of a previous change for the Go race detector (D128909). The bug was
that only the most recently added shadow segment was being reset, as
opposed to the entire extent of the segment created so far. This
fixes a bug identified in Google internal testing (b/240733951).

Diff Detail

Event Timeline

thanm created this revision.Aug 5 2022, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 5:37 AM
Herald added a subscriber: Enna1. · View Herald Transcript
thanm requested review of this revision.Aug 5 2022, 5:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 5 2022, 5:37 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dvyukov added inline comments.Aug 5 2022, 5:53 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
203–204

Do we need the ctx->mapped_shadow_begin variable at all?
It's used in MapShadow, but only to update itself. So it looks we can remove it now.

But also what if the mapped region (ctx->mapped_shadow_begin) is far from ShadowBeg? Won't we mmap/memset potentially huge region (terabytes)? Maybe we need to fix how we update ctx->mapped_shadow_begin instead?

E.g. instead of:

if (ctx->mapped_shadow_begin < shadow_begin)
  ctx->mapped_shadow_begin = shadow_begin;

do something like:

if (!ctx->mapped_shadow_begin || ctx->mapped_shadow_begin > shadow_begin)
   ctx->mapped_shadow_begin = shadow_begin;

?

thanm added inline comments.Aug 5 2022, 7:01 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
203–204

You are the expert here, I am arriving late on the scene. Happy to try your second suggestion.

Won't we mmap/memset potentially huge region (terabytes)?

What confuses me here is that for regular C/C++ tsan (e.g. not SANITIZER_GO) it's going to default to clearing ShadowBegin() through ShadowEnd(), which presumably is a very large region. Wouldn't your concern apply there too?

dvyukov added inline comments.Aug 5 2022, 7:42 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
203–204

You are the expert here, I am arriving late on the scene. Happy to try your second suggestion.

It just looks like mapped_shadow_begin is currently maintained incorrectly (does not really point to the beginning of the used shadow region). Otherwise I don't know how to explain that extreme slowdown in the test we observed.

Wouldn't your concern apply there too?

C++ tsan does not work on Windows. And Windows is the reason we added these mapped_shadow_begin/end. On unixes it's fine to mmap terabytes of virtual address space.

thanm added inline comments.Aug 5 2022, 7:46 AM
compiler-rt/lib/tsan/rtl/tsan_rtl.cpp
203–204

Right, of course.

OK, this strategy:

if (!ctx->mapped_shadow_begin || ctx->mapped_shadow_begin > shadow_begin)
   ctx->mapped_shadow_begin = shadow_begin;

looks good-- fixes the google test timeout and seems to be clean with race.bat on windows. I will update the patch.

thanm updated this revision to Diff 450289.Aug 5 2022, 7:48 AM

Switch shadow clear back to [mapped_shadow_begin, mapped_shadow_end]
but revise recipe for updating mapped_shadow_begin.

dvyukov accepted this revision.Aug 5 2022, 7:53 AM
This revision is now accepted and ready to land.Aug 5 2022, 7:53 AM
This revision was landed with ongoing or failed builds.Aug 5 2022, 8:38 AM
This revision was automatically updated to reflect the committed changes.