This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt] [tsan] Unify aarch64 mapping
ClosedPublic

Authored by zatrazz on Oct 30 2015, 6:42 AM.

Details

Summary

This patch unify the 39 and 42-bit VMA support for AArch64 by using indirect
calls instead of inline the address function transformation. The already
defined maps for both VMA are used. Although slower, this leads to same
instrumented binary to be independent of the kernel. It also has the advantage
to be easier to enable the remaining VMA for AArch64 (48-bits).

Along with this change this patch also fix some 42-bit failures with
ALSR disable by increasing the upper high app memory threshold and also
the 42-bit madvise value for non large page set.

Regarding performance, I did a run with speccpu2006 using 'test' set (mainly
to evaluate faster) and the results shows the new indirect calls are roughly
4.6% slower than the inline version:

                 DIFF
401.bzip2        5.12 
403.gcc          5.24 
429.mcf          1.00 
445.gobmk        3.36 
456.hmmer        6.72 
458.sjeng        6.15 
462.libquantum   4.55 
464.h264ref      6.82 
473.astar        5.68 
483.xalancbmk    6.77 
GEOMEAN          4.62

PS: 400.perlbench fails due missing longjmp family instrumentation for aarch64 and 471.omnetpp
the know issue with user defined new/delete operators.

Diff Detail

Event Timeline

zatrazz updated this revision to Diff 38804.Oct 30 2015, 6:42 AM
zatrazz retitled this revision from to [compiler-rt] [tsan] Unify aarch64 mapping.
zatrazz updated this object.
zatrazz added a subscriber: llvm-commits.
zatrazz updated this object.Oct 30 2015, 6:47 AM
rengolin updated this object.Oct 30 2015, 6:51 AM
rengolin updated this object.
dvyukov added inline comments.Nov 4 2015, 4:11 AM
lib/tsan/rtl/tsan_platform.h
159

You introduce both indirect functions and these non-const variables. This looks excessive. Let's do one or another. I would prefer global variables as it looks less intrusive.

You can do something along the lines of:

Here declare the consts:

extern uptr kLoAppMemBeg;
...

And then in InitializePlatformEarly initialize them:

kLoAppMemEnd = vma_39_42(0x0000400000ull, 0x00000400000ull);

And that's it.

175

All identifiers with double underscores are reserved for language implementation. TSAN_ is a fine prefix, and also consistent with all macros in the codebase.

580

Please rename this to InitializePlatformEarly.
I don't understand what is "Modules" (dynamic libraries?). And "Specific" is excessive, "Platform" already implies platform-specific stuff.

zatrazz updated this revision to Diff 39185.Nov 4 2015, 4:23 AM

Changes from previous version:

  • Increase the VMA range for application on both 39-bit and 42-bit VMA. Now 39-bit ranges from 0x7D00000000-0x7FFFFFFFFF (previously 0x7F00000000-0x7FFFFFFFFF) and 42-bits from 0x3f000000000-0x3FFFFFFFFFF (previously 0x3FF00000000-0x3FFFFFFFFFF);
  • Fixed a printf format specifier in test/tsan/test.h.
zatrazz added inline comments.Nov 4 2015, 1:02 PM
lib/tsan/rtl/tsan_platform.h
159

I declared them in the header because I used the constants in the function definitions at lib/tsan/rtl/tsan_platform_linux.cc. I can move them to the cc file if it desirable, but I think by removing the definitions will require to hardcode its value on the function definition somehow (and I think current approach in simple than replicate the hexadecimals values in some places).

Also, I use them so to add another VMA (48-bits) would not require much work. Instead of adding now 'vma_39_42' and then moving it to 'vma_39_42_48', the patch will just require to check and add the 48-bits definitions.

175

I will change it.

580

I will change it (I used 'modules' and 'specific' because I could not came up with a better naming).

zatrazz updated this revision to Diff 39338.Nov 5 2015, 3:37 AM

This is an updated version based on comments:

  • Moved aarch64 const VMA definition from platform header to platform file
  • Rename InitializePlatformInitModules to InitializePlatformEarly
dvyukov added inline comments.Nov 5 2015, 4:33 AM
lib/tsan/rtl/tsan_platform.h
157

But can we have just global vars? It seems to be enough. The same generic functions will just operate with different values stores in the vars.

I am just frighten by the size of this change, amount of logical indirections and macros.

zatrazz added inline comments.Nov 5 2015, 5:52 AM
lib/tsan/rtl/tsan_platform.h
157

We can, but it will be slower: every function transformation will require not only an indirection but also potentially memory reads to actually get the global variables value. The idea of the patch to create multiple functions is to try to minimize the performance hit this same indirection occurs by creating functions is fast as possible.

zatrazz updated this revision to Diff 39353.Nov 5 2015, 5:58 AM

Small update on previous patch to initialize kMadviseRangeBeg and kMadviseRangeSize.

dvyukov added inline comments.Nov 5 2015, 8:20 AM
lib/tsan/rtl/tsan_platform.h
143–144

Won't it then be faster to inline all 3 variants?

E.g.

struct Mapping39 {
  static const uptr kLoAppMemBeg = 0x...;
  static const uptr kLoAppMemEnd = 0x...;
  static const uptr kShadowBeg = 0x...;
  ...
};

struct Mapping42 {
  static const uptr kLoAppMemBeg = 0x...;
  static const uptr kLoAppMemEnd = 0x...;
  static const uptr kShadowBeg = 0x...;
  ...
};

template<typename Mapping>
uptr MemToShadowImpl(uptr x) {
  return (((x) & ~(Mapping::kAppMemMsk | (kShadowCell - 1)))
      ^ Mapping::kAppMemXor) * kShadowCnt;
}

uptr MemToShadow(uptr x) {
  if (vma_size == 39)
    return MemToShadowImpl<Mapping39>(x);
  else
    return MemToShadowImpl<Mapping42>(x);
  DCHECK(0);
}

I suspect it can be faster as it is inlinable. And probably compiler will be able to do some common expressions merging, so the code won't be too large.

zatrazz added inline comments.Nov 5 2015, 8:28 AM
lib/tsan/rtl/tsan_platform.h
143–144

I will check this out, thanks!

zatrazz updated this revision to Diff 39514.Nov 6 2015, 5:11 AM

The mapping using inline function is indeed slight faster than the previous attempt (I used the same benchmark as before, speccpu2006 test size). Changes from previous version:

  • Use a different strategy for mapping functions: instead of indirect functions inline version based on an external VMA detected at runtime.
  • Fixed the 39-bit shadow memory upper bound.
dvyukov edited edge metadata.Nov 9 2015, 3:00 AM

Much nicer now. Thanks.

Do we still need these extern vars in header
extern uptr kLoAppMemBeg
?
I would prefer if we get rid of any remaining usages of these and introduce more functions instead if needed.
They introduce significant amount of boilerplate. And also they are not necessary consts now, so it can be dangerous (performance-wise) to use them occasionally on some performance-critical path.

Also we now have duplication of complex code between MemToShadow for x86 and MemToShadowImpl for power.
Please move x86 consts to a Mapping struct as well. Then we can have a single MemToShadowImpl implementation and then define MemToShadow as:

uptr MemToShadowImpl(uptr x) {
#ifdef power

if (vmaSize == 39)
  return MemToShadowImpl<Mapping39>(x);
else
  return MemToShadowImpl<Mapping42>(x);

#else

return MemToShadowImpl<Mapping>(x);

#endif
}

zatrazz updated this revision to Diff 39903.Nov 11 2015, 5:48 AM
zatrazz edited edge metadata.

Changes from previous patch:

  • Unify all implementations using the Mapping scheme.
  • Remove redundant code for go implementation by unify windows and non-windows transformation.
  • All accesses to mapping const definition are done through functions now. For architecture with only one mapping this won't change code generation.

For architecture with static mapping I had to still define kHeapMemBeg and kHeapMemEnd because they are used on SizeClassAllocator64 definition. I think a possible cleanup is possible (prob when I adjust tsan for aarch64 48-bit VMA).

Sorry for long delays.
I've looked at it several times and it still feels somewhat awkward, but at the same time I cannot come up with something better (we could remove lots of boilerplate and replication with advanced C++ templates, but I doubt it worth it, advanced C++ templates is an issue in itself). So probably some amount of duplication and boilerplate here is the way to go.
I will take a final look today.

dvyukov added inline comments.Nov 17 2015, 8:44 AM
lib/tsan/rtl/tsan_platform.h
224

UserRegions are used only once during startup, there is no point in having both const and dynamic versions of it. Also you initialize the dynamic variable only on linux, but it actually needs to be initialized when TSAN_RUNTIME_VMA (e.g. on mac/aarch64 in future).

Please leave only one version and define it in this header.
Something along the lines of:

bool GetUserRegion(int i, uptr *start, uptr *end) {

switch (i) {
default:
  return false;
case 0:
  *start = LoAppMemBeg();
  *end = LoAppMemEnd();
case 1:
  ...
case 2:
  ...
}

}

And loop in CheckShadowMapping until this function returns false.

272

Why !SANITIZER_WINDOWS?
There is also no windows-Go-specific code, so these definitions must be necessary either on all Go platforms or on no Go platforms.

lib/tsan/rtl/tsan_platform_posix.cc
50–60

vmaSize should be already set by now, so use it.
Add

} else {
  CHECK(0);
}

branch below. It will catch both case when vmaSize is somehow not initialized by now and case when support for a new vma size is added.

zatrazz added inline comments.Nov 19 2015, 7:02 AM
lib/tsan/rtl/tsan_platform.h
224

I will change that.

272

This is wrong, I will fix it.

lib/tsan/rtl/tsan_platform_posix.cc
50–60

I will change that.

zatrazz updated this revision to Diff 40643.Nov 19 2015, 7:04 AM

Changes from previous version:

  • Remove UserRegions struct and change the interface to call GetUserRegion.
  • Remove a wrong windows SANITIZER_GO define
  • Use vmaSize on InitializeShadowMemory for aarch64.
dvyukov accepted this revision.Nov 20 2015, 11:40 AM
dvyukov edited edge metadata.

LGTM with nits

sorry for delays and thanks for bearing with me

lib/tsan/rtl/tsan_platform.h
248

#ifndef SANITIZER_GO
that's more common throughout tsan

256

#else

315

drop !SANITIZER_WINDOWS part

lib/tsan/rtl/tsan_rtl.cc
291

please replace this with:

uptr beg, end;
for (int i = 0; GetUserRegion(i, &beg, &end); i++) {

it is shorter and more idiomatic for loops.

This revision is now accepted and ready to land.Nov 20 2015, 11:40 AM
zatrazz closed this revision.Nov 26 2015, 5:33 AM
eugenis added inline comments.Dec 9 2015, 4:00 PM
lib/tsan/rtl/tsan_platform.h
129

I've noticed that this mapping does not have an application region at 0x55...
That's where PIE executables are mapped by the newer kernels.
Did you test this on an older kernel (I don't know exactly when the transition happened; it was around 4.1 for X86 and AFAIK a bit earlier for AArch 64)?

See http://reviews.llvm.org/D15308 for more context.

zatrazz added inline comments.Dec 10 2015, 8:30 AM
lib/tsan/rtl/tsan_platform.h
129

I did check the PIE executable with a full TSAN test built with '-pie -fpic' and I saw no issue. Analyzing the trace for 39-bits using the mutexset1.cc (I picked it at random) I see:

  • Non-pie build:

00400000-004d1000 r-xp 00000000 08:02 21008174 /home/adhemerval.zanella/llvm/llvm-git-aarch64-build-release/mutexset1
004e0000-004e3000 r--p 000d0000 08:02 21008174 /home/adhemerval.zanella/llvm/llvm-git-aarch64-build-release/mutexset1
004e3000-004e6000 rw-p 000d3000 08:02 21008174 /home/adhemerval.zanella/llvm/llvm-git-aarch64-build-release/mutexset1

  • PIE build:

7fae119000-7fae1ee000 r-xp 00000000 08:02 21008174 /home/adhemerval.zanella/llvm/llvm-git-aarch64-build-release/mutexset1
7fae1ee000-7fae1fb000 rw-p 00000000 00:00 0
7fae1fd000-7fae200000 r--p 000d4000 08:02 21008174 /home/adhemerval.zanella/llvm/llvm-git-aarch64-build-release/mutexset1
7fae200000-7fae203000 rw-p 000d7000 08:02 21008174 /home/adhemerval.zanella/llvm/llvm-git-aarch64-build-release/mutexset1

So at least for 39-bit PIE addresses are being covered. I will check again on 42-bit VMA.

zatrazz added inline comments.Dec 10 2015, 8:53 AM
lib/tsan/rtl/tsan_platform.h
129

I and I checked on a fairly recent kernel, 3.19. I am not aware if the mappings for PIE has changed in recent kernels, neither I checked on recent ones. Do you have more information?

eugenis added inline comments.Dec 10 2015, 11:14 AM
lib/tsan/rtl/tsan_platform.h
129

See this bug for the linux 4.1.2 problem with MSan:
https://llvm.org/bugs/show_bug.cgi?id=24155

I see 0x55... mappings with 3.10.40 kernel on Android/AAarch64.

git tag --contains in the torvalds repo tells me that this change appears in v4.1 and newer. Android must have cherry-picked it in 3.10.

What happens if you disable ASLR? We used to get the main executable at 0x55... under gdb (i.e. with disabled randomization) long before 3.9.

zatrazz added inline comments.Dec 11 2015, 3:33 AM
lib/tsan/rtl/tsan_platform.h
129

My previous analysis was in fact wrong, even for the kernels I am currently using (3.19 for 39-bit and 3.17 for 42-bit I am seeing):

  • 0x00000000000-0x00010000000: 39/42-bits program own segments
  • 0x05500000000-0x05600000000: 39-bits PIE program segments
  • 0x07f80000000-0x07fffffffff: 39-bits libraries segments
  • 0x2aa00000000-0x2ab00000000: 42-bits PIE program segments
  • 0x3ff00000000-0x3ffffffffff: 42-bits libraries segments

Fortunately I could adjust aarch64 mapping to include these segments using the current mapping. I am preparing a patch.