After patch https://lkml.org/lkml/2015/12/21/340 is introduced in
linux kernel, the random gap between stack and heap is increased
from 128M to 36G on 39-bit aarch64. As it is hard to cover this
big range, I think we need to detect heap range dynamically.
Details
Diff Detail
Event Timeline
I only tested this on android with 39-bit aarch64 platform. Let me know if it breaks on other 39-bit aarch64 platforms, or if we have other ways to make things work.
Adhemerval, CC'd, should know better, as he was the one implementing VMA options for AArch64.
Checking on kernel patch it looks like it is not specific to 39-bit VMA, but to all kernel configuration.
So instead of just setting it to 39-bit VMA I would rather make it default for aarch64 on all current
supported VMAs.
lib/tsan/rtl/tsan_platform_linux.cc | ||
---|---|---|
274 | llvm-lit is accusing "Line ends in whitespace. Consider deleting these extra spaces." for lini 272. |
lib/tsan/rtl/tsan_platform.h | ||
---|---|---|
667 | This sounds odd... I thought we had moved TSAN to dynamic VMA detection, too... |
The mapping become more and more messy. But I don't see any simple solution, so I guess I am OK with this.
Are there any objections from ARM people?
We also have problems on x86, modules have moved to 0x5555 without ASRL.
If we agree to have 1 memory access in mem->shadow translation, we could split address space into, say, 256 parts and have arbitrary mapping between these parts. Something long the lines of:
uptr mapping[256];
uptr MemToShadow(uptr x) {
return (x << 2) ^ mapping[x >> kAddressShift];
}
This would allow a platform to have up to 50 regions with user memory arbitrary mapped onto 50*4 shadow regions.
Just thinking aloud.
lib/tsan/rtl/tsan_platform.h | ||
---|---|---|
610 | It should be faster to use [kHeapMemBeg, kHeapMemBeg + kHeapMemSize) instead of [kHeapMemBeg, kHeapMemEnd). Because kHeapMemSize is a const. |
I think using uptr mapping[256]; is a better solution to remove current complex logic.
I think using uptr mapping[256]; is a better solution to remove current complex logic.
Just in case, don't start working on it yet :)
That was just a quick idea, we need at least kcc@ input and benchmark numbers to decide (because currently x86 does not do any memory accesses during translation).
mapping[256];
I have not touched tsan for a while, but if my understanding is still correct an extra load on fast path may cost us 10% of CPU.
That's too much.
Don't do it unless you really have to, and get perf numbers before sending such a CL.
This change also introduces more instructions on the main path .
From reading the patch it seems to be AArch64-specific, so I am fine with that,
but I'd like to have the new functions like MemToShadowImplWithDynamicHeap to be under ifdef AArch64,
so that they can not be possibly used for x86
lib/tsan/rtl/tsan_platform.h | ||
---|---|---|
609 | Here and below, can we use if (SANITIZER_GO) instead of #ifdef? |
I'm not. :)
AArch64 already suffers from bloat from the VMA selection, adding more main path run time slow down is not the way to go, especially when that involved division and multiplication.
Also, these sources are outdated, the AArch64 part is not like that any more for a good number of months.
cheers,
--renato
profiling results on N5X(aarch64) as below:
mini_bench_local
without tsan: 2.50s
tsan before: 131s
tsan after change: 138s
mini_bench_shared
without tsan: 3.60s
tsan before: 126s
tsan after change: 130s
start_many_threads
without tsan: 0.05s
tsan before: 0.39s
tsan after change: 0.47s
vts_many_threads
without tsan: 1.64s
tsan before: 165s
tsan after change: 163s
According to the results, the performance lose when accessing
memory may be about 5%, which seems acceptable.
Sorry for not updating this so long, I was struggling with other stuff.
I don't understand "these sources are outdated, the AArch64 part is not like that any more for a good number of months." When I downloaded the latest linux kernel, the code of arch/arm64/mm/mmap.c is not changed:
unsigned long arch_mmap_rnd(void)
{
unsigned long rnd;
#ifdef CONFIG_COMPAT
if (test_thread_flag(TIF_32BIT))
rnd = (unsigned long)get_random_int() & ((1 << mmap_rnd_compat_bits) - 1);
else
#endif
rnd = (unsigned long)get_random_int() & ((1 <<** mmap_rnd_bits**) - 1);
return rnd << PAGE_SHIFT;
}
And I think at least we should support it on android. The division and multiplication are in current code like MemToShadowImpl as well, and they should be compiled into shift instructions.
lib/tsan/rtl/tsan_platform.h | ||
---|---|---|
609 | I use #if !defined(SANITIZER_GO) instead of #if !(SANITIZER_GO), because other places make compilation decision based on whether SANITIZER_GO is defined. |
Ouch!
According to the results, the performance lose when accessing
memory may be about 5%, which seems acceptable.
5% of 5200% is 262% of the initial value!!
I personally don't think that 52x slow down on TSAN is acceptable as it is. This may be AArch64 specific, as I heard from Kostya 2 years ago that it was around 10x for x86_64, but it's definitely not usable.
Thread analysis depends on timing issues too much to make any analysis that runs 52x slower either bogus or very noisy.
This patch is adding a huge list of new instructions to the hot path, and I don't think this will help in any way.
As it happened before multiple times, I think we need to sit down and discuss this issue with kernel and sanitizer engineers. There will be a Linaro Connect in two weeks, and I'll bring that up with the kernel folks this topic, as well as try to reduce the already almost useless slow down.
cheers,
--renato
There will be a Linaro Connect in two weeks, and I'll bring that up with the kernel folks this topic, as well as try to reduce the already almost useless slow down.
It would be great if kernel provided a "sanitizer-friendly" mapping.
FWIW, some time ago we had support for so called "COMPAT" mapping for x86. In that mode kernel mapped modules around 0x2a0000000000. The shadow mapping for that mode was not a bijection, instead it was a injection. Several user regions mapped onto the same shadow, but during runtime only one of the user regions was occupied so no clashes happened. This allowed us to support huge user memory range (like in this case) and map it onto smaller shadow. The problem was with the reverse mapping, though (e.g. when we print racy address in report). You can find that code in history.
As recommended by dvyukov, I find that we can use personality(ADDR_NO_RANDOMIZE) to disable randomized virtual address space. The patch is uploaded at http://reviews.llvm.org/D18003
This patch is uploaded because I misunderstood kHeapMem as mapped area. And the correct fix has been submitted in http://reviews.llvm.org/D18003.
Here and below, can we use if (SANITIZER_GO) instead of #ifdef?