Page MenuHomePhabricator

[tsan] Dynamically detect heap range for aarch64.
AbandonedPublic

Authored by yabinc on Jan 28 2016, 12:23 PM.

Details

Summary

After patch https://lkml.org/lkml/2015/12/21/340 is introduced in
linux kernel, the random gap between stack and heap is increased
from 128M to 36G on 39-bit aarch64. As it is hard to cover this
big range, I think we need to detect heap range dynamically.

Diff Detail

Event Timeline

yabinc updated this revision to Diff 46304.Jan 28 2016, 12:23 PM
yabinc retitled this revision from to [tsan] Dynamically detect heap range for 39-bit aarch64..
yabinc updated this object.

I only tested this on android with 39-bit aarch64 platform. Let me know if it breaks on other 39-bit aarch64 platforms, or if we have other ways to make things work.

Adhemerval, CC'd, should know better, as he was the one implementing VMA options for AArch64.

zatrazz edited edge metadata.Feb 1 2016, 5:41 AM

Checking on kernel patch it looks like it is not specific to 39-bit VMA, but to all kernel configuration.
So instead of just setting it to 39-bit VMA I would rather make it default for aarch64 on all current
supported VMAs.

lib/tsan/rtl/tsan_platform_linux.cc
274

llvm-lit is accusing "Line ends in whitespace. Consider deleting these extra spaces." for lini 272.

yabinc updated this revision to Diff 46604.Feb 1 2016, 5:54 PM
yabinc edited edge metadata.

Add support for 42-bit aarch64.

yabinc retitled this revision from [tsan] Dynamically detect heap range for 39-bit aarch64. to [tsan] Dynamically detect heap range for aarch64..Feb 1 2016, 5:56 PM

Done. I didn't add 42-bit version because I don't have a machine to test it.

rengolin added inline comments.Feb 2 2016, 2:00 AM
lib/tsan/rtl/tsan_platform.h
669

This sounds odd... I thought we had moved TSAN to dynamic VMA detection, too...

dvyukov edited edge metadata.Feb 3 2016, 7:02 AM

The mapping become more and more messy. But I don't see any simple solution, so I guess I am OK with this.
Are there any objections from ARM people?

We also have problems on x86, modules have moved to 0x5555 without ASRL.
If we agree to have 1 memory access in mem->shadow translation, we could split address space into, say, 256 parts and have arbitrary mapping between these parts. Something long the lines of:

uptr mapping[256];

uptr MemToShadow(uptr x) {

return (x << 2) ^ mapping[x >> kAddressShift];

}

This would allow a platform to have up to 50 regions with user memory arbitrary mapped onto 50*4 shadow regions.
Just thinking aloud.

lib/tsan/rtl/tsan_platform.h
610

It should be faster to use [kHeapMemBeg, kHeapMemBeg + kHeapMemSize) instead of [kHeapMemBeg, kHeapMemEnd). Because kHeapMemSize is a const.

yabinc updated this revision to Diff 46834.Feb 3 2016, 1:57 PM
yabinc edited edge metadata.

use kHeapMemBeg + kHeapMemSize for efficiency.

yabinc marked an inline comment as done.Feb 3 2016, 2:03 PM

I think using uptr mapping[256]; is a better solution to remove current complex logic.

I think using uptr mapping[256]; is a better solution to remove current complex logic.

Just in case, don't start working on it yet :)
That was just a quick idea, we need at least kcc@ input and benchmark numbers to decide (because currently x86 does not do any memory accesses during translation).

yabinc added a reviewer: enh.Feb 5 2016, 5:36 PM
srhines added a subscriber: srhines.Feb 5 2016, 5:58 PM
kcc edited edge metadata.Feb 8 2016, 5:16 PM

mapping[256];

I have not touched tsan for a while, but if my understanding is still correct an extra load on fast path may cost us 10% of CPU.
That's too much.
Don't do it unless you really have to, and get perf numbers before sending such a CL.

This change also introduces more instructions on the main path .
From reading the patch it seems to be AArch64-specific, so I am fine with that,
but I'd like to have the new functions like MemToShadowImplWithDynamicHeap to be under ifdef AArch64,
so that they can not be possibly used for x86

lib/tsan/rtl/tsan_platform.h
609

Here and below, can we use if (SANITIZER_GO) instead of #ifdef?

rengolin requested changes to this revision.Feb 9 2016, 1:45 AM
rengolin added a reviewer: rengolin.
In D16689#346952, @kcc wrote:

This change also introduces more instructions on the main path .
From reading the patch it seems to be AArch64-specific, so I am fine with that,

I'm not. :)

AArch64 already suffers from bloat from the VMA selection, adding more main path run time slow down is not the way to go, especially when that involved division and multiplication.

Also, these sources are outdated, the AArch64 part is not like that any more for a good number of months.

cheers,
--renato

This revision now requires changes to proceed.Feb 9 2016, 1:45 AM
yabinc updated this revision to Diff 48752.Feb 22 2016, 4:20 PM
yabinc edited edge metadata.

profiling results on N5X(aarch64) as below:
mini_bench_local
without tsan: 2.50s
tsan before: 131s
tsan after change: 138s

mini_bench_shared
without tsan: 3.60s
tsan before: 126s
tsan after change: 130s

start_many_threads
without tsan: 0.05s
tsan before: 0.39s
tsan after change: 0.47s

vts_many_threads
without tsan: 1.64s
tsan before: 165s
tsan after change: 163s

According to the results, the performance lose when accessing
memory may be about 5%, which seems acceptable.

Sorry for not updating this so long, I was struggling with other stuff.
I don't understand "these sources are outdated, the AArch64 part is not like that any more for a good number of months." When I downloaded the latest linux kernel, the code of arch/arm64/mm/mmap.c is not changed:

unsigned long arch_mmap_rnd(void)
{
unsigned long rnd;

#ifdef CONFIG_COMPAT
if (test_thread_flag(TIF_32BIT))

		rnd = (unsigned long)get_random_int() & ((1 << mmap_rnd_compat_bits) - 1);

else
#endif

		rnd = (unsigned long)get_random_int() & ((1 <<** mmap_rnd_bits**) - 1);

return rnd << PAGE_SHIFT;
}

And I think at least we should support it on android. The division and multiplication are in current code like MemToShadowImpl as well, and they should be compiled into shift instructions.

lib/tsan/rtl/tsan_platform.h
609

I use #if !defined(SANITIZER_GO) instead of #if !(SANITIZER_GO), because other places make compilation decision based on whether SANITIZER_GO is defined.

profiling results on N5X(aarch64) as below:
mini_bench_local
without tsan: 2.50s
tsan before: 131s
tsan after change: 138s

Ouch!

According to the results, the performance lose when accessing
memory may be about 5%, which seems acceptable.

5% of 5200% is 262% of the initial value!!

I personally don't think that 52x slow down on TSAN is acceptable as it is. This may be AArch64 specific, as I heard from Kostya 2 years ago that it was around 10x for x86_64, but it's definitely not usable.

Thread analysis depends on timing issues too much to make any analysis that runs 52x slower either bogus or very noisy.

This patch is adding a huge list of new instructions to the hot path, and I don't think this will help in any way.

As it happened before multiple times, I think we need to sit down and discuss this issue with kernel and sanitizer engineers. There will be a Linaro Connect in two weeks, and I'll bring that up with the kernel folks this topic, as well as try to reduce the already almost useless slow down.

cheers,
--renato

There will be a Linaro Connect in two weeks, and I'll bring that up with the kernel folks this topic, as well as try to reduce the already almost useless slow down.

It would be great if kernel provided a "sanitizer-friendly" mapping.

FWIW, some time ago we had support for so called "COMPAT" mapping for x86. In that mode kernel mapped modules around 0x2a0000000000. The shadow mapping for that mode was not a bijection, instead it was a injection. Several user regions mapped onto the same shadow, but during runtime only one of the user regions was occupied so no clashes happened. This allowed us to support huge user memory range (like in this case) and map it onto smaller shadow. The problem was with the reverse mapping, though (e.g. when we print racy address in report). You can find that code in history.

As recommended by dvyukov, I find that we can use personality(ADDR_NO_RANDOMIZE) to disable randomized virtual address space. The patch is uploaded at http://reviews.llvm.org/D18003

yabinc abandoned this revision.Mar 22 2016, 10:24 AM

This patch is uploaded because I misunderstood kHeapMem as mapped area. And the correct fix has been submitted in http://reviews.llvm.org/D18003.