This is an archive of the discontinued LLVM Phabricator instance.

[Asan, Tsan] Generalize means the sanitizers work with memory
AbandonedPublic

Authored by kosarev on Mar 3 2017, 9:01 AM.

Details

Summary

This diff is here to give an overview of the patch that suggests moving toward adding support for MMU-less target platforms. This diff is not to be committed.

Kostya, thanks for your reply. Re: the "platforms that lack such support": we believe that this proposal affects many custom platforms based on modern IP cores. In some cases such platforms intentionally lack the support for virtual memory, but in all other respects are capable of and can benefit from using sanitizers. In our case we are targeting an ARM-based platform, which is well supported by other parts of LLVM already, so it's not about rare and/or exotic targets.


Here's the link to the original RFC message:
https://groups.google.com/d/msg/llvm-dev/Xi5Jl4s_Ofc/oboR4zkgAAAJ

Diff Detail

Repository
rL LLVM

Event Timeline

kosarev created this revision.Mar 3 2017, 9:01 AM
jlebar added a subscriber: jlebar.Mar 3 2017, 10:05 AM
dvyukov added inline comments.Mar 8 2017, 5:01 AM
lib/tsan/rtl/tsan_platform.h
636

I would suggest to keep the existing name as is.

Why do we need the second MemToPShadow? It is used in 2 places in tsan code which looks somewhat strange. Can't we have a single MemToShadow which maps user address to the address where shadow for that address is stored?

kosarev added inline comments.Mar 10 2017, 8:51 AM
lib/tsan/rtl/tsan_platform.h
636

Hello Dmitry. Thanks for reviewing. We keep both the P (for physical addresses) and V (for virtual addresses) variants of this function as on some platforms the backward P-to-V translation may be nontrivial/expensive or not available at all.

I had some success with elimination of ShadowToMem() for Asan, but for Tsan it doesn't seem to be that simple. So whenever we want to be able to restore application memory addresses by given shadow addresses, we probably have to maintain both the concepts of virtual and physical shadow addresses.

kosarev updated this revision to Diff 91568.Mar 13 2017, 9:06 AM

This update implements a simple software memory manager that allocates physical shadow pages with a usual heap malloc()/free()-like API. Thanks to eliminated calls to msync() and mmap() that we used to do on every shadow access, with this patch we pass Asan and Tsan tests about 12 times faster comparing to the previous version and just about 3 times slower than we do with the original code.

dvyukov edited edge metadata.Mar 13 2017, 11:49 AM

Few suggestions if you want to upstream this:

  • remove unchanged files, unnecessary ifdefs and todo's
  • start with just asan (tsan is more complex)
  • use the existing asan compiler instrumentation that inserts function calls, and do the rest in runtime (at least in the first version, this will greatly simplify the change)
lib/tsan/rtl/tsan_platform.h
636

I am still not following completely.
Both asan and tsan runtime has only 2 notions: applications memory (let's call it A) and shadow memory (let's call it S). Consequently they care only about 2 transitions: A->S and S->A.
Now a platform can do these transitions in an arbitrary complex way, potentially involving a third concept (say, a virtual memory V), or maybe a fourth or whatever. But the point is that is an implementation detail of a particular platform, the rest of the runtimes can still manipulate with only A and S and know nothing about V/P/whatever.
Now the question is: why do you want to disclose these implementation details about P/V to the rest of runtime? Do you want to store V addresses somewhere? Where?
Note that tsan seems to use ShadowToMem only in 1 place -- during race reporting. And you could just remember the application (A) address in thr->racy_shadow_addr in the new mode.

Will do. Thanks Dmitry.

lib/tsan/rtl/tsan_platform.h
636

Well, we cannot rely exclusively on virtual shadow addresses as sanitizers do not compute the virtual address of every shadow cell they access. Instead, they often translate the base address of a shadow memory block that corresponds to a given application memory address range and call a memory-block function like memset(). This means sanitizers expect adjacent application addresses to be mapped to the same or adjacent shadow memory addresses, which is not a quality that physical shadow addresses hold. So every time an RTL function takes a base shadow address and a size, that base address cannot be a physical one.

Furthermore, there are functions like FastPoisonShadowPartialRightRedzone() and GetStackFrameAccessByAddr() that potentially access lots of shadow cells, but do this in ways that are more complex than what we can do with the usual memory-block functions. Still, in terms of performance it would be expensive to translate the address of every shadow cell these functions access, hence the necessity to introduce the concept of the physical shadow page size.

dvyukov added inline comments.Apr 18 2017, 5:35 AM
lib/tsan/rtl/tsan_platform.h
636

Still, in terms of performance it would be expensive to translate the address of every shadow cell these functions access, hence the necessity to introduce the concept of the physical shadow page size.

This is the part that I did not understand.
So this is required to preserve performance of the current MMU mode and support NOMMU mode at the same time, right? Initially I read it as it is a performance optimization for the new NOMMU mode. But later I realized that it's hardly the case because VShadowToPShadow is the expensive part.

There is a trick that we use in kernel sanitizers (KMSAN/KTSAN) to solve a similar problem (inability to do continuous pre-allocated shadow). I am not completely sure if it works here, but I think we should try hard to employ it because (1) it will eliminate these pervasive changes throughout the runtimes, (2) it will give you way better performance.
The insight is that when runtime accesses a range of memory (whether it's a 16-byte access, memcpy, stack poisoning or whatever), the range always belongs to a single memory region (either a heap block, a thread stack, a mmap-ed region, etc). If such range crosses boundary of a single memory block then this a bug in user code, so crashing is OK. So we intercept all memory allocations (heap allocator requesting a new superblock, mmap, thread creation, etc) and map a continuous region of shadow for that user memory block. This gives us: (1) shadow memory is always continuous for ranges runtime accesses (so current code will work as is), (2) we don't need lazy init checks in MemToShadow as shadow is always pre-allocated (faster).

What do you think? Will it work in your case? Please give it a try.
Maybe we will need to change something in this approach or adapt it otherwise, but I see great value in making something like this work rather than introducing 2-phase shadow (great burden on all sanitizer developers).

kosarev added inline comments.Apr 24 2017, 7:03 AM
lib/tsan/rtl/tsan_platform.h
636

So this is required to preserve performance of the current MMU mode and support NOMMU mode at the same time, right? Initially I read it as it is a performance optimization for the new NOMMU mode. But later I realized that it's hardly the case because VShadowToPShadow is the expensive part.

Right, VShadowToPShadow() is expensive, since it's responsible for allocation of physical shadow memory. This is why we want to call it as little as possible. In addition to this, a VShadowToPShadow(vs) call only guarantees that the physical page that corresponds to the vs address is in place. So if we do not know where physical shadow pages begin and end, then we cannot say if accesses to shadow cells at (vs - 1) and (vs + 1) hit the same physical page and so we have to call VShadowToPShadow() for each of them, resulting in an unacceptable slowdown.

So we intercept all memory allocations (heap allocator requesting a new superblock, mmap, thread creation, etc) and map a continuous region of shadow for that user memory block.

Right, we actually do try to get things working this way. As of right now, it's not completely clear if this approach is viable in our situation, but we certainly will give it a try.

However, chances are we will need to proceed with per-access allocations, so I'm looking for ways to support it in mainline without making too much stress to sanitizer developers. One thing I'm thinking of is wrapping virtual shadow addresses that are currently presented as uptr variables with a class that hides the V-to-P translation and physical pages mechanics. It might be somewhat less efficient comparing to the originally proposed patch set, though still worth to try. Will let you know how it goes. Thanks.

kosarev abandoned this revision.Nov 28 2017, 8:53 AM
test/asan/TestCases/Linux/nohugepage_test.cc