This is an archive of the discontinued LLVM Phabricator instance.

[scudo] Add Android support
ClosedPublic

Authored by cryptoad on Apr 28 2017, 10:25 AM.

Details

Summary

This change adds Android support to the allocator (but doesn't yet enable it in
the cmake config), and should be the last fragment of the rewritten change
D31947.

Android has more memory constraints than other platforms, so the idea of a
unique context per thread would not have worked. The alternative chosen is to
allocate a set of contexts based on the number of cores on the machine, and
share those contexts within the threads. Contexts can be dynamically reassigned
to threads to prevent contention, based on a scheme suggested by @dvyuokv in
the initial review.

Additionally, given that Android doesn't support ELF TLS (only emutls for now),
we use the TSan TLS slot to make things faster: Scudo is mutually exclusive
with other sanitizers so this shouldn't cause any problem.

An additional change made here, is replacing thread_local by THREADLOCAL
and using the initial-exec thread model in the non-Android version to prevent
extraneous weak definition and checks on the relevant variables.

Event Timeline

cryptoad created this revision.Apr 28 2017, 10:25 AM
dvyukov added inline comments.Apr 29 2017, 9:39 AM
lib/scudo/scudo_tls.h
38

This one liner is used only once, just inline it there. Additional levels of indirection do not help understanding code.

lib/scudo/scudo_tls_android.h
46 ↗(On Diff #97121)

This is duplicated from tsan, right? Please move this bulky chunk of tricky assembly into sanitizer_common (along with TLS_SLOT_TSAN). No reason to have 2 copies of a complex code, and we can potentially use it in asan and later in msan. Something like:

uptr *get_android_tls_ptr();
84 ↗(On Diff #97121)

By sanitizer convention functions ending with Locked mean that function call normally needs to be protected with a mutex, but this is version for the case when caller already holds the mutex. For such case we generally use AndLock suffix (e.g. GetIfExistsAndLock).
Please rename this to getThreadContextAndLock. It will also make relation between Lock and Unlock much clearer in the following code:

ScudoThreadContext *ThreadContext = getThreadContextAndLock();
if (LIKELY(ThreadContext)) {
  ...
  ThreadContext->unlock();
}
91 ↗(On Diff #97121)

I would move everything starting from this line to getThreadContextLockedSlow. I think this is where slow path starts.
Then we also don't need ThreadContexts and NumberOfContexts in header.

Thanks Dmitry. Additional question in the comment before making the code modifications.

lib/scudo/scudo_tls.h
38

Will do.
Additionally, UnusedSince and Mutex are only used for Android. Is worth adding an #if to not have them in the non-Android version, or is the clutter not worth it?

lib/scudo/scudo_tls_android.h
46 ↗(On Diff #97121)

Correct. TSan uses only the AArch64 & x64 versions, and I needed them all (mips being bordeline but still included).
I will move it to sanitizer_common.

84 ↗(On Diff #97121)

Will do.

91 ↗(On Diff #97121)

Will do.

dvyukov added inline comments.Apr 29 2017, 2:07 PM
lib/scudo/scudo_tls.h
38

I am fine either way.
I've noticed the android-specific data and methods, but couldn't think of an elegant, non-overdesigned solution...

cryptoad updated this revision to Diff 97287.May 1 2017, 8:05 AM

Addressing Dmitry's comments, except for the the __get_tls() part which is
being worked on in a separate patch. Once it's done, I will update this patch
again.

cryptoad marked 7 inline comments as done.May 1 2017, 8:06 AM
cryptoad updated this revision to Diff 97482.May 2 2017, 12:01 PM

Modified to leverage the new get_android_tls_ptr() function introduced in
sanitizer_common with D32705.

cryptoad marked 2 inline comments as done.May 2 2017, 12:02 PM
cryptoad updated this revision to Diff 97650.May 3 2017, 8:21 AM

Removing debug Printf line that mistakenly made its way in the patch.

cryptoad updated this revision to Diff 97878.May 4 2017, 2:23 PM

Getting rid of some extra spaces.

alekseyshl added inline comments.May 5 2017, 11:06 AM
lib/scudo/scudo_tls_android.h
28 ↗(On Diff #97878)

Can we rename UnusedSince to something else? It's a bit confusing to read the code, it is set when the context is, in fact, in use (by some other thread). How about InSlowLockQueueSince or SlowLockOrder or SlowLockPrecedence? Two latter ones abstract the fact that we use timespamp, which, to me, is a good thing,

lib/scudo/scudo_tls_linux.h
0

Is the explicit model necessary?

cryptoad added inline comments.May 5 2017, 11:17 AM
lib/scudo/scudo_tls_android.h
28 ↗(On Diff #97878)

Good idea. Will do.

lib/scudo/scudo_tls_linux.h
0

I talked to Dmitry offline about this.
When looking at the assembly, the portion of codes using ScudoThreadState looked like (on all archs):

                mov     r15, cs:_ZTHN7__scudo16ScudoThreadStateE_ptr
                test    r15, r15
                jz      short loc_7FD57
                call    __ZTHN7__scudo16ScudoThreadStateE
loc_7FD57:                              ; CODE XREF: __scudo::scudoMalloc(ulong,__scudo::AllocType)+20 j
                mov     r13, 0FFFFFFFFFFFFFFC0h
                cmp     byte ptr fs:[r13+0], 0
                jz      loc_80020
loc_7FD6A:                              ; CODE XREF: __scudo::scudoMalloc(ulong,__scudo::AllocType)+2F5 j

The compiler creates a weak symbol for the extern thread_local variables, looks them up, and if they exist, calls them, then proceeds with the regular flow of code.
The overhead has no use (those weak symbols are never defined), and goes away by explicitly specifying the model: it just read it from the TLS.

cryptoad updated this revision to Diff 97994.EditedMay 5 2017, 12:01 PM

Renaming UnusedSince to SlowLockPrecedence.

Also, since I am not 100% sure that the case where all the contexts would have
a 0 SlowLockPrecedence cannot happen, change the CHECK_NE to a fallback to
the context currently assigned to the thread.

alekseyshl added inline comments.May 5 2017, 12:13 PM
lib/scudo/scudo_tls.h
32

You don't need public here.

46

How about structuring it this way:

namespace __scudo {

#include "scudo_tls_context_android.inc"
#include "scudo_tls_context_linux.inc"

struct ALIGNED(64) ScudoThreadContext : public ScudoThreadContextPlatform {
  AllocatorCache Cache;
  Xorshift128Plus Prng;
  uptr QuarantineCachePlaceHolder[4];
};

void initThread();

// Fastpath functions are defined in the following platform specific headers.
#include "scudo_tls_android.inc"
#include "scudo_tls_linux.inc"

}  // namespace __scudo

You can define lock/unlock/etc functions inline on the struct and Linux version might not need them defined at all (hard to tell for me off the bat).

lib/scudo/scudo_tls_linux.h
0

Good to know. My understanding was that compiler is supposed to select the most restricted model which works for the code being compiled. Apparently, it's not always the case.

cryptoad updated this revision to Diff 98014.May 5 2017, 1:49 PM

Splitting the thread context definition based on the underlying platform.

cryptoad updated this revision to Diff 98015.May 5 2017, 2:02 PM

Reshuffling public & private, to leverage the default public-ness of structs.

cryptoad updated this revision to Diff 98016.May 5 2017, 2:05 PM

Forgot one last public.

alekseyshl accepted this revision.May 5 2017, 2:24 PM

LGTM

lib/scudo/scudo_tls_android.cpp
74

I'd call it just 'Precedence', to match ThreadContext. 'LowestPrecedence' usually means 'the last to pick' and here it's the other way.

This revision is now accepted and ready to land.May 5 2017, 2:24 PM
cryptoad updated this revision to Diff 98023.May 5 2017, 2:28 PM
cryptoad marked 6 inline comments as done.

Addressing Aleksey's comment (s/LowestPrecedence/Precedence).

cryptoad closed this revision.May 5 2017, 2:51 PM