This is an archive of the discontinued LLVM Phabricator instance.

hwasan: Implement lazy thread initialization for the interceptor ABI.
ClosedPublic

Authored by pcc on Dec 21 2018, 4:52 PM.

Details

Summary

The problem is similar to D55986 but for threads: a process with the
interceptor hwasan library loaded might have some threads started by
instrumented libraries and some by uninstrumented libraries, and we
need to be able to run instrumented code on the latter.

The solution is to perform per-thread initialization lazily. If a
function needs to access shadow memory or add itself to the per-thread
ring buffer its prologue checks to see whether the value in the
sanitizer TLS slot is null, and if so it calls __hwasan_thread_enter
and reloads from the TLS slot. The runtime does the same thing if it
needs to access this data structure.

This change means that the code generator needs to know whether we
are targeting the interceptor runtime, since we don't want to pay
the cost of lazy initialization when targeting a platform with native
hwasan support. A flag -fsanitize-hwaddress-abi={interceptor,platform}
has been introduced for selecting the runtime ABI to target. The
default ABI is set to interceptor since it's assumed that it will
be more common that users will be compiling application code than
platform code.

Because we can no longer assume that the TLS slot is initialized,
the pthread_create interceptor is no longer necessary, so it has
been removed.

Ideally, lazy initialization should only cost one instruction in the
hot path, but at present the call may cause us to spill arguments
to the stack, which means more instructions in the hot path (or
theoretically in the cold path if the spills are moved with shrink
wrapping). With an appropriately chosen calling convention for
the per-thread initialization function (TODO) the hot path should
always need just one instruction and the cold path should need two
instructions with no spilling required.

Diff Detail

Event Timeline

pcc created this revision.Dec 21 2018, 4:52 PM
eugenis accepted this revision.Dec 27 2018, 2:38 PM

LGTM

llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
827

This could jump back to the earlier load to save an instruction (at the cost of one extra not-taken branch on the slow path).

Or even better, return the new value from the custom calling convention variant of __hwasan_thread_enter. Not in this change, of course.

This revision is now accepted and ready to land.Dec 27 2018, 2:38 PM
pcc added inline comments.Jan 4 2019, 10:50 AM
llvm/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp
827

You're right that it could jump back, but I think that I'd prefer not to add this complexity just to remove it later when we switch to the new calling convention.

And yes, the function with the new calling convention would return the thread long value. On ARM64 the code would look like this:

ldr x9, [x9, #64]
cbz .Linit
.Lcont:
...

.Linit:
bl __hwasan_lazy_thread_enter # returns in x9
b .Lcont
This revision was automatically updated to reflect the committed changes.