Page MenuHomePhabricator

[MSAN] Cache stack traces and chained origins
Needs ReviewPublic

Authored by guiand on Aug 4 2020, 10:35 AM.

Details

Summary

Prevent (excessive) expensive stack unwinding and depot lookups by introducing smaller caches external to the depots.

The stack trace depot is cached by "stack trace hash" -- a TLS value constantly updated by instrumentation in the MSAN LLVM pass. The origin depot is cached by a cheap 32-bit hash over the two IDs comprising the edge in the graph.

Each depot is accessed only after missing in a L1 and L2 cache. The L1 is small and per-thread, and the L2 is a few times larger and is global to all threads. Both are direct-mapped using the hash. On collision, the old entry is evicted.

In order to prevent stack unwinding in the case of a cache hit, MSan no longer passes StackTrace objects among its internal functions. Instead, I use a new StackUnwindCtx object which stores the parameters we need for unwinding on demand.

Diff Detail

Event Timeline

guiand created this revision.Aug 4 2020, 10:35 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 4 2020, 10:35 AM
Herald added subscribers: llvm-commits, Restricted Project, jfb, hiraditya. · View Herald Transcript
guiand requested review of this revision.Aug 4 2020, 10:35 AM
guiand added inline comments.Aug 4 2020, 10:37 AM
compiler-rt/lib/msan/msan_origin.h
128–129

I'm only not really sure what to do about these use counts. Updating them in the depot somewhat defeats the purpose of trying to prevent depot accesses through caching.

eugenis added inline comments.Aug 6 2020, 5:26 PM
compiler-rt/lib/msan/msan_origin.h
128–129

The purpose of this is to limit the number of new chained origins that one access trace can create. We could cache this value in l1 and l2, and when we are about to create a chained origin, double-check against the global stack depot and refresh the caches.

llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
1301

KMSan uses this SplitBlock thing to avoid visiting the prologue setup instructions, but it has a cost: it turns all allocas into dynamic allocas, as they are no longer in the entry basic block.

Try to avoid this by turning ActualFnStart into an instruction pointer, that should help with the code size and maybe performance a little.

This could be what is causing the local shadow addresses to be pre-calculated.

3672

The same should be done on ResumeInst and probably more, otherwise this would keep generating unique hash values in an exception-enabled code.

guiand updated this revision to Diff 284422.Aug 10 2020, 10:14 AM

Addressed comments. For handling number of uses per stack trace, this uses a bit of a heuristic:

For some unique origin count O, some unique stack trace count T, and some origins/trace limit L:

check_limit = (O > T * L * 7/8)
if (check_limit) {
  hnd = FindHandle()
  if (hnd.uses() > L * 1/8)
    return early
}
inserted = PutChainedOrigin()
if (check_limit && inserted)
  hnd.uses()++
eugenis added inline comments.Aug 13 2020, 4:19 PM
llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp
1109

You need to move ActualFnStart below the new prologue, otherwise ex. the TLS store store will get instrumented with an address check, and a shadow store.

That's why I suggested to make it Instruction* and tweak the visitor loop to skip earlier instruction in the starting BB.

guiand updated this revision to Diff 285703.Aug 14 2020, 11:29 AM
guiand retitled this revision from [Draft][MSAN] Cache stack traces and chained origins to [MSAN] Cache stack traces and chained origins.

Updated to depend on https://reviews.llvm.org/D85985 (ActualFnStart becomes a Instruction *)

guiand updated this revision to Diff 285719.Aug 14 2020, 11:44 AM

Added a test for hash codegen.