Prevent (excessive) expensive stack unwinding and depot lookups by introducing smaller caches external to the depots.
The stack trace depot is cached by "stack trace hash" -- a TLS value constantly updated by instrumentation in the MSAN LLVM pass. The origin depot is cached by a cheap 32-bit hash over the two IDs comprising the edge in the graph.
Each depot is accessed only after missing in a L1 and L2 cache. The L1 is small and per-thread, and the L2 is a few times larger and is global to all threads. Both are direct-mapped using the hash. On collision, the old entry is evicted.
In order to prevent stack unwinding in the case of a cache hit, MSan no longer passes StackTrace objects among its internal functions. Instead, I use a new StackUnwindCtx object which stores the parameters we need for unwinding on demand.