This patch is adding the support for a shadow memory with
dynamically allocated address range.
The compiler-rt needs to export a symbol containing the shadow
memory range.
This is required to support ASAN on windows 64-bits.
Differential D23354
[asan] Support dynamic shadow address instrumentation etienneb on Aug 10 2016, 8:35 AM. Authored by
Details This patch is adding the support for a shadow memory with The compiler-rt needs to export a symbol containing the shadow This is required to support ASAN on windows 64-bits.
Diff Detail
Event Timeline
Comment Actions FYI: This should be landed with the patch I'm completing (compiler-rt changes to expose the shadow address).
Comment Actions Wait, wait, this utterly sucks. Comment Actions Windows memory implementation is not compatible with a static shadow address space without using ugly hacks. The shadow address spaces currently used for win64 is working on windows 10 but not on windows 7. When using a static address within the 8TB, there is no way to be sure the memory space is available. see: https://msdn.microsoft.com/en-us/library/jj835761.aspx /HIGHENTROPYVA By default, the linker sets this option for 64-bit executable images. I was able to make it work with a static shadow (within the 8TB limit), but executables need to turn off the high-entropy, which is a hack. After trying to find different way to hack the code to keep a static shadow, we decided to try the dynamic one and determine if it's working The only added cost is a load at every instrumented-function entry. Benchmark will come. I wanted to share this solution to start discussion. Comment Actions I'm pretty confident that Etienne has done a lot of research, and we have found that on Windows, DLLs are spread out widely across the 48-bit address space. We'll be lucky if its possible to satisfy a 16TB allocation, let alone at a fixed address. On top of that, Windows is a hostile environment where malware and AVs will come and inject DLLs into your process at unpredictable addresses. They don't stay clumped in the upper half like on Linux. We will need dynamic shadow, one way or another, if anyone ever wants to run an asan-ified executable to a machine that they don't control with any degree of reliability. Also, as he mentioned, having a dynamic shadow makes it easier to run the same ASan binary on pre-Win8.1 machines (44-bit address space) and Win8.1+ machines (48-bit address space). We could technically use one fixed offset for both address space layouts, but the size would be dynamic, so we would still need a lot of runtime complexity. I'm also skeptical that one load in the prologue is really that expensive. Use-after-return does a conditional branch in the prologue, and it's on by default now. Safestack does more work than that in the prologue, and your team found it had a negligible impact on performance. Anyway, Ettiene said he'd do a benchmark soon, we can how it goes. We really did spend a while thinking about this, and I really do think we want to bring back dynamic shadow mapping. Comment Actions All I've heard so far are very sad things, but they don't convince me. We must find a way to use static shadow. Comparing this change to whatever we do in the prologue is not correct since for dynamic shadow we BTW, why do we have to support all older windows versions? Comment Actions We're not trying to support all versions of Windows. But a significant portion of our user population is still on Win7/8/8.1, and not on Win10. We'd like to support them for as long as Chrome supports those platforms. No matter what static address you choose, there's a non-zero chance that something else will be there. Especially since all sorts of things out of control (third party code) likes to inject itself into Chrome's address space. Since we're trying to grab 12.5% of the address space, that chance is actually pretty high. In a lab it's fine for occasional failures to start due to address space collisions. On a users machine, not so much. SyzyASAN chose a dynamic shadow for this reason. We're gearing up to ship 64-bit ASAN to the canary Windows users, and shipping a product that can randomly fail to start is simply not going to happen. If the ability to run with a non-dynamic shadow is preserved as an instrumentation time choice, why can't we support both worlds? If you really want to tweak out the last small slice of performance in a lab scenario then feel free to use a static shadow. For shipping to users where the ability to reliably start trumps performance, then use a dynamic shadow. Comment Actions
Maybe that's not the right goal?
I perfectly understand that, but I have no expertise to know if there is no way to workaround this.
Of course.
We can. I am not convinced we've done everything to avoid it.
On the contrary. When testing Chrome in a lab, 10% of speed is "just" money. Comment Actions We performed benchmarking of static and dynamic instrumentation on windows and on linux 64-bits to compare speed and code size impact. Benchmarks used for the analysis are:
On windows 64-bits instrumented executables, we are observing a reduction in code size of about 3%. This can be explained by the fact that the shadow address is kept in a register, loaded in each function prologue. Thus, subsequent load instructions are smaller because they are not using 8-bytes for a fixed constant. The same gains are observed when using the dynamic instrumentation on linux 64-bits, even if the shadow address constant is able to fit in 4-bytes. Using different levels of optimisation (O0, O1 and O2) on linux 64-bits shows that there is a code size increase of about 20% on clang debug builds, but not on release builds. This loss is not observed on CPU2006 executables. By looking manually at a few functions, we figured out that functions with lots of local variables are creating high register pressure and the shadow address is not kept in a register and it is spilled on the stack, leading to an increase in function size. On the speed aspect we are observing a loss of about 3% on windows CPU2006 benchmarks. On the other benchmarks the speed variation is lost in the noise. Assembly code of static instrumentation: shr $0x3,%rax mov %rax,0x70(%rbx) cmpb $0x0,0x7fff8000(%rax) # bytes: 80 b8 00 80 ff 7f 00 (7-bytes) jne 591a90 <main+0x1140> Assembly code of dynamic instrumentation: prologue: lea 0x301d40(%rip),%rax # <__asan_shadow_memory_dynamic_address> mov (%rax),%rdx # %rdx contains shadow address [...] Instrumentation size: shr $0x3,%rax mov %rax,0x78(%rbx) cmpb $0x0,(%rax,%rdx,1) # bytes: 80 3c 10 00 (4-bytes, smaller) jne 58ecf9 <main+0x1139> We can observed the gain in code size on the cmpb instruction. Chromium code size benchmark results: CHROMIUM linux base static dynamic S/B D/B D/S bro 000311c6 001934c6 0018fb06 821% 814% 99% flatc 000635a6 002269f6 00220f66 554% 549% 99% chrome 04a49f06 110865d6 10ab2686 367% 359% 98% windows base static dynamic S/B D/B D/S bro 6A000 100000 FE000 242% 240% 99% flatc D6000 2BE000 2B1000 328% 322% 98% chrome 3E5B000 10FAD000 109FA000 436% 427% 98% c_child 3A09000 F69A000 F184000 425% 416% 98% On chromium benchmark, the code size is always slightly improving. CPU2006 codesize benchmark results: CPU2026 base static dynamic S/B D/B D/S windows /Od 470.lbm 21000 A3000 A3000 494% 494% 100% 482.sphinx3 55000 15A000 157000 407% 404% 99% 401.bzip2 2C000 EE000 EC000 541% 536% 99% /O1 470.lbm 1D000 97000 97000 521% 521% 100% 482.sphinx3 38000 F3000 F2000 434% 432% 100% 401.bzip2 1E000 AA000 AA000 567% 567% 100% /O2 470.lbm 1D000 97000 97000 521% 521% 100% 482.sphinx3 3C000 109000 107000 442% 438% 99% 401.bzip2 22000 BC000 BC000 553% 553% 100% On CPU2006 benchmarks, the code size is always slightly improving or about the same. Clang code size metrics: CLANG (Release) linux base static dynamic S/B D/B D/S clang-4.0 01df0c9a 0666a30a 063cebea 342% 333% 97% opt 00d48372 02d436aa 02c3fd3a 341% 333% 98% clang-format 000f3582 0053526a 00518e0a 548% 536% 98% llvm-link 001a34b2 00a4030a 00a04f1a 626% 612% 98% clang-tidy 00f682fa 0387f9ba 03700e2a 367% 357% 97% CLANG (Debug) linux base static dynamic S/B D/B D/S clang-4.0 043bbd30 04fa5b80 063cebea 118% 147% 125% opt 01aab360 02648e80 02c3fd3a 144% 166% 116% clang-format 0047aff0 0073a340 00518e0a 161% 114% 71% llvm-link 007618f0 00b2e8f0 00a04f1a 151% 136% 90% clang-tidy 026478d0 02d32dd0 03700e2a 118% 144% 122% On clang-benchmark, the code size is improving with Release builds and decreasing with Debug builds. Comment Actions Impressive measurement results, thank you.
Comment Actions thanks, could you also review this one: https://reviews.llvm.org/D23363 Comment Actions Re-opening this issue. [compiler-rt] r282096 - revert 282085, 281909, they broke 32-bit dynamic ASan and the sanitizer-windows bot I'm gonna fix it and re-land it. Comment Actions When testing with this patch on the Mac, I ran into a problem using -asan-force-dynamic-shadow. Specifically, Mapping.OrShadowOffset was not inferred correctly because it relies on checking kDynamicShadowSentinel and not the ClForceDynamicShadow. How about using ClForceDynamicShadow when setting the Mapping.Offset and gating the maybeInsertDynamicShadowAtFunctionEntry solely on the value of the Mapping.Offset? @@ -434,13 +434,19 @@ static ShadowMapping getShadowMapping(Triple &TargetTriple, int LongSize, ... else Mapping.Offset = kDefaultShadowOffset64; } + if (ClForceDynamicShadow) { + Mapping.Offset = kDynamicShadowSentinel; + } + Mapping.Scale = kDefaultShadowScale; if (ClMappingScale.getNumOccurrences() > 0) { Mapping.Scale = ClMappingScale; @@ -1802,7 +1808,7 @@ bool AddressSanitizer::maybeInsertAsanInitAtFunctionEntry(Function &F) { void AddressSanitizer::maybeInsertDynamicShadowAtFunctionEntry(Function &F) { // Generate code only when dynamic addressing is needed. - if (!ClForceDynamicShadow && Mapping.Offset != kDynamicShadowSentinel) + if (Mapping.Offset != kDynamicShadowSentinel) return; IRBuilder<> IRB(&F.front().front()); If you'd like, I can commit this as a separate patch. Comment Actions Thanks Anna. I modified this patch to include your changes. If you have some cycles, can you run the unittests with both patches on iOS. Tests are fine on windows 32/64 and linux 64. Comment Actions I've already tested this on iOS and Mac after applying additional patches to compiler-rt to get the FindAvailableMemoryRange working. I did not run into any regressions. Hope the commits stick this time around! |
Can we make a kDynamicShadowSentinel for this?