This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Instrumentation/
-
Transforms/
-
Instrumentation/
9/10
AddressSanitizer.cpp
-
test/Instrumentation/AddressSanitizer/
-
Instrumentation/
-
AddressSanitizer/
-
basic-msvc64.ll

Differential D23354

[asan] Support dynamic shadow address instrumentation
ClosedPublic

Authored by etienneb on Aug 10 2016, 8:35 AM.

Download Raw Diff

Details

Reviewers

kcc
vitalybuka
rnk

Commits

rG0ca05686049d: [asan] Support dynamic shadow address instrumentation
rG6ba5176862dc: [asan] Support dynamic shadow address instrumentation
rL282881: [asan] Support dynamic shadow address instrumentation
rL281908: [asan] Support dynamic shadow address instrumentation

Summary

This patch is adding the support for a shadow memory with
dynamically allocated address range.

The compiler-rt needs to export a symbol containing the shadow
memory range.

This is required to support ASAN on windows 64-bits.

Diff Detail

Build Status

Buildable 25
Build 25: arc lint + arc unit

Event Timeline

etienneb updated this revision to Diff 67526.Aug 10 2016, 8:35 AM

etienneb retitled this revision from to [compiler-rt] Suport dynamic shadow address instrumentation.

etienneb updated this object.

etienneb added reviewers: rnk, kcc.

etienneb added subscribers: • chrisha, llvm-commits.

Herald added a subscriber: dberris. · View Herald TranscriptAug 10 2016, 8:35 AM

nits

more nits

moar nits

rnk added inline comments.Aug 10 2016, 9:53 AM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
464	Can we make a kDynamicShadowSentinel for this?
945	LLVM typically uses right leaning pointers and references.
1822	Why create a local alloca and then load from it? This might end up being really slow in -O0 if we reload the shadow base from the stack before every user memory access. If we know we're in the entry block, we should be able to skip the alloca and change this to: Value *GlobalDynamicAddress = F.getParent()->getOrInsertGlobal( kAsanShadowMemoryDynamicAddress, IntptrTy); LocalDynamicShadow = IRB.CreateLoad(GlobalDynamicAddress); We can then skip the extra load in memToShadow and use LocalDynamicShadow directly.

fix rnk@ comments

FYI: This should be landed with the patch I'm completing (compiler-rt changes to expose the shadow address).

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1822	Turns out that it's more efficient to put it on the stack for -O1 and -O2. The alloca-slot is promoted to register for the scope of the function. This is more efficient than having a load from memory for every instrumented load/store. Even for /O0, the code should be better. For every instrumented load/store, the code is slighty better (loading from the stack instead of global address). The only added cost is the required instructions to copy the shadow address on the stack.

rnk added inline comments.Aug 10 2016, 10:42 AM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1822	I'm not proposing loading from the global on every shadow memory access, I'm saying you should load once from the global in the entry block and use that load everywhere, rather than spilling it to the stack immediately and reloading before every shadow check. With O1 and higher, mid-level optimizers will immediately do this transform for you, but it's simpler to emit the efficient code in the first place. I'm not sure what will happen at -O0 since the fast register allocator is pretty bad.

etienneb added inline comments.Aug 10 2016, 10:50 AM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1822	As discussed offline, as soon as the entry block dominates all other basic block.... using the result from "load" in the entry block should be enough. I'm gonna make a patch to try it. thx rnk for the explanation.

Wait, wait, this utterly sucks.
I need to know that we've exhausted all other possibilities before reviewing this.
Did we?

etienneb retitled this revision from [compiler-rt] Suport dynamic shadow address instrumentation to [compiler-rt] Support dynamic shadow address instrumentation.Aug 10 2016, 11:53 AM

etienneb edited edge metadata.

In D23354#511569, @kcc wrote:

Wait, wait, this utterly sucks.
I need to know that we've exhausted all other possibilities before reviewing this.
Did we?

Windows memory implementation is not compatible with a static shadow address space without using ugly hacks.

The shadow address spaces currently used for win64 is working on windows 10 but not on windows 7.
On different windows version, the address space size is not the same (8 TB vs 128 TB).
see: https://msdn.microsoft.com/en-ca/library/windows/desktop/aa366778(v=vs.85).aspx

When using a static address within the 8TB, there is no way to be sure the memory space is available.
The DLLs are loaded at random location. The high-entropy randomisation is activated and DLLs are loaded everywhere in the memory space.

see: https://msdn.microsoft.com/en-us/library/jj835761.aspx

/HIGHENTROPYVA
By default, the linker sets this option for 64-bit executable images.

I was able to make it work with a static shadow (within the 8TB limit), but executables need to turn off the high-entropy, which is a hack.
Turning off high-entropy is not recommended and even if it's working I can't guaranty it will always work. We observed that DLLs where
loaded in the lower address space.

After trying to find different way to hack the code to keep a static shadow, we decided to try the dynamic one and determine if it's working
and what is the extra cost added.

The only added cost is a load at every instrumented-function entry.
There is an added loaded and a register holding the shadow address is used;
which I believe won't make a huge difference.

Benchmark will come. I wanted to share this solution to start discussion.

In D23354#511569, @kcc wrote:

Wait, wait, this utterly sucks.
I need to know that we've exhausted all other possibilities before reviewing this.
Did we?

I'm pretty confident that Etienne has done a lot of research, and we have found that on Windows, DLLs are spread out widely across the 48-bit address space. We'll be lucky if its possible to satisfy a 16TB allocation, let alone at a fixed address.

On top of that, Windows is a hostile environment where malware and AVs will come and inject DLLs into your process at unpredictable addresses. They don't stay clumped in the upper half like on Linux. We will need dynamic shadow, one way or another, if anyone ever wants to run an asan-ified executable to a machine that they don't control with any degree of reliability.

Also, as he mentioned, having a dynamic shadow makes it easier to run the same ASan binary on pre-Win8.1 machines (44-bit address space) and Win8.1+ machines (48-bit address space). We could technically use one fixed offset for both address space layouts, but the size would be dynamic, so we would still need a lot of runtime complexity.

I'm also skeptical that one load in the prologue is really that expensive. Use-after-return does a conditional branch in the prologue, and it's on by default now. Safestack does more work than that in the prologue, and your team found it had a negligible impact on performance. Anyway, Ettiene said he'd do a benchmark soon, we can how it goes.

We really did spend a while thinking about this, and I really do think we want to bring back dynamic shadow mapping.

All I've heard so far are very sad things, but they don't convince me.
I believe we have Microsoft folks in the community now. Please summon some of them to this discussion.

We must find a way to use static shadow.
My estimate is that dynamic shadow will cost us 10% in CPU and 5% in code size at the very least.
These are just guesses, so benchmark results are more than welcome.

Comparing this change to whatever we do in the prologue is not correct since for dynamic shadow we
a) steal a register for the entire function
b) complicate the address arithmetic for every memory access.

BTW, why do we have to support all older windows versions?
For me it would be fine to only support whatever the current version is and the future ones.
At least as the first step.

In D23354#511712, @kcc wrote:

All I've heard so far are very sad things, but they don't convince me.
I believe we have Microsoft folks in the community now. Please summon some of them to this discussion.

We must find a way to use static shadow.
My estimate is that dynamic shadow will cost us 10% in CPU and 5% in code size at the very least.
These are just guesses, so benchmark results are more than welcome.

Comparing this change to whatever we do in the prologue is not correct since for dynamic shadow we
a) steal a register for the entire function
b) complicate the address arithmetic for every memory access.

BTW, why do we have to support all older windows versions?
For me it would be fine to only support whatever the current version is and the future ones.
At least as the first step.

We're not trying to support all versions of Windows. But a significant portion of our user population is still on Win7/8/8.1, and not on Win10. We'd like to support them for as long as Chrome supports those platforms.

No matter what static address you choose, there's a non-zero chance that something else will be there. Especially since all sorts of things out of control (third party code) likes to inject itself into Chrome's address space. Since we're trying to grab 12.5% of the address space, that chance is actually pretty high.

In a lab it's fine for occasional failures to start due to address space collisions. On a users machine, not so much. SyzyASAN chose a dynamic shadow for this reason. We're gearing up to ship 64-bit ASAN to the canary Windows users, and shipping a product that can randomly fail to start is simply not going to happen.

If the ability to run with a non-dynamic shadow is preserved as an instrumentation time choice, why can't we support both worlds? If you really want to tweak out the last small slice of performance in a lab scenario then feel free to use a static shadow. For shipping to users where the ability to reliably start trumps performance, then use a dynamic shadow.

We're not trying to support all versions of Windows. But a significant portion of our user population is still on Win7/8/8.1, and not on Win10. We'd like to support them for as long as Chrome supports those platforms.

Maybe that's not the right goal?
Win7/8/8.1 tend to have slower machines and giving them asan-ified binaries *might* be a bad idea anyway.

No matter what static address you choose, there's a non-zero chance that something else will be there. Especially since all sorts of things out of control (third party code) likes to inject itself into Chrome's address space. Since we're trying to grab 12.5% of the address space, that chance is actually pretty high.

I perfectly understand that, but I have no expertise to know if there is no way to workaround this.
This is why I am asking someone from Microsoft to help.

In a lab it's fine for occasional failures to start due to address space collisions. On a users machine, not so much. SyzyASAN chose a dynamic shadow for this reason. We're gearing up to ship 64-bit ASAN to the canary Windows users, and shipping a product that can randomly fail to start is simply not going to happen.

Of course.

If the ability to run with a non-dynamic shadow is preserved as an instrumentation time choice, why can't we support both worlds?

We can. I am not convinced we've done everything to avoid it.

If you really want to tweak out the last small slice of performance in a lab scenario then feel free to use a static shadow. For shipping to users where the ability to reliably start trumps performance, then use a dynamic shadow.

On the contrary. When testing Chrome in a lab, 10% of speed is "just" money.
On a user machine it might be a difference between tolerable and intolerable UI speed.
Same for binary size.

address rnk@ comment

We performed benchmarking of static and dynamic instrumentation on windows and on linux 64-bits to compare speed and code size impact.

Benchmarks used for the analysis are:

Chromium release build,
Clang debug and release build,
and some executables of the CPU2006 suite.

On windows 64-bits instrumented executables, we are observing a reduction in code size of about 3%. This can be explained by the fact that the shadow address is kept in a register, loaded in each function prologue. Thus, subsequent load instructions are smaller because they are not using 8-bytes for a fixed constant.

The same gains are observed when using the dynamic instrumentation on linux 64-bits, even if the shadow address constant is able to fit in 4-bytes.

Using different levels of optimisation (O0, O1 and O2) on linux 64-bits shows that there is a code size increase of about 20% on clang debug builds, but not on release builds. This loss is not observed on CPU2006 executables. By looking manually at a few functions, we figured out that functions with lots of local variables are creating high register pressure and the shadow address is not kept in a register and it is spilled on the stack, leading to an increase in function size.

On the speed aspect we are observing a loss of about 3% on windows CPU2006 benchmarks. On the other benchmarks the speed variation is lost in the noise.

Assembly code of static instrumentation:

shr    $0x3,%rax
mov    %rax,0x70(%rbx)
cmpb   $0x0,0x7fff8000(%rax)        # bytes: 80 b8 00 80 ff 7f 00 (7-bytes)
jne    591a90 <main+0x1140>

Assembly code of dynamic instrumentation:

prologue:
  lea    0x301d40(%rip),%rax        # <__asan_shadow_memory_dynamic_address>
  mov    (%rax),%rdx                # %rdx contains shadow address

[...]
Instrumentation size:
  shr    $0x3,%rax
  mov    %rax,0x78(%rbx)
  cmpb   $0x0,(%rax,%rdx,1)         # bytes: 80 3c 10 00 (4-bytes, smaller)
  jne    58ecf9 <main+0x1139>

We can observed the gain in code size on the cmpb instruction.

Chromium code size benchmark results:

CHROMIUM

linux
       base      static   dynamic     S/B    D/B   D/S
bro    000311c6	001934c6 0018fb06   821%   814%  99%
flatc  000635a6	002269f6 00220f66   554%   549%  99%
chrome 04a49f06	110865d6 10ab2686   367%   359%  98%

windows
       base      static   dynamic   S/B    D/B   D/S
bro       6A000   100000    FE000   242%   240%  99%
flatc     D6000   2BE000   2B1000   328%   322%  98%
chrome  3E5B000 10FAD000 109FA000   436%   427%  98%
c_child 3A09000  F69A000  F184000   425%   416%  98%

On chromium benchmark, the code size is always slightly improving.

CPU2006 codesize benchmark results:

CPU2026              
                  base  static dynamic  S/B   D/B    D/S
windows              
/Od 470.lbm      21000   A3000  A3000   494%  494%  100%
    482.sphinx3  55000  15A000  157000  407%  404%   99%
    401.bzip2    2C000   EE000  EC000   541%  536%   99%
              
/O1 470.lbm      1D000   97000  97000   521%  521%  100%
    482.sphinx3  38000   F3000  F2000   434%  432%  100%
    401.bzip2    1E000   AA000  AA000   567%  567%  100%
              
/O2 470.lbm      1D000   97000  97000   521%  521%  100%
    482.sphinx3  3C000  109000  107000  442%  438%   99%
    401.bzip2    22000   BC000  BC000   553%  553%  100%

On CPU2006 benchmarks, the code size is always slightly improving or about the same.

Clang code size metrics:

CLANG (Release) linux
                  base    static   dynamic  S/B   D/B   D/S  
clang-4.0     01df0c9a  0666a30a  063cebea  342%  333%  97%
opt           00d48372  02d436aa  02c3fd3a  341%  333%  98%
clang-format  000f3582  0053526a  00518e0a  548%  536%  98%
llvm-link     001a34b2  00a4030a  00a04f1a  626%  612%  98%
clang-tidy    00f682fa  0387f9ba  03700e2a  367%  357%  97%

CLANG (Debug) linux
                  base    static   dynamic  S/B   D/B   D/S  
clang-4.0     043bbd30  04fa5b80  063cebea  118%  147%  125%
opt           01aab360  02648e80  02c3fd3a  144%  166%  116%
clang-format  0047aff0  0073a340  00518e0a  161%  114%  71%
llvm-link     007618f0  00b2e8f0  00a04f1a  151%  136%  90%
clang-tidy    026478d0  02d32dd0  03700e2a  118%  144%  122%

On clang-benchmark, the code size is improving with Release builds and decreasing with Debug builds.

Impressive measurement results, thank you.
Let's go forward, Vitaly, please help review this further.

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1861	Why is this here? It looks like it belongs to another change. If so, please make a separate CL.

etienneb added inline comments.Sep 13 2016, 10:52 AM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1861	Since the code is using a dynamic shadow, I've found a flaky bug hidden below layers. IIRC, there is a call to `__asan_default_options` at the beginning of __asan_init, BEFORE the shadow initialization. To avoid the bug, users should add the appropriate attribute to avoid instrumentation of that function, which is error prone. This is a way to avoid instrumenting these functions. I'm not proud of that fix, and willing to other proposition. Like, maybe only banning a fixed set.

kcc added inline comments.Sep 13 2016, 2:26 PM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1861	Sure, but please make it a separate change. I am totally fine to just ignore all the functions that start with __asan_

vitalybuka added inline comments.Sep 14 2016, 10:18 PM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1007	I is not null, so just if (LocalDynamicShadow == I)

rebased + cleanup

kubamracek added a subscriber: kubamracek.Sep 15 2016, 9:53 AM

fix incorrect merge

Please replace in title [compiler-rt] -> [asan]
LGTM

This revision is now accepted and ready to land.Sep 15 2016, 11:11 AM

thanks, could you also review this one: https://reviews.llvm.org/D23363
Both patches are needed.

etienneb closed this revision.Sep 19 2016, 9:07 AM

Re-opening this issue.

[compiler-rt] r282096 - revert 282085, 281909, they broke 32-bit dynamic ASan and the sanitizer-windows bot

I'm gonna fix it and re-land it.

This revision is now accepted and ready to land.Sep 30 2016, 9:00 AM

etienneb added a subscriber: zaks.anna.Sep 30 2016, 9:01 AM

When testing with this patch on the Mac, I ran into a problem using -asan-force-dynamic-shadow. Specifically, Mapping.OrShadowOffset was not inferred correctly because it relies on checking kDynamicShadowSentinel and not the ClForceDynamicShadow. How about using ClForceDynamicShadow when setting the Mapping.Offset and gating the maybeInsertDynamicShadowAtFunctionEntry solely on the value of the Mapping.Offset?

@@ -434,13 +434,19 @@ static ShadowMapping getShadowMapping(Triple &TargetTriple, int LongSize,
...
     else
       Mapping.Offset = kDefaultShadowOffset64;
   }
 
+  if (ClForceDynamicShadow) {
+    Mapping.Offset = kDynamicShadowSentinel;
+  }
+
   Mapping.Scale = kDefaultShadowScale;
   if (ClMappingScale.getNumOccurrences() > 0) {
     Mapping.Scale = ClMappingScale;
@@ -1802,7 +1808,7 @@ bool AddressSanitizer::maybeInsertAsanInitAtFunctionEntry(Function &F) {
 
 void AddressSanitizer::maybeInsertDynamicShadowAtFunctionEntry(Function &F) {
   // Generate code only when dynamic addressing is needed.
-  if (!ClForceDynamicShadow && Mapping.Offset != kDynamicShadowSentinel)
+  if (Mapping.Offset != kDynamicShadowSentinel)
     return;
 
   IRBuilder<> IRB(&F.front().front());

If you'd like, I can commit this as a separate patch.

apply anna's patch.

Thanks Anna. I modified this patch to include your changes.

If you have some cycles, can you run the unittests with both patches on iOS.
I can't test on that platform.

Tests are fine on windows 32/64 and linux 64.

In D23354#557270, @zaks.anna wrote:
When testing with this patch on the Mac, I ran into a problem using -asan-force-dynamic-shadow. Specifically, Mapping.OrShadowOffset was not inferred correctly because it relies on checking kDynamicShadowSentinel and not the ClForceDynamicShadow. How about using ClForceDynamicShadow when setting the Mapping.Offset and gating the maybeInsertDynamicShadowAtFunctionEntry solely on the value of the Mapping.Offset?
@@ -434,13 +434,19 @@ static ShadowMapping getShadowMapping(Triple &TargetTriple, int LongSize,
...
     else
       Mapping.Offset = kDefaultShadowOffset64;
   }
 
+  if (ClForceDynamicShadow) {
+    Mapping.Offset = kDynamicShadowSentinel;
+  }
+
   Mapping.Scale = kDefaultShadowScale;
   if (ClMappingScale.getNumOccurrences() > 0) {
     Mapping.Scale = ClMappingScale;
@@ -1802,7 +1808,7 @@ bool AddressSanitizer::maybeInsertAsanInitAtFunctionEntry(Function &F) {
 
 void AddressSanitizer::maybeInsertDynamicShadowAtFunctionEntry(Function &F) {
   // Generate code only when dynamic addressing is needed.
-  if (!ClForceDynamicShadow && Mapping.Offset != kDynamicShadowSentinel)
+  if (Mapping.Offset != kDynamicShadowSentinel)
     return;
 
   IRBuilder<> IRB(&F.front().front());
If you'd like, I can commit this as a separate patch.

fix rnk@ comments

etienneb closed this revision.Sep 30 2016, 10:55 AM

I've already tested this on iOS and Mac after applying additional patches to compiler-rt to get the FindAvailableMemoryRange working. I did not run into any regressions. Hope the commits stick this time around!

Revision Contents

Path

Size

lib/

Transforms/

Instrumentation/

AddressSanitizer.cpp

57 lines

test/

Instrumentation/

AddressSanitizer/

basic-msvc64.ll

35 lines

Diff 73074

lib/Transforms/Instrumentation/AddressSanitizer.cpp

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "asan"		#define DEBUG_TYPE "asan"

static const uint64_t kDefaultShadowScale = 3;		static const uint64_t kDefaultShadowScale = 3;
static const uint64_t kDefaultShadowOffset32 = 1ULL << 29;		static const uint64_t kDefaultShadowOffset32 = 1ULL << 29;
static const uint64_t kDefaultShadowOffset64 = 1ULL << 44;		static const uint64_t kDefaultShadowOffset64 = 1ULL << 44;
		static const uint64_t kDynamicShadowSentinel = ~(uint64_t)0;
static const uint64_t kIOSShadowOffset32 = 1ULL << 30;		static const uint64_t kIOSShadowOffset32 = 1ULL << 30;
static const uint64_t kIOSShadowOffset64 = 0x120200000;		static const uint64_t kIOSShadowOffset64 = 0x120200000;
static const uint64_t kIOSSimShadowOffset32 = 1ULL << 30;		static const uint64_t kIOSSimShadowOffset32 = 1ULL << 30;
static const uint64_t kIOSSimShadowOffset64 = kDefaultShadowOffset64;		static const uint64_t kIOSSimShadowOffset64 = kDefaultShadowOffset64;
static const uint64_t kSmallX86_64ShadowOffset = 0x7FFF8000; // < 2G.		static const uint64_t kSmallX86_64ShadowOffset = 0x7FFF8000; // < 2G.
static const uint64_t kLinuxKasan_ShadowOffset64 = 0xdffffc0000000000;		static const uint64_t kLinuxKasan_ShadowOffset64 = 0xdffffc0000000000;
static const uint64_t kPPC64_ShadowOffset64 = 1ULL << 41;		static const uint64_t kPPC64_ShadowOffset64 = 1ULL << 41;
static const uint64_t kSystemZ_ShadowOffset64 = 1ULL << 52;		static const uint64_t kSystemZ_ShadowOffset64 = 1ULL << 52;
static const uint64_t kMIPS32_ShadowOffset32 = 0x0aaa0000;		static const uint64_t kMIPS32_ShadowOffset32 = 0x0aaa0000;
static const uint64_t kMIPS64_ShadowOffset64 = 1ULL << 37;		static const uint64_t kMIPS64_ShadowOffset64 = 1ULL << 37;
static const uint64_t kAArch64_ShadowOffset64 = 1ULL << 36;		static const uint64_t kAArch64_ShadowOffset64 = 1ULL << 36;
static const uint64_t kFreeBSD_ShadowOffset32 = 1ULL << 30;		static const uint64_t kFreeBSD_ShadowOffset32 = 1ULL << 30;
static const uint64_t kFreeBSD_ShadowOffset64 = 1ULL << 46;		static const uint64_t kFreeBSD_ShadowOffset64 = 1ULL << 46;
static const uint64_t kWindowsShadowOffset32 = 3ULL << 28;		static const uint64_t kWindowsShadowOffset32 = 3ULL << 28;
// TODO(wwchrome): Experimental for asan Win64, may change.		// The shadow memory space is dynamically allocated.
static const uint64_t kWindowsShadowOffset64 = 0x1ULL << 45; // 32TB.		static const uint64_t kWindowsShadowOffset64 = kDynamicShadowSentinel;

static const size_t kMinStackMallocSize = 1 << 6; // 64B		static const size_t kMinStackMallocSize = 1 << 6; // 64B
static const size_t kMaxStackMallocSize = 1 << 16; // 64K		static const size_t kMaxStackMallocSize = 1 << 16; // 64K
static const uintptr_t kCurrentStackFrameMagic = 0x41B58AB3;		static const uintptr_t kCurrentStackFrameMagic = 0x41B58AB3;
static const uintptr_t kRetiredStackFrameMagic = 0x45E0360E;		static const uintptr_t kRetiredStackFrameMagic = 0x45E0360E;

static const char *const kAsanModuleCtorName = "asan.module_ctor";		static const char *const kAsanModuleCtorName = "asan.module_ctor";
static const char *const kAsanModuleDtorName = "asan.module_dtor";		static const char *const kAsanModuleDtorName = "asan.module_dtor";
Show All 26 Lines
static const char *const kAsanUnpoisonStackMemoryName =		static const char *const kAsanUnpoisonStackMemoryName =
"__asan_unpoison_stack_memory";		"__asan_unpoison_stack_memory";
static const char *const kAsanGlobalsRegisteredFlagName =		static const char *const kAsanGlobalsRegisteredFlagName =
"__asan_globals_registered";		"__asan_globals_registered";

static const char *const kAsanOptionDetectUseAfterReturn =		static const char *const kAsanOptionDetectUseAfterReturn =
"__asan_option_detect_stack_use_after_return";		"__asan_option_detect_stack_use_after_return";

		static const char *const kAsanShadowMemoryDynamicAddress =
		"__asan_shadow_memory_dynamic_address";

static const char *const kAsanAllocaPoison = "__asan_alloca_poison";		static const char *const kAsanAllocaPoison = "__asan_alloca_poison";
static const char *const kAsanAllocasUnpoison = "__asan_allocas_unpoison";		static const char *const kAsanAllocasUnpoison = "__asan_allocas_unpoison";

// Accesses sizes are powers of two: 1, 2, 4, 8, 16.		// Accesses sizes are powers of two: 1, 2, 4, 8, 16.
static const size_t kNumberOfAccessSizes = 5;		static const size_t kNumberOfAccessSizes = 5;

static const unsigned kAllocaRzSize = 32;		static const unsigned kAllocaRzSize = 32;

Show All 16 Lines
static cl::opt<bool> ClInstrumentAtomics(		static cl::opt<bool> ClInstrumentAtomics(
"asan-instrument-atomics",		"asan-instrument-atomics",
cl::desc("instrument atomic instructions (rmw, cmpxchg)"), cl::Hidden,		cl::desc("instrument atomic instructions (rmw, cmpxchg)"), cl::Hidden,
cl::init(true));		cl::init(true));
static cl::opt<bool> ClAlwaysSlowPath(		static cl::opt<bool> ClAlwaysSlowPath(
"asan-always-slow-path",		"asan-always-slow-path",
cl::desc("use instrumentation with slow path for all accesses"), cl::Hidden,		cl::desc("use instrumentation with slow path for all accesses"), cl::Hidden,
cl::init(false));		cl::init(false));
		static cl::opt<bool> ClForceDynamicShadow(
		"asan-force-dynamic-shadow",
		cl::desc("Load shadow address into a local variable for each function"),
		cl::Hidden, cl::init(false));

// This flag limits the number of instructions to be instrumented		// This flag limits the number of instructions to be instrumented
// in any given BB. Normally, this should be set to unlimited (INT_MAX),		// in any given BB. Normally, this should be set to unlimited (INT_MAX),
// but due to http://llvm.org/bugs/show_bug.cgi?id=12652 we temporary		// but due to http://llvm.org/bugs/show_bug.cgi?id=12652 we temporary
// set it to 10000.		// set it to 10000.
static cl::opt<int> ClMaxInsnsToInstrumentPerBB(		static cl::opt<int> ClMaxInsnsToInstrumentPerBB(
"asan-max-ins-per-bb", cl::init(10000),		"asan-max-ins-per-bb", cl::init(10000),
cl::desc("maximal number of instructions to instrument in any given BB"),		cl::desc("maximal number of instructions to instrument in any given BB"),
cl::Hidden);		cl::Hidden);
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	else if (IsIOS)
// If we're targeting iOS and x86, the binary is built for iOS simulator.		// If we're targeting iOS and x86, the binary is built for iOS simulator.
Mapping.Offset = IsX86_64 ? kIOSSimShadowOffset64 : kIOSShadowOffset64;		Mapping.Offset = IsX86_64 ? kIOSSimShadowOffset64 : kIOSShadowOffset64;
else if (IsAArch64)		else if (IsAArch64)
Mapping.Offset = kAArch64_ShadowOffset64;		Mapping.Offset = kAArch64_ShadowOffset64;
else		else
Mapping.Offset = kDefaultShadowOffset64;		Mapping.Offset = kDefaultShadowOffset64;
}		}

		if (ClForceDynamicShadow) {
		Mapping.Offset = kDynamicShadowSentinel;
		}

Mapping.Scale = kDefaultShadowScale;		Mapping.Scale = kDefaultShadowScale;
if (ClMappingScale.getNumOccurrences() > 0) {		if (ClMappingScale.getNumOccurrences() > 0) {
Mapping.Scale = ClMappingScale;		Mapping.Scale = ClMappingScale;
}		}

if (ClMappingOffset.getNumOccurrences() > 0) {		if (ClMappingOffset.getNumOccurrences() > 0) {
Mapping.Offset = ClMappingOffset;		Mapping.Offset = ClMappingOffset;
}		}

// OR-ing shadow offset if more efficient (at least on x86) if the offset		// OR-ing shadow offset if more efficient (at least on x86) if the offset
// is a power of two, but on ppc64 we have to use add since the shadow		// is a power of two, but on ppc64 we have to use add since the shadow
// offset is not necessary 1/8-th of the address space. On SystemZ,		// offset is not necessary 1/8-th of the address space. On SystemZ,
// we could OR the constant in a single instruction, but it's more		// we could OR the constant in a single instruction, but it's more
// efficient to load it once and use indexed addressing.		// efficient to load it once and use indexed addressing.
Mapping.OrShadowOffset = !IsAArch64 && !IsPPC64 && !IsSystemZ		Mapping.OrShadowOffset = !IsAArch64 && !IsPPC64 && !IsSystemZ
&& !(Mapping.Offset & (Mapping.Offset - 1));		&& !(Mapping.Offset & (Mapping.Offset - 1))
		&& Mapping.Offset != kDynamicShadowSentinel;
		rnkUnsubmitted Done Reply Inline Actions Can we make a kDynamicShadowSentinel for this? rnk: Can we make a kDynamicShadowSentinel for this?

return Mapping;		return Mapping;
}		}

static size_t RedzoneSizeForScale(int MappingScale) {		static size_t RedzoneSizeForScale(int MappingScale) {
// Redzone used for stack and globals is at least 32 bytes.		// Redzone used for stack and globals is at least 32 bytes.
// For scales 6 and 7, the redzone has to be 64 and 128 bytes respectively.		// For scales 6 and 7, the redzone has to be 64 and 128 bytes respectively.
return std::max(32U, 1U << MappingScale);		return std::max(32U, 1U << MappingScale);
}		}

/// AddressSanitizer: instrument the code in module to find memory bugs.		/// AddressSanitizer: instrument the code in module to find memory bugs.
struct AddressSanitizer : public FunctionPass {		struct AddressSanitizer : public FunctionPass {
explicit AddressSanitizer(bool CompileKernel = false, bool Recover = false,		explicit AddressSanitizer(bool CompileKernel = false, bool Recover = false,
bool UseAfterScope = false)		bool UseAfterScope = false)
: FunctionPass(ID), CompileKernel(CompileKernel \|\| ClEnableKasan),		: FunctionPass(ID), CompileKernel(CompileKernel \|\| ClEnableKasan),
Recover(Recover \|\| ClRecover),		Recover(Recover \|\| ClRecover),
UseAfterScope(UseAfterScope \|\| ClUseAfterScope) {		UseAfterScope(UseAfterScope \|\| ClUseAfterScope),
		LocalDynamicShadow(nullptr) {
initializeAddressSanitizerPass(*PassRegistry::getPassRegistry());		initializeAddressSanitizerPass(*PassRegistry::getPassRegistry());
}		}
const char *getPassName() const override {		const char *getPassName() const override {
return "AddressSanitizerFunctionPass";		return "AddressSanitizerFunctionPass";
}		}
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
Show All 31 Lines	Value createSlowPathCmp(IRBuilder<> &IRB, Value AddrLong,
Value *ShadowValue, uint32_t TypeSize);		Value *ShadowValue, uint32_t TypeSize);
Instruction generateCrashCode(Instruction InsertBefore, Value *Addr,		Instruction generateCrashCode(Instruction InsertBefore, Value *Addr,
bool IsWrite, size_t AccessSizeIndex,		bool IsWrite, size_t AccessSizeIndex,
Value *SizeArgument, uint32_t Exp);		Value *SizeArgument, uint32_t Exp);
void instrumentMemIntrinsic(MemIntrinsic *MI);		void instrumentMemIntrinsic(MemIntrinsic *MI);
Value memToShadow(Value Shadow, IRBuilder<> &IRB);		Value memToShadow(Value Shadow, IRBuilder<> &IRB);
bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;
bool maybeInsertAsanInitAtFunctionEntry(Function &F);		bool maybeInsertAsanInitAtFunctionEntry(Function &F);
		void maybeInsertDynamicShadowAtFunctionEntry(Function &F);
void markEscapedLocalAllocas(Function &F);		void markEscapedLocalAllocas(Function &F);
bool doInitialization(Module &M) override;		bool doInitialization(Module &M) override;
bool doFinalization(Module &M) override;		bool doFinalization(Module &M) override;
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

DominatorTree &getDominatorTree() const { return *DT; }		DominatorTree &getDominatorTree() const { return *DT; }

private:		private:
void initializeCallbacks(Module &M);		void initializeCallbacks(Module &M);

bool LooksLikeCodeInBug11395(Instruction *I);		bool LooksLikeCodeInBug11395(Instruction *I);
bool GlobalIsLinkerInitialized(GlobalVariable *G);		bool GlobalIsLinkerInitialized(GlobalVariable *G);
bool isSafeAccess(ObjectSizeOffsetVisitor &ObjSizeVis, Value *Addr,		bool isSafeAccess(ObjectSizeOffsetVisitor &ObjSizeVis, Value *Addr,
uint64_t TypeSize) const;		uint64_t TypeSize) const;

/// Helper to cleanup per-function state.		/// Helper to cleanup per-function state.
struct FunctionStateRAII {		struct FunctionStateRAII {
AddressSanitizer *Pass;		AddressSanitizer *Pass;
FunctionStateRAII(AddressSanitizer *Pass) : Pass(Pass) {		FunctionStateRAII(AddressSanitizer *Pass) : Pass(Pass) {
assert(Pass->ProcessedAllocas.empty() &&		assert(Pass->ProcessedAllocas.empty() &&
"last pass forgot to clear cache");		"last pass forgot to clear cache");
		assert(!Pass->LocalDynamicShadow);
		}
		~FunctionStateRAII() {
		Pass->LocalDynamicShadow = nullptr;
		Pass->ProcessedAllocas.clear();
}		}
~FunctionStateRAII() { Pass->ProcessedAllocas.clear(); }
};		};

LLVMContext *C;		LLVMContext *C;
Triple TargetTriple;		Triple TargetTriple;
int LongSize;		int LongSize;
bool CompileKernel;		bool CompileKernel;
bool Recover;		bool Recover;
bool UseAfterScope;		bool UseAfterScope;
Type *IntptrTy;		Type *IntptrTy;
ShadowMapping Mapping;		ShadowMapping Mapping;
DominatorTree *DT;		DominatorTree *DT;
Function *AsanCtorFunction = nullptr;		Function *AsanCtorFunction = nullptr;
Function *AsanInitFunction = nullptr;		Function *AsanInitFunction = nullptr;
Function *AsanHandleNoReturnFunc;		Function *AsanHandleNoReturnFunc;
Function AsanPtrCmpFunction, AsanPtrSubFunction;		Function AsanPtrCmpFunction, AsanPtrSubFunction;
// This array is indexed by AccessIsWrite, Experiment and log2(AccessSize).		// This array is indexed by AccessIsWrite, Experiment and log2(AccessSize).
Function *AsanErrorCallback[2][2][kNumberOfAccessSizes];		Function *AsanErrorCallback[2][2][kNumberOfAccessSizes];
Function *AsanMemoryAccessCallback[2][2][kNumberOfAccessSizes];		Function *AsanMemoryAccessCallback[2][2][kNumberOfAccessSizes];
// This array is indexed by AccessIsWrite and Experiment.		// This array is indexed by AccessIsWrite and Experiment.
Function *AsanErrorCallbackSized[2][2];		Function *AsanErrorCallbackSized[2][2];
Function *AsanMemoryAccessCallbackSized[2][2];		Function *AsanMemoryAccessCallbackSized[2][2];
Function AsanMemmove, AsanMemcpy, *AsanMemset;		Function AsanMemmove, AsanMemcpy, *AsanMemset;
InlineAsm *EmptyAsm;		InlineAsm *EmptyAsm;
		Value *LocalDynamicShadow;
GlobalsMetadata GlobalsMD;		GlobalsMetadata GlobalsMD;
DenseMap<const AllocaInst *, bool> ProcessedAllocas;		DenseMap<const AllocaInst *, bool> ProcessedAllocas;

friend struct FunctionStackPoisoner;		friend struct FunctionStackPoisoner;
};		};

class AddressSanitizerModule : public ModulePass {		class AddressSanitizerModule : public ModulePass {
public:		public:
▲ Show 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	static bool GlobalWasGeneratedByCompiler(GlobalVariable *G) {
return false;		return false;
}		}

Value AddressSanitizer::memToShadow(Value Shadow, IRBuilder<> &IRB) {		Value AddressSanitizer::memToShadow(Value Shadow, IRBuilder<> &IRB) {
// Shadow >> scale		// Shadow >> scale
Shadow = IRB.CreateLShr(Shadow, Mapping.Scale);		Shadow = IRB.CreateLShr(Shadow, Mapping.Scale);
if (Mapping.Offset == 0) return Shadow;		if (Mapping.Offset == 0) return Shadow;
// (Shadow >> scale) \| offset		// (Shadow >> scale) \| offset
		Value *ShadowBase;
		rnkUnsubmitted Done Reply Inline Actions LLVM typically uses right leaning pointers and references. rnk: LLVM typically uses right leaning pointers and references.
		if (LocalDynamicShadow)
		ShadowBase = LocalDynamicShadow;
		else
		ShadowBase = ConstantInt::get(IntptrTy, Mapping.Offset);
if (Mapping.OrShadowOffset)		if (Mapping.OrShadowOffset)
return IRB.CreateOr(Shadow, ConstantInt::get(IntptrTy, Mapping.Offset));		return IRB.CreateOr(Shadow, ShadowBase);
else		else
return IRB.CreateAdd(Shadow, ConstantInt::get(IntptrTy, Mapping.Offset));		return IRB.CreateAdd(Shadow, ShadowBase);
}		}

// Instrument memset/memmove/memcpy		// Instrument memset/memmove/memcpy
void AddressSanitizer::instrumentMemIntrinsic(MemIntrinsic *MI) {		void AddressSanitizer::instrumentMemIntrinsic(MemIntrinsic *MI) {
IRBuilder<> IRB(MI);		IRBuilder<> IRB(MI);
if (isa<MemTransferInst>(MI)) {		if (isa<MemTransferInst>(MI)) {
IRB.CreateCall(		IRB.CreateCall(
isa<MemMoveInst>(MI) ? AsanMemmove : AsanMemcpy,		isa<MemMoveInst>(MI) ? AsanMemmove : AsanMemcpy,
Show All 36 Lines
/// and set IsWrite/Alignment. Otherwise return nullptr.		/// and set IsWrite/Alignment. Otherwise return nullptr.
Value AddressSanitizer::isInterestingMemoryAccess(Instruction I,		Value AddressSanitizer::isInterestingMemoryAccess(Instruction I,
bool *IsWrite,		bool *IsWrite,
uint64_t *TypeSize,		uint64_t *TypeSize,
unsigned *Alignment) {		unsigned *Alignment) {
// Skip memory accesses inserted by another instrumentation.		// Skip memory accesses inserted by another instrumentation.
if (I->getMetadata("nosanitize")) return nullptr;		if (I->getMetadata("nosanitize")) return nullptr;

		// Do not instrument the load fetching the dynamic shadow address.
		if (LocalDynamicShadow == I)
		vitalybukaUnsubmitted Done Reply Inline Actions I is not null, so just if (LocalDynamicShadow == I) vitalybuka: I is not null, so just if (LocalDynamicShadow == I)
		return nullptr;

Value *PtrOperand = nullptr;		Value *PtrOperand = nullptr;
const DataLayout &DL = I->getModule()->getDataLayout();		const DataLayout &DL = I->getModule()->getDataLayout();
if (LoadInst *LI = dyn_cast<LoadInst>(I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
if (!ClInstrumentReads) return nullptr;		if (!ClInstrumentReads) return nullptr;
*IsWrite = false;		*IsWrite = false;
*TypeSize = DL.getTypeStoreSizeInBits(LI->getType());		*TypeSize = DL.getTypeStoreSizeInBits(LI->getType());
*Alignment = LI->getAlignment();		*Alignment = LI->getAlignment();
PtrOperand = LI->getPointerOperand();		PtrOperand = LI->getPointerOperand();
▲ Show 20 Lines • Show All 788 Lines • ▼ Show 20 Lines	bool AddressSanitizer::maybeInsertAsanInitAtFunctionEntry(Function &F) {
if (F.getName().find(" load]") != std::string::npos) {		if (F.getName().find(" load]") != std::string::npos) {
IRBuilder<> IRB(&F.front(), F.front().begin());		IRBuilder<> IRB(&F.front(), F.front().begin());
IRB.CreateCall(AsanInitFunction, {});		IRB.CreateCall(AsanInitFunction, {});
return true;		return true;
}		}
return false;		return false;
}		}

		void AddressSanitizer::maybeInsertDynamicShadowAtFunctionEntry(Function &F) {
		// Generate code only when dynamic addressing is needed.
		if (Mapping.Offset != kDynamicShadowSentinel)
		return;

		IRBuilder<> IRB(&F.front().front());
		Value *GlobalDynamicAddress = F.getParent()->getOrInsertGlobal(
		kAsanShadowMemoryDynamicAddress, IntptrTy);
		LocalDynamicShadow = IRB.CreateLoad(GlobalDynamicAddress);
		rnkUnsubmitted Done Reply Inline Actions Why create a local alloca and then load from it? This might end up being really slow in -O0 if we reload the shadow base from the stack before every user memory access. If we know we're in the entry block, we should be able to skip the alloca and change this to: Value GlobalDynamicAddress = F.getParent()->getOrInsertGlobal( kAsanShadowMemoryDynamicAddress, IntptrTy); LocalDynamicShadow = IRB.CreateLoad(GlobalDynamicAddress); We can then skip the extra load in memToShadow and use LocalDynamicShadow directly. rnk:* Why create a local alloca and then load from it? This might end up being really slow in -O0 if…
		etiennebAuthorUnsubmitted Not Done Reply Inline Actions Turns out that it's more efficient to put it on the stack for -O1 and -O2. The alloca-slot is promoted to register for the scope of the function. This is more efficient than having a load from memory for every instrumented load/store. Even for /O0, the code should be better. For every instrumented load/store, the code is slighty better (loading from the stack instead of global address). The only added cost is the required instructions to copy the shadow address on the stack. etienneb: Turns out that it's more efficient to put it on the stack for -O1 and -O2. The alloca-slot is…
		rnkUnsubmitted Done Reply Inline Actions I'm not proposing loading from the global on every shadow memory access, I'm saying you should load once from the global in the entry block and use that load everywhere, rather than spilling it to the stack immediately and reloading before every shadow check. With O1 and higher, mid-level optimizers will immediately do this transform for you, but it's simpler to emit the efficient code in the first place. I'm not sure what will happen at -O0 since the fast register allocator is pretty bad. rnk: I'm not proposing loading from the global on every shadow memory access, I'm saying you should…
		etiennebAuthorUnsubmitted Done Reply Inline Actions As discussed offline, as soon as the entry block dominates all other basic block.... using the result from "load" in the entry block should be enough. I'm gonna make a patch to try it. thx rnk for the explanation. etienneb: As discussed offline, as soon as the entry block dominates all other basic block.... using the…
		}

void AddressSanitizer::markEscapedLocalAllocas(Function &F) {		void AddressSanitizer::markEscapedLocalAllocas(Function &F) {
// Find the one possible call to llvm.localescape and pre-mark allocas passed		// Find the one possible call to llvm.localescape and pre-mark allocas passed
// to it as uninteresting. This assumes we haven't started processing allocas		// to it as uninteresting. This assumes we haven't started processing allocas
// yet. This check is done up front because iterating the use list in		// yet. This check is done up front because iterating the use list in
// isInterestingAlloca would be algorithmically slower.		// isInterestingAlloca would be algorithmically slower.
assert(ProcessedAllocas.empty() && "must process localescape before allocas");		assert(ProcessedAllocas.empty() && "must process localescape before allocas");

// Try to get the declaration of llvm.localescape. If it's not in the module,		// Try to get the declaration of llvm.localescape. If it's not in the module,
Show All 20 Lines
bool AddressSanitizer::runOnFunction(Function &F) {		bool AddressSanitizer::runOnFunction(Function &F) {
if (&F == AsanCtorFunction) return false;		if (&F == AsanCtorFunction) return false;
if (F.getLinkage() == GlobalValue::AvailableExternallyLinkage) return false;		if (F.getLinkage() == GlobalValue::AvailableExternallyLinkage) return false;
if (!ClDebugFunc.empty() && ClDebugFunc == F.getName()) return false;		if (!ClDebugFunc.empty() && ClDebugFunc == F.getName()) return false;
if (F.getName().startswith("__asan_")) return false;		if (F.getName().startswith("__asan_")) return false;

bool FunctionModified = false;		bool FunctionModified = false;

// If needed, insert __asan_init before checking for SanitizeAddress attr.		// If needed, insert __asan_init before checking for SanitizeAddress attr.
		kccUnsubmitted Done Reply Inline Actions Why is this here? It looks like it belongs to another change. If so, please make a separate CL. kcc: Why is this here? It looks like it belongs to another change. If so, please make a separate CL.
		etiennebAuthorUnsubmitted Done Reply Inline Actions Since the code is using a dynamic shadow, I've found a flaky bug hidden below layers. IIRC, there is a call to `__asan_default_options` at the beginning of __asan_init, BEFORE the shadow initialization. To avoid the bug, users should add the appropriate attribute to avoid instrumentation of that function, which is error prone. This is a way to avoid instrumenting these functions. I'm not proud of that fix, and willing to other proposition. Like, maybe only banning a fixed set. etienneb: Since the code is using a dynamic shadow, I've found a flaky bug hidden below layers. IIRC…
		kccUnsubmitted Done Reply Inline Actions Sure, but please make it a separate change. I am totally fine to just ignore all the functions that start with __asan_ kcc: Sure, but please make it a separate change. I am totally fine to just ignore all the functions…
// This function needs to be called even if the function body is not		// This function needs to be called even if the function body is not
// instrumented.		// instrumented.
if (maybeInsertAsanInitAtFunctionEntry(F))		if (maybeInsertAsanInitAtFunctionEntry(F))
FunctionModified = true;		FunctionModified = true;

// Leave if the function doesn't need instrumentation.		// Leave if the function doesn't need instrumentation.
if (!F.hasFnAttribute(Attribute::SanitizeAddress)) return FunctionModified;		if (!F.hasFnAttribute(Attribute::SanitizeAddress)) return FunctionModified;

DEBUG(dbgs() << "ASAN instrumenting:\n" << F << "\n");		DEBUG(dbgs() << "ASAN instrumenting:\n" << F << "\n");

initializeCallbacks(*F.getParent());		initializeCallbacks(*F.getParent());
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

FunctionStateRAII CleanupObj(this);		FunctionStateRAII CleanupObj(this);

		maybeInsertDynamicShadowAtFunctionEntry(F);

// We can't instrument allocas used with llvm.localescape. Only static allocas		// We can't instrument allocas used with llvm.localescape. Only static allocas
// can be passed to that intrinsic.		// can be passed to that intrinsic.
markEscapedLocalAllocas(F);		markEscapedLocalAllocas(F);

// We want to instrument every address only once per basic block (unless there		// We want to instrument every address only once per basic block (unless there
// are calls between uses).		// are calls between uses).
SmallSet<Value *, 16> TempsToInstrument;		SmallSet<Value *, 16> TempsToInstrument;
SmallVector<Instruction *, 16> ToInstrument;		SmallVector<Instruction *, 16> ToInstrument;
▲ Show 20 Lines • Show All 703 Lines • Show Last 20 Lines

test/Instrumentation/AddressSanitizer/basic-msvc64.ll

This file was added.

				; Test basic address sanitizer instrumentation.
				;
				; RUN: opt -asan -asan-module -S < %s \| FileCheck %s

				target triple = "x86_64-pc-windows-msvc"
				; CHECK: @llvm.global_ctors = {{.*}}@asan.module_ctor

				define i32 @test_load(i32* %a) sanitize_address {
				; First instrumentation in the function must be to load the dynamic shadow
				; address into a local variable.
				; CHECK-LABEL: @test_load
				; CHECK: entry:
				; CHECK-NEXT: %[[SHADOW:[^ ]]] = load i64, i64 @__asan_shadow_memory_dynamic_address

				; Shadow address is loaded and added into the whole offset computation.
				; CHECK add i64 %{{.*}}, %[[SHADOW] ]

				entry:
				%tmp1 = load i32, i32* %a, align 4
				ret i32 %tmp1
				}

				define i32 @__asan_options(i32* %a) sanitize_address {
				; Asan functions are not instrumented. Asan function may be called by
				; __asan_init before the shadow initialisation, which may lead to incorrect
				; behavior of the instrumented code.
				; CHECK-LABEL: @__asan_options
				; CHECK: entry:
				; CHECK-NEXT: %tmp1 = load i32, i32* %a, align 4
				; CHECK-NEXT: ret i32 %tmp1

				entry:
				%tmp1 = load i32, i32* %a, align 4
				ret i32 %tmp1
				}