Index: llvm/trunk/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp =================================================================== --- llvm/trunk/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp +++ llvm/trunk/lib/Target/X86/AsmParser/X86AsmInstrumentation.cpp @@ -30,6 +30,70 @@ #include #include +// Following comment describes how assembly instrumentation works. +// Currently we have only AddressSanitizer instrumentation, but we're +// planning to implement MemorySanitizer for inline assembly too. If +// you're not familiar with AddressSanitizer algorithm, please, read +// https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm. +// +// When inline assembly is parsed by an instance of X86AsmParser, all +// instructions are emitted via EmitInstruction method. That's the +// place where X86AsmInstrumentation analyzes an instruction and +// decides, whether the instruction should be emitted as is or +// instrumentation is required. The latter case happens when an +// instruction reads from or writes to memory. Now instruction opcode +// is explicitly checked, and if an instruction has a memory operand +// (for instance, movq (%rsi, %rcx, 8), %rax) - it should be +// instrumented. There're also exist instructions that modify +// memory but don't have an explicit memory operands, for instance, +// movs. +// +// Let's consider at first 8-byte memory accesses when an instruction +// has an explicit memory operand. In this case we need two registers - +// AddressReg to compute address of a memory cells which are accessed +// and ShadowReg to compute corresponding shadow address. So, we need +// to spill both registers before instrumentation code and restore them +// after instrumentation. Thus, in general, instrumentation code will +// look like this: +// PUSHF # Store flags, otherwise they will be overwritten +// PUSH AddressReg # spill AddressReg +// PUSH ShadowReg # spill ShadowReg +// LEA MemOp, AddressReg # compute address of the memory operand +// MOV AddressReg, ShadowReg +// SHR ShadowReg, 3 +// # ShadowOffset(AddressReg >> 3) contains address of a shadow +// # corresponding to MemOp. +// CMP ShadowOffset(ShadowReg), 0 # test shadow value +// JZ .Done # when shadow equals to zero, everything is fine +// MOV AddressReg, RDI +// # Call __asan_report function with AddressReg as an argument +// CALL __asan_report +// .Done: +// POP ShadowReg # Restore ShadowReg +// POP AddressReg # Restore AddressReg +// POPF # Restore flags +// +// Memory accesses with different size (1-, 2-, 4- and 16-byte) are +// handled in a similar manner, but small memory accesses (less than 8 +// byte) require an additional ScratchReg, which is used for shadow value. +// +// If, suppose, we're instrumenting an instruction like movs, only +// contents of RDI, RDI + AccessSize * RCX, RSI, RSI + AccessSize * +// RCX are checked. In this case there're no need to spill and restore +// AddressReg , ShadowReg or flags four times, they're saved on stack +// just once, before instrumentation of these four addresses, and restored +// at the end of the instrumentation. +// +// There exist several things which complicate this simple algorithm. +// * Instrumented memory operand can have RSP as a base or an index +// register. So we need to add a constant offset before computation +// of memory address, since flags, AddressReg, ShadowReg, etc. were +// already stored on stack and RSP was modified. +// * Debug info (usually, DWARF) should be adjusted, because sometimes +// RSP is used as a frame register. So, we need to select some +// register as a frame register and temprorary override current CFA +// register. + namespace llvm { namespace {