The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Started using R8 instead of RAX beacuse of some test failures. Also fixed the "Yes means No and No means Yes. Delete all files [Y]? " bug :-) for the load/store functions.
This can be split into 3 patches in the following order:
- x86
- Instrumentation
- clang
Each of them needs tests, Instrumentation changes are not tested.
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h | ||
---|---|---|
150 ↗ | (On Diff #366141) | It's not how enums described here https://llvm.org/docs/CodingStandards.html#id43 However this one does not need to be enum and just "constexpr size_t" |
Never mind, this won't compile. Let me fix it and I will let you know when it is ready for review.
Instead of packing and bit shifting arguments passing them as individual intrinsic arguments.
Can you please use "Edit related revisions" and link other patches of the features into the stack?
llvm/lib/Target/X86/X86InstrCompiler.td | ||
---|---|---|
272–274 | shadowbase, mappingscale and orshadowoffset are known to runtime, we should not have them as part of intrinsic | |
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
1330 | This does not look right | |
1334 | Why this called reg when it's address? | |
1340 | Why shadow is an argument? Both runtime and instrumented code know the shadow. | |
1340–1341 | could you please unpack these to into local variables for consistency? | |
1369 | I assumed the goal to move this shadow math into runtime code Also MappingScale and OrShadowOffset are constants. |
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
---|---|---|
1369 | My mistake, runtime has nothing to do with that, we have implementation of callbacks here in object file. |
Please don't do [PATCH*] tags and just organize Phabricator "stack"
Just in case, it should look like this random image from search: https://firefox-source-docs.mozilla.org/_images/example-stack.png
I know it seems redundant to pass the constants with every function call, but otherwise the code generation code will have to take dependency on AddressSanitizerCommon.h, which I am not sure is ideal. I spent some time looking into it and becomes quite messy. For example the tests now need a way to set the values for those variables. I think the main downside of this approach is the bloat in the IR code, but it is cleaner than exposing internal asan details.
One other option I have looked into is to pass this info using the Module. For example using the addModuleFlag function, but again there is a problem of the tests. We could have a quick chat to discuss this if you want.
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
---|---|---|
1334 | Because it is the id of the register which is used to pass the address. For example for X86::RDI it will be 53. |
Bloat in IR is bigger concert than include of AddressSanitizerCommon.h. And I don't see anything messy about that.
Similar AArch64/AArch64AsmPrinter.cpp does not hesitate to #include "llvm/Transforms/Instrumentation/HWAddressSanitizer.h"
One other option I have looked into is to pass this info using the Module. For example using the addModuleFlag function, but again there is a problem of the tests. We could have a quick chat to discuss this if you want.
It will extend the language, we should try to avoid that if possible. So far all values directly calculated from target triple.
Added getAddressSanitizerParams to pull the constants out. Let me know if this is what you had in mind. Also struggled a little bit with git and rebase, but I think I have figured it out now.
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
---|---|---|
1340–1341 | Removed the variables as we have discussed. |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1640 | PTAL at lvm.read_register.i32 How about: llvm.asan.check.memaccess -> lvm.asan.check_read lvm.asan.check_write lvm.asan.kernel.check_read lvm.asan.kernel.check_write Even better lvm.asan.check_read.{i8, i16, i32, ...} lvm.asan.check_write.{i8, i16, i32, ...} lvm.asan.kernel.check_read.{i8, i16, i32, ...} lvm.asan.kernel.check_write.{i8, i16, i32, ...} |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1640 | Looks like underscore is not used in intrinsic names, so essentially the same with dots. |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1640 | Sounds good to me. I do the full expansion so there will be 20 intrinsics altogether. I will update the code and ping you when done. |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1640 | don't have a strong opinion, but sometimes I wish that hwasan outlined function names were more readable. The magic number in the names takes effort to decode. |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1640 | I think I am gonna go with int_asan_check(?_kernel)_(load|store) and pass the size as parameter. What do you think? |
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
1640 | You mean in a register? I think that could mean more register pressure -> higher code size. The magic numbers are unfortunate but they aren't that hard to decode (maybe we should be printing them as hex to make it a bit easier). I suppose we could pretty print the access info into the symbol name but only a few people will be looking at these so I'm not sure it's worth the effort. |
llvm/test/CodeGen/X86/asan-check-memaccess-or.ll | ||
---|---|---|
48 | Is this test out of date? Code has fewer arguments now. |
All tests are passing locally. Should be good to review now.
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h | ||
---|---|---|
150 ↗ | (On Diff #366141) | I wanted to be as close as possible to the HWASan style. Please let me know if you want me to change it. |
llvm/test/CodeGen/X86/asan-check-memaccess-or.ll | ||
48 | Should be updated now. |
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h | ||
---|---|---|
150 ↗ | (On Diff #366141) | They are quite unrelated, no need to violate coding style to match them. Also packing and unpacking implementation unnecessary separated. struct AsanAccessInfo { uint8_t AccessSize; bool CompileKernel; bool IsWrite; ... explicit AsanAccessInfo(int64_t packed); int64_t Pack() const; } Also math a little bit simpler if 4bit field is the last |
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
1424 | what is in ECX register here? | |
1551–1553 | Tests do not check these attributes. | |
1559–1562 | as-is shifts are in the header, and masks are here, which is not nice. | |
llvm/test/CodeGen/X86/asan-check-memaccess-add.ll | ||
7 | should we also check how we load registers before callq? |
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h | ||
---|---|---|
150 ↗ | (On Diff #366141) | I like the idea. Keeps things well encapsulated in the implementation file. Done. |
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
1424 | Should be NoRegister. Done. | |
1559–1562 | Encapsulated in the implementation file as you have recommended. | |
llvm/test/CodeGen/X86/asan-check-memaccess-add.ll | ||
7 | The issue is that there is no load instruction, because the registers are implied. |
Fixed a crash.
llvm/lib/Target/X86/X86MCInstLower.cpp | ||
---|---|---|
1424 | The instruction template for AND32ri8 expects 2 registers for some reason, which I am not sure why. If I provide only one register I get a clang crash. The emitted instruction is what I want so I didn't dig much deeper into it. | |
llvm/test/CodeGen/X86/asan-check-memaccess-or.ll | ||
95 | ECX is the lower 3 bits of the address. Is is the second part of the check "((Addr & 7) + AccessSize > k)". |
And if it's straightforward fix for the revert you can "reuse" approvals and land without additional review.
PTAL at lvm.read_register.i32
How about:
llvm.asan.check.memaccess ->
Even better