This is an archive of the discontinued LLVM Phabricator instance.

[asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86.
ClosedPublic

Authored by kstoimenov on Aug 10 2021, 11:12 AM.

Details

Summary

The implementation uses the int_asan_check_memaccess intrinsic to instrument the code. The intrinsic is replaced by a call to a function which performs the access check. The generated function names encode the input register name as a number using Reg - X86::NoRegister formula.

Diff Detail

Event Timeline

kstoimenov created this revision.Aug 10 2021, 11:12 AM
kstoimenov requested review of this revision.Aug 10 2021, 11:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2021, 11:12 AM

Added an IR test.

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2021, 2:31 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Fixed the test.

Address lint comments.

kstoimenov updated this revision to Diff 366069.EditedAug 12 2021, 12:17 PM

Started using R8 instead of RAX beacuse of some test failures. Also fixed the "Yes means No and No means Yes. Delete all files [Y]? " bug :-) for the load/store functions.

Added code generation tests and fixed some bugs along the way.

kstoimenov retitled this revision from [WIP][asan] Implemented custom calling convention similar used by HWASan for X86. The feature should be code complete. The tests are coming in a later revision. to [asan] Implemented custom calling convention similar to the one used by HWASan for X86. .Aug 12 2021, 5:08 PM
kstoimenov edited the summary of this revision. (Show Details)
kstoimenov added reviewers: vitalybuka, kda.
vitalybuka added a comment.EditedAug 17 2021, 10:44 AM

This can be split into 3 patches in the following order:

  1. x86
  2. Instrumentation
  3. clang

Each of them needs tests, Instrumentation changes are not tested.

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
150 ↗(On Diff #366141)

It's not how enums described here https://llvm.org/docs/CodingStandards.html#id43
Also it's common to use enum class nowerdays.

However this one does not need to be enum and just "constexpr size_t"

kstoimenov retitled this revision from [asan] Implemented custom calling convention similar to the one used by HWASan for X86. to [asan] Implemented custom calling convention similar to the one used by HWASan for X86..

X86 only patch.

Amended this patch to be the x86 only patch. PTAL.

This can be split into 3 patches in the following order:

  1. x86
  2. Instrumentation
  3. clang

Each of them needs tests, Instrumentation changes are not tested.

kstoimenov edited the summary of this revision. (Show Details)Aug 17 2021, 11:43 AM

Never mind, this won't compile. Let me fix it and I will let you know when it is ready for review.

kstoimenov edited the summary of this revision. (Show Details)

Instead of packing and bit shifting arguments passing them as individual intrinsic arguments.

Removed unused include.

kstoimenov retitled this revision from [asan] Implemented custom calling convention similar to the one used by HWASan for X86. to [asan] [Patch 1/3] Implemented custom calling convention similar to the one used by HWASan for X86..Aug 17 2021, 3:22 PM

Compiles and tests are passing now. PTAL.

Can you please use "Edit related revisions" and link other patches of the features into the stack?

llvm/lib/Target/X86/X86InstrCompiler.td
272–274

shadowbase, mappingscale and orshadowoffset are known to runtime, we should not have them as part of intrinsic

llvm/lib/Target/X86/X86MCInstLower.cpp
1331

This does not look right

1335

Why this called reg when it's address?

1341

Why shadow is an argument? Both runtime and instrumented code know the shadow.

1341–1342

could you please unpack these to into local variables for consistency?

1370

I assumed the goal to move this shadow math into runtime code
so we have less instructions in the instrumented code.

Also MappingScale and OrShadowOffset are constants.
It's a little bit not nice that ShadowBase can be dynamic, so each __asan_check_ will have to load it,
but let's for now care about fixed shadow which is constant.

vitalybuka added inline comments.Aug 17 2021, 4:33 PM
llvm/lib/Target/X86/X86MCInstLower.cpp
1370

My mistake, runtime has nothing to do with that, we have implementation of callbacks here in object file.
Still those parameters are not needed to be exposed to IR layers. They must be the same for AddressSanitizer.cpp and here. So please just expose some functions e.g. in llvm/Transforms/Instrumentation/AddressSanitizerCommon.h and use them here and there to calculate these values.

Please don't do [PATCH*] tags and just organize Phabricator "stack"
Just in case, it should look like this random image from search: https://firefox-source-docs.mozilla.org/_images/example-stack.png

kstoimenov retitled this revision from [asan] [Patch 1/3] Implemented custom calling convention similar to the one used by HWASan for X86. to [asan] Implemented custom calling convention similar to the one used by HWASan for X86..Aug 17 2021, 5:22 PM

I know it seems redundant to pass the constants with every function call, but otherwise the code generation code will have to take dependency on AddressSanitizerCommon.h, which I am not sure is ideal. I spent some time looking into it and becomes quite messy. For example the tests now need a way to set the values for those variables. I think the main downside of this approach is the bloat in the IR code, but it is cleaner than exposing internal asan details.

One other option I have looked into is to pass this info using the Module. For example using the addModuleFlag function, but again there is a problem of the tests. We could have a quick chat to discuss this if you want.

llvm/lib/Target/X86/X86MCInstLower.cpp
1335

Because it is the id of the register which is used to pass the address. For example for X86::RDI it will be 53.

I know it seems redundant to pass the constants with every function call, but otherwise the code generation code will have to take dependency on AddressSanitizerCommon.h, which I am not sure is ideal. I spent some time looking into it and > becomes quite messy. For example the tests now need a way to set the values for those variables. I think the main downside of this approach is the bloat in the IR code, but it is cleaner than exposing internal asan details.

Bloat in IR is bigger concert than include of AddressSanitizerCommon.h. And I don't see anything messy about that.
Similar AArch64/AArch64AsmPrinter.cpp does not hesitate to #include "llvm/Transforms/Instrumentation/HWAddressSanitizer.h"

One other option I have looked into is to pass this info using the Module. For example using the addModuleFlag function, but again there is a problem of the tests. We could have a quick chat to discuss this if you want.

It will extend the language, we should try to avoid that if possible. So far all values directly calculated from target triple.

Pulled constants like shadow base, etc out of the intrinsic.

Pulled constants like shadow base, etc out of the intrinsic.

Fixed after rebase.

kstoimenov marked 7 inline comments as done.Aug 19 2021, 8:26 AM

Added getAddressSanitizerParams to pull the constants out. Let me know if this is what you had in mind. Also struggled a little bit with git and rebase, but I think I have figured it out now.

llvm/lib/Target/X86/X86MCInstLower.cpp
1341–1342

Removed the variables as we have discussed.

vitalybuka added inline comments.Aug 19 2021, 11:06 AM
llvm/include/llvm/IR/Intrinsics.td
1640

PTAL at lvm.read_register.i32

How about:

llvm.asan.check.memaccess ->

lvm.asan.check_read
lvm.asan.check_write
lvm.asan.kernel.check_read
lvm.asan.kernel.check_write

Even better

lvm.asan.check_read.{i8, i16, i32, ...}
lvm.asan.check_write.{i8, i16, i32, ...}
lvm.asan.kernel.check_read.{i8, i16, i32, ...}
lvm.asan.kernel.check_write.{i8, i16, i32, ...}
vitalybuka added inline comments.Aug 19 2021, 11:09 AM
llvm/include/llvm/IR/Intrinsics.td
1640

Looks like underscore is not used in intrinsic names, so essentially the same with dots.

kstoimenov marked an inline comment as done.Aug 19 2021, 11:37 AM
kstoimenov added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1640

Sounds good to me. I do the full expansion so there will be 20 intrinsics altogether. I will update the code and ping you when done.

vitalybuka added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1640

@pcc @eugenis
WDYT, I think later we can do the same for HWASAN?

Update before expanding intrinsics.

eugenis added inline comments.Aug 19 2021, 12:00 PM
llvm/include/llvm/IR/Intrinsics.td
1640

I don't see what these multiple intrinsics give us that a single memaccess one does not provide?

As long as access type and similar arguments are immediates.

1642

We've just removed IntrInaccessibleMemOnly from hwasan. This needs to alias shadow updates.

kstoimenov retitled this revision from [asan] Implemented custom calling convention similar to the one used by HWASan for X86. to [asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86..Aug 19 2021, 12:04 PM
pcc added inline comments.Aug 19 2021, 1:06 PM
llvm/include/llvm/IR/Intrinsics.td
1640

Agree with @eugenis - these sorts of intrinsic variants are typically used for distinguishing different pointer element types and we're in the process of getting rid of those anyway.

vitalybuka added inline comments.Aug 19 2021, 1:22 PM
llvm/include/llvm/IR/Intrinsics.td
1640

@pcc @eugenis Then do you prefer to encode is_write+size+kernel into non-human unreadable AccessInfo, like hwasan, or separate 0/1 arguments.
I probably prefer AccessInfo, as they both unreadable, but the hwasan version is shorter.

eugenis added inline comments.Aug 19 2021, 1:25 PM
llvm/include/llvm/IR/Intrinsics.td
1640

don't have a strong opinion, but sometimes I wish that hwasan outlined function names were more readable. The magic number in the names takes effort to decode.

kstoimenov retitled this revision from [asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86. to [asan] Implemented intrinsic for the custom calling convention similar used by HWASan for X86..

Removed IntrInaccessibleMemOnly.

kstoimenov marked an inline comment as done.Aug 19 2021, 1:44 PM
kstoimenov added inline comments.
llvm/include/llvm/IR/Intrinsics.td
1640

I think I am gonna go with int_asan_check(?_kernel)_(load|store) and pass the size as parameter. What do you think?

pcc added inline comments.Aug 19 2021, 2:09 PM
llvm/include/llvm/IR/Intrinsics.td
1640

You mean in a register? I think that could mean more register pressure -> higher code size.

The magic numbers are unfortunate but they aren't that hard to decode (maybe we should be printing them as hex to make it a bit easier). I suppose we could pretty print the access info into the symbol name but only a few people will be looking at these so I'm not sure it's worth the effort.

vitalybuka added inline comments.Aug 19 2021, 2:09 PM
llvm/include/llvm/IR/Intrinsics.td
1640

AccessInfo

1640

I think I am gonna go with int_asan_check(?_kernel)_(load|store) and pass the size as parameter. What do you think?

Looks like both @pcc and @eugenis do not like multiple intrinsic.

Also I chatted with @eugenis to clarify the last comment. He is OK with AccessInfo like in hwasan.

1640

I think I am gonna go with int_asan_check(?_kernel)_(load|store) and pass the size as parameter. What do you think?

vitalybuka added inline comments.Aug 19 2021, 2:11 PM
llvm/test/CodeGen/X86/asan-check-memaccess-or.ll
47

Is this test out of date? Code has fewer arguments now.

Moved back to using AccessInfo and fixed the tests.

Upload after updating AddressSanitizer.cpp.

All tests are passing locally. Should be good to review now.

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
150 ↗(On Diff #366141)

I wanted to be as close as possible to the HWASan style. Please let me know if you want me to change it.

llvm/test/CodeGen/X86/asan-check-memaccess-or.ll
47

Should be updated now.

vitalybuka added inline comments.Aug 20 2021, 5:16 PM
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
150 ↗(On Diff #366141)

They are quite unrelated, no need to violate coding style to match them.

Also packing and unpacking implementation unnecessary separated.
Maybe something like this and keep bit arithmetic in a single cpp file:

struct AsanAccessInfo {
  uint8_t AccessSize;
  bool CompileKernel;
  bool IsWrite;
...

  explicit AsanAccessInfo(int64_t packed);
  int64_t Pack() const;
}

Also math a little bit simpler if 4bit field is the last

llvm/lib/Target/X86/X86MCInstLower.cpp
1425

what is in ECX register here?

1552–1554

Tests do not check these attributes.

1560–1563

as-is shifts are in the header, and masks are here, which is not nice.

llvm/test/CodeGen/X86/asan-check-memaccess-add.ll
7

should we also check how we load registers before callq?

kstoimenov marked 4 inline comments as done.

Addressed the comments.

kstoimenov added inline comments.Aug 23 2021, 11:52 AM
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
150 ↗(On Diff #366141)

I like the idea. Keeps things well encapsulated in the implementation file. Done.

llvm/lib/Target/X86/X86MCInstLower.cpp
1425

Should be NoRegister. Done.

1560–1563

Encapsulated in the implementation file as you have recommended.

llvm/test/CodeGen/X86/asan-check-memaccess-add.ll
7

The issue is that there is no load instruction, because the registers are implied.

Fixed code generation bug.

vitalybuka added inline comments.Aug 23 2021, 5:34 PM
llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
166 ↗(On Diff #368236)

.cpp file is missing

llvm/lib/Target/X86/X86MCInstLower.cpp
1425

Why NoRegister update is not reflected in tests?

llvm/test/CodeGen/X86/asan-check-memaccess-or.ll
95

What is ECX here?

Attempt to split AddressSanitizer.cpp.

Second attempt to split AddressSanitizer.cpp.

kstoimenov marked an inline comment as done.

Fixed a crash.

llvm/lib/Target/X86/X86MCInstLower.cpp
1425

The instruction template for AND32ri8 expects 2 registers for some reason, which I am not sure why. If I provide only one register I get a clang crash. The emitted instruction is what I want so I didn't dig much deeper into it.

llvm/test/CodeGen/X86/asan-check-memaccess-or.ll
95

ECX is the lower 3 bits of the address. Is is the second part of the check "((Addr & 7) + AccessSize > k)".

This revision is now accepted and ready to land.Aug 24 2021, 12:14 PM
This revision was landed with ongoing or failed builds.Aug 24 2021, 12:35 PM
This revision was automatically updated to reflect the committed changes.
vitalybuka reopened this revision.Aug 24 2021, 5:22 PM
This revision is now accepted and ready to land.Aug 24 2021, 5:22 PM

Updated with the fix for Bazel build.

vitalybuka accepted this revision.Aug 24 2021, 5:46 PM

And if it's straightforward fix for the revert you can "reuse" approvals and land without additional review.

Thanks! I will submit tomorrow morning.

Updated from the revert commit.

This revision was landed with ongoing or failed builds.Aug 25 2021, 8:32 AM
This revision was automatically updated to reflect the committed changes.