This is an archive of the discontinued LLVM Phabricator instance.

Implement variable-sized alloca instrumentation.
ClosedPublic

Authored by m.ostapenko on Oct 31 2014, 7:43 AM.

Details

Summary

This patch implements variable-sized alloca instrumentation (https://code.google.com/p/address-sanitizer/issues/detail?id=138).

Diff Detail

Event Timeline

m.ostapenko retitled this revision from to Implement variable-sized alloca instrumentation..
m.ostapenko updated this object.
m.ostapenko edited the test plan for this revision. (Show Details)
m.ostapenko added reviewers: kcc, samsonov, eugenis.
m.ostapenko set the repository for this revision to rL LLVM.
m.ostapenko added a project: lld.
m.ostapenko added subscribers: ygribov, Unknown Object (MLST).
kcc edited edge metadata.Oct 31 2014, 3:23 PM

This does not handle stack-use-after-return for alloca, right?
I don't insist you implement that now, but consider for the next patch.

lib/Transforms/Instrumentation/AddressSanitizer.cpp
585

Please do it under a flag, off by default for now.

lib/asan/asan_interface_internal.h
181 ↗(On Diff #15616)

indent

lib/asan/asan_internal.h
140

I think kAsanAllocaPartialMagic is redundant, just use kAsanAllocaRightMagic
(we may want to get rid of kAsanStackPartialRedzoneMagic separately)

lib/asan/asan_report.cc
990

Maybe dynamic-stack-buffer-overflow (for both left and right cases)?

ygribov added inline comments.Oct 31 2014, 10:22 PM
lib/asan/asan_report.cc
990

I think Max did underflow to match messages for ordinary stack.

kcc added inline comments.Nov 3 2014, 11:21 AM
lib/asan/asan_report.cc
990

Yea... I think the "underflow" was not very useful.
For this new thing I'd just go with a single dynamic-stack-buffer-overflow

m.ostapenko edited edge metadata.

Updated according to Konstantin's notes.

kcc added inline comments.Nov 6 2014, 1:24 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
161

Please add a test for this flag in test/Instrumentation/AddressSanitizer/
to check asan-instrument-alloca=0, asan-instrument-alloca=1, and default setting
Similar to test/Instrumentation/AddressSanitizer/instrumentation-with-call-threshold.ll

540

This will not create a left red zone, right?
And even if it will (due to alignment) it will not poison it.

I would prefer to create both left and right redzones and [un]poison them inline with one 4-byte store for the left rz and one or two 4-byte stores for the right one.
Make sure to make the new size 0 mod 32

550

do you really need to always align this by 32?
Maybe use max(ClRealignStack, AllocaAlign)?

579

A cleaner way is to create a new AllocaInst, just like we do in another place in this file.
Then do eraseFromParent on the old one.

lib/asan/asan_fake_stack.cc
240 ↗(On Diff #15842)

We probably don't need these at all if we inline the poisoning and unpoisoning.

ygribov added inline comments.Nov 6 2014, 1:45 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
540

I think Max's idea was to create thricely left, right and partial. As for inlining, it would be a mess for partial redzone - it's size is unknown until runtime so we won't be able to use 4-byte stores and will have to use an ugly loop instead.

kcc added inline comments.Nov 6 2014, 2:11 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
540

A loop? Come on, I am sure you can construct the appropriate 32-bit constant to poison the partial 32-byte zon just using arithmetic (masks and shifts)

ygribov added inline comments.Nov 6 2014, 11:12 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
540

I wonder if this can be less ugly? We don't want to emit this mess in codegen, do we?

Tail = OldSize & 5; // Get length of 32-byte unaligned part
if (Tail >= 24) {

sh = 24;
shadow_word = 0;

} else if (Tail >= 16) {

sh = 16;
shadow_word = 0xcb000000;  // 0xcb is right magic

} else if (Tail >= 8) {

sh = 8;
shadow_word = 0xcbcb0000;  // 0xcb is right magic

} else {

sh = 0;
shadow_word = 0xcbcbcb00;  // 0xcb is right magic

}
Tail8 = Tail - sh;
shadow_byte = Tail8 == 8 ? 0 : Tail8 ? Tail8 : 0xcb;
shadow_word |= shadow_byte << sh;

m.ostapenko added inline comments.Nov 7 2014, 3:36 AM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
540

Hm, perhaps something like this would be preferable:

padding = OldSize & (Align - 1) // get padding
if (padding) {
  shift = padding & ~7; // the number of bits we need to shift to access first chunk in shadow memory, containing nonzero bytes
  // Example:
  // padding = 21                       padding = 16
  // Shadow:  |00|00|05|cb|          Shadow:  |00|00|cb|cb|         
  //                ^                               ^
  //                |                               |
  // shift = 21 & ~7 = 16            shift = 16 & ~7 = 16
  val1 = 0xcbcbcbcb << (shift + 8);
  partialBits = padding & 7;
  if (!partialBits) partialBits = 0xcb;
  val2 = partialBits << shift;
  result = val1 | val2;
}

if (!partialBits) partialBits = 0xcb; looks ugly, but right now I don't see any convenient way to avoid it.

m.ostapenko added inline comments.Nov 7 2014, 4:42 AM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
540

Oh, Align is 32, of course.

kcc added inline comments.Nov 7 2014, 11:23 AM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
540

Yep, something along these lines. Cool.

Please make sure to have a test for all 32 values of padding.
You may use __asan_region_is_poisoned call in such test

Don't forget about little- vs big- endian

if (!partialBits) partialBits = 0xcb;

This should be fine actually, you can use SelectInst to avoid creating new BB

Updated patch. Major changes are:

  1. (Un)poisoning is inline now.
  2. Added test to check instrumentation (with/without alloca instrumentation, default behavior).
  3. Add test to check that all 32 values of padding are handled correctly.
  4. Big-endian is now "supported", but I unable to test it, sorry.
kcc added inline comments.Nov 14 2014, 2:00 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
476

what is the middle rz?

480

add a comment describing the 3 redzones.
What is PartialRz, is it always non-empty?

I think you should store the pointers to the shadow here, not the pointers to the app memory.

484

space before =

563

no constants here, please.
Define kAsanAllocaLeftMagic/kAsanAllocaRightMagic at the top of the file

589

why long?
If it has to be 64-bit use uint64_t

654

no constants here.

660

instead of creating a new BB for partial RZ, I would do this:
make sure that PartialSize is never zero, i.e. instead of being in 0..31 it is in 1..32
This is better as we will not need to keep both pointers (PartialRz and RightRz) alive throughout the procedure.

test/Instrumentation/AddressSanitizer/instrument-dynamic-allocas.ll
7

use CHECK-NOALLOCA here

test/asan/TestCases/alloca_instruments_all_paddings.cc
12

Can you replace "str+index" with &str[0]?
This wya we will ensure that the valid memory is unpoisoned.

17

run this loop twice to ensure that we properly unpoison the stack

Updated according to last review, major changes are:

  1. Removed redundant new BB creation.
  2. Added small fixes for tests.
  3. Defined new constants on top of the AddressSanitizer.cpp.
m.ostapenko added inline comments.Nov 18 2014, 5:39 AM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
660

Yes, this is a good idea to avoid new BB creation. But we still need both PartialRz and RightRz, because in case of PartialSize == 32 PartialRz points the same address as RightRz, otherwise RightRz == PartialSize + 32, isn't it?

Or maybe I've misunderstood something?

kcc added inline comments.Nov 18 2014, 1:06 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
660

But we still need both PartialRz and RightRz, because in case of PartialSize == 32 PartialRz points the same address as RightRz, otherwise RightRz == PartialSize + 32, isn't it?

There are two ways to implement this:

  1. in case of PartialSize == 32 PartialRz points the same address as RightRz, otherwise RightRz == PartialSize + 32

In this case, we need to poison both PartialRz and RightRz and we need to keep this value in the register for the entire function.

  1. PartialRz is always strictly before RightRz.

In this case we always unpoison RightRz and RightRz-32 and so we don't need to keep PartialRz around.

It's hard to tell which is better w/o measuring on a good benchmark, also there is a memory-vs-cpu tradeoff (larger redzones vs larger register pressure).
But the second way sounds slightly better to me.

kcc added inline comments.Nov 18 2014, 1:18 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
108

static

535

Constant::getNullValue

554

This deserves a comment: what exactly you are computing.

578

These new functions are probably too big and should be placed outside of the class decl.

580

write a function-level comment with the expression you are computing

586

Constant::getNullValue

596

two lines should be enough here

ygribov added inline comments.Nov 19 2014, 12:06 AM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
660

in case of PartialSize == 32 PartialRz points the same address as RightRz, otherwise RightRz == PartialSize + 32

In this case ... we need to keep this value in the register for the entire function

Not really, we can do the same as in your case 2 i.e. zero out RightRz - 32. Worst case (when PartialRz = RightRz) we'll do a redundant write.

As for benchmarking I'm not sure it matters that much - allocas are quite rare anyway.

Updated according to last review.

Now we don't keep PartialRzAddr until the end of function, we can just unpoison RightRzAddr and RightRzAddr - 32 with two stores into RightRzAddrShadow and RightRzAddrShadow - 4.
If PartialRzAddr == RightRzAddr we will perform one redundant store into user's memory shadow during unpoisoning, but perhaps this should be actually fine.

kcc added inline comments.Nov 19 2014, 2:13 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
476

remove "partial" from comment

559

rephrase somehow, e.g.

... it would contain the value that we will use to poison the partial redzone

1816

please add a comment similar to the comment before handleDynamicAllocaCall
explaining what exactly and how you compute here, because this part is nto completely trivial.

1837

I tried hard, but I don't understand this :(
Try to avoid setOperand, instead create new Instruction objects when needed.

1914

two lines, please.
(You will like clang-format if you try it)

Updated. Added new comments, fixed code style.

kcc added a comment.Nov 20 2014, 10:14 AM

LGTM
Thanks for working on this!
If you don't yet have commit access ask Yuri to commit (mentioning you as the author).

Next steps would be to enable the flag by default (we'll do our part of testing too).
And then it may be interesting to enable use-after-return for allocas.

kcc accepted this revision.Nov 20 2014, 10:15 AM
kcc edited edge metadata.
This revision is now accepted and ready to land.Nov 20 2014, 10:15 AM

Next steps would be to enable the flag by default (we'll do our part of testing too).
And then it may be interesting to enable use-after-return for allocas.

And we also need to store metadata (probably just variable name) in redzone for user-friendlier reports.

Done in r222519 and r222520.

kcc added a comment.Nov 21 2014, 12:59 PM
In D6055#35, @ygribov wrote:

Next steps would be to enable the flag by default (we'll do our part of testing too).
And then it may be interesting to enable use-after-return for allocas.

And we also need to store metadata (probably just variable name) in redzone for user-friendlier reports.

Indeed so, the diagnostics could be improved.
Let's do it before enabling the feature by default

kcc added a comment.Nov 21 2014, 1:31 PM

I've run the new feature on the chromium sources and it produced a compiler failure:

reduced test:
% cat a.c
int a;
int b;
int c;
void fn3(int *, int);
void fn1 () {

int d = b && c;
int e[a];
int f;
if (d)
  fn3 (&f, sizeof 0 * (&c - e));

}
% clang -fsanitize=address -mllvm -asan-instrument-allocas -O2 a.c
Instruction does not dominate all uses!

%54 = add i64 %53, 2147450880
%68 = sub i64 %54, 4

Instruction does not dominate all uses!

%35 = add i64 %34, 2147450880
%69 = inttoptr i64 %35 to i32*

Instruction does not dominate all uses!

%54 = add i64 %53, 2147450880
%71 = inttoptr i64 %54 to i32*

fatal error: error in backend: Broken function found, compilation aborted!
clang-3.6: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 3.6.0 (trunk 222567)
Target: x86_64-unknown-linux-gnu
Thread model: posix
clang-3.6: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
clang-3.6: note: diagnostic msg:


PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-3.6: note: diagnostic msg: /tmp/a-2b96e4.c
clang-3.6: note: diagnostic msg: /tmp/a-2b96e4.sh
clang-3.6: note: diagnostic msg:


kcc added a comment.Nov 21 2014, 1:38 PM

Interesting.
The dynamic alloca in this case is moved to a basic block outside of main path and hence it does not dominate
all exits. So, we can not unpoison it at the RET statements, instead we should do it in the end of the alloca's scope.
This is starting to resemble use-after-scope... Alexey, any comments?

So we basically need to find all blocks dominated by alloca, then all exits from those to non-dominated blocks and then insert unpoison calls prior to these exits?

Or maybe use dominance frontiers to make whole process more efficient: insert unpoisons at those predecessors of blocks from alloca dominance frontier which are dominated by alloca.

kcc added a comment.Nov 24 2014, 8:49 AM

I am afraid this is not that simple.
It is legal to call alloca manually inside an if-statement and then use it until the function exit.
/me pondering...
(And will be OOO most of this week, don't expect prompt replies until next Mon)

Right. Frankly I'm mostly interested in VLAs so worst case we can simply remove such pathological cases. I think they can be detected by checking if alloca is argument of some phi?

I think they can be detected by checking if alloca is argument of some phi?

Hm, not really - alloca result could also escape.

kcc added a comment.Nov 24 2014, 1:38 PM

We may start from checking that alloca dominates all exits (the most common case, probably).

Handling nontrivial cases may be quite tricky. Perhaps we can implement something like linked list of "bad " allocas, storing the address and size of next/previous alloca in the left redzone and marking the last/first one with some magic value. Then, before each ret instruction, we can iterate over this list and unpoison these allocas.

Or we could just memset shadow for dynamic part of stack to 0. This wouldn't work with use-after-return though.

kcc added a comment.Dec 2 2014, 4:39 PM

I see you've committed r222991 which checks that alloca dominates the exits.
Let's polish this thing first and enable it by default, then we may return to more complicated cases.

Actually exit domination wouldn't save the day. E.g.

void f() {
  char *p;
  ...
  do {
    p = alloca(100);
    g(p);
  } while(whatever);
  ...
}
m.ostapenko closed this revision.Oct 22 2015, 5:28 AM