This is an archive of the discontinued LLVM Phabricator instance.

Implement variable-sized alloca instrumentation (take 2).
ClosedPublic

Authored by m.ostapenko on Jan 21 2015, 9:52 AM.

Details

Summary

This patch introduces new way for dynamic allocas unpoisoning. Here, all dynamic allocas are linked in a linked list (each alloca stores pointer to previous one in left redzone), pointer to last alloca is stored in a special memory cell called DynamicAllocaLayout (however we are going to get rid of it and locate this pointer on register).

We introduce new __asan_unroll_alloca function, which unpoisons dynamic allocas before each Ret and StackRestore instructions (this probably should be inlined thought). This implementation does not support systems with upgrowing stack now.

Diff Detail

Repository
rL LLVM

Event Timeline

m.ostapenko retitled this revision from to Implement variable-sized alloca instrumentation (take 2)..
m.ostapenko updated this object.
m.ostapenko edited the test plan for this revision. (Show Details)
m.ostapenko added reviewers: kcc, samsonov, eugenis.
m.ostapenko set the repository for this revision to rL LLVM.
m.ostapenko added subscribers: ygribov, m.ostapenko.
samsonov added a subscriber: Unknown Object (MLST).Jan 22 2015, 11:17 PM
kcc edited edge metadata.Jan 27 2015, 2:33 PM

OMG. I am curious, if you really have targets for which testing alloca is important...
Do we really need a list here?
Can't we handle the task with an integer indicating the combined sizes of allocas?

OMG. I am curious, if you really have targets for which testing alloca is important...

Well, bugs do pop up in various OSS projects. Alloca is surprisingly widespread.

Do we really need a list here?
Can't we handle the task with an integer indicating the combined sizes of allocas?

We could simply memset(0) all shadow memory corresponding to dynamic part of stack. But this wouldn't scale to use-after-return (as I understand alloca regions will not be consecutive in this case).

ygribov added inline comments.Jan 27 2015, 9:53 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
1951 ↗(On Diff #18526)

Why not unsigned btw?

Kostya, so what's your call on this? The code isn't that complicated and I'm not sure there is an easier complete solution to dynamic stack variables. And dynamic stack arrays are indeed frequent.

kcc added a comment.Feb 19 2015, 5:19 PM

I did not give much thought to dynamic allocas, but what if we simply replace them with run-time calls somehow?
Maybe we could reuse the FakeStack here somehow?

Alexey, you've been touching the Alloca's recently, and was going to touch them more. Thoughts?

I did not give much thought to dynamic allocas, but what if we simply replace them with run-time calls somehow?

We could hide Max's linked lists behind an internal API, something like

// Poison redzones and store metadata for new_alloca
void asan_poison_alloca(uptr new_alloca, uptr size, uptr prev_alloca);

// Unpoison all allocas below bound
uptr asan_unpoison_allocas(uptr prev_alloca, uptr bound);

The ugly bound parameter is necessary to model the equally ugly alloca/VLA interwork. From https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html :

If you use both variable-length arrays and alloca in the same function, deallocation of a variable-length array also deallocates anything more recently allocated with alloca.

I did not give much thought to dynamic allocas, but what if we simply replace them with run-time calls somehow?

We could hide Max's linked lists behind an internal API, something like

// Poison redzones and store metadata for new_alloca
void asan_poison_alloca(uptr new_alloca, uptr size, uptr prev_alloca);

// Unpoison all allocas below bound
uptr asan_unpoison_allocas(uptr prev_alloca, uptr bound);

I think this is exactly what we want to do in UAR case. Since we don't know number of allocated regions until runtime (in general), we still need some dynamically growing structure (e.g. linked list) to store allocated regions to be unpoisoned before each ret instruction. We can hide this structure behind internal API, in runtime library, thought. For non-UAR case we can use memset for dynamic section indeed, but I'm not sure this is preferable way.

kcc added a comment.Feb 25 2015, 2:07 PM

I'd prefer to hide as much as possible inside the run-time and have simplest possible compiler changes.
Also, whenever possible, I prefer not to have linked lists. I am pretty sure we can solve this problem with an array.

rnk added a subscriber: rnk.Feb 25 2015, 3:06 PM

+1

ASan should replace dynamic allocas with runtime calls that delegate to
malloc. Poisoning will work out of the box with no changes.

We will need additional runtime handling to deal with VLAs, which use
llvm.stacksave / llvm.stackrestore.

kcc added a comment.Feb 25 2015, 3:13 PM

Well, you clearly can not just use malloc, because then you have to call free() somewhere.
But something like asan's fake stack -- yes.

rnk added a comment.Feb 25 2015, 3:15 PM

It occurs to me that ASan would need to add exceptional cleanups, if we
want this to cleanup after exception.

kcc added a comment.Feb 25 2015, 3:22 PM
In D7098#130041, @rnk wrote:

It occurs to me that ASan would need to add exceptional cleanups, if we
want this to cleanup after exception.

Not necessary.
E.g. fake stack does garbage collection in the run-time lib.

Also, whenever possible, I prefer not to have linked lists. I am pretty sure we can solve this problem with an array.

Note that number of allocas is not known statically so in general you'll need a dynamically growing array.

kcc added a comment.Feb 26 2015, 10:29 AM

Note that number of allocas is not known statically so in general you'll need a dynamically growing array.

Haha, no.
While you don't know the number of allocas, their total size is limited by the size of the stack,
so you can pre-allocate an array at the thread start up.

dynamically growing array in a hot spot in asan is not wise.
We better allocate a bit extra space (maybe with MAP_NORESERVE)

So, maybe we simply use the existing fake stack (in use-after-return mode, or maybe even by default?) to simulate dynamic alloca?

While you don't know the number of allocas, their total size is limited by the size of the stack,
so you can pre-allocate an array at the thread start up.
dynamically growing array in a hot spot in asan is not wise.

Right, that's why we originally proposed lists. Wouldn't this be much simpler?

So, maybe we simply use the existing fake stack (in use-after-return mode, or maybe even by default?) to simulate dynamic alloca?

I'd oppose to that - fake stacks pose unacceptable RAM overheads for mobile devices.

kcc added a comment.Feb 27 2015, 5:01 PM

Right, that's why we originally proposed lists. Wouldn't this be much simpler?

There is nothing simpler than a non-resizable array :)
I still urge you to find a solution w/o lists, I think it's possible.
If nothing good shows up, ok, let's do lists, but in the asan-runtime (as opposed to compiler module)

try to minimize the amount of compiler changes. something like

  1. all dynamic alloca() are replaced with __asan_dynamic_alloca which will use fake stack in use-after-return mode and real stack (with redzones) in base mode.
  2. at all RET instructions __asan_release_dynamic_allocas is called, but in presence of exceptions/longjmp there is a backup recovery mechanism.

So, maybe we simply use the existing fake stack (in use-after-return mode, or maybe even by default?) to simulate dynamic alloca?

I'd oppose to that - fake stacks pose unacceptable RAM overheads for mobile devices.

Fair enough.

m.ostapenko edited edge metadata.

Hi!
I'm sorry for delay.
Here are some approaches we can follow for dynamic allocas:

  1. We can instrument all allocas with runtime call for both UAR/non-UARcases (for non-UAR we'll just perform poisoning, for UAR allocate allocas using fake stacks). But this will lead to complex logic into unpoisoning functionality, since allocated memory chunks for UAR will not be consecutive.
  2. We probably can allocate single memory chunk for dynamic stack area via asan_stack_malloc similarly to static stack area and use memset for unpoisoning, but since we don't know total size of dynamic area, this might be unfriendly for memory consumption.
  3. We can ignore UAR detection for now and always allocate dynamic allocas on stack for both cases. This is the simplest solution, the only trick here is how to get a size parameter for memset, that would perform unpoisoning stuff before each ret and llvm.stackrestore instructions.

Generally, I don't see a convenient way to deal with UAR detection without code mess, perhaps we can skip it for now? Right now, I'm testing a patch for non-UAR case, it seems to be quite simple.

I would like to ping the patch. Updated with small nits fixed. All tests passed for x86_64-unknown-linux-gnu.

kcc added a comment.May 4 2015, 10:28 AM

I am sorry for the delay -- I was OOO for the last 4 weeks. Will try to respond this week.

kcc accepted this revision.May 8 2015, 4:34 PM
kcc edited edge metadata.

LGTM
Looks simple enough.
This is still off by default, right? Did you test it on something huge?

test/asan/TestCases/alloca_vla_interact.cc
12 ↗(On Diff #24385)

Does it have to be a macro?

This revision is now accepted and ready to land.May 8 2015, 4:34 PM

Konstantin, thank you for review!

In D7098#169544, @kcc wrote:

LGTM
Looks simple enough.
This is still off by default, right? Did you test it on something huge?

Yes, it's off by default. I've managed to build Firefox with dynamic allocas instrumentation enabled, is it big enough? I'm also found a "bug" there and I'm trying to understand if it's a real bug (not just unpoisoning error or something like that).

kcc added a comment.May 14 2015, 1:12 PM

I've managed to build Firefox with dynamic allocas instrumentation enabled, is it big enough?

Yea, it's big, and I expect it to have quite a few allocas/VLAs

I'm also found a "bug" there and I'm trying to understand if it's a real bug (not just unpoisoning error or something like that).

Let us know how it goes.

I think you may commit this in the meantime.

I'm also found a "bug" there and I'm trying to understand if it's a real bug (not just unpoisoning error or something like that).

Let us know how it goes.

That one turned out to be false positive - JS engine was allocating frame for asm stub via alloca, then freeing and re-using it inside the stub.

Sorry for delay, I was on vacation last week.

Proceeding Firefox testing, I've ran to several tests failures due to reasons pretty much similar to described above by Yura (allocating stack frame via alloca and some asm magic with deallocating after). After I'd worked around these allocas (asan_blacklist.txt was very helpful here), none of new test failures were popped up.

So, Konstantin, if you don't mind, I'll ask Yura to land this revision after retesting complete.

kcc added a comment.May 27 2015, 9:54 AM

Yes, go ahead.
We'll need to also test it on chromium before enabling the flag by default.
Thanks!

This revision was automatically updated to reflect the committed changes.
This comment was removed by LeoneChen.
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 4:33 AM
Herald added a subscriber: delcypher. · View Herald Transcript