This is an archive of the discontinued LLVM Phabricator instance.

Implement variable-sized alloca instrumentation (take 2).
ClosedPublic

Authored by m.ostapenko on Jan 21 2015, 9:52 AM.

Download Raw Diff

Details

Reviewers

kcc
samsonov
eugenis

Commits

rG98b18599a6b4: [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko!
rG63d97645852d: [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko!
rCRT238401: [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko!
rL238402: [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko!
rL238401: [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko!

Summary

This patch introduces new way for dynamic allocas unpoisoning. Here, all dynamic allocas are linked in a linked list (each alloca stores pointer to previous one in left redzone), pointer to last alloca is stored in a special memory cell called DynamicAllocaLayout (however we are going to get rid of it and locate this pointer on register).

We introduce new __asan_unroll_alloca function, which unpoisons dynamic allocas before each Ret and StackRestore instructions (this probably should be inlined thought). This implementation does not support systems with upgrowing stack now.

Diff Detail

Repository: rL LLVM

Event Timeline

m.ostapenko updated this revision to Diff 18526.Jan 21 2015, 9:52 AM

m.ostapenko retitled this revision from to Implement variable-sized alloca instrumentation (take 2)..

m.ostapenko updated this object.

m.ostapenko edited the test plan for this revision. (Show Details)

m.ostapenko added reviewers: kcc, samsonov, eugenis.

m.ostapenko set the repository for this revision to rL LLVM.

m.ostapenko added subscribers: ygribov, m.ostapenko.

samsonov added a subscriber: Unknown Object (MLST).Jan 22 2015, 11:17 PM

OMG. I am curious, if you really have targets for which testing alloca is important...
Do we really need a list here?
Can't we handle the task with an integer indicating the combined sizes of allocas?

OMG. I am curious, if you really have targets for which testing alloca is important...

Well, bugs do pop up in various OSS projects. Alloca is surprisingly widespread.

Do we really need a list here?
Can't we handle the task with an integer indicating the combined sizes of allocas?

We could simply memset(0) all shadow memory corresponding to dynamic part of stack. But this wouldn't scale to use-after-return (as I understand alloca regions will not be consecutive in this case).

ygribov added inline comments.Jan 27 2015, 9:53 PM

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1951 ↗	(On Diff #18526)	Why not unsigned btw?

Kostya, so what's your call on this? The code isn't that complicated and I'm not sure there is an easier complete solution to dynamic stack variables. And dynamic stack arrays are indeed frequent.

I did not give much thought to dynamic allocas, but what if we simply replace them with run-time calls somehow?
Maybe we could reuse the FakeStack here somehow?

Alexey, you've been touching the Alloca's recently, and was going to touch them more. Thoughts?

I did not give much thought to dynamic allocas, but what if we simply replace them with run-time calls somehow?

We could hide Max's linked lists behind an internal API, something like

// Poison redzones and store metadata for new_alloca
void asan_poison_alloca(uptr new_alloca, uptr size, uptr prev_alloca);

// Unpoison all allocas below bound
uptr asan_unpoison_allocas(uptr prev_alloca, uptr bound);

The ugly bound parameter is necessary to model the equally ugly alloca/VLA interwork. From https://gcc.gnu.org/onlinedocs/gcc/Variable-Length.html :

If you use both variable-length arrays and alloca in the same function, deallocation of a variable-length array also deallocates anything more recently allocated with alloca.

I did not give much thought to dynamic allocas, but what if we simply replace them with run-time calls somehow?

We could hide Max's linked lists behind an internal API, something like
// Poison redzones and store metadata for new_alloca
void asan_poison_alloca(uptr new_alloca, uptr size, uptr prev_alloca);

// Unpoison all allocas below bound
uptr asan_unpoison_allocas(uptr prev_alloca, uptr bound);

I think this is exactly what we want to do in UAR case. Since we don't know number of allocated regions until runtime (in general), we still need some dynamically growing structure (e.g. linked list) to store allocated regions to be unpoisoned before each ret instruction. We can hide this structure behind internal API, in runtime library, thought. For non-UAR case we can use memset for dynamic section indeed, but I'm not sure this is preferable way.

I'd prefer to hide as much as possible inside the run-time and have simplest possible compiler changes.
Also, whenever possible, I prefer not to have linked lists. I am pretty sure we can solve this problem with an array.

kubamracek added a subscriber: kubamracek.Feb 25 2015, 2:15 PM

ASan should replace dynamic allocas with runtime calls that delegate to
malloc. Poisoning will work out of the box with no changes.

We will need additional runtime handling to deal with VLAs, which use
llvm.stacksave / llvm.stackrestore.

Well, you clearly can not just use malloc, because then you have to call free() somewhere.
But something like asan's fake stack -- yes.

It occurs to me that ASan would need to add exceptional cleanups, if we
want this to cleanup after exception.

In D7098#130041, @rnk wrote:

It occurs to me that ASan would need to add exceptional cleanups, if we
want this to cleanup after exception.

Not necessary.
E.g. fake stack does garbage collection in the run-time lib.

Also, whenever possible, I prefer not to have linked lists. I am pretty sure we can solve this problem with an array.

Note that number of allocas is not known statically so in general you'll need a dynamically growing array.

Note that number of allocas is not known statically so in general you'll need a dynamically growing array.

Haha, no.
While you don't know the number of allocas, their total size is limited by the size of the stack,
so you can pre-allocate an array at the thread start up.

dynamically growing array in a hot spot in asan is not wise.
We better allocate a bit extra space (maybe with MAP_NORESERVE)

So, maybe we simply use the existing fake stack (in use-after-return mode, or maybe even by default?) to simulate dynamic alloca?

While you don't know the number of allocas, their total size is limited by the size of the stack,
so you can pre-allocate an array at the thread start up.
dynamically growing array in a hot spot in asan is not wise.

Right, that's why we originally proposed lists. Wouldn't this be much simpler?

So, maybe we simply use the existing fake stack (in use-after-return mode, or maybe even by default?) to simulate dynamic alloca?

I'd oppose to that - fake stacks pose unacceptable RAM overheads for mobile devices.

Right, that's why we originally proposed lists. Wouldn't this be much simpler?

There is nothing simpler than a non-resizable array :)
I still urge you to find a solution w/o lists, I think it's possible.
If nothing good shows up, ok, let's do lists, but in the asan-runtime (as opposed to compiler module)

try to minimize the amount of compiler changes. something like

all dynamic alloca() are replaced with __asan_dynamic_alloca which will use fake stack in use-after-return mode and real stack (with redzones) in base mode.
at all RET instructions __asan_release_dynamic_allocas is called, but in presence of exceptions/longjmp there is a backup recovery mechanism.

So, maybe we simply use the existing fake stack (in use-after-return mode, or maybe even by default?) to simulate dynamic alloca?

I'd oppose to that - fake stacks pose unacceptable RAM overheads for mobile devices.

Fair enough.

Hi!
I'm sorry for delay.
Here are some approaches we can follow for dynamic allocas:

We can instrument all allocas with runtime call for both UAR/non-UARcases (for non-UAR we'll just perform poisoning, for UAR allocate allocas using fake stacks). But this will lead to complex logic into unpoisoning functionality, since allocated memory chunks for UAR will not be consecutive.
We probably can allocate single memory chunk for dynamic stack area via asan_stack_malloc similarly to static stack area and use memset for unpoisoning, but since we don't know total size of dynamic area, this might be unfriendly for memory consumption.
We can ignore UAR detection for now and always allocate dynamic allocas on stack for both cases. This is the simplest solution, the only trick here is how to get a size parameter for memset, that would perform unpoisoning stuff before each ret and llvm.stackrestore instructions.

Generally, I don't see a convenient way to deal with UAR detection without code mess, perhaps we can skip it for now? Right now, I'm testing a patch for non-UAR case, it seems to be quite simple.

I would like to ping the patch. Updated with small nits fixed. All tests passed for x86_64-unknown-linux-gnu.

I am sorry for the delay -- I was OOO for the last 4 weeks. Will try to respond this week.

LGTM
Looks simple enough.
This is still off by default, right? Did you test it on something huge?

test/asan/TestCases/alloca_vla_interact.cc
12 ↗	(On Diff #24385)	Does it have to be a macro?

This revision is now accepted and ready to land.May 8 2015, 4:34 PM

Konstantin, thank you for review!

In D7098#169544, @kcc wrote:

LGTM
Looks simple enough.
This is still off by default, right? Did you test it on something huge?

Yes, it's off by default. I've managed to build Firefox with dynamic allocas instrumentation enabled, is it big enough? I'm also found a "bug" there and I'm trying to understand if it's a real bug (not just unpoisoning error or something like that).

I've managed to build Firefox with dynamic allocas instrumentation enabled, is it big enough?

Yea, it's big, and I expect it to have quite a few allocas/VLAs

I'm also found a "bug" there and I'm trying to understand if it's a real bug (not just unpoisoning error or something like that).

Let us know how it goes.

I think you may commit this in the meantime.

I'm also found a "bug" there and I'm trying to understand if it's a real bug (not just unpoisoning error or something like that).

Let us know how it goes.

That one turned out to be false positive - JS engine was allocating frame for asm stub via alloca, then freeing and re-using it inside the stub.

Sorry for delay, I was on vacation last week.

Proceeding Firefox testing, I've ran to several tests failures due to reasons pretty much similar to described above by Yura (allocating stack frame via alloca and some asm magic with deallocating after). After I'd worked around these allocas (asan_blacklist.txt was very helpful here), none of new test failures were popped up.

So, Konstantin, if you don't mind, I'll ask Yura to land this revision after retesting complete.

Yes, go ahead.
We'll need to also test it on chromium before enabling the flag by default.
Thanks!

Closed by commit rL238401: [ASan] New approach to dynamic allocas unpoisoning. Patch by Max Ostapenko! (authored by ygribov). · Explain WhyMay 28 2015, 12:53 AM

This revision was automatically updated to reflect the committed changes.

LeoneChen added a subscriber: LeoneChen.Mar 29 2022, 4:33 AM

This comment was removed by LeoneChen.

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2022, 4:33 AM

Herald added a subscriber: delcypher. · View Herald Transcript

MaskRay mentioned this in rGc875567af352: [asan,test] Disable alloca_loop_unpoisoning.cpp on s390{{.*}}.Fri, Jan 19, 12:39 AM

Revision Contents

Path

Size

compiler-rt/

trunk/

lib/

asan/

asan_fake_stack.cc

23 lines

asan_interface_internal.h

4 lines

test/

asan/

TestCases/

Linux/

interface_symbols_linux.c

2 lines

alloca_loop_unpoisoning.cc

32 lines

alloca_vla_interact.cc

39 lines

vla_chrome_testcase.cc

30 lines

vla_condition_overflow.cc

21 lines

vla_loop_overfow.cc

21 lines

Diff 26664

compiler-rt/trunk/lib/asan/asan_fake_stack.cc

Show All 16 Lines

namespace __asan {		namespace __asan {

static const u64 kMagic1 = kAsanStackAfterReturnMagic;		static const u64 kMagic1 = kAsanStackAfterReturnMagic;
static const u64 kMagic2 = (kMagic1 << 8) \| kMagic1;		static const u64 kMagic2 = (kMagic1 << 8) \| kMagic1;
static const u64 kMagic4 = (kMagic2 << 16) \| kMagic2;		static const u64 kMagic4 = (kMagic2 << 16) \| kMagic2;
static const u64 kMagic8 = (kMagic4 << 32) \| kMagic4;		static const u64 kMagic8 = (kMagic4 << 32) \| kMagic4;

		static const u64 kAllocaRedzoneSize = 32UL;
		static const u64 kAllocaRedzoneMask = 31UL;

// For small size classes inline PoisonShadow for better performance.		// For small size classes inline PoisonShadow for better performance.
ALWAYS_INLINE void SetShadow(uptr ptr, uptr size, uptr class_id, u64 magic) {		ALWAYS_INLINE void SetShadow(uptr ptr, uptr size, uptr class_id, u64 magic) {
CHECK_EQ(SHADOW_SCALE, 3); // This code expects SHADOW_SCALE=3.		CHECK_EQ(SHADOW_SCALE, 3); // This code expects SHADOW_SCALE=3.
u64 shadow = reinterpret_cast<u64>(MemToShadow(ptr));		u64 shadow = reinterpret_cast<u64>(MemToShadow(ptr));
if (class_id <= 6) {		if (class_id <= 6) {
for (uptr i = 0; i < (1U << class_id); i++) {		for (uptr i = 0; i < (1U << class_id); i++) {
shadow[i] = magic;		shadow[i] = magic;
SanitizerBreakOptimization(0); // Make sure this does not become memset.		SanitizerBreakOptimization(0); // Make sure this does not become memset.
▲ Show 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	FakeFrame frame = reinterpret_cast<FakeFrame >(fs->AddrIsInFakeStack(
reinterpret_cast<uptr>(addr), &frame_beg, &frame_end));		reinterpret_cast<uptr>(addr), &frame_beg, &frame_end));
if (!frame) return 0;		if (!frame) return 0;
if (frame->magic != kCurrentStackFrameMagic)		if (frame->magic != kCurrentStackFrameMagic)
return 0;		return 0;
if (beg) beg = reinterpret_cast<void>(frame_beg);		if (beg) beg = reinterpret_cast<void>(frame_beg);
if (end) end = reinterpret_cast<void>(frame_end);		if (end) end = reinterpret_cast<void>(frame_end);
return reinterpret_cast<void*>(frame->real_stack);		return reinterpret_cast<void*>(frame->real_stack);
}		}

		SANITIZER_INTERFACE_ATTRIBUTE
		void __asan_alloca_poison(uptr addr, uptr size) {
		uptr LeftRedzoneAddr = addr - kAllocaRedzoneSize;
		uptr PartialRzAddr = addr + size;
		uptr RightRzAddr = (PartialRzAddr + kAllocaRedzoneMask) & ~kAllocaRedzoneMask;
		uptr PartialRzAligned = PartialRzAddr & ~(SHADOW_GRANULARITY - 1);
		FastPoisonShadow(LeftRedzoneAddr, kAllocaRedzoneSize, kAsanAllocaLeftMagic);
		FastPoisonShadowPartialRightRedzone(
		PartialRzAligned, PartialRzAddr % SHADOW_GRANULARITY,
		RightRzAddr - PartialRzAligned, kAsanAllocaRightMagic);
		FastPoisonShadow(RightRzAddr, kAllocaRedzoneSize, kAsanAllocaRightMagic);
		}

		SANITIZER_INTERFACE_ATTRIBUTE
		void __asan_allocas_unpoison(uptr top, uptr bottom) {
		if ((!top) \|\| (top > bottom)) return;
		REAL(memset)(reinterpret_cast<void*>(MemToShadow(top)), 0,
		(bottom - top) / SHADOW_GRANULARITY);
		}
} // extern "C"		} // extern "C"

compiler-rt/trunk/lib/asan/asan_interface_internal.h

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	extern "C" {
SANITIZER_INTERFACE_ATTRIBUTE		SANITIZER_INTERFACE_ATTRIBUTE
void __asan_poison_cxx_array_cookie(uptr p);		void __asan_poison_cxx_array_cookie(uptr p);
SANITIZER_INTERFACE_ATTRIBUTE		SANITIZER_INTERFACE_ATTRIBUTE
uptr __asan_load_cxx_array_cookie(uptr *p);		uptr __asan_load_cxx_array_cookie(uptr *p);
SANITIZER_INTERFACE_ATTRIBUTE		SANITIZER_INTERFACE_ATTRIBUTE
void __asan_poison_intra_object_redzone(uptr p, uptr size);		void __asan_poison_intra_object_redzone(uptr p, uptr size);
SANITIZER_INTERFACE_ATTRIBUTE		SANITIZER_INTERFACE_ATTRIBUTE
void __asan_unpoison_intra_object_redzone(uptr p, uptr size);		void __asan_unpoison_intra_object_redzone(uptr p, uptr size);
		SANITIZER_INTERFACE_ATTRIBUTE
		void __asan_alloca_poison(uptr addr, uptr size);
		SANITIZER_INTERFACE_ATTRIBUTE
		void __asan_allocas_unpoison(uptr top, uptr bottom);
} // extern "C"		} // extern "C"

#endif // ASAN_INTERFACE_INTERNAL_H		#endif // ASAN_INTERFACE_INTERNAL_H

compiler-rt/trunk/test/asan/TestCases/Linux/interface_symbols_linux.c

	Show All 32 Lines
	// RUN: echo __asan_report_exp_store2 >> %t.interface			// RUN: echo __asan_report_exp_store2 >> %t.interface
	// RUN: echo __asan_report_exp_store4 >> %t.interface			// RUN: echo __asan_report_exp_store4 >> %t.interface
	// RUN: echo __asan_report_exp_store8 >> %t.interface			// RUN: echo __asan_report_exp_store8 >> %t.interface
	// RUN: echo __asan_report_exp_store16 >> %t.interface			// RUN: echo __asan_report_exp_store16 >> %t.interface
	// RUN: echo __asan_report_exp_load_n >> %t.interface			// RUN: echo __asan_report_exp_load_n >> %t.interface
	// RUN: echo __asan_report_exp_store_n >> %t.interface			// RUN: echo __asan_report_exp_store_n >> %t.interface
	// RUN: echo __asan_get_current_fake_stack >> %t.interface			// RUN: echo __asan_get_current_fake_stack >> %t.interface
	// RUN: echo __asan_addr_is_in_fake_stack >> %t.interface			// RUN: echo __asan_addr_is_in_fake_stack >> %t.interface
				// RUN: echo __asan_alloca_poison >> %t.interface
				// RUN: echo __asan_allocas_unpoison >> %t.interface
	// RUN: cat %t.interface \| sort -u \| diff %t.symbols -			// RUN: cat %t.interface \| sort -u \| diff %t.symbols -

	// FIXME: nm -D on powerpc somewhy shows ASan interface symbols residing			// FIXME: nm -D on powerpc somewhy shows ASan interface symbols residing
	// in "initialized data section".			// in "initialized data section".
	// REQUIRES: x86_64-supported-target,i386-supported-target,asan-static-runtime			// REQUIRES: x86_64-supported-target,i386-supported-target,asan-static-runtime

	int main() { return 0; }			int main() { return 0; }

compiler-rt/trunk/test/asan/TestCases/alloca_loop_unpoisoning.cc

				// RUN: %clangxx_asan -O0 -mllvm -asan-instrument-allocas %s -o %t
				// RUN: %run %t 2>&1
				//

				// This testcase checks that allocas and VLAs inside loop are correctly unpoisoned.

				#include <assert.h>
				#include <alloca.h>
				#include <stdint.h>
				#include "sanitizer/asan_interface.h"

				void top, bot;

				__attribute__((noinline)) void foo(int len) {
				char x;
				top = &x;
				char array[len]; // NOLINT
				assert(!(reinterpret_cast<uintptr_t>(array) & 31L));
				alloca(len);
				for (int i = 0; i < 32; ++i) {
				char array[i]; // NOLINT
				bot = alloca(i);
				assert(!(reinterpret_cast<uintptr_t>(bot) & 31L));
				}
				}

				int main(int argc, char **argv) {
				foo(32);
				void q = __asan_region_is_poisoned(bot, (char )top - (char *)bot);
				assert(!q);
				return 0;
				}

compiler-rt/trunk/test/asan/TestCases/alloca_vla_interact.cc

				// RUN: %clangxx_asan -O0 -mllvm -asan-instrument-allocas %s -o %t
				// RUN: %run %t 2>&1
				//

				// This testcase checks correct interaction between VLAs and allocas.

				#include <assert.h>
				#include <alloca.h>
				#include <stdint.h>
				#include "sanitizer/asan_interface.h"

				#define RZ 32

				__attribute__((noinline)) void foo(int len) {
				char top, bot;
				// This alloca call should live until the end of foo.
				char alloca1 = (char )alloca(len);
				assert(!(reinterpret_cast<uintptr_t>(alloca1) & 31L));
				// This should be first poisoned address after loop.
				top = alloca1 - RZ;
				for (int i = 0; i < 32; ++i) {
				// Check that previous alloca was unpoisoned at the end of iteration.
				if (i) assert(!__asan_region_is_poisoned(bot, 96));
				// VLA is unpoisoned at the end of iteration.
				volatile char array[i];
				assert(!(reinterpret_cast<uintptr_t>(array) & 31L));
				// Alloca is unpoisoned at the end of iteration,
				// because dominated by VLA.
				bot = (char *)alloca(i) - RZ;
				}
				// Check that all allocas from loop were unpoisoned correctly.
				void q = __asan_region_is_poisoned(bot, (char )top - (char *)bot + 1);
				assert(q == top);
				}

				int main(int argc, char **argv) {
				foo(32);
				return 0;
				}

compiler-rt/trunk/test/asan/TestCases/vla_chrome_testcase.cc

				// RUN: %clangxx_asan -O0 -mllvm -asan-instrument-allocas %s -o %t
				// RUN: not %run %t 2>&1 \| FileCheck %s
				//

				// This is reduced testcase based on Chromium code.
				// See http://reviews.llvm.org/D6055?vs=on&id=15616&whitespace=ignore-all#toc.

				#include <stdint.h>
				#include <assert.h>

				int a = 7;
				int b;
				int c;
				int *p;

				__attribute__((noinline)) void fn3(int *first, int second) {
				}

				int main() {
				int d = b && c;
				int e[a]; // NOLINT
				assert(!(reinterpret_cast<uintptr_t>(e) & 31L));
				int f;
				if (d)
				fn3(&f, sizeof 0 * (&c - e));
				e[a] = 0;
				// CHECK: ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address [[ADDR:0x[0-9a-f]+]]
				// CHECK: WRITE of size 4 at [[ADDR]] thread T0
				return 0;
				}

compiler-rt/trunk/test/asan/TestCases/vla_condition_overflow.cc

				// RUN: %clangxx_asan -O0 -mllvm -asan-instrument-allocas %s -o %t
				// RUN: not %run %t 2>&1 \| FileCheck %s
				//

				#include <assert.h>
				#include <stdint.h>

				__attribute__((noinline)) void foo(int index, int len) {
				if (index > len) {
				char str[len]; //NOLINT
				assert(!(reinterpret_cast<uintptr_t>(str) & 31L));
				str[index] = '1'; // BOOM
				// CHECK: ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address [[ADDR:0x[0-9a-f]+]]
				// CHECK: WRITE of size 1 at [[ADDR]] thread T0
				}
				}

				int main(int argc, char **argv) {
				foo(33, 10);
				return 0;
				}

compiler-rt/trunk/test/asan/TestCases/vla_loop_overfow.cc

				// RUN: %clangxx_asan -O0 -mllvm -asan-instrument-allocas %s -o %t
				// RUN: not %run %t 2>&1 \| FileCheck %s
				//

				#include <assert.h>
				#include <stdint.h>

				void foo(int index, int len) {
				for (int i = 1; i < len; ++i) {
				char array[len]; // NOLINT
				assert(!(reinterpret_cast<uintptr_t>(array) & 31L));
				array[index + i] = 0;
				// CHECK: ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address [[ADDR:0x[0-9a-f]+]]
				// CHECK: WRITE of size 1 at [[ADDR]] thread T0
				}
				}

				int main(int argc, char **argv) {
				foo(9, 21);
				return 0;
				}