Page MenuHomePhabricator

[asan] add primitives that allow coroutine implementations
ClosedPublic

Authored by blastrock on Jun 2 2016, 8:30 AM.

Details

Reviewers
kcc
filcab
dvyukov
Summary

This patch adds the sanitizer_start_switch_fiber and
sanitizer_finish_switch_fiber methods inspired from what can be found here
https://github.com/facebook/folly/commit/2ea64dd24946cbc9f3f4ac3f6c6b98a486c56e73 .

These methods are needed when the compiled software needs to implement
coroutines, fibers or the like. Without a way to annotate them, when the program
jumps to a stack that is not the thread stack, __asan_handle_no_return shows a
warning about that, and the fake stack mechanism may free fake frames that are
still in use.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
filcab added inline comments.Jun 8 2016, 7:16 AM
lib/asan/asan_thread.cc
14

Isn't this one of the files that shouldn't include system headers?

125

I'd prefer something like "starting fiber switch while in fiber switch".
I'd also make it an assert unless we have actual uses cases of this.

lib/asan/asan_thread.h
151

Do we need this or can we use the regular stack_{top,bottom}?
Do we need to keep track of the old stack bounds?

155

What about fiber vs non-fiber? How do we know when to pick each of {fiber_,}stack_{top,bottom}?

For stack bounds:
How about if __sanitizer_start_enter_fiber takes a void** where we can stash the current FakeStack pointer, and then __sanitizer_finish_enter_fiber can overwrite it with a new one if needed?
On __sanitizer_finish_exit_fiber we could do the opposite (runtime would need to pass a void*, of course), and recover the old FakeStack (disposing of the fiber one (there's no documentation, but I'm assuming the exit_fiber function is to be called when actually exiting (for the last time) a fiber, not simply when returning to the main stack)).
Without the fake stack handling, we'll lose use-after-return errors.

Oh, that what they are used for! I think what you said could work, I must give it more thought.

My understanding is that we're calling __sanitizer_*_enter_fiber on an anything->fiber transition (be it the first one into that fiber or any other), and calling __sanitizer_*_exit_fiber on the *last* (for a given fiber) fiber->anything transition.
Let me know if that's not the case. And if it's not, why it's not generalized like that.

No, the second part is not exact (or I'm misunderstanding you). __sanitizer_*_exit_fiber are used when returning from the fiber to the thread's stack. The fiber may be re-entered later.
To be clear:

  • you call enter_fiber when you are about to switch to a fiber
  • you call exit_fiber when you are about to switch back to the thread's stack

These are not related to the fiber's lifetime.

As for the stack bounds: Would it be a problem to figure them out dinamycally in __sanitizer_finish_enter_fiber? Possibly with a callback from the fiber library, since we need to know if a fiber is running (Our system's fiber library has a function that queries TLS and tells you if you're running in a fiber. I have no idea about other implementations).

I'm not sure I understand. You want to avoid giving the stack bounds as arguments to __sanitizer_start_enter_fiber? We actually need to know the stack bounds we are switching to beforehand, in case a signal handler runs on the fiber, before __sanitizer_finish_enter_fiber is called.

@dvyokov I tried the signal triggering stuff with setitimer, but the resolution (on my machine at least) seems to low. The signal triggers 0 to 1 time, never close to critical parts. I even tried letting it run for 5min with a while (true) of the test, it didn't trigger any bad behavior.
Instead, I tried adding calls to ThrowAndCatch() in between the lines of StartEnterFiber() and the others functions directly in the library (of course I can't commit that), and I found a few bugs with that, not fixed yet.
My point is that using setitimer to trigger these bugs seems useless, do you still want me to do it? I have no other solution to unit-test that though...

lib/asan/asan_thread.cc
14

I don't know about that. I use it for size_t to implement the functions exposed in common_interface_defs.h. I can move these implementations in another file if needed, or use uptr if that makes sense, but I'm not sure I can use it in common_interface_defs.

125

No use case comes to my mind, I'm ok with making this an assert.

128

see my response to your comment on asan_thread.hh:149

163

It says that a semicolon after a closing brace is useless, it probably thinks of this as a block...

174

I think I did run clang-format, I will do it again. I just need to use the default llvm style, right?

442

I don't know if this can actually happen, so I don't know what would be the consequences. I guess that it would only trigger the warning in __asan_handle_no_return later, so this one can be made a verbose log, since nothing really bad has happened yet.

lib/asan/asan_thread.h
151

I think we do. This is there to handle the case where the user does:

__start_enter_fiber();
// a signal triggers here and the handler throws which triggers __asan_handle_no_return
swapcontext();

// in the fiber we just switched to
// a signal may trigger here too
__finish_enter_fiber();

On each call of __asan_handle_no_return between the __start and __finish_enter_fiber, we need to check on which stack we actually are running on, so we need both of them. This is what is done in GetStackBounds()

155

If fiber_stack_* are not null then we are on the fiber, unless fiber_switching_ is true which means that we don't know if we are on fiber stack or thread stack.

@dvyokov I tried the signal triggering stuff with setitimer, but the resolution (on my machine at least) seems to low. The signal triggers 0 to 1 time, never close to critical parts. I even tried letting it run for 5min with a while (true) of the test, it didn't trigger any bad behavior.

Instead, I tried adding calls to ThrowAndCatch() in between the lines of StartEnterFiber() and the others functions directly in the library (of course I can't commit that), and I found a few bugs with that, not fixed yet.
My point is that using setitimer to trigger these bugs seems useless, do you still want me to do it? I have no other solution to unit-test that though...

Absolutely.
My OS is well able to trigger signal every 100us per core. We also have a bunch of test bots that run these tests continuously. Even if this test will find an issue after 100 runs, it's still very useful. Debugging such issue in production is costly.

filcab added a comment.Jun 9 2016, 8:15 AM

My understanding is that we're calling __sanitizer_*_enter_fiber on an anything->fiber transition (be it the first one into that fiber or any other), and calling __sanitizer_*_exit_fiber on the *last* (for a given fiber) fiber->anything transition.
Let me know if that's not the case. And if it's not, why it's not generalized like that.

No, the second part is not exact (or I'm misunderstanding you). __sanitizer_*_exit_fiber are used when returning from the fiber to the thread's stack. The fiber may be re-entered later.
To be clear:

  • you call enter_fiber when you are about to switch to a fiber
  • you call exit_fiber when you are about to switch back to the thread's stack These are not related to the fiber's lifetime.

Do we need to care that we're doing a Fiber->Thread, though? Does it need to be handled differently, in general?

The way I see it, you could do something like (I'm giving vars/functions/members totally different names to avoid ambiguity with the current names. Feel free to keep something closer to your current ones):
__sanitizer_fiber_switch_start(void *ctx, uptr size);
This function is passed the fiber's or the thread's stack bounds, and sets next_stack_{top,bottom}.

__sanitizer_fiber_switch_finish();
This function is run in the new stack that you switched to and removes the current stack_{top,bottom}, replacing with what was in next_stack_{top,bottom} (setting the latter ones to 0, after).

That way, you only have one primitive to deal with (This follows from our simple fiber implementation. Yours might have functionality that requires the current implementation. Please let me know).

The FakeStacks might be more annoying and require the fiber implementation to provide an easy way for us to store a random pointer-sized object. I don't think that should be a problem.

Do we want to keep track of fibers in the same way we keep track of threads, and be able to diagnose which fiber we were in on allocation/deallocation?
(I guess the answer might be "yes", but I mention this as a "future work" thing. We should try to take it into account, but shouldn't add it to this patch (or complicate it too much)).

As for the stack bounds: Would it be a problem to figure them out dinamycally in __sanitizer_finish_enter_fiber? Possibly with a callback from the fiber library, since we need to know if a fiber is running (Our system's fiber library has a function that queries TLS and tells you if you're running in a fiber. I have no idea about other implementations).

I'm not sure I understand. You want to avoid giving the stack bounds as arguments to __sanitizer_start_enter_fiber? We actually need to know the stack bounds we are switching to beforehand, in case a signal handler runs on the fiber, before __sanitizer_finish_enter_fiber is called.

Fair enough. The signal handler case is interesting and I think it's worth it to get the bounds ahead because of it.

filcab added inline comments.Jun 9 2016, 8:21 AM
lib/asan/asan_thread.cc
14

OK. Remove the stddef.h include.
On the external header, stddef.h is already included (it's external, so we don't have a problem with that) and you can keep size_t there.
Internally, use uptr instead of size_t. We know that on the platforms we support we're safe doing it.

163

Unfortunate, but not much to do here.

174

You can probably use Google-style for ASan. I'm not sure if we're trying to transition ASan to LLVM-style, but I don't think so.

The way I see it, you could do something like (I'm giving vars/functions/members totally different names to avoid ambiguity with the current names. Feel free to keep something closer to your current ones):
__sanitizer_fiber_switch_start(void *ctx, uptr size);
This function is passed the fiber's or the thread's stack bounds, and sets next_stack_{top,bottom}.

__sanitizer_fiber_switch_finish();
This function is run in the new stack that you switched to and removes the current stack_{top,bottom}, replacing with what was in next_stack_{top,bottom} (setting the latter ones to 0, after).

That way, you only have one primitive to deal with (This follows from our simple fiber implementation. Yours might have functionality that requires the current implementation. Please let me know).

My implementation does not know about the thread's stack bounds. It could ask them through pthread and store them in some thread_local storage to avoid asking them each time.

Also, this interface is based on what I saw in facebook's folly, here https://github.com/facebook/folly/commit/2ea64dd24946cbc9f3f4ac3f6c6b98a486c56e73 , so they decided to do it this way too.

This interface currently makes it easier to annotate code, but if you think simplifying asan's internals is worth it, I don't mind changing it.

blastrock marked 12 inline comments as done.Jun 11 2016, 3:12 AM
blastrock added inline comments.
lib/asan/asan_thread.cc
125

Hm... should I use the default C assert macro? I couldn't find if you have your own macro. Or should I leave the if and the log as they are and just add a Die() call?

lib/asan/asan_thread.h
138

I can move the struct in the .cc file, but if I move GetStackBounds(), I must make it friend of AsanThread so that it can access fiber_stack_*. Another solution would be to make it receive all that data as argument, but the prototype would be GetStackBounds(stacktop, stackbottom, fiberstacktop, fiberstackbottom, fiberswitching) (if I keep only two stack in memory as you suggested in your last comment).
What do you prefer?

blastrock updated this revision to Diff 60476.Jun 12 2016, 12:08 PM
blastrock updated this object.

I changed the interface to what @filcab suggested. Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

I made my implementation more signal-proof by using volatile. I saw later that in some other parts of the code, you prefer using atomic access, I can change the volatiles to that if needed.

I split the tests and left the swapcontext test as it is currently. I also added a test that spams signals while it runs that @dvyukov asked for. As I said, the timer as a poor resolution on my machine, so it doesn't test anything more than the test without the signals.

I removed the stddef include and changed some logs to verbose reports. I changed my clang-format to google style and reformatted a few things I wrote.

About the FakeStack stuff, I wrote a new test, planning to do the void** thing @filcab asked for, but the test is already passing on my machine. Could you help me write a failing test before I implement that feature?

Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

I only see the APIs in test code. Is it correct?
Tests are OK. swapcontext is also non-portable.

lib/asan/asan_thread.cc
133

If you want to tolerate some user errors here, then I think it's better to return here. next_stack_top/bottom are most likely 0, so we will end up with no stack at all otherwise.

147

Why?
If I am reading core correctly, this must never happen.

I made my implementation more signal-proof by using volatile. I saw later that in some other parts of the code, you prefer using atomic access, I can change the volatiles to that if needed.

Yes, please.
Current ordering of memory accesses if very tricky, esp in FinishSwitchFiber and I see that that is actually important for GetStackBounds correctness.

I think it is enough to make only stack_switching_ atomic:

atomic<int> stack_switching_;

void AsanThread::StartSwitchFiber(uptr bottom, uptr size) {
  next_stack_bottom_ = bottom;
  next_stack_top_ = bottom + size;
  atomic_store(&stack_switching_, 1, memory_order_release);
}
  
void AsanThread::FinishSwitchFiber() {
  stack_bottom_ = next_stack_bottom_;
  stack_top_ = next_stack_top_;
  atomic_store(&stack_switching_, 0, memory_order_release);
  next_stack_top_ = 0;
  next_stack_bottom_ = 0;
}

inline AsanThread::StackBounds AsanThread::GetStackBounds() const {
  if (!atomic_load(&stack_switching_, memory_order_acquire))
    return StackBounds{stack_bottom_, stack_top_};  // NOLINT
  char local;
  const uptr cur_stack = (uptr)&local;
  // Note: need to check next stack first, because FinishSwitchFiber
  // may be in process of overwriting stack_top_/bottom_. But in such case
  // we are already on the next stack.
  if (cur_stack >= next_stack_bottom_ && cur_stack < next_stack_top_)
    return StackBounds{next_stack_bottom_, next_stack_top_};  // NOLINT
  return StackBounds{stack_bottom_, stack_top_};  // NOLINT
}
test/asan/TestCases/Linux/swapcontext_annotation.cc
6

I would expect that we do all that -O0/-O1/-O2 automatically on a higher level. At least we used to as far as I remember. Now all that cmake/lit in incomprehensible, so I am not sure... But I don't see any -O flags in other asan tests.

Does anybody know is we run asan lit tests with different optimization levels?

68

s/0/1/
otherwise test can silently break and continue passing on bots

90

s/0/1/

102

s/0/1/

137

please reformat tests with clang-format
{ should be on the previous line

156

Is it useful to run the test without the signal ('x' above)?
I would expect that the signal variant is a strict superset of no-signal. I can't imagine how signals can prevent some failure from happening.
If you agree, then remove the 'x' runs.

176

Run all the following 10000 times (or whatever makes it run for 100ms or so). That will allow to stress interaction with signals better.

Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

I only see the APIs in test code. Is it correct?
Tests are OK. swapcontext is also non-portable.

Yes, I didn't add non-portable code to the library itself. I said that because it forces users to write non-portable code to annotate their code, but it's acceptable.

lib/asan/asan_thread.cc
133

@filcab suggested an assertion instead, I think it would indeed be better since it shows a mis-use of annotations.

147

Oh yes, you're right, this can't happen, I'll remove it.

As for your implementation, I think it would work, and it is indeed simpler.

test/asan/TestCases/Linux/swapcontext_annotation.cc
6

I took this snippet from the preivous test swapcontext_test.cc which was already there and did all the optimization levels explicitly. I can check if the test framework already runs all optimization levels by itself.

68

Indeed, though it would be caught anyway because the string "TestX passed" would not appear in output.

I changed the interface to what @filcab suggested. Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

Like @dvyukov said, having that in test code is fine. The test is already Linux-only anyway :-)
And this exports a simpler API, which is a plus for maintainability of a low-level library ;-) (fewer things to track)

About the FakeStack stuff, I wrote a new test, planning to do the void** thing @filcab asked for, but the test is already passing on my machine. Could you help me write a failing test before I implement that feature?

I'm unsure what you mean here. Do you mean you wrote a test+implementation to swap the fakestack pointer? Or something else?
Can you share the test?

lib/asan/asan_thread.cc
125

If we're going to error, I'd probably go with a Die() call, since it's a bad use of the library (two start switch without an end in the middle).

lib/asan/asan_thread.h
138

Indeed. Probably best to leave it as is, then. It's clean enough and we wouldn't hide that much complexity due to the additional friend declarations.

test/asan/TestCases/Linux/swapcontext_annotation.cc
118

Are the ss_sp and ss_size properties cross-arch? If not, please write getters for them (so we can #ifdef different platforms there) since most of this should be "portable enough".

test/asan/TestCases/Linux/swapcontext_use_after_return.cc
95 ↗(On Diff #60476)

Same as swapcontext_annotation.cc (but with setters).

filcab added inline comments.Jun 13 2016, 8:00 AM
test/asan/TestCases/Linux/swapcontext_annotation.cc
6

It doesn't really. Unless I missed a big thing, we don't really re-do lit's test handling, so it will just get the run lines and do them. It won't try several different opt levels.

blastrock marked 17 inline comments as done.Jun 13 2016, 10:54 AM

About the FakeStack stuff, I wrote a new test, planning to do the void** thing @filcab asked for, but the test is already passing on my machine. Could you help me write a failing test before I implement that feature?

I'm unsure what you mean here. Do you mean you wrote a test+implementation to swap the fakestack pointer? Or something else?
Can you share the test?

The test is in this diff, the file named swapcontext_use_after_return.cc. I didn't write any implementation as you can see and the test passes on my machine. I don't know how the fakestack mechanism works, so I'm not sure what problem I should be trying to solve by adding the void** argument to __asan_start_switch_fiber().

Just to explain what the test does, it is very similar to the previous test and calls UseAfterReturn() at some tricky points. Moreover, it stores the addresses of function-local dead variables in global pointers so that they can be accessed from outside the fiber. All of these cases correctly trigger a use-after-return error.

I also tried accessing a still-alive function-local variable from outside the fiber which still ran correctly. I didn't commit that part because I think I would need to write a separate file to test for that automatically, since this file tests for the error-case.

test/asan/TestCases/Linux/swapcontext_annotation.cc
118

The ucontext struct is described in man getcontext and the uc_stack field is of type stack_t which is described in man sigaltstack. Both say they are part of POSIX and I see no note about architecture-dependency, so I think this is ok.

blastrock marked 10 inline comments as done.Jun 13 2016, 12:01 PM

@dvyukov I wrote that loop to run the tests longer while the signal triggers, and I have found... a deadlock :(

I am not sure how exceptions work, but it seems that there is a mutex that allow only one stack unwinding to occur at the same time... So when the signal triggers while the mutex is locked, the program is completely deadlocked. Here is the stack I got:

* thread #1: tid = 11670, 0x00007f1dbfa79ccc libpthread.so.0`__lll_lock_wait + 28, name = 'swapcontext_ann', stop reason = signal SIGSTOP
  * frame #0: 0x00007f1dbfa79ccc libpthread.so.0`__lll_lock_wait + 28
    frame #1: 0x00007f1dbfa73b05 libpthread.so.0`__GI___pthread_mutex_lock + 117
    frame #2: 0x00007f1dbede0f3a libgcc_s.so.1`_Unwind_Find_FDE [inlined] __gthread_mutex_lock(__mutex=<unavailable>) + 42 at gthr-default.h:748
    frame #3: 0x00007f1dbede0f22 libgcc_s.so.1`_Unwind_Find_FDE at unwind-dw2-fde.c:1005
    frame #4: 0x00007f1dbede0f22 libgcc_s.so.1`_Unwind_Find_FDE(pc=0x00007f1dbeddf18d, bases=0x00007f1dbfdd08d8) + 18 at unwind-dw2-fde-dip.c:448
    frame #5: 0x00007f1dbeddda76 libgcc_s.so.1`uw_frame_state_for(context=0x00007f1dbfdd0830, fs=0x00007f1dbfdd0680) + 102 at unwind-dw2.c:1241
    frame #6: 0x00007f1dbeddecc0 libgcc_s.so.1`uw_init_context_1(context=0x00007f1dbfdd0830, outer_cfa=0x00007f1dbfdd0be0, outer_ra=0x00007f1dbf77c60c) + 80 at unwind-dw2.c:1562
    frame #7: 0x00007f1dbeddf18e libgcc_s.so.1`_Unwind_RaiseException(exc=0x000060d000039860) + 62 at unwind.inc:88
    frame #8: 0x00007f1dbf77c60c libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x000060d000039880, tinfo=0x000000000072abc0, dest=0x0000000000000000)(void *)) + 92 at eh_throw.cc:82
    frame #9: 0x00000000004f9dd2 swapcontext_annotation.cc.tmp`Throw() + 82
    frame #10: 0x00000000004f9dee swapcontext_annotation.cc.tmp`ThrowAndCatch() + 14
    frame #11: 0x00000000004fa469 swapcontext_annotation.cc.tmp`handler(int) + 9
    frame #12: 0x00007f1dbfa7ad30 libpthread.so.0`??? + 1
    frame #13: 0x00007f1dbfa750b3 libpthread.so.0`__pthread_mutex_unlock_usercnt + 3
    frame #14: 0x00007f1dbede1074 libgcc_s.so.1`_Unwind_Find_FDE [inlined] __gthread_mutex_unlock(__mutex=<unavailable>) + 356 at gthr-default.h:778
    frame #15: 0x00007f1dbede1063 libgcc_s.so.1`_Unwind_Find_FDE + 123 at unwind-dw2-fde.c:1039
    frame #16: 0x00007f1dbede0fe8 libgcc_s.so.1`_Unwind_Find_FDE(pc=0x00007f1dbeddf18d, bases=0x00007f1dbfdd13e8) + 216 at unwind-dw2-fde-dip.c:448
    frame #17: 0x00007f1dbeddda76 libgcc_s.so.1`uw_frame_state_for(context=0x00007f1dbfdd1340, fs=0x00007f1dbfdd1190) + 102 at unwind-dw2.c:1241
    frame #18: 0x00007f1dbeddecc0 libgcc_s.so.1`uw_init_context_1(context=0x00007f1dbfdd1340, outer_cfa=0x00007f1dbfdd16f0, outer_ra=0x00007f1dbf77c60c) + 80 at unwind-dw2.c:1562
    frame #19: 0x00007f1dbeddf18e libgcc_s.so.1`_Unwind_RaiseException(exc=0x000060d000039930) + 62 at unwind.inc:88
    frame #20: 0x00007f1dbf77c60c libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x000060d000039950, tinfo=0x000000000072abc0, dest=0x0000000000000000)(void *)) + 92 at eh_throw.cc:82
    frame #21: 0x00000000004f9dd2 swapcontext_annotation.cc.tmp`Throw() + 82
    frame #22: 0x00000000004f9dee swapcontext_annotation.cc.tmp`ThrowAndCatch() + 14
    frame #23: 0x00000000004f9fe1 swapcontext_annotation.cc.tmp`Child(int) + 129
    frame #24: 0x00007f1dbea6f0c0 libc.so.6
    frame #25: 0x000000000141d0e0 swapcontext_annotation.cc.tmp
    frame #26: 0x00000000004fa7d9 swapcontext_annotation.cc.tmp`main + 601 at swapcontext_annotation.cc:170 [opt]
    frame #27: 0x00007f1dbea4b610 libc.so.6`__libc_start_main + 240
    frame #28: 0x000000000041b699 swapcontext_annotation.cc.tmp`_start + 41

So I guess throwing exceptions from signal handlers is unsafe in the first place, but I couldn't find anything that says it's a bad thing to do.

I wrote that loop to run the tests longer while the signal triggers, and I have found... a deadlock :(

Good we caught it now.
Replace exceptions with setjmp/longjmp. Calling longjmp from a signal handler should work

setjmp/longjmp seems to work, the test is now correctly passing. The only thing is that I loop only 30 times and it already incurs a 1s total overhead (4 compilations at all optimization levels) on my machine, quite far from the 10000 you wished for. 100 iterations gives 5s overhead, ~60s for 1000 iterations.

setjmp/longjmp seems to work, the test is now correctly passing. The only thing is that I loop only 30 times and it already incurs a 1s total overhead (4 compilations at all optimization levels) on my machine, quite far from the 10000 you wished for. 100 iterations gives 5s overhead, ~60s for 1000 iterations.

OK, let's leave 30.
It already started catching bugs :)

blastrock updated this revision to Diff 60593.Jun 13 2016, 1:24 PM
blastrock marked an inline comment as done.

Here's what's new:

  • replaced warnings by asserts when calling start/finish_switch_fiber in wrong order
  • simplified code with only one atomic as suggested by @dvyukov
  • clang-formatted the two test files
  • removed the useless x/s argument from the test, the signal now always triggers
  • replaced ThrowAndCatch by a call to a noreturn function that will longjmp back
  • replaced some exit(0) by exit(1) in the test
  • loop 30 times instead of doing a single run

A few points that need checking:

  • I added some atomic_load()s because of the bool that was changed to atomic, I choose a relaxed memory order there, I think it is enough.
  • I didn't use an atomic<int> but atomic_uint8_t to save some space, it shouldn't be harmful.

What ends up happening seems to be a silent bug (and possible corruption afterwards):

  • Create two fibers: Leader and Worker
  • Leader allocates stuff on the stack (needing an asan_stack_malloc call), creating a FakeFrame
  • Switch to Worker
  • Worker does the same, then throws something, catching it too. (This sets the gc_needed_ flag on the StackFrame, since throwing calls handle_no_return)
  • Worker calls another function that sets up a FakeFrame
    • This triggers a GC on FakeStack and we collect the Leader's FakeFrame!!
  • Switch to Leader
  • We can still use the array we allocated (because we didn't poison the shadow when GCing), but the FakeStack might have reallocated that FakeFrame and we would be using invalid frames.

Here's output from a test program I have using our fibers. I added prints to ASan to help show off the bug. And added comments

Initializing jobs...
Starting jobs...
stack_malloc_6(size=d20)
Allocated FakeFrame 0x0002129b5800                   <- Leader fake frame
Started Leader
Created array
stack_malloc_6(size=d20)
Allocated FakeFrame 0x0002129b6800                   <- Worker fake frame
Started Worker
Created array
stack_malloc_6(size=d20)                             <- Setting up dummy function's fake frame
GCing
Checking FakeFrame idx 0 (flags: 1): 0x0002129b5800
Collecting!                                          <- Just collected Leader's fake frame!!
Checking FakeFrame idx 1 (flags: 1): 0x0002129b6800
Allocated FakeFrame 0x0002129b7800
dummyFunction
Created array
stack_free_6(ptr=0x0002129b7800, size=d20)
Deallocating FakeFrame 0x0002129b7800
stack_free_6(ptr=0x0002129b6800, size=d20)
Deallocating FakeFrame 0x0002129b6800
stack_free_6(ptr=0x0002129b5800, size=d20)
Deallocating FakeFrame 0x0002129b5800                <- Finally deallocating the Leader's fake frame. Proper place to "collect" it.
Exiting.

Here's some pseudo-code for the functions (sorry about the uglyness. Compile with -O0 so ASan actually inserts the asan_stack_{malloc,free}_* calls). Since I don't have easy access to a machine with Linux, I can't give you a full test-case. This should be adaptable, though.
Start by creating contexts for the fibers. Then setup a fiber that runs doLeader, and one that does doWorker. Switch to Leader.
If you want to just try swapcontext, I think we should be able to replace the switchToFiber with swapcontext (assuming we created the appropriate contexts) and it should almost work?

#include <stdio.h>
#include <assert.h>
#include <inttypes.h>

struct { void* nextFiber; } Leader, Worker;

void switchToFiber(void *, uint64_t, uint64_t*);  // Fiber, arg to "return", ptr for "returned" arg
void returnToThread(uint64_t, uint64_t*);

struct Stuff { void *S1, *S2; };
__attribute__((noinline,noreturn)) void doLeader(uint64_t argOnInitialize, uint64_t argOnRun) {
  fprintf(stderr, "Started Leader\n");
  // Sets up a FakeStack
  Stuff aaa[200] = {nullptr, nullptr};
  fprintf(stderr, "Created array\n");
  
  switchToFiber(Leader.nextFiber, 0, nullptr);
  returnToThread((uint64_t)aaa->S1, nullptr);
}

__attribute__((noinline)) int dummyFunction(int *p) {
  fprintf(stderr, "dummyFunction\n");
  Stuff bbb[200] = {nullptr, nullptr};
  fprintf(stderr, "Created array\n");
  return p[1];
}

__attribute__((noinline,noreturn)) void doWorker(uint64_t argOnInitialize, uint64_t argOnRun) {
  fprintf(stderr, "Started Worker\n");
  Stuff ccc[200] = {nullptr, nullptr};
  fprintf(stderr, "Created array\n");
  try {
    // Calls handle_no_return (which sets FakeStack::needs_gc_)
    throw 3;
  } catch (int e) {
  }
  // Will call asan_stack_malloc and GC the fake stack
  dummyFunction((int*)ccc);
  switchToFiber(Worker.nextFiber, 0, nullptr);
}

Here's the diff for lib/asan/asan_fake_stack.cc:

diff --git a/lib/asan/asan_fake_stack.cc b/lib/asan/asan_fake_stack.cc
index 91fdf0a..87f7674 100644
--- a/lib/asan/asan_fake_stack.cc
+++ b/lib/asan/asan_fake_stack.cc
@@ -140,6 +140,7 @@ void FakeStack::HandleNoReturn() {
 // We do it based on their 'real_stack' values -- everything that is lower
 // than the current real_stack is garbage.
 NOINLINE void FakeStack::GC(uptr real_stack) {
+  Printf("GCing\n");
   uptr collected = 0;
   for (uptr class_id = 0; class_id < kNumberOfSizeClasses; class_id++) {
     u8 *flags = GetFlags(stack_size_log(), class_id);
@@ -148,7 +149,9 @@ NOINLINE void FakeStack::GC(uptr real_stack) {
       if (flags[i] == 0) continue;  // not allocated.
       FakeFrame *ff = reinterpret_cast<FakeFrame *>(
           GetFrame(stack_size_log(), class_id, i));
+      Printf("Checking FakeFrame idx %lld (flags: %x): %p\n", i, flags[i], ff);
       if (ff->real_stack < real_stack) {
+        Printf("Collecting!\n");
         flags[i] = 0;
         collected++;
       }
@@ -206,12 +209,14 @@ ALWAYS_INLINE uptr OnMalloc(uptr class_id, uptr size) {
   uptr real_stack = reinterpret_cast<uptr>(&local_stack);
   FakeFrame *ff = fs->Allocate(fs->stack_size_log(), class_id, real_stack);
   if (!ff) return 0;  // Out of fake stack.
+  Printf("Allocated FakeFrame %p\n", ff);
   uptr ptr = reinterpret_cast<uptr>(ff);
   SetShadow(ptr, size, class_id, 0);
   return ptr;
 }
 
 ALWAYS_INLINE void OnFree(uptr ptr, uptr class_id, uptr size) {
+  Printf("Deallocating FakeFrame %p\n", ptr);
   FakeStack::Deallocate(ptr, class_id);
   SetShadow(ptr, size, class_id, kMagic8);
 }
@@ -220,14 +225,16 @@ ALWAYS_INLINE void OnFree(uptr ptr, uptr class_id, uptr size) {
 
 // ---------------------- Interface ---------------- {{{1
 using namespace __asan;
-#define DEFINE_STACK_MALLOC_FREE_WITH_CLASS_ID(class_id)                       \
-  extern "C" SANITIZER_INTERFACE_ATTRIBUTE uptr                                \
-      __asan_stack_malloc_##class_id(uptr size) {                              \
-    return OnMalloc(class_id, size);                                           \
-  }                                                                            \
-  extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __asan_stack_free_##class_id(  \
-      uptr ptr, uptr size) {                                                   \
-    OnFree(ptr, class_id, size);                                               \
+#define DEFINE_STACK_MALLOC_FREE_WITH_CLASS_ID(class_id)                      \
+  extern "C" SANITIZER_INTERFACE_ATTRIBUTE uptr                               \
+      __asan_stack_malloc_##class_id(uptr size) {                             \
+    Printf("stack_malloc_" #class_id "(size=%llx)\n", size);                  \
+    return OnMalloc(class_id, size);                                          \
+  }                                                                           \
+  extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __asan_stack_free_##class_id( \
+      uptr ptr, uptr size) {                                                  \
+    Printf("stack_free_" #class_id "(ptr=%p, size=%llx)\n", ptr, size);       \
+    OnFree(ptr, class_id, size);                                              \
   }
 
 DEFINE_STACK_MALLOC_FREE_WITH_CLASS_ID(0)

Wow, thanks for taking the time to explain all this ^^
I have a few questions about it.

So if I understand correctly, there is no way to automate that test, I'll just test manually on my machine?
And the thing I'm trying to fix is avoiding that the leader fakeframe gets gc-ed, right?

Do we really need two fibers to run that test? Can't we have just one fiber and the thread's stack and switch between them?

I will read the code in more detail, I may come back with more questions :)

Ah, right! Probably just one thread that:
allocates fakeframe
handles noreturn
allocates new fakeframe

will trigger it. I'll try changing the test on my side too.

Thank you,

Filipe

dvyukov accepted this revision.Jun 14 2016, 10:39 AM
dvyukov edited edge metadata.

Looks good to me. Any objections, anybody?

Do you have commit access? If not, I will commit it.

This revision is now accepted and ready to land.Jun 14 2016, 10:39 AM

And thanks for adding this support and bearing with us through the review!

Hm... shouldn't I fix @filcab's issue with FakeStack before?

dvyukov requested changes to this revision.Jun 15 2016, 5:31 AM
dvyukov edited edge metadata.

Hm... shouldn't I fix @filcab's issue with FakeStack before?

Yes, missed that there is something unresolved. Sorry.

This revision now requires changes to proceed.Jun 15 2016, 5:31 AM
blastrock updated this revision to Diff 61205.Jun 19 2016, 10:11 AM
blastrock edited edge metadata.

I think I got how fake stacks work. I implemented your idea @filcab, and with your test code, the FakeFrame is not collected anymore.

Now we must provide a void* to store the current FakeStack before a switch occurs. During a fiber switch, the FakeStack mechanism is disabled (I understand that it's ok to have __asan_stack_malloc() return null), I couldn't find a solution without disabling it. One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.

When leaving a fiber definitely, null must be passed as first argument so that instead of saving the fake stack, it gets destroyed.

I also removed the test I wrote which doesn't seem to test anything.

One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.

That's a consequence of manual annotations. If a program uses swapcontext, and we intercept and annotate it, then it won't happen, right?
I think it is fine. The guideline for manual annotations should then recommend to introduce an additional function that merely finishes the switch and calls real non-inlinable fiber body. Worth noting in __sanitizer_start_switch_fiber comment, and, yes, it needs some comment.

I implemented your idea @filcab, and with your test code

I don't see any new test files. Did you forget to add a file?

One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.

That's a consequence of manual annotations. If a program uses swapcontext, and we intercept and annotate it, then it won't happen, right?

Correct. But it's not that bad anyway I think. If the fake stack is there to find use-after-return errors, when the fiber entry function returns, it means the fiber is dead, so there is no real need to track those anymore. It's like a thread dying, and in both cases (thread or fiber) the fake stack is destroyed and it won't detect use-after-returns as such.

I think it is fine. The guideline for manual annotations should then recommend to introduce an additional function that merely finishes the switch and calls real non-inlinable fiber body. Worth noting in __sanitizer_start_switch_fiber comment, and, yes, it needs some comment.

Yes, I was planning on writing a few lines of documentation when we all agree on the interface.

I don't see any new test files. Did you forget to add a file?

I didn't add it. The test proposed by @filcab is based on printfs inside the asan library to show when fake frames are collected. We couldn't write an automatic test for that.

filcab accepted this revision.Jun 20 2016, 10:08 AM
filcab added a reviewer: filcab.

I think I got how fake stacks work. I implemented your idea @filcab, and with your test code, the FakeFrame is not collected anymore.

Yay!

Now we must provide a void* to store the current FakeStack before a switch occurs. During a fiber switch, the FakeStack mechanism is disabled (I understand that it's ok to have __asan_stack_malloc() return null), I couldn't find a solution without disabling it. One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.
When leaving a fiber definitely, null must be passed as first argument so that instead of saving the fake stack, it gets destroyed.

As I commented above, I think adding a way to kill a fake stack without being in the fiber (so: from outside the fiber that "has" that fake stack) would be nice. But I'm ok with doing this internally if there's no interest in it from open source.
In your fiber library, is a use case where the fiber returns to the thread and then re-enters possible? And is it possible to have the fiber go back to the thread and the thread decide to either switch back to the fiber or kill the fiber?

I also removed the test I wrote which doesn't seem to test anything.

Fair enough. I don't see a way of looking at this bug without adding those printfs.

I think this patch looks good. Just ping back with the documentation before committing. If I don't reply after one day, assume I have no complaints or will address them later.

Thanks a lot!

include/sanitizer/common_interface_defs.h
145

These still need documentation.
We probably want a way to collect fake stacks that we saved in a fiber (__sanitizer_fake_stack_kill(void *) or something), instead of just the "current" fake stack (when we're leaving the fiber).

Our fiber library doesn't really have a "I'm now leaving the fiber and won't come back" function. Execution flow can do something like thread->fiberA->fiberB->thread->fiberC->fiberA->thread->fiberB->kill(fiberA)->fiberC->...
And users can terminate a fiber at any time, as long as it's not running.
In order to support this kind of use-case, we need to be able to kill the fake stack without being inside the fiber. I can implement this internally if you think this is not very general. But depending on your fiber library, I'd guess this is likely to appear elsewhere too.

As I commented above, I think adding a way to kill a fake stack without being in the fiber (so: from outside the fiber that "has" that fake stack) would be nice. But I'm ok with doing this internally if there's no interest in it from open source.
In your fiber library, is a use case where the fiber returns to the thread and then re-enters possible? And is it possible to have the fiber go back to the thread and the thread decide to either switch back to the fiber or kill the fiber?

No, in my library, when a fiber returns, it is completely destroyed. And we never kill a fiber from outside before it is finished.
This use-case came to my mind but I thought it wasn't a real world use-case, I was wrong ^^

I let you add this function if you need it, it doesn't seem hard to implement anyway :)

blastrock updated this revision to Diff 61344.Jun 21 2016, 12:57 AM
blastrock updated this object.
blastrock edited edge metadata.
blastrock marked an inline comment as done.

I added some documentation in common_interface_defs.h.

filcab accepted this revision.Jun 21 2016, 4:27 AM
filcab edited edge metadata.

LGTM

No, in my library, when a fiber returns, it is completely destroyed. And we never kill a fiber from outside before it is finished.
This use-case came to my mind but I thought it wasn't a real world use-case, I was wrong ^^

I let you add this function if you need it, it doesn't seem hard to implement anyway :)

There's always someone :-) I'll implement it when I migrate over to this and probably upstream anyway.

Thank you!

Filipe

dvyukov accepted this revision.Jun 21 2016, 5:25 AM
dvyukov edited edge metadata.
This revision is now accepted and ready to land.Jun 21 2016, 5:25 AM

Thank you for your time reviewing this! :)

Failed on windows bot:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/24286/steps/run%20tests/logs/stdio
Looks relevant.

It says something about:

NOTE === If you see a mismatch below, please update asan_win_dll_thunk.cc

I guess I should add INTERFACE_FUNCTION(__sanitizer_*_switch_fiber) to that file. Should I open a new review on phabricator?

I also don't understand why it is needed for these new functions and not for some other functions like __sanitizer_print_memory_profile.

I guess I should add INTERFACE_FUNCTION(__sanitizer_*_switch_fiber) to that file. Should I open a new review on phabricator?

Yes, please.

File that defines __sanitizer_print_memory_profile contains #if CAN_SANITIZE_LEAKS. Probably that's not enabled on windows.

filcab closed this revision.Dec 14 2016, 8:06 AM

This was committed in r273260.