This is an archive of the discontinued LLVM Phabricator instance.

[asan] add primitives that allow coroutine implementations
ClosedPublic

Authored by blastrock on Jun 2 2016, 8:30 AM.

Details

Reviewers
kcc
filcab
dvyukov
Summary

This patch adds the sanitizer_start_switch_fiber and
sanitizer_finish_switch_fiber methods inspired from what can be found here
https://github.com/facebook/folly/commit/2ea64dd24946cbc9f3f4ac3f6c6b98a486c56e73 .

These methods are needed when the compiled software needs to implement
coroutines, fibers or the like. Without a way to annotate them, when the program
jumps to a stack that is not the thread stack, __asan_handle_no_return shows a
warning about that, and the fake stack mechanism may free fake frames that are
still in use.

Diff Detail

Event Timeline

blastrock updated this revision to Diff 59384.Jun 2 2016, 8:30 AM
blastrock retitled this revision from to [asan] add primitives that allow coroutine implementations.
blastrock updated this object.
blastrock added a reviewer: kcc.
blastrock added a subscriber: llvm-commits.
kcc added a reviewer: dvyukov.Jun 2 2016, 3:46 PM

Please add test(s)

The code looks sane but I don't know enough about fibers to see what may go wrong.
Anyone?

lib/asan/asan_thread.cc
404

Is this going to be public interface that we want to allow users to call?
If yes, it should be __sanitizer_enter_fiber and *also* declared in include/sanitizer/common_interface_defs.h
(It's ok if we only implement it in asan for now, just explain it in comments in common_interface_defs.h)

lib/asan/asan_thread.h
98

should this be a hard failure?

156

Why do you need this as a variable? Isn't a function better?

dvyukov edited edge metadata.Jun 3 2016, 12:03 AM

Philippe, what coroutine implementation do you use?

We should intercept and annotate at least ucontext API, so that applications using ucontext work out of the box. It will also simplify testing and make it more realistic, as we will just need to write a real program using ucontext.

Unfortunately, boost::context uses own assembly to implement coroutines, so we still need to expose annotations. We can talk to boost maintainers to add annotations directly to boost::context.

Philippe, what coroutine implementation do you use?

I use boost::context as you seem to have guessed

We should intercept and annotate at least ucontext API, so that applications using ucontext work out of the box. It will also simplify testing and make it more realistic, as we will just need to write a real program using ucontext.

So you mean redefining makecontext and swapcontext in compiler-rt, annotate them and call libc's implementation after that? Is there an example of such a code for another api?

Unfortunately, boost::context uses own assembly to implement coroutines, so we still need to expose annotations. We can talk to boost maintainers to add annotations directly to boost::context.

I can try to make them a patch for that when I'm done with this. For the moment I added the annotations around boost::context calls.

lib/asan/asan_thread.cc
404

Oh, you mean that asan functions are meant to be called by the code generated by the compiler while sanitizer functions are meant to be called by the user directly? Anyway, I'll change that.

lib/asan/asan_thread.h
98

I wasn't sure. I can make this a hard failure.

Facebook's folly uses a slightly different interface which looks like enterfiber(uptr bottom, uint size), this kind of interface would remove the need for such a check. Would you prefer me to change it to that?

156

I agree, I was just copying what was done with the stack_* variables. I'll make this a function.

So you mean redefining makecontext and swapcontext in compiler-rt, annotate them and call libc's implementation after that? Is there an example of such a code for another api?

Yes.
Here are examples (the interceptors should also be added to that file):
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sanitizer_common_interceptors.inc?revision=270076&view=markup

You will need to define new hooks COMMON_INTERCEPTOR_ENTER/EXIT_FIBER, but then you can give them actual implementation only in asan (for starters).

I see that folly always uses main thread stack as scheduler stack: always switches main->fiber/fiber->main, but not fiber1->fiber2. Some programs avoid the excessive switch by doing fiber1->fiber2. We need to make sure that we support that pattern as well.

I wonder if it can interfere with signals (without altstack). Namely, a signal arrives when asan has setup fiber stack, but the actual switch did not happen (SP is still pointing to the old stack). Then signal handler executes something that causes __asan_handle_no_return. Seems that we can get the same crash. Need tests.

I see that folly always uses main thread stack as scheduler stack: always switches main->fiber/fiber->main, but not fiber1->fiber2. Some programs avoid the excessive switch by doing fiber1->fiber2. We need to make sure that we support that pattern as well.

I think this is easy to support, we can remove the warning about entering a fiber twice, then, for example, you could go main -> enter_fiber -> fiber 1 -> enter_fiber -> fiber 2 -> exit_fiber -> main.

I wonder if it can interfere with signals (without altstack). Namely, a signal arrives when asan has setup fiber stack, but the actual switch did not happen (SP is still pointing to the old stack). Then signal handler executes something that causes __asan_handle_no_return. Seems that we can get the same crash. Need tests.

I didn't think about that. I think you are right, it will cause the warning in __asan_handle_no_return and the false positives afterwards. However, I don't know how we could avoid that... The user doing the annotation could take care of that by deferring signals between the enter_fiber and the switch to the new stack. Do you have any other idea?

I didn't think about that. I think you are right, it will cause the warning in __asan_handle_no_return and the false positives afterwards. However, I don't know how we could avoid that... The user doing the annotation could take care of that by deferring signals between the enter_fiber and the switch to the new stack. Do you have any other idea?

Requiring users to mess with it is unpleasant. And missing signal masks is not something you will instantly notice.

Also setting signal mask can slow down programs significantly. On my old program removal of signal mask change on context switches provided 7x end-to-end speedup. And I think swapcontext should do only one sigsetmask call, while disable/enable will be 2 syscalls.

If we express the annotations as start_switch/fnish_switch, then we have more flexibility to handle it. First, we can handle signal masks internally. Second, we can remember both old and new stacks for the duration of the switch, and then check both in __asan_handle_no_return. This will eliminate the need to mess with signal masks.

If we express the annotations as start_switch/fnish_switch, then we have more flexibility to handle it. First, we can handle signal masks internally. Second, we can remember both old and new stacks for the duration of the switch, and then check both in __asan_handle_no_return. This will eliminate the need to mess with signal masks.

Oh, I see!

But I note that AsanThread::stack_*() functions are called by other functions than __asan_handle_no_return. I see one particular call in asan_thread.cc, in GetThreadRangesLocked(), it is done on the thread whose id is given as an argument. If that thread is in the middle of start_switch/finish_switch, I don't know which stack to return in AsanThread::stack_*().

I just noticed that I also forgot to change AsanThread::AddrIsInStack(), should it just return true if the address is in one of the two stacks when the thread is in the middle of a context switch? Would it have bad consequences?

Is there a way from an AsanThread to get its stack pointer? It would help to know if the thread is on one stack or the other for all usages of stack_*() and solve all these problems.

Or if the only problematic function is __asan_handle_no_return, I may have to handle the coroutine special case only there?

Is there a way from an AsanThread to get its stack pointer? It would help to know if the thread is on one stack or the other for all usages of stack_*() and solve all these problems.

You generally can do that by taking an address of any local variable.
Yes, if we are in the middle of switch we can take current SP and figure out what stack we are on. I have not looked at all usages of stack_top, but it seems to me that AddrIsInStack should use this new fiber support as well. Kostya, what do you think?

You generally can do that by taking an address of any local variable.

If we are looking for the stack of the current thread, yes. But there are functions that need to know the stack of other threads, like GetThreadRangesLocked. It is the only such function I have found but I don't know if there are others.

I think GetThreadRangesLocked needs to use main thread stack. Fiber stacks must be preserved alive by some other means (e.g. referenced from the context object), and thus transitively scanned by lsan. However, I think it may not always be the case. For example, if we switch to a fiber stack and then wipe all pointers to the stack (since we are not going to switch back to it, e.g. the fiber approaches it's end). In such situation the only reference the to fiber stack is from SP register...

blastrock updated this revision to Diff 59674.Jun 5 2016, 1:53 PM
blastrock edited edge metadata.

Ok, I think I did everything you asked for, except for the interception of the swapcontext calls, I will come back at it later.

I didn't do anything on MsanThread, should I do something on it, for the case where the only reference left to the fiber stack is the sp register?

I am still unsure about something... There is that function AsanThread::ClearShadowForThreadAndStackTLS which does something about (un?)poisoning the stack. I didn't change it, so it still operates on the thread main stack for the moment, should it be changed to also clear the fiber stack if there is one?

I have added a few NOLINT there because the linter complained about the fact that a semicolon is useless after a closing brace. Any idea about how to avoid that warning is welcome.

And one last thing, I only ran the tests on clang 3.8 and rebased the patch on trunk to submit it here. I will run the tests later on trunk if everybody is ok with this code.

dvyukov added inline comments.Jun 6 2016, 3:54 AM
lib/asan/asan_thread.cc
136

This if is excessive. Remove.

lib/asan/asan_thread.h
97

We can also switch from one fiber to another. So we need two fiber stacks (old/new).

blastrock marked 6 inline comments as done.Jun 6 2016, 8:14 AM
blastrock added inline comments.
lib/asan/asan_thread.cc
136

Right, I changed this code so much I missed the obvious :P

lib/asan/asan_thread.h
97

Indeed, this is not enough, I will fix it and add tests.

blastrock updated this revision to Diff 59763.Jun 6 2016, 12:32 PM

I added support for switching from one fiber to another and added that to the unit test. I also added a few ThrowAndCatch calls there to try to trigger the warning that this patch fixes in __asan_handle_no_return.

Btw, if I add the interceptors for swapcontext, then I won't be able to insert these ThrowAndCatch calls in this test. To keep it, do you think I should call swapcontext from a dlopen/dlsym to avoid calling the interceptor? Or do you have another idea?

Btw, if I add the interceptors for swapcontext, then I won't be able to insert these ThrowAndCatch calls in this test.

Why?

Because then I will not be able to write something like

__sanitizer_enter();
ThrowAndCatch();
swapcontext();

because __sanitizer_enter() will be called directly inside swapcontext().

Because then I will not be able to write something like

__sanitizer_enter();
ThrowAndCatch();
swapcontext();

because __sanitizer_enter() will be called directly inside swapcontext().

This can be tested by sending signals during the switch and doing ThrowAndCatch inside the handler.
That would be a very useful test on its own. Signals are known to be tricky and cause problems.
You can find some existing tests that setup periodic timer signals (at lest there are few in tsan).

I found one of these tests you are talking about in tsan/signal_sync.cc. It seems to trigger a signal every 10 microseconds, but even with that, isn't the probability that the signal triggers right in between __sanitizer_enter() and swapcontext() very low? I think it would be useful to add such a test, but not replace the current one by it, as the current one is deterministic and tests all tricky cases (except the signals).

If you want deterinistic unit tests, finding real swapcontext with dlsym if fine. dlsym(RTLD_NEXT, "swapcontext") should do.

filcab added a subscriber: filcab.Jun 8 2016, 7:16 AM

Please split the additions to the swapcontext test to a new, "just fiber-related" test case.
That will also make it easier to port that specific test to several fiber platforms, in the absence of swapcontext (or to other mechanisms for the same functionality).

More general comments:
Several things need to be updated in AsanThread to keep track of fibers. Some of them involve saving old state:

  • Stack bounds: Mostly been dealt with, which is good. I think there might be some other things we can do to avoid adding too much complexity.
  • FakeStack: Missing. It basically needs a place where we can dump a FakeStack.

For stack bounds:
How about if __sanitizer_start_enter_fiber takes a void** where we can stash the current FakeStack pointer, and then __sanitizer_finish_enter_fiber can overwrite it with a new one if needed?
On __sanitizer_finish_exit_fiber we could do the opposite (runtime would need to pass a void*, of course), and recover the old FakeStack (disposing of the fiber one (there's no documentation, but I'm assuming the exit_fiber function is to be called when actually exiting (for the last time) a fiber, not simply when returning to the main stack)).
Without the fake stack handling, we'll lose use-after-return errors.

My understanding is that we're calling __sanitizer_*_enter_fiber on an anything->fiber transition (be it the first one into that fiber or any other), and calling __sanitizer_*_exit_fiber on the *last* (for a given fiber) fiber->anything transition.
Let me know if that's not the case. And if it's not, why it's not generalized like that.

As for the stack bounds: Would it be a problem to figure them out dinamycally in __sanitizer_finish_enter_fiber? Possibly with a callback from the fiber library, since we need to know if a fiber is running (Our system's fiber library has a function that queries TLS and tells you if you're running in a fiber. I have no idea about other implementations).

include/sanitizer/common_interface_defs.h
143

Needs documentation.

lib/asan/asan_thread.cc
130

Why don't you change stack_{top,bottom} and, when exiting the fiber, get the bounds from the system?
At the very least (if the proposal above can't be done), we can probably avoid a bunch of code changes by simply having old_stack_{top,bottom} members to save the old stuff, and then set stack_{top,bottom} to the fiber's bounds. It would also avoid queries to fiber_stack_* to see if we should use those or the regular stack_*.

140

Assert?

151

Assert?

157

Assert?

162

The struct definition should be in an anonymous namespace defined here in the *.cc file.
This function should be a static function or also put in the anonymous namespace.

165

ugh, the linter warns here?

176

Line up the //NOLINT unless clang-format changes it to be unaligned.
But given the different amount of spaces in these two lines, it seems you should at least run clang-format ;-)

236

Initializing these vars should be done in the Init() function, not here.

409

If it's not a "big" problem to call these functions if we don't know about the thread (hence the warning vs. assert, I guess), then we should probably use VReport(1, ...) so we don't warn when verbosity is off, no?

lib/asan/asan_thread.h
145

Remove this. GetStackBounds (and the struct) are only used in the .cc file. You can make them have internal linkage there.

filcab added inline comments.Jun 8 2016, 7:16 AM
lib/asan/asan_thread.cc
14

Isn't this one of the files that shouldn't include system headers?

127

I'd prefer something like "starting fiber switch while in fiber switch".
I'd also make it an assert unless we have actual uses cases of this.

lib/asan/asan_thread.h
157

Do we need this or can we use the regular stack_{top,bottom}?
Do we need to keep track of the old stack bounds?

161

What about fiber vs non-fiber? How do we know when to pick each of {fiber_,}stack_{top,bottom}?

For stack bounds:
How about if __sanitizer_start_enter_fiber takes a void** where we can stash the current FakeStack pointer, and then __sanitizer_finish_enter_fiber can overwrite it with a new one if needed?
On __sanitizer_finish_exit_fiber we could do the opposite (runtime would need to pass a void*, of course), and recover the old FakeStack (disposing of the fiber one (there's no documentation, but I'm assuming the exit_fiber function is to be called when actually exiting (for the last time) a fiber, not simply when returning to the main stack)).
Without the fake stack handling, we'll lose use-after-return errors.

Oh, that what they are used for! I think what you said could work, I must give it more thought.

My understanding is that we're calling __sanitizer_*_enter_fiber on an anything->fiber transition (be it the first one into that fiber or any other), and calling __sanitizer_*_exit_fiber on the *last* (for a given fiber) fiber->anything transition.
Let me know if that's not the case. And if it's not, why it's not generalized like that.

No, the second part is not exact (or I'm misunderstanding you). __sanitizer_*_exit_fiber are used when returning from the fiber to the thread's stack. The fiber may be re-entered later.
To be clear:

  • you call enter_fiber when you are about to switch to a fiber
  • you call exit_fiber when you are about to switch back to the thread's stack

These are not related to the fiber's lifetime.

As for the stack bounds: Would it be a problem to figure them out dinamycally in __sanitizer_finish_enter_fiber? Possibly with a callback from the fiber library, since we need to know if a fiber is running (Our system's fiber library has a function that queries TLS and tells you if you're running in a fiber. I have no idea about other implementations).

I'm not sure I understand. You want to avoid giving the stack bounds as arguments to __sanitizer_start_enter_fiber? We actually need to know the stack bounds we are switching to beforehand, in case a signal handler runs on the fiber, before __sanitizer_finish_enter_fiber is called.

@dvyokov I tried the signal triggering stuff with setitimer, but the resolution (on my machine at least) seems to low. The signal triggers 0 to 1 time, never close to critical parts. I even tried letting it run for 5min with a while (true) of the test, it didn't trigger any bad behavior.
Instead, I tried adding calls to ThrowAndCatch() in between the lines of StartEnterFiber() and the others functions directly in the library (of course I can't commit that), and I found a few bugs with that, not fixed yet.
My point is that using setitimer to trigger these bugs seems useless, do you still want me to do it? I have no other solution to unit-test that though...

lib/asan/asan_thread.cc
14

I don't know about that. I use it for size_t to implement the functions exposed in common_interface_defs.h. I can move these implementations in another file if needed, or use uptr if that makes sense, but I'm not sure I can use it in common_interface_defs.

127

No use case comes to my mind, I'm ok with making this an assert.

130

see my response to your comment on asan_thread.hh:149

165

It says that a semicolon after a closing brace is useless, it probably thinks of this as a block...

176

I think I did run clang-format, I will do it again. I just need to use the default llvm style, right?

409

I don't know if this can actually happen, so I don't know what would be the consequences. I guess that it would only trigger the warning in __asan_handle_no_return later, so this one can be made a verbose log, since nothing really bad has happened yet.

lib/asan/asan_thread.h
157

I think we do. This is there to handle the case where the user does:

__start_enter_fiber();
// a signal triggers here and the handler throws which triggers __asan_handle_no_return
swapcontext();

// in the fiber we just switched to
// a signal may trigger here too
__finish_enter_fiber();

On each call of __asan_handle_no_return between the __start and __finish_enter_fiber, we need to check on which stack we actually are running on, so we need both of them. This is what is done in GetStackBounds()

161

If fiber_stack_* are not null then we are on the fiber, unless fiber_switching_ is true which means that we don't know if we are on fiber stack or thread stack.

@dvyokov I tried the signal triggering stuff with setitimer, but the resolution (on my machine at least) seems to low. The signal triggers 0 to 1 time, never close to critical parts. I even tried letting it run for 5min with a while (true) of the test, it didn't trigger any bad behavior.

Instead, I tried adding calls to ThrowAndCatch() in between the lines of StartEnterFiber() and the others functions directly in the library (of course I can't commit that), and I found a few bugs with that, not fixed yet.
My point is that using setitimer to trigger these bugs seems useless, do you still want me to do it? I have no other solution to unit-test that though...

Absolutely.
My OS is well able to trigger signal every 100us per core. We also have a bunch of test bots that run these tests continuously. Even if this test will find an issue after 100 runs, it's still very useful. Debugging such issue in production is costly.

filcab added a comment.Jun 9 2016, 8:15 AM

My understanding is that we're calling __sanitizer_*_enter_fiber on an anything->fiber transition (be it the first one into that fiber or any other), and calling __sanitizer_*_exit_fiber on the *last* (for a given fiber) fiber->anything transition.
Let me know if that's not the case. And if it's not, why it's not generalized like that.

No, the second part is not exact (or I'm misunderstanding you). __sanitizer_*_exit_fiber are used when returning from the fiber to the thread's stack. The fiber may be re-entered later.
To be clear:

  • you call enter_fiber when you are about to switch to a fiber
  • you call exit_fiber when you are about to switch back to the thread's stack

These are not related to the fiber's lifetime.

Do we need to care that we're doing a Fiber->Thread, though? Does it need to be handled differently, in general?

The way I see it, you could do something like (I'm giving vars/functions/members totally different names to avoid ambiguity with the current names. Feel free to keep something closer to your current ones):
__sanitizer_fiber_switch_start(void *ctx, uptr size);
This function is passed the fiber's or the thread's stack bounds, and sets next_stack_{top,bottom}.

__sanitizer_fiber_switch_finish();
This function is run in the new stack that you switched to and removes the current stack_{top,bottom}, replacing with what was in next_stack_{top,bottom} (setting the latter ones to 0, after).

That way, you only have one primitive to deal with (This follows from our simple fiber implementation. Yours might have functionality that requires the current implementation. Please let me know).

The FakeStacks might be more annoying and require the fiber implementation to provide an easy way for us to store a random pointer-sized object. I don't think that should be a problem.

Do we want to keep track of fibers in the same way we keep track of threads, and be able to diagnose which fiber we were in on allocation/deallocation?
(I guess the answer might be "yes", but I mention this as a "future work" thing. We should try to take it into account, but shouldn't add it to this patch (or complicate it too much)).

As for the stack bounds: Would it be a problem to figure them out dinamycally in __sanitizer_finish_enter_fiber? Possibly with a callback from the fiber library, since we need to know if a fiber is running (Our system's fiber library has a function that queries TLS and tells you if you're running in a fiber. I have no idea about other implementations).

I'm not sure I understand. You want to avoid giving the stack bounds as arguments to __sanitizer_start_enter_fiber? We actually need to know the stack bounds we are switching to beforehand, in case a signal handler runs on the fiber, before __sanitizer_finish_enter_fiber is called.

Fair enough. The signal handler case is interesting and I think it's worth it to get the bounds ahead because of it.

filcab added inline comments.Jun 9 2016, 8:21 AM
lib/asan/asan_thread.cc
14

OK. Remove the stddef.h include.
On the external header, stddef.h is already included (it's external, so we don't have a problem with that) and you can keep size_t there.
Internally, use uptr instead of size_t. We know that on the platforms we support we're safe doing it.

165

Unfortunate, but not much to do here.

176

You can probably use Google-style for ASan. I'm not sure if we're trying to transition ASan to LLVM-style, but I don't think so.

The way I see it, you could do something like (I'm giving vars/functions/members totally different names to avoid ambiguity with the current names. Feel free to keep something closer to your current ones):
__sanitizer_fiber_switch_start(void *ctx, uptr size);
This function is passed the fiber's or the thread's stack bounds, and sets next_stack_{top,bottom}.

__sanitizer_fiber_switch_finish();
This function is run in the new stack that you switched to and removes the current stack_{top,bottom}, replacing with what was in next_stack_{top,bottom} (setting the latter ones to 0, after).

That way, you only have one primitive to deal with (This follows from our simple fiber implementation. Yours might have functionality that requires the current implementation. Please let me know).

My implementation does not know about the thread's stack bounds. It could ask them through pthread and store them in some thread_local storage to avoid asking them each time.

Also, this interface is based on what I saw in facebook's folly, here https://github.com/facebook/folly/commit/2ea64dd24946cbc9f3f4ac3f6c6b98a486c56e73 , so they decided to do it this way too.

This interface currently makes it easier to annotate code, but if you think simplifying asan's internals is worth it, I don't mind changing it.

blastrock marked 12 inline comments as done.Jun 11 2016, 3:12 AM
blastrock added inline comments.
lib/asan/asan_thread.cc
127

Hm... should I use the default C assert macro? I couldn't find if you have your own macro. Or should I leave the if and the log as they are and just add a Die() call?

lib/asan/asan_thread.h
145

I can move the struct in the .cc file, but if I move GetStackBounds(), I must make it friend of AsanThread so that it can access fiber_stack_*. Another solution would be to make it receive all that data as argument, but the prototype would be GetStackBounds(stacktop, stackbottom, fiberstacktop, fiberstackbottom, fiberswitching) (if I keep only two stack in memory as you suggested in your last comment).
What do you prefer?

blastrock updated this revision to Diff 60476.Jun 12 2016, 12:08 PM
blastrock updated this object.

I changed the interface to what @filcab suggested. Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

I made my implementation more signal-proof by using volatile. I saw later that in some other parts of the code, you prefer using atomic access, I can change the volatiles to that if needed.

I split the tests and left the swapcontext test as it is currently. I also added a test that spams signals while it runs that @dvyukov asked for. As I said, the timer as a poor resolution on my machine, so it doesn't test anything more than the test without the signals.

I removed the stddef include and changed some logs to verbose reports. I changed my clang-format to google style and reformatted a few things I wrote.

About the FakeStack stuff, I wrote a new test, planning to do the void** thing @filcab asked for, but the test is already passing on my machine. Could you help me write a failing test before I implement that feature?

Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

I only see the APIs in test code. Is it correct?
Tests are OK. swapcontext is also non-portable.

lib/asan/asan_thread.cc
135

If you want to tolerate some user errors here, then I think it's better to return here. next_stack_top/bottom are most likely 0, so we will end up with no stack at all otherwise.

149

Why?
If I am reading core correctly, this must never happen.

I made my implementation more signal-proof by using volatile. I saw later that in some other parts of the code, you prefer using atomic access, I can change the volatiles to that if needed.

Yes, please.
Current ordering of memory accesses if very tricky, esp in FinishSwitchFiber and I see that that is actually important for GetStackBounds correctness.

I think it is enough to make only stack_switching_ atomic:

atomic<int> stack_switching_;

void AsanThread::StartSwitchFiber(uptr bottom, uptr size) {
  next_stack_bottom_ = bottom;
  next_stack_top_ = bottom + size;
  atomic_store(&stack_switching_, 1, memory_order_release);
}
  
void AsanThread::FinishSwitchFiber() {
  stack_bottom_ = next_stack_bottom_;
  stack_top_ = next_stack_top_;
  atomic_store(&stack_switching_, 0, memory_order_release);
  next_stack_top_ = 0;
  next_stack_bottom_ = 0;
}

inline AsanThread::StackBounds AsanThread::GetStackBounds() const {
  if (!atomic_load(&stack_switching_, memory_order_acquire))
    return StackBounds{stack_bottom_, stack_top_};  // NOLINT
  char local;
  const uptr cur_stack = (uptr)&local;
  // Note: need to check next stack first, because FinishSwitchFiber
  // may be in process of overwriting stack_top_/bottom_. But in such case
  // we are already on the next stack.
  if (cur_stack >= next_stack_bottom_ && cur_stack < next_stack_top_)
    return StackBounds{next_stack_bottom_, next_stack_top_};  // NOLINT
  return StackBounds{stack_bottom_, stack_top_};  // NOLINT
}
test/asan/TestCases/Linux/swapcontext_annotation.cc
5 ↗(On Diff #60476)

I would expect that we do all that -O0/-O1/-O2 automatically on a higher level. At least we used to as far as I remember. Now all that cmake/lit in incomprehensible, so I am not sure... But I don't see any -O flags in other asan tests.

Does anybody know is we run asan lit tests with different optimization levels?

67 ↗(On Diff #60476)

s/0/1/
otherwise test can silently break and continue passing on bots

89 ↗(On Diff #60476)

s/0/1/

101 ↗(On Diff #60476)

s/0/1/

136 ↗(On Diff #60476)

please reformat tests with clang-format
{ should be on the previous line

155 ↗(On Diff #60476)

Is it useful to run the test without the signal ('x' above)?
I would expect that the signal variant is a strict superset of no-signal. I can't imagine how signals can prevent some failure from happening.
If you agree, then remove the 'x' runs.

175 ↗(On Diff #60476)

Run all the following 10000 times (or whatever makes it run for 100ms or so). That will allow to stress interaction with signals better.

Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

I only see the APIs in test code. Is it correct?
Tests are OK. swapcontext is also non-portable.

Yes, I didn't add non-portable code to the library itself. I said that because it forces users to write non-portable code to annotate their code, but it's acceptable.

lib/asan/asan_thread.cc
135

@filcab suggested an assertion instead, I think it would indeed be better since it shows a mis-use of annotations.

149

Oh yes, you're right, this can't happen, I'll remove it.

As for your implementation, I think it would work, and it is indeed simpler.

test/asan/TestCases/Linux/swapcontext_annotation.cc
5 ↗(On Diff #60476)

I took this snippet from the preivous test swapcontext_test.cc which was already there and did all the optimization levels explicitly. I can check if the test framework already runs all optimization levels by itself.

67 ↗(On Diff #60476)

Indeed, though it would be caught anyway because the string "TestX passed" would not appear in output.

I changed the interface to what @filcab suggested. Another downside is that getting the current thread's stack bounds is done through a non-portable pthread API, but it does simplify asan code.

Like @dvyukov said, having that in test code is fine. The test is already Linux-only anyway :-)
And this exports a simpler API, which is a plus for maintainability of a low-level library ;-) (fewer things to track)

About the FakeStack stuff, I wrote a new test, planning to do the void** thing @filcab asked for, but the test is already passing on my machine. Could you help me write a failing test before I implement that feature?

I'm unsure what you mean here. Do you mean you wrote a test+implementation to swap the fakestack pointer? Or something else?
Can you share the test?

lib/asan/asan_thread.cc
127

If we're going to error, I'd probably go with a Die() call, since it's a bad use of the library (two start switch without an end in the middle).

lib/asan/asan_thread.h
145

Indeed. Probably best to leave it as is, then. It's clean enough and we wouldn't hide that much complexity due to the additional friend declarations.

test/asan/TestCases/Linux/swapcontext_annotation.cc
117 ↗(On Diff #60476)

Are the ss_sp and ss_size properties cross-arch? If not, please write getters for them (so we can #ifdef different platforms there) since most of this should be "portable enough".

test/asan/TestCases/Linux/swapcontext_use_after_return.cc
95 ↗(On Diff #60476)

Same as swapcontext_annotation.cc (but with setters).

filcab added inline comments.Jun 13 2016, 8:00 AM
test/asan/TestCases/Linux/swapcontext_annotation.cc
5 ↗(On Diff #60476)

It doesn't really. Unless I missed a big thing, we don't really re-do lit's test handling, so it will just get the run lines and do them. It won't try several different opt levels.

blastrock marked 17 inline comments as done.Jun 13 2016, 10:54 AM

About the FakeStack stuff, I wrote a new test, planning to do the void** thing @filcab asked for, but the test is already passing on my machine. Could you help me write a failing test before I implement that feature?

I'm unsure what you mean here. Do you mean you wrote a test+implementation to swap the fakestack pointer? Or something else?
Can you share the test?

The test is in this diff, the file named swapcontext_use_after_return.cc. I didn't write any implementation as you can see and the test passes on my machine. I don't know how the fakestack mechanism works, so I'm not sure what problem I should be trying to solve by adding the void** argument to __asan_start_switch_fiber().

Just to explain what the test does, it is very similar to the previous test and calls UseAfterReturn() at some tricky points. Moreover, it stores the addresses of function-local dead variables in global pointers so that they can be accessed from outside the fiber. All of these cases correctly trigger a use-after-return error.

I also tried accessing a still-alive function-local variable from outside the fiber which still ran correctly. I didn't commit that part because I think I would need to write a separate file to test for that automatically, since this file tests for the error-case.

test/asan/TestCases/Linux/swapcontext_annotation.cc
117 ↗(On Diff #60476)

The ucontext struct is described in man getcontext and the uc_stack field is of type stack_t which is described in man sigaltstack. Both say they are part of POSIX and I see no note about architecture-dependency, so I think this is ok.

blastrock marked 10 inline comments as done.Jun 13 2016, 12:01 PM

@dvyukov I wrote that loop to run the tests longer while the signal triggers, and I have found... a deadlock :(

I am not sure how exceptions work, but it seems that there is a mutex that allow only one stack unwinding to occur at the same time... So when the signal triggers while the mutex is locked, the program is completely deadlocked. Here is the stack I got:

* thread #1: tid = 11670, 0x00007f1dbfa79ccc libpthread.so.0`__lll_lock_wait + 28, name = 'swapcontext_ann', stop reason = signal SIGSTOP
  * frame #0: 0x00007f1dbfa79ccc libpthread.so.0`__lll_lock_wait + 28
    frame #1: 0x00007f1dbfa73b05 libpthread.so.0`__GI___pthread_mutex_lock + 117
    frame #2: 0x00007f1dbede0f3a libgcc_s.so.1`_Unwind_Find_FDE [inlined] __gthread_mutex_lock(__mutex=<unavailable>) + 42 at gthr-default.h:748
    frame #3: 0x00007f1dbede0f22 libgcc_s.so.1`_Unwind_Find_FDE at unwind-dw2-fde.c:1005
    frame #4: 0x00007f1dbede0f22 libgcc_s.so.1`_Unwind_Find_FDE(pc=0x00007f1dbeddf18d, bases=0x00007f1dbfdd08d8) + 18 at unwind-dw2-fde-dip.c:448
    frame #5: 0x00007f1dbeddda76 libgcc_s.so.1`uw_frame_state_for(context=0x00007f1dbfdd0830, fs=0x00007f1dbfdd0680) + 102 at unwind-dw2.c:1241
    frame #6: 0x00007f1dbeddecc0 libgcc_s.so.1`uw_init_context_1(context=0x00007f1dbfdd0830, outer_cfa=0x00007f1dbfdd0be0, outer_ra=0x00007f1dbf77c60c) + 80 at unwind-dw2.c:1562
    frame #7: 0x00007f1dbeddf18e libgcc_s.so.1`_Unwind_RaiseException(exc=0x000060d000039860) + 62 at unwind.inc:88
    frame #8: 0x00007f1dbf77c60c libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x000060d000039880, tinfo=0x000000000072abc0, dest=0x0000000000000000)(void *)) + 92 at eh_throw.cc:82
    frame #9: 0x00000000004f9dd2 swapcontext_annotation.cc.tmp`Throw() + 82
    frame #10: 0x00000000004f9dee swapcontext_annotation.cc.tmp`ThrowAndCatch() + 14
    frame #11: 0x00000000004fa469 swapcontext_annotation.cc.tmp`handler(int) + 9
    frame #12: 0x00007f1dbfa7ad30 libpthread.so.0`??? + 1
    frame #13: 0x00007f1dbfa750b3 libpthread.so.0`__pthread_mutex_unlock_usercnt + 3
    frame #14: 0x00007f1dbede1074 libgcc_s.so.1`_Unwind_Find_FDE [inlined] __gthread_mutex_unlock(__mutex=<unavailable>) + 356 at gthr-default.h:778
    frame #15: 0x00007f1dbede1063 libgcc_s.so.1`_Unwind_Find_FDE + 123 at unwind-dw2-fde.c:1039
    frame #16: 0x00007f1dbede0fe8 libgcc_s.so.1`_Unwind_Find_FDE(pc=0x00007f1dbeddf18d, bases=0x00007f1dbfdd13e8) + 216 at unwind-dw2-fde-dip.c:448
    frame #17: 0x00007f1dbeddda76 libgcc_s.so.1`uw_frame_state_for(context=0x00007f1dbfdd1340, fs=0x00007f1dbfdd1190) + 102 at unwind-dw2.c:1241
    frame #18: 0x00007f1dbeddecc0 libgcc_s.so.1`uw_init_context_1(context=0x00007f1dbfdd1340, outer_cfa=0x00007f1dbfdd16f0, outer_ra=0x00007f1dbf77c60c) + 80 at unwind-dw2.c:1562
    frame #19: 0x00007f1dbeddf18e libgcc_s.so.1`_Unwind_RaiseException(exc=0x000060d000039930) + 62 at unwind.inc:88
    frame #20: 0x00007f1dbf77c60c libstdc++.so.6`__cxxabiv1::__cxa_throw(obj=0x000060d000039950, tinfo=0x000000000072abc0, dest=0x0000000000000000)(void *)) + 92 at eh_throw.cc:82
    frame #21: 0x00000000004f9dd2 swapcontext_annotation.cc.tmp`Throw() + 82
    frame #22: 0x00000000004f9dee swapcontext_annotation.cc.tmp`ThrowAndCatch() + 14
    frame #23: 0x00000000004f9fe1 swapcontext_annotation.cc.tmp`Child(int) + 129
    frame #24: 0x00007f1dbea6f0c0 libc.so.6
    frame #25: 0x000000000141d0e0 swapcontext_annotation.cc.tmp
    frame #26: 0x00000000004fa7d9 swapcontext_annotation.cc.tmp`main + 601 at swapcontext_annotation.cc:170 [opt]
    frame #27: 0x00007f1dbea4b610 libc.so.6`__libc_start_main + 240
    frame #28: 0x000000000041b699 swapcontext_annotation.cc.tmp`_start + 41

So I guess throwing exceptions from signal handlers is unsafe in the first place, but I couldn't find anything that says it's a bad thing to do.

I wrote that loop to run the tests longer while the signal triggers, and I have found... a deadlock :(

Good we caught it now.
Replace exceptions with setjmp/longjmp. Calling longjmp from a signal handler should work

setjmp/longjmp seems to work, the test is now correctly passing. The only thing is that I loop only 30 times and it already incurs a 1s total overhead (4 compilations at all optimization levels) on my machine, quite far from the 10000 you wished for. 100 iterations gives 5s overhead, ~60s for 1000 iterations.

setjmp/longjmp seems to work, the test is now correctly passing. The only thing is that I loop only 30 times and it already incurs a 1s total overhead (4 compilations at all optimization levels) on my machine, quite far from the 10000 you wished for. 100 iterations gives 5s overhead, ~60s for 1000 iterations.

OK, let's leave 30.
It already started catching bugs :)

blastrock updated this revision to Diff 60593.Jun 13 2016, 1:24 PM
blastrock marked an inline comment as done.

Here's what's new:

  • replaced warnings by asserts when calling start/finish_switch_fiber in wrong order
  • simplified code with only one atomic as suggested by @dvyukov
  • clang-formatted the two test files
  • removed the useless x/s argument from the test, the signal now always triggers
  • replaced ThrowAndCatch by a call to a noreturn function that will longjmp back
  • replaced some exit(0) by exit(1) in the test
  • loop 30 times instead of doing a single run

A few points that need checking:

  • I added some atomic_load()s because of the bool that was changed to atomic, I choose a relaxed memory order there, I think it is enough.
  • I didn't use an atomic<int> but atomic_uint8_t to save some space, it shouldn't be harmful.

What ends up happening seems to be a silent bug (and possible corruption afterwards):

  • Create two fibers: Leader and Worker
  • Leader allocates stuff on the stack (needing an asan_stack_malloc call), creating a FakeFrame
  • Switch to Worker
  • Worker does the same, then throws something, catching it too. (This sets the gc_needed_ flag on the StackFrame, since throwing calls handle_no_return)
  • Worker calls another function that sets up a FakeFrame
    • This triggers a GC on FakeStack and we collect the Leader's FakeFrame!!
  • Switch to Leader
  • We can still use the array we allocated (because we didn't poison the shadow when GCing), but the FakeStack might have reallocated that FakeFrame and we would be using invalid frames.

Here's output from a test program I have using our fibers. I added prints to ASan to help show off the bug. And added comments

Initializing jobs...
Starting jobs...
stack_malloc_6(size=d20)
Allocated FakeFrame 0x0002129b5800                   <- Leader fake frame
Started Leader
Created array
stack_malloc_6(size=d20)
Allocated FakeFrame 0x0002129b6800                   <- Worker fake frame
Started Worker
Created array
stack_malloc_6(size=d20)                             <- Setting up dummy function's fake frame
GCing
Checking FakeFrame idx 0 (flags: 1): 0x0002129b5800
Collecting!                                          <- Just collected Leader's fake frame!!
Checking FakeFrame idx 1 (flags: 1): 0x0002129b6800
Allocated FakeFrame 0x0002129b7800
dummyFunction
Created array
stack_free_6(ptr=0x0002129b7800, size=d20)
Deallocating FakeFrame 0x0002129b7800
stack_free_6(ptr=0x0002129b6800, size=d20)
Deallocating FakeFrame 0x0002129b6800
stack_free_6(ptr=0x0002129b5800, size=d20)
Deallocating FakeFrame 0x0002129b5800                <- Finally deallocating the Leader's fake frame. Proper place to "collect" it.
Exiting.

Here's some pseudo-code for the functions (sorry about the uglyness. Compile with -O0 so ASan actually inserts the asan_stack_{malloc,free}_* calls). Since I don't have easy access to a machine with Linux, I can't give you a full test-case. This should be adaptable, though.
Start by creating contexts for the fibers. Then setup a fiber that runs doLeader, and one that does doWorker. Switch to Leader.
If you want to just try swapcontext, I think we should be able to replace the switchToFiber with swapcontext (assuming we created the appropriate contexts) and it should almost work?

#include <stdio.h>
#include <assert.h>
#include <inttypes.h>

struct { void* nextFiber; } Leader, Worker;

void switchToFiber(void *, uint64_t, uint64_t*);  // Fiber, arg to "return", ptr for "returned" arg
void returnToThread(uint64_t, uint64_t*);

struct Stuff { void *S1, *S2; };
__attribute__((noinline,noreturn)) void doLeader(uint64_t argOnInitialize, uint64_t argOnRun) {
  fprintf(stderr, "Started Leader\n");
  // Sets up a FakeStack
  Stuff aaa[200] = {nullptr, nullptr};
  fprintf(stderr, "Created array\n");
  
  switchToFiber(Leader.nextFiber, 0, nullptr);
  returnToThread((uint64_t)aaa->S1, nullptr);
}

__attribute__((noinline)) int dummyFunction(int *p) {
  fprintf(stderr, "dummyFunction\n");
  Stuff bbb[200] = {nullptr, nullptr};
  fprintf(stderr, "Created array\n");
  return p[1];
}

__attribute__((noinline,noreturn)) void doWorker(uint64_t argOnInitialize, uint64_t argOnRun) {
  fprintf(stderr, "Started Worker\n");
  Stuff ccc[200] = {nullptr, nullptr};
  fprintf(stderr, "Created array\n");
  try {
    // Calls handle_no_return (which sets FakeStack::needs_gc_)
    throw 3;
  } catch (int e) {
  }
  // Will call asan_stack_malloc and GC the fake stack
  dummyFunction((int*)ccc);
  switchToFiber(Worker.nextFiber, 0, nullptr);
}

Here's the diff for lib/asan/asan_fake_stack.cc:

diff --git a/lib/asan/asan_fake_stack.cc b/lib/asan/asan_fake_stack.cc
index 91fdf0a..87f7674 100644
--- a/lib/asan/asan_fake_stack.cc
+++ b/lib/asan/asan_fake_stack.cc
@@ -140,6 +140,7 @@ void FakeStack::HandleNoReturn() {
 // We do it based on their 'real_stack' values -- everything that is lower
 // than the current real_stack is garbage.
 NOINLINE void FakeStack::GC(uptr real_stack) {
+  Printf("GCing\n");
   uptr collected = 0;
   for (uptr class_id = 0; class_id < kNumberOfSizeClasses; class_id++) {
     u8 *flags = GetFlags(stack_size_log(), class_id);
@@ -148,7 +149,9 @@ NOINLINE void FakeStack::GC(uptr real_stack) {
       if (flags[i] == 0) continue;  // not allocated.
       FakeFrame *ff = reinterpret_cast<FakeFrame *>(
           GetFrame(stack_size_log(), class_id, i));
+      Printf("Checking FakeFrame idx %lld (flags: %x): %p\n", i, flags[i], ff);
       if (ff->real_stack < real_stack) {
+        Printf("Collecting!\n");
         flags[i] = 0;
         collected++;
       }
@@ -206,12 +209,14 @@ ALWAYS_INLINE uptr OnMalloc(uptr class_id, uptr size) {
   uptr real_stack = reinterpret_cast<uptr>(&local_stack);
   FakeFrame *ff = fs->Allocate(fs->stack_size_log(), class_id, real_stack);
   if (!ff) return 0;  // Out of fake stack.
+  Printf("Allocated FakeFrame %p\n", ff);
   uptr ptr = reinterpret_cast<uptr>(ff);
   SetShadow(ptr, size, class_id, 0);
   return ptr;
 }
 
 ALWAYS_INLINE void OnFree(uptr ptr, uptr class_id, uptr size) {
+  Printf("Deallocating FakeFrame %p\n", ptr);
   FakeStack::Deallocate(ptr, class_id);
   SetShadow(ptr, size, class_id, kMagic8);
 }
@@ -220,14 +225,16 @@ ALWAYS_INLINE void OnFree(uptr ptr, uptr class_id, uptr size) {
 
 // ---------------------- Interface ---------------- {{{1
 using namespace __asan;
-#define DEFINE_STACK_MALLOC_FREE_WITH_CLASS_ID(class_id)                       \
-  extern "C" SANITIZER_INTERFACE_ATTRIBUTE uptr                                \
-      __asan_stack_malloc_##class_id(uptr size) {                              \
-    return OnMalloc(class_id, size);                                           \
-  }                                                                            \
-  extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __asan_stack_free_##class_id(  \
-      uptr ptr, uptr size) {                                                   \
-    OnFree(ptr, class_id, size);                                               \
+#define DEFINE_STACK_MALLOC_FREE_WITH_CLASS_ID(class_id)                      \
+  extern "C" SANITIZER_INTERFACE_ATTRIBUTE uptr                               \
+      __asan_stack_malloc_##class_id(uptr size) {                             \
+    Printf("stack_malloc_" #class_id "(size=%llx)\n", size);                  \
+    return OnMalloc(class_id, size);                                          \
+  }                                                                           \
+  extern "C" SANITIZER_INTERFACE_ATTRIBUTE void __asan_stack_free_##class_id( \
+      uptr ptr, uptr size) {                                                  \
+    Printf("stack_free_" #class_id "(ptr=%p, size=%llx)\n", ptr, size);       \
+    OnFree(ptr, class_id, size);                                              \
   }
 
 DEFINE_STACK_MALLOC_FREE_WITH_CLASS_ID(0)

Wow, thanks for taking the time to explain all this ^^
I have a few questions about it.

So if I understand correctly, there is no way to automate that test, I'll just test manually on my machine?
And the thing I'm trying to fix is avoiding that the leader fakeframe gets gc-ed, right?

Do we really need two fibers to run that test? Can't we have just one fiber and the thread's stack and switch between them?

I will read the code in more detail, I may come back with more questions :)

Ah, right! Probably just one thread that:
allocates fakeframe
handles noreturn
allocates new fakeframe

will trigger it. I'll try changing the test on my side too.

Thank you,

Filipe

dvyukov accepted this revision.Jun 14 2016, 10:39 AM
dvyukov edited edge metadata.

Looks good to me. Any objections, anybody?

Do you have commit access? If not, I will commit it.

This revision is now accepted and ready to land.Jun 14 2016, 10:39 AM

And thanks for adding this support and bearing with us through the review!

Hm... shouldn't I fix @filcab's issue with FakeStack before?

dvyukov requested changes to this revision.Jun 15 2016, 5:31 AM
dvyukov edited edge metadata.

Hm... shouldn't I fix @filcab's issue with FakeStack before?

Yes, missed that there is something unresolved. Sorry.

This revision now requires changes to proceed.Jun 15 2016, 5:31 AM
blastrock updated this revision to Diff 61205.Jun 19 2016, 10:11 AM
blastrock edited edge metadata.

I think I got how fake stacks work. I implemented your idea @filcab, and with your test code, the FakeFrame is not collected anymore.

Now we must provide a void* to store the current FakeStack before a switch occurs. During a fiber switch, the FakeStack mechanism is disabled (I understand that it's ok to have __asan_stack_malloc() return null), I couldn't find a solution without disabling it. One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.

When leaving a fiber definitely, null must be passed as first argument so that instead of saving the fake stack, it gets destroyed.

I also removed the test I wrote which doesn't seem to test anything.

One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.

That's a consequence of manual annotations. If a program uses swapcontext, and we intercept and annotate it, then it won't happen, right?
I think it is fine. The guideline for manual annotations should then recommend to introduce an additional function that merely finishes the switch and calls real non-inlinable fiber body. Worth noting in __sanitizer_start_switch_fiber comment, and, yes, it needs some comment.

I implemented your idea @filcab, and with your test code

I don't see any new test files. Did you forget to add a file?

One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.

That's a consequence of manual annotations. If a program uses swapcontext, and we intercept and annotate it, then it won't happen, right?

Correct. But it's not that bad anyway I think. If the fake stack is there to find use-after-return errors, when the fiber entry function returns, it means the fiber is dead, so there is no real need to track those anymore. It's like a thread dying, and in both cases (thread or fiber) the fake stack is destroyed and it won't detect use-after-returns as such.

I think it is fine. The guideline for manual annotations should then recommend to introduce an additional function that merely finishes the switch and calls real non-inlinable fiber body. Worth noting in __sanitizer_start_switch_fiber comment, and, yes, it needs some comment.

Yes, I was planning on writing a few lines of documentation when we all agree on the interface.

I don't see any new test files. Did you forget to add a file?

I didn't add it. The test proposed by @filcab is based on printfs inside the asan library to show when fake frames are collected. We couldn't write an automatic test for that.

filcab accepted this revision.Jun 20 2016, 10:08 AM
filcab added a reviewer: filcab.

I think I got how fake stacks work. I implemented your idea @filcab, and with your test code, the FakeFrame is not collected anymore.

Yay!

Now we must provide a void* to store the current FakeStack before a switch occurs. During a fiber switch, the FakeStack mechanism is disabled (I understand that it's ok to have __asan_stack_malloc() return null), I couldn't find a solution without disabling it. One consequence is that the fake frame that should be allocated when entering the fiber is not, only the functions called from there have one.
When leaving a fiber definitely, null must be passed as first argument so that instead of saving the fake stack, it gets destroyed.

As I commented above, I think adding a way to kill a fake stack without being in the fiber (so: from outside the fiber that "has" that fake stack) would be nice. But I'm ok with doing this internally if there's no interest in it from open source.
In your fiber library, is a use case where the fiber returns to the thread and then re-enters possible? And is it possible to have the fiber go back to the thread and the thread decide to either switch back to the fiber or kill the fiber?

I also removed the test I wrote which doesn't seem to test anything.

Fair enough. I don't see a way of looking at this bug without adding those printfs.

I think this patch looks good. Just ping back with the documentation before committing. If I don't reply after one day, assume I have no complaints or will address them later.

Thanks a lot!

include/sanitizer/common_interface_defs.h
145

These still need documentation.
We probably want a way to collect fake stacks that we saved in a fiber (__sanitizer_fake_stack_kill(void *) or something), instead of just the "current" fake stack (when we're leaving the fiber).

Our fiber library doesn't really have a "I'm now leaving the fiber and won't come back" function. Execution flow can do something like thread->fiberA->fiberB->thread->fiberC->fiberA->thread->fiberB->kill(fiberA)->fiberC->...
And users can terminate a fiber at any time, as long as it's not running.
In order to support this kind of use-case, we need to be able to kill the fake stack without being inside the fiber. I can implement this internally if you think this is not very general. But depending on your fiber library, I'd guess this is likely to appear elsewhere too.

As I commented above, I think adding a way to kill a fake stack without being in the fiber (so: from outside the fiber that "has" that fake stack) would be nice. But I'm ok with doing this internally if there's no interest in it from open source.
In your fiber library, is a use case where the fiber returns to the thread and then re-enters possible? And is it possible to have the fiber go back to the thread and the thread decide to either switch back to the fiber or kill the fiber?

No, in my library, when a fiber returns, it is completely destroyed. And we never kill a fiber from outside before it is finished.
This use-case came to my mind but I thought it wasn't a real world use-case, I was wrong ^^

I let you add this function if you need it, it doesn't seem hard to implement anyway :)

blastrock updated this revision to Diff 61344.Jun 21 2016, 12:57 AM
blastrock updated this object.
blastrock edited edge metadata.
blastrock marked an inline comment as done.

I added some documentation in common_interface_defs.h.

filcab accepted this revision.Jun 21 2016, 4:27 AM
filcab edited edge metadata.

LGTM

No, in my library, when a fiber returns, it is completely destroyed. And we never kill a fiber from outside before it is finished.
This use-case came to my mind but I thought it wasn't a real world use-case, I was wrong ^^

I let you add this function if you need it, it doesn't seem hard to implement anyway :)

There's always someone :-) I'll implement it when I migrate over to this and probably upstream anyway.

Thank you!

Filipe

dvyukov accepted this revision.Jun 21 2016, 5:25 AM
dvyukov edited edge metadata.
This revision is now accepted and ready to land.Jun 21 2016, 5:25 AM

Thank you for your time reviewing this! :)

Failed on windows bot:
http://lab.llvm.org:8011/builders/sanitizer-windows/builds/24286/steps/run%20tests/logs/stdio
Looks relevant.

It says something about:

NOTE === If you see a mismatch below, please update asan_win_dll_thunk.cc

I guess I should add INTERFACE_FUNCTION(__sanitizer_*_switch_fiber) to that file. Should I open a new review on phabricator?

I also don't understand why it is needed for these new functions and not for some other functions like __sanitizer_print_memory_profile.

I guess I should add INTERFACE_FUNCTION(__sanitizer_*_switch_fiber) to that file. Should I open a new review on phabricator?

Yes, please.

File that defines __sanitizer_print_memory_profile contains #if CAN_SANITIZE_LEAKS. Probably that's not enabled on windows.

filcab closed this revision.Dec 14 2016, 8:06 AM

This was committed in r273260.