This is an archive of the discontinued LLVM Phabricator instance.

[runtimes][asan] Fix swapcontext interception
ClosedPublic

Authored by itrofimow on Nov 8 2022, 9:43 AM.

Details

Summary

Resetting oucp's stack to zero in swapcontext interception is incorrect,
since it breaks ucp cleanup after swapcontext returns in some cases:

Say we have two contexts, A and B, and we swapcontext from A to B, do
some work on Bs stack and then swapcontext back from B to A. At this
point shadow memory of Bs stack is in arbitrary state, but since we
can't know whether B will ever swapcontext-ed to again we clean up it's
shadow memory, because otherwise it remains poisoned and blows in
completely unrelated places when heap-allocated memory of Bs context
gets reused later (see https://github.com/llvm/llvm-project/issues/58633
for example). swapcontext prototype is swapcontext(ucontext* oucp,
ucontext* ucp), so in this example A is oucp and B is ucp, and i refer
to the process of cleaning up Bs shadow memory as ucp cleanup.

About how it breaks:
Take the same example with A and B: when we swapcontext back from B to A
the oucp parameter of swapcontext is actually B, and current trunk
resets its stack in a way that it becomes "uncleanupable" later. It
works fine if we do A->B->A, but if we do A->B->A->B->A no cleanup is
performed for Bs stack after B "returns" to A second time. That's
exactly what happens in the test i provided, and it's actually a pretty
common real world scenario.

Instead of resetting oucp's we make use of uc_stack.ss_flags to mark
context as "cleanup-able" by storing stack specific hash. It should be
safe since this field is not used in [get|make|swap]context functions
and is hopefully never meaningfully used in real-world scenarios (and i
haven't seen any).

Fixes https://github.com/llvm/llvm-project/issues/58633

Diff Detail

Event Timeline

itrofimow created this revision.Nov 8 2022, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:43 AM
Herald added a subscriber: Enna1. · View Herald Transcript
itrofimow requested review of this revision.Nov 8 2022, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 9:43 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Hi! Could you please have a look at this?

This patch is an attempt to fix bugs introduced by https://reviews.llvm.org/D129219 and https://reviews.llvm.org/D130218,
and the test added is failing with current clang++-15 with -fsanitize-address-use-after-return=never.

Actually about that -fsanitize-address-use-after-return, which default value was changed to
runtime in clang-15 - doesn't that render all that cleanup machinery useless, since fake stack is used instead of the stack specified in ucontext_t?

Hi! Could you please have a look at this?

This patch is an attempt to fix bugs introduced by https://reviews.llvm.org/D129219 and https://reviews.llvm.org/D130218,
and the test added is failing with current clang++-15 with -fsanitize-address-use-after-return=never.

Actually about that -fsanitize-address-use-after-return, which default value was changed to
runtime in clang-15 - doesn't that render all that cleanup machinery useless, since fake stack is used instead of the stack specified in ucontext_t?

Yes, it's less important. UAR moves most variables to the fake, stack, but not all.
Also if UAR allocator overflows it falls back to non-UAR stack.

compiler-rt/lib/asan/asan_interceptors.cpp
259–260

so if we do:
getcontext
update fields, including flags
makecontext
we will not be able to tell a difference?

306

if you change ReadContextStack, please make it return bool: true - need cleanaup, false does not
then hide kAsanContextStackFlagsMagic into ReadContextStack
and you don't need return flags after that

319

I still think that intercepting makecontext is the better solution.

INTERCEPTOR( ... makecontext {
...
ucp->uc_stack.ss_size -= sizeof(u64);
*(ucp->uc_stack.ss_sp + ucp->uc_stack.ss_size) = kMagic.
return REAL(makecontext)
...
}

Then we need only to check: *(ucp->uc_stack.ss_sp + ucp->uc_stack.ss_size) == magic to decide

compiler-rt/lib/asan/asan_linux.cpp
230

It looks "ss_flags == kAsanContextStackFlagsMagic" only if "ss_sp == nullptr"

itrofimow added inline comments.Nov 10 2022, 11:51 AM
compiler-rt/lib/asan/asan_interceptors.cpp
259–260

Yes, but that's the point of using ss_flags for this: nobody uses them for anything meaningful,
this field is either set to 0 or left unchanged after getcontext call.

I mean, some people might use ss_flags, but for them we at least would break in a
realistically debuggable way, not just some random use-after-scope in a completely unrelated places.

319

I see a couple of problems with intercepting makecontext:

first, makecontext takes ellipsis (...) args, and i'm not sure how to pass that to real makecontext without too much hassle

second, saving magic to users stack is unreliable, and the exact approach you provided might break boost.context for instance (https://github.com/boostorg/context/blob/develop/include/boost/context/continuation_ucontext.hpp#L279)

compiler-rt/lib/asan/asan_linux.cpp
230

Not necessary: ss_sp|ss_size could be changed by user after getcontext call

The idea is that there are 3 major cases:

  1. context is initialized with getcontext for later usage with makecontext. Since nobody uses ss_flags (see my other comment for explanation) we can reliably tell that we should cleanup its stack
  2. context is some global variable that is just set to zeroes (that's what happens in tests, for example) - we can't distinguish this case from 1. but can assume that ss_sp is either implicitly zeroed or set by user, who also explicitly set ss_flags to zero
  3. context is uninitialized and used as oucp for swapcontext - then its flags would never be equal to magic, so we won't try to clean it up
itrofimow added inline comments.Nov 10 2022, 11:55 AM
compiler-rt/lib/asan/asan_interceptors.cpp
259–260

You are right, setting ss_flags in makecontext would be better, but idk how to do that

itrofimow updated this revision to Diff 474864.Nov 11 2022, 2:44 PM
  • cr fixes

Hi @vitalybuka!
Did you have a chance to make yourself familiar with my reasoning?

Using flags is to easy to find example when it fails.

With makecontext the only issues is ... handling, I can look into this a little bit later.
Also with makecontext we can handle uc_link, we need to clean the stack before moving to the next.

compiler-rt/lib/asan/asan_interceptors.cpp
319

I see a couple of problems with intercepting makecontext:

first, makecontext takes ellipsis (...) args, and i'm not sure how to pass that to real makecontext without too much hassle

Can be tricky, but we do something like this for printf interceptors

second, saving magic to users stack is unreliable, and the exact approach you provided might break boost.context for instance (https://github.com/boostorg/context/blob/develop/include/boost/context/continuation_ucontext.hpp#L279)

It should not be a problem, makecontext implementation already cut off a piece of buffer for own data.

compiler-rt/lib/asan/asan_linux.cpp
214–215

can you make it more "random", I suspect probability of this constant in unrelated place is above average.

compiler-rt/test/asan/TestCases/Linux/swapcontext_test.cpp
28

is this clang format?
if so please move out reformating into a separate CL, or just revert here

95

I can imagine someone clears poll_context.uc_stack before setting ss_sp and ss_size

itrofimow marked 3 inline comments as done.Nov 14 2022, 4:41 PM

Using flags is to easy to find example when it fails.

Maybe i'm having a brain lag here, but what would that example be?
One has to explicitly set this field to non-zero after getcontext (or makecontext if i manage to intercept it somehow and move the assignment of flags there),
and it seems to me like literally nobody does that - it's either left unchanged or zeroed.

ss_flags is either set to Magic by me or set to 0 by user (and maybe i should add checks for SS_DISABLE/SS_ONSTACK as well),
and as long as it's set to something meaningful we can almost reliably identify this context as cleanup-able,
because having meaningful values implies it was set up by user manually, and the context isn't used just as oucp for swapcontext without prior setup.
What am i missing here?

P.S. I do understand that this is still an heuristic and can fail in some cases, but i would argue that those cases are uncommon to say the least

compiler-rt/lib/asan/asan_interceptors.cpp
319

Can be tricky, but we do something like this for printf interceptors

I might be mistaken, but it seems to me that for printf/scanf functions Asan just proxies to their analogues in libc taking va_list (vprintf/vscanf and the likes of). However there is no such thing for makecontext, sadly.

It should not be a problem, makecontext implementation already cut off a piece of buffer for own data.

You are right, in this exact case it would work, but the idea of manipulating user input in such ways makes me uneasy: people might rely on their layout/sizes, and we can unintentionally bring them all kinds of not-that-easy-to-debug UBs.

compiler-rt/lib/asan/asan_linux.cpp
214–215

I hope it is)

I was thinking about scenarios where manipulating ss_flags breaks someones logic, and they start to debug what's going on and why are there mismatches in this value. For me this screams "i am not random garbage" and should give a hint that this is intentional.

Would be better to have this set in makecontext, that's true.

compiler-rt/test/asan/TestCases/Linux/swapcontext_test.cpp
28

Yes, it's clang-format. Will revert

95

Not sure if i understood this, could you please elaborate?

itrofimow updated this revision to Diff 475823.Nov 16 2022, 7:30 AM
  • more exhaustive check for flags

@vitalybuka gently pinging you on this. Can you see this ever landing or am i going the wrong direction with the idea of using ss_flags?

@vitalybuka gently pinging you on this. Can you see this ever landing or am i going the wrong direction with the idea of using ss_flags?

@dvyukov @eugenis

Sorry I wanted to try makecontext interceptor and it works fine, but I hit other issues I have no idea how to solve.
I think we can't have reasonable context switch support and rather recommend to remove existing "INTERCEPTOR(int, swapcontext" and make poisoning users responsibility.

Or simplify interceptors to:

INTERCEPTOR(int, swapcontext, struct ucontext_t *oucp,
            struct ucontext_t *ucp) {
  static bool reported_warning = false;
  if (!reported_warning) {
    Report("WARNING: ASan doesn't fully support makecontext/swapcontext "
           "functions and may produce false positives in some cases!\n");
    reported_warning = true;
  }
  ___asan_handle_no_return();
  int res = real_swapcontext(oucp, ucp);
  __asan_handle_no_return();
  return res;
}

and make users to use existing sanitizer_start_switch_fiber/sanitizer_finish_switch_fiber to make sure __asan_handle_no_return does the job.

vitalybuka updated this revision to Diff 481450.Dec 8 2022, 2:32 PM

don't use flags

vitalybuka updated this revision to Diff 481456.Dec 8 2022, 3:02 PM

switch back to flags

vitalybuka accepted this revision.Dec 8 2022, 3:03 PM
This revision is now accepted and ready to land.Dec 8 2022, 3:03 PM
MaskRay added a subscriber: MaskRay.Jan 5 2023, 9:18 PM

since it breaks ucp cleanup after swapcontext returns in some cases.

Can you elaborate a bit what is "ucp cleanup" and how it breaks?

compiler-rt/lib/asan/asan_linux.cpp
231

^ has lower precedence than ==. This may not work as intended.

since it breaks ucp cleanup after swapcontext returns in some cases.

Can you elaborate a bit what is "ucp cleanup" and how it breaks?

Sure.
Say we have two contexts, A and B, and we swapcontext from A to B, do some work on Bs stack and then swapcontext back from B to A.
At this point shadow memory of Bs stack is in arbitrary state, but since we can't know whether B will ever swapcontext-ed to again we clean up it's shadow memory,
because otherwise it remains poisoned and blows in completely unrelated places when heap-allocated memory of Bs context gets reused later (see https://github.com/llvm/llvm-project/issues/58633 for example).
swapcontext prototype is swapcontext(ucontext* oucp, ucontext* ucp), so in this example A is oucp and B is ucp, and i refer to the process of cleaning up Bs shadow memory as ucp cleanup.

About how it breaks:
Take the same example with A and B: when we swapcontext back from B to A the oucp parameter of swapcontext is actually B, and current trunk resets its stack in a way that it becomes "uncleanupable" later.
It works fine if we do A->B->A, but if we do A->B->A->B->A no cleanup is performed for Bs stack after after B "returns" to A second time.
That's exactly what happens in the test i provided, and it's actually a pretty common real world scenario.

Hope this clarifies things at least a bit.

since it breaks ucp cleanup after swapcontext returns in some cases.

Can you elaborate a bit what is "ucp cleanup" and how it breaks?

Sure.
Say we have two contexts, A and B, and we swapcontext from A to B, do some work on Bs stack and then swapcontext back from B to A.
At this point shadow memory of Bs stack is in arbitrary state, but since we can't know whether B will ever swapcontext-ed to again we clean up it's shadow memory,
because otherwise it remains poisoned and blows in completely unrelated places when heap-allocated memory of Bs context gets reused later (see https://github.com/llvm/llvm-project/issues/58633 for example).
swapcontext prototype is swapcontext(ucontext* oucp, ucontext* ucp), so in this example A is oucp and B is ucp, and i refer to the process of cleaning up Bs shadow memory as ucp cleanup.

About how it breaks:
Take the same example with A and B: when we swapcontext back from B to A the oucp parameter of swapcontext is actually B, and current trunk resets its stack in a way that it becomes "uncleanupable" later.
It works fine if we do A->B->A, but if we do A->B->A->B->A no cleanup is performed for Bs stack after after B "returns" to A second time.
That's exactly what happens in the test i provided, and it's actually a pretty common real world scenario.

Hope this clarifies things at least a bit.

Thanks! You may fold the description into the commit message/Differential summary and perhaps add a brief comment (including A->B->A->B->A) to the test function Poll.

itrofimow edited the summary of this revision. (Show Details)Jan 7 2023, 7:06 PM

since it breaks ucp cleanup after swapcontext returns in some cases.

Can you elaborate a bit what is "ucp cleanup" and how it breaks?

Sure.
Say we have two contexts, A and B, and we swapcontext from A to B, do some work on Bs stack and then swapcontext back from B to A.
At this point shadow memory of Bs stack is in arbitrary state, but since we can't know whether B will ever swapcontext-ed to again we clean up it's shadow memory,
because otherwise it remains poisoned and blows in completely unrelated places when heap-allocated memory of Bs context gets reused later (see https://github.com/llvm/llvm-project/issues/58633 for example).
swapcontext prototype is swapcontext(ucontext* oucp, ucontext* ucp), so in this example A is oucp and B is ucp, and i refer to the process of cleaning up Bs shadow memory as ucp cleanup.

About how it breaks:
Take the same example with A and B: when we swapcontext back from B to A the oucp parameter of swapcontext is actually B, and current trunk resets its stack in a way that it becomes "uncleanupable" later.
It works fine if we do A->B->A, but if we do A->B->A->B->A no cleanup is performed for Bs stack after after B "returns" to A second time.
That's exactly what happens in the test i provided, and it's actually a pretty common real world scenario.

Hope this clarifies things at least a bit.

Thanks! You may fold the description into the commit message/Differential summary and perhaps add a brief comment (including A->B->A->B->A) to the test function Poll.

Updated the summary with the more detailed explanation of the problem and a brief of proposed changes.

Actually, what's the state of this patch? It feels to me that there are 3 competing possibilities of what to do with the swapcontext issue:

  1. Land this, see how it goes and what/where it breaks, if any
  2. Completely abandon all cleanup machinery and make it users responsibility, as @vitalybuka suggested
  3. Revert https://reviews.llvm.org/D129219 together with https://reviews.llvm.org/D130218

I could try to ping folks from github issue to try this patch out before merging, to check if it's actually working

Hi!
Is there anything i can do to help resolving the issue (https://github.com/llvm/llvm-project/issues/58633), be that this patch or reverting some other patches?
Right now i'm a bit confused as what to do with this and how to advance any further.

@MaskRay @vitalybuka

Hi!
Is there anything i can do to help resolving the issue (https://github.com/llvm/llvm-project/issues/58633), be that this patch or reverting some other patches?
Right now i'm a bit confused as what to do with this and how to advance any further.

@MaskRay @vitalybuka

Sorry, the problem totally slipped my mind.
I will review it in the first half of the weak, can't do today.
Feel free to ping me any other day, If I stop replying.

compiler-rt/test/asan/TestCases/Linux/swapcontext_test.cpp
104

I suspect with higher O levels it will be converted into stack buffer

vitalybuka added a comment.EditedFeb 2 2023, 3:19 PM

Actually, what's the state of this patch? It feels to me that there are 3 competing possibilities of what to do with the swapcontext issue:

  1. Land this, see how it goes and what/where it breaks, if any
  2. Completely abandon all cleanup machinery and make it users responsibility, as @vitalybuka suggested
  3. Revert https://reviews.llvm.org/D129219 together with https://reviews.llvm.org/D130218

So I would prefer no.1, likely some cases would need some user side workaround like no.2
Do you see issues with the current state of the patch? If not, I can land it.
Would you prefer if I submit the patch into your name, as it was started by you? If you don't want to accept the blame for this version, I can land with my name.

Actually, what's the state of this patch? It feels to me that there are 3 competing possibilities of what to do with the swapcontext issue:

  1. Land this, see how it goes and what/where it breaks, if any
  2. Completely abandon all cleanup machinery and make it users responsibility, as @vitalybuka suggested
  3. Revert https://reviews.llvm.org/D129219 together with https://reviews.llvm.org/D130218

So I would prefer no.1, likely some cases would need some user side workaround like no.2
Do you see issues with the current state of the patch? If not, I can land it.
Would you prefer if I submit the patch into your name, as it was started by you? If you don't want to accept the blame for this version, I can land with my name.

Left some inline comments, please have a look.

I'm fine with taking the blame for this. In github it should be shown as co-authored by us anyway, so people would be able to reach out if something goes wrong.

compiler-rt/lib/asan/asan_interceptors.cpp
262

My (limited) knowledge of C tells me, that there is no portable way to forward an invocation of variadic function
(https://c-faq.com/varargs/handoff.html).
Are you sure that this indeed does what it's supposed to do?

compiler-rt/lib/asan/asan_linux.cpp
231

Nice catch

vitalybuka requested changes to this revision.Feb 7 2023, 12:59 PM

I will look into makecontext issue

compiler-rt/lib/asan/asan_interceptors.cpp
262

somehow it works on test, but replacing ... with va_list does not look right

This revision now requires changes to proceed.Feb 7 2023, 12:59 PM

Hi @vitalybuka! Did you have a chance to look into makecontext issue?

compiler-rt/lib/asan/asan_interceptors.cpp
262

I'm almost sure that it breaks as soon as there are more than 6 (on x86 that is) parameters, or more than sizeof(va_list) / sizeof(int64).
Might be worth to add a test for that

Hi @vitalybuka ! Pinging you about this

fix makecontext interceptor

up to 64 args

This works for me

Hi @vitalybuka! Thank you for getting back to this.

I really like your idea of marking flags in makecontext instead of getcontext, and i'm glad you've found a solution to ellipsis args forwarding - neat!
I can confirm that this fixes all the standalone repros i had, and i'm working on testing some larger projects with this patch applied to clang.

There seems to be one last issue with operator precedence in ReadContextStack left, and i guess we are done after that?

fix ^, == precedence

I can confirm that this patch (with parentheses added around comparison in ReadContextStack)
applied atop of clang-15.0.7 fixes https://github.com/userver-framework/userver/issues/170

Use existing hash function

vitalybuka added inline comments.Apr 13 2023, 11:28 AM
compiler-rt/lib/asan/asan_linux.cpp
231

Done.

vitalybuka accepted this revision.Apr 13 2023, 11:37 AM
vitalybuka edited the summary of this revision. (Show Details)

Thanks for verifying. I am going to merge this.

This revision is now accepted and ready to land.Apr 13 2023, 11:38 AM
vitalybuka edited the summary of this revision. (Show Details)Apr 13 2023, 11:45 AM
This revision was landed with ongoing or failed builds.Apr 13 2023, 11:47 AM
This revision was automatically updated to reflect the committed changes.