This is an archive of the discontinued LLVM Phabricator instance.

[MSAN] Add fiber switching APIs
ClosedPublic

Authored by justincady on Aug 24 2020, 9:56 AM.

Details

Summary

Add functions exposed via the MSAN interface to enable MSAN within
binaries that perform manual stack switching (e.g. through using fibers
or coroutines).

This functionality is analogous to the fiber APIs available for ASAN and TSAN.

Fixes google/sanitizers#1232

Diff Detail

Event Timeline

justincady created this revision.Aug 24 2020, 9:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2020, 9:56 AM
Herald added subscribers: Restricted Project, lxfind, jfb, modocache. · View Herald Transcript
justincady requested review of this revision.Aug 24 2020, 9:56 AM
vitalybuka added inline comments.Aug 24 2020, 7:03 PM
compiler-rt/lib/msan/msan_thread.cpp
95

Please drop {} for single line if/for for consistency

110

if we don't expect multiple threads access this field, then we don't need atomic
if we do expect that, this atomic does not solve data race.

From what I see stack_top() stack_bottom() only used from BufferedStackTrace::UnwindImpl for current thread.

So please remove remove stack_switching_ and simplify GetStackBounds() (next_stack_top_ == next_stack_bottom_ should be enought? )

compiler-rt/lib/msan/msan_thread.h
60–66
justincady marked 2 inline comments as done.Aug 25 2020, 8:22 AM
justincady added inline comments.
compiler-rt/lib/msan/msan_thread.cpp
110

Yes, you are correct. We do not expect multiple threads here currently. But I don't think this is attempting to prevent a data race (I used this pattern after seeing it in the ASAN implementation). My intent, and I believe the original intent in ASAN, is to prevent incorrect program logic and make sure error reports are always correct.

Regarding incorrect program logic, consider this sequence from a single OS thread:

  1. __msan_start_switch_fiber() is invoked
  2. __msan_start_switch_fiber() is invoked again

(Or similarly any sequence of events in which start and finish are not paired properly.)

In those cases, the atomic is the breadcrumb that lets MSAN terminate the program immediately and inform the user there was a bad sequence of calls (at the top of StartSwitchFiber and FinishSwitchFiber).

Regarding correct error reports, consider this sequence from a single OS thread:

  1. __msan_start_switch_fiber() is invoked
  2. Something bad happens before __msan_finish_switch_fiber() is invoked
  3. An error report with stack trace is generated

Note that Step 2 is basically never expected to happen. But, if it does, Steps 2 and 3 could occur either before of after the actual fiber switch. The atomic is again used as a breadcrumb to make sure that the error report shows the correct stack, be it pre- or post- fiber switch (in GetStackBounds). If stack_switching_ is 1, we need to create a local stack variable and check if it is in the "old" stack or "new" stack, returning the appropriate stack addresses based on that check.

I think the confusing part is the comment in GetStackBounds that I also brought over from asan_thread.cpp, which incorrectly indicates a data race is being protected against.

But your comment also points out that even if all of the above is true, we don't necessarily need the breadcrumb to be an atomic. It could just be a normal bool, since we don't expect access by multiple threads.

Of course, if I am misunderstanding something please let me know. How would you like to proceed?

vitalybuka added inline comments.Aug 25 2020, 6:00 PM
compiler-rt/lib/msan/msan_thread.cpp
45–47

This should go into ::Create()
However Create uses MmapOrDie and implicitly reset everything to 0.

85–87

> is not possible
== only for 0,0

90
110

Lsan in Asan can inspect stacks from other threads, so I see why there is atomic.

110–113

maybe

120–123

maybe

124–129

same for the rest

compiler-rt/lib/msan/msan_thread.h
62
compiler-rt/test/msan/Linux/swapcontext_annotation_reset.cpp
42–45

I don't understand why the second one is poisoned

justincady marked 7 inline comments as done.Aug 27 2020, 8:28 AM
justincady added inline comments.
compiler-rt/lib/msan/msan_thread.cpp
110

Ah, I see. Thanks for clarifying!

compiler-rt/test/msan/Linux/swapcontext_annotation_reset.cpp
42–45

Because main is unsanitized, calls from main do not update the TLS. But foo() is sanitized, so when foo() calls bar() it does update the TLS.

The TLS is left in that state when we make the second foo() call from main. Line 44 exists to validate that understanding before the actual test below, which should result in the TLS being cleared.

I got the half-sanitized trick from here: compiler-rt/test/msan/unpoison_param.cpp

Address outstanding comments. I have re-run all of the tests with the new changes.

vitalybuka accepted this revision.Aug 27 2020, 2:54 PM
vitalybuka added inline comments.
compiler-rt/test/msan/Linux/swapcontext_annotation.cpp
4

I guess this requirement is copy/pasted, if so just remove it
msan has a different set of supported platforms, so it's likely not needed here

same in another test

This revision is now accepted and ready to land.Aug 27 2020, 2:54 PM

Remove required test architectures from unit tests.

@vitalybuka Thanks for the review. I updated the diff with your recommended changes for the unit tests.

Could you please land this for me? If I should contact someone else for that or we should wait for another review first, please let me know. (I'm assuming arc land isn't going to work, since I don't have commit access.)

Thanks again!

This revision was landed with ongoing or failed builds.Aug 27 2020, 7:30 PM
Closed by commit rG1d3ef5f122fe: [MSAN] Add fiber switching APIs (authored by justincady, committed by vitalybuka). · Explain Why
This revision was automatically updated to reflect the committed changes.