This is an archive of the discontinued LLVM Phabricator instance.

[GWP-ASan] Add recoverable mode.
ClosedPublic

Authored by hctim on Dec 15 2022, 3:22 PM.

Details

Summary

The GWP-ASan recoverable mode allows a process to continue to function
after a GWP-ASan error is detected. The error will continue to be
dumped, but GWP-ASan now has APIs that a signal handler (like the
example optional crash handler) can call in order to allow the
continuation of a process.

When an error occurs with an allocation, the slot used for that
allocation will be permanently disabled. This means that free() of that
pointer is a no-op, and use-after-frees will succeed (writing and
reading the data present in the page).

For heap-buffer-overflow/underflow, the guard page is marked as accessible
and buffer-overflows will succeed (writing and reading the data present
in the now-accessible guard page). This does impact adjacent
allocations, buffer-underflow and buffer-overflows from adjacent
allocations will no longer touch an inaccessible guard page. This could
be improved in future by having two guard pages between each adjacent
allocation, but that's out of scope of this patch.

Each allocation only ever has a single error report generated. It's
whatever came first between invalid-free, double-free, use-after-free or
heap-buffer-overflow, but only one.

Diff Detail

Event Timeline

hctim created this revision.Dec 15 2022, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 3:22 PM
hctim requested review of this revision.Dec 15 2022, 3:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 3:22 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

many nits

compiler-rt/lib/gwp_asan/crash_handler.cpp
42

Can we calculate this in some function and share everywhere? E.g. State->getInternalFaultAddress() or something

compiler-rt/lib/gwp_asan/crash_handler.h
49–58

add some paragraphs maybe, this is quite the wall-of-text

compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp
285

nit

288–289

What is wrong with that though? Please add to this comment.

It's not like crash handling is performance-critical (I hear reentrant locks are _bad_ for that).

295

not in this CL, and random driveby: why the inconsistency between Failure and Error?

305

be more explicit that this code is only reached in recoverable mode, in non-recoverable we would have crashed already. can we make that an assertion? if not please add a very explicit comment to that effect

382

for consistency with above: assert(pointerIsMine(Ptr) && "Pointer is not mine!");

400

move this just in front if the if condition, makes it easier to read. maybe turn into tenary op, up to you

402–404

be consistent about single line statements and braces

424

Can this happen multiple times, or can we break out of this?

compiler-rt/lib/gwp_asan/optional/segv_handler_posix.cpp
186–187

random driveby, not for this CL: AFAICT his is already in namespace {}, so this seems unnecessary

186–190

Why didn't we make this assertion before? What changed?

compiler-rt/lib/gwp_asan/tests/backtrace.cpp
77

EXPECT_TRUE

compiler-rt/lib/gwp_asan/tests/harness.h
100

.clear()?

compiler-rt/lib/gwp_asan/tests/recoverable.cpp
62

ASSERT_TRUE (and everywhere else)?

107

FAIL?

126

ASSERT_SUBSTR potentially (also in other places)?

164

This is a very odd parameter. All threads that are > 4 just trap, if we have < 4 we just don't do some of the jobs.

170

This seems wrong. In that case we never do some jobs above?

fmayer added inline comments.Dec 15 2022, 4:50 PM
compiler-rt/lib/gwp_asan/tests/harness.h
121

__attribute__ ((format (printf, 1, 2)))?

It seems like this could become much simpler if we disable gwp-asan on the first report instead of killing slots one-by-one.

It seems like this could become much simpler if we disable gwp-asan on the first report instead of killing slots one-by-one.

That sounds quite expensive (and also complicated), as I think that would involve unpoisoning many pages etc.

It's a single mprotect across the gwp-asan region. Also no need to bother with recursive guards as no new allocations can reach gwp-asan.

hctim added a comment.Dec 16 2022, 1:08 PM

It seems like this could become much simpler if we disable gwp-asan on the first report instead of killing slots one-by-one.

It's a single mprotect across the gwp-asan region. Also no need to bother with recursive guards as no new allocations can reach gwp-asan.

I think the change might be less LOC, but the semantics might be just as difficult to reason about. If we mprotect the entire region, we still need to do the trick with the special-segv-address to handle an invalid free + use after free happening at the same time, still need to lock/unlock mutexes/recursive guard because we could get a use-after-free while another thread is doing malloc/free. The only thing that would change is the per-allocation "HasCrashed" becomes a whole-gwp-asan "HasCrashed", and we don't need to do the clear-slot-used-by-UaF (in handleRecoverablePostCrashReport).

hctim marked 19 inline comments as done.Dec 16 2022, 1:37 PM
hctim added inline comments.
compiler-rt/lib/gwp_asan/crash_handler.cpp
42

done

compiler-rt/lib/gwp_asan/crash_handler.h
49–58

done. also removed the part about the signal handler's responsibility, as that's opaque. the signal handler is just responsible for calling the PreCrash and PostCrash functions.

compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp
288–289

added to the text, but basically "let's not force users of gwp-asan..." (where each OS has to provide its own mutex wrapper) "... to implement things that really should be unncessary"

295

sorry, not sure I understand the inconsistency?

305

unfortunately the Recoverable option isn't captured in the GPA class, but I've added a comment and a couple of litmus-test assertions

424

thanks, can break

compiler-rt/lib/gwp_asan/optional/segv_handler_posix.cpp
186–187

roger, can do a pass later on

186–190

previous code was misleading. shouldn't be possible to call this function without GPAForSignalHandler set (as it's installed via. installSignalHandlers, which also has an assert that GPAForSignalHandler is non-null). This is kind of an unnecessary duplicate, so I've removed it.

compiler-rt/lib/gwp_asan/tests/recoverable.cpp
126

unfortunately the gtest that's in the llvm tree doesn't have ASSERT_SUBSTR.

164

sure, remove the param and made it just skip if we don't have enough.

hctim updated this revision to Diff 483655.Dec 16 2022, 1:37 PM
hctim marked 9 inline comments as done.

Update from Florian's comments.

fmayer added inline comments.Dec 16 2022, 1:43 PM
compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp
295

I meant: Why aren't we consistent in naming this either Failure or Error?

eugenis added inline comments.Dec 16 2022, 1:59 PM
compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp
386

why can't disable() call always happen in the crash handler? why special case for internal crashes?

395

Could we pretend like recoverable is the default? Ex. we can always do the nested logic for recursive guards, and the difference in non-recoverable is that we simply never restore it.

hctim updated this revision to Diff 483673.Dec 16 2022, 2:33 PM
hctim marked 2 inline comments as done.

Merged paths for pre-crash callbacks for non-recoverable and recoverable mode.

compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp
386

the mutexes locked by disable() protect State.FailureType and State.FailureAddress (from multiple double-frees/invalid-frees at the same time)

395

sure, i've made those changes.

there was a slight advantage in the non-recoverable mode in that we just tryLock()'d the mutex and then continued ahead, and now we wait to actually lock() the mutexes. I don't think it's a major disadvantage though, if there's a concurrent malloc()/free(), we can just wait. Makes it less likely for us to end up with a garbage report anyway.

fmayer added inline comments.Dec 16 2022, 3:09 PM
compiler-rt/lib/gwp_asan/guarded_pool_allocator.cpp
401

This is nonstandard, and I think, after some Googling, that this is not supported in MSVC.

fmayer accepted this revision.Dec 19 2022, 10:41 AM
fmayer added inline comments.
compiler-rt/lib/gwp_asan/common.cpp
109

out of interest, is this an arbitrary value within the page? maybe add a comment

This revision is now accepted and ready to land.Dec 19 2022, 10:41 AM

Seems like your newly added test is broken in the pre-merge checks. PTAL

Do you have a stress test to exhaust all gwp-asan slots?

hctim updated this revision to Diff 486067.Jan 3 2023, 1:46 PM

Add an all-slots test.

hctim added a comment.Jan 3 2023, 1:46 PM

Do you have a stress test to exhaust all gwp-asan slots?

Added, thanks.

Seems like your newly added test is broken in the pre-merge checks. PTAL

Hmm, not sure what's going wrong there, running with --gtest_shuffle --gtest_repeat=100 works fine on my end. Let's see if it sticks around after the re-upload/rebase.

eugenis accepted this revision.Jan 10 2023, 1:20 PM

LGTM

hctim updated this revision to Diff 487976.Jan 10 2023, 1:48 PM

Add some extra NFC logging to attempt to figure out what's wrong with the presubmit bot.

hctim updated this revision to Diff 488294.Jan 11 2023, 11:09 AM

clang-format

This revision was landed with ongoing or failed builds.Jan 11 2023, 1:11 PM
This revision was automatically updated to reflect the committed changes.
hctim reopened this revision.Jan 12 2023, 4:00 PM
This revision is now accepted and ready to land.Jan 12 2023, 4:00 PM
hctim updated this revision to Diff 489178.Jan 13 2023, 6:10 PM

Update with some fixes discovered from internal testing:

  1. Moves some duplicated helper functions to a common definition.
  2. Fixes a bug only found in debug mode.
This revision was landed with ongoing or failed builds.Jan 17 2023, 10:21 AM
This revision was automatically updated to reflect the committed changes.
yingcong-wu added inline comments.
compiler-rt/lib/gwp_asan/optional/segv_handler_posix.cpp
127

AllocMeta is dereferenced here and then do the null check in LINE 152, I think this is a problem.

hctim marked an inline comment as done.Feb 28 2023, 8:16 AM
hctim added inline comments.
compiler-rt/lib/gwp_asan/optional/segv_handler_posix.cpp
127