This is an archive of the discontinued LLVM Phabricator instance.

scudo: Add support for diagnosing memory errors when memory tagging is enabled.
ClosedPublic

Authored by pcc on Apr 1 2020, 8:23 PM.

Details

Summary

Introduce a function __scudo_get_error_info() that may be called to interpret
a crash resulting from a memory error, potentially in another process,
given information extracted from the crashing process. The crash may be
interpreted as a use-after-free, buffer overflow or buffer underflow.

Also introduce a feature to optionally record a stack trace for each
allocation and deallocation. If this feature is enabled, a stack trace for
the allocation and (if applicable) the deallocation will also be available
via __scudo_get_error_info().

Diff Detail

Event Timeline

pcc created this revision.Apr 1 2020, 8:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2020, 8:23 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript
hctim added a comment.Apr 2 2020, 9:50 AM

Initial thoughts - I'll take a look at the stack depot / crash handler stuff but wanted to get a first round back to you ASAP.

compiler-rt/lib/scudo/standalone/combined.h
39

QQ - do we want to use per-plaftorm implementations, or do we want to cargo-cult the FP unwinder here? We'd then be able to provide stack traces (even without MTE) to all platforms. My gut says per-platform implementations should be the special case, not the generic. I don't feel strongly about this, but think it's worth raising the question.

236

Why not constexpr?

241

If we discard two frames, we should probably collect two more in android_unsafe_frame_pointer_chase , right? The 64 here is more for ensuring the size of the depot doesn't grow too large, rather than how much stack space we alloca...

386

Zeroing the contents seems too important to be default off now that we store the previous tag (and thus the previous tag is accessible through an uninitialized-heap read). IMHO this should be done by default for MTE.

725

TrackAllocationStacks is also used to track deallocation stacks. Behaviour seems reasonable, but consider a more descriptive name?

1024

Why UNLIKELY here? (and storeDeallocationStackMaybe)

compiler-rt/lib/scudo/standalone/memtag.h
207

Mask TBI bits here? We don't need them, and we should allow users to have them.

hctim added inline comments.Apr 2 2020, 1:51 PM
compiler-rt/lib/scudo/standalone/include/scudo/interface.h
30

Sorry, 'remaining elements' here meaning any scudo_error_report struct that's unused?

37

Why ptr then? Why not fault_address?

45

"in the crashing process, generally ahead-of-time (at libc init, for example)."

55

Can you make a recommendation on how much memory we need? Should that be defined by the platform, or by Scudo (via. a callback)? I assume we need at least 3 allocations (one to the left and one to the right) to make a reasonable guess, but how many do you see us having? A constant 16? Is it some function on the sizeof the allocation?

Our top primary size class on Android is 64KiB + 16B. Assuming a static 16 allocations (8 on either side) for the crash handler to do its thing, that's a megabyte per crash. Not too bad.

Once we support secondary tagged allocations though (assuming we intend on doing this), we'd have to rethink this, no? Would it be worth having a callback of __scudo_get_surrounding_metadata() that, for some fault address, returns the metadata (header + any deallocation data left over) for some number of surrounding allocations alone?

59

Maybe add a comment where this is normally obtained from?

61

"the starting address"

65

Meta note - I'd also add an explicit disclaimer here that this is not intended to work across API level boundaries (version_of_scudo_allocator === version_of_scudo_crash_handler).

70

We now have two ways that scudo reports errors. For UaF/overflow, as per this, you get the report from __scudo_get_error_info. For internally detected errors (invalid free, double free), you still get an abort message. We should probably unify the behaviour here, because:

  1. The metadata we collect about tags/threads/traces is useful in debugging double frees, etc.
  2. Having different exit conditions is harder to keep track of. I can easily see a double-free that ends up doing some load/store with incorrect tags before it hits the internal trap, and subsequently we report it incorrectly (i.e. a double free now looks like an UNKNOWN or BUFFER_OVERFLOW, etc.).
compiler-rt/lib/scudo/standalone/stack_depot.h
43

I know that this is zero-initialized by the linker in its current inclusion in the allocator class, but it's a bit roundabout to detemine this. Can you explicitly initialize these varaiables?

45

I worry that these values aren't configurable. We might want to have a larger stack depot on some devices, if:

  1. Scudo ends up using a stack depot for a MTE-similar feature on another architecture.
  2. Someone ends up using stack trace collection for non-MTE scenarios (I mean, it's a cheap way to do heap profiling).
  3. We have different devices that have more/less sensitivity to memory overhead.

Should we make this templated?

51
#ifdef __LP64__
  static const uptr kRingSize = kRingBits * 8;
#else
  static const uptr kRingSize = kRingBits * 4;
#endif

... as we can ask for trace collection on 32-bit?

65

Can we please add some more documentation to this class? I get that this is saving some unique-ID to prevent us from saving the same allocation trace twice, but it would be nice if that was immediately obvious.

86

Same thing here. What makes up this cookie should be more obvious IMHO. I get the top 31 bits are roughly a UID for the stack trace, the middle 32 bits are hash (and thus these two combined provide some hash-collision resistance), and the bottom bit is to uniquely identifiy cookies as the FP-address shouldn't have the LSB set (except for in thumb...), but putting this somewhere and typedefing this u64 seems like a good idea.

hctim added inline comments.Apr 2 2020, 2:01 PM
compiler-rt/lib/scudo/standalone/combined.h
740

This function makes me a little nervous, as it makes the assumption that the pointers we're provided are valid structs.

Especially in async mode, we can't make that assumption. Given that an attacker may have arbitrary write for some amount of time between fault and trap time, they have full access to whatever structs the crash handler is about to blindly use.

We should validate all these input structs for validity before operating on them. This function should be highly tested, and highly fuzzed. Thankfully, the crash handler on Android is always spawned with less privileges than the crashed process, but RCE in the crash handler is still no fun.

Some initial comments

compiler-rt/lib/scudo/standalone/combined.h
35

I am not sure how __ANDROID_API__ works, should it be >=? (same for all the others)

237

Style wise, the whole code base avoids the kConstant naming.

389

Should the 16 be a constant?

752

Not sure if the lambda should abide by the usual style, eg getGranule, up to you.

compiler-rt/lib/scudo/standalone/stack_depot.h
13

Style on that file went the g3 way rather than the LLVM way.

srhines added a subscriber: srhines.Apr 3 2020, 9:19 AM
srhines added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
35

No, since this isn't part of the NDK, this should be using the == 10000 check, which is a special value that indicates that you are building for the platform (i.e. not the NDK).

eugenis added inline comments.Apr 3 2020, 2:30 PM
compiler-rt/lib/scudo/standalone/combined.h
724

There is no synchronization. Is this meant to be used in single thread or stop-the-world context only?
If we turn Options into a relaxed atomic, this could be made thread-safe, with the understanding that the other threads will change their behavior "eventually".

Either way this should be mentioned in the declaration comment.

756

static_cast<uint8_t>
or is it reinterpret_cast?

786

give a name to this constant somewhere

800

TBH, I'm not sure about relying on delayed block reuse for use-after-free reports. Not aggressively reusing the most recently freed block will hurt both RAM and CPU (because of worse heap locality). It feels that for any given target probability (however defined) of deallocation info being available in the event of a use-after-free, persisting smaller records (like in hwasan history buffer) would be a better trade-off.

Of course, the exception from this would be use-immediately-after-free.

1039

Why not store the entire deallocation stack trace here if it fits, and reduce the stack depot load by 2x?

compiler-rt/lib/scudo/standalone/stack_depot.h
45

Not before there are actual users of this.

59

could shift by >>2 to get more meaningful bits into the hash

94

All accesses to ring and tab are, technically, data races, and must be done with atomic calls.

pcc updated this revision to Diff 254988.Apr 3 2020, 7:42 PM
pcc marked 40 inline comments as done.
  • Address review comments
pcc added inline comments.Apr 3 2020, 7:42 PM
compiler-rt/lib/scudo/standalone/combined.h
39

This function needs to know the stack bounds, which is not available in general. Unless we change that, it needs to be implemented in libc.

236

Oops, used the wrong style here. Fixed.

237

Ditto.

386

I'm fine either way, so I changed this to if (Size).

389

Yes, we can use the granule size since we probably can't realistically fit more than a granule's worth of data here. I made this clear in the MemTag*Index comment as well.

724

It's single threaded for now at least, and I've added a comment in wrappers_c.inc. It seems probable that we will only need this in a stop-the-world context, so I don't think we should convert it to use atomics for now.

725

I read "allocation stacks" as "stacks relating to an allocation" which covers deallocation stacks. Something like "allocation info" is slightly more accurate but also vague. I'll think about the name.

740

I think it makes sense to set up a fuzzer for this. As for testing, I feel that integration tests (like the ones that I'm adding to system/core in AOSP) are the appropriate level of testing here, because of the number of moving pieces that need to be set up for this function to work.

752

I think the style for these normally matches the style for local variables, because that's what they are in the end.

800

That certainly seems like a concern. I thought that the quarantine might save us but it looks like it's disabled by default (due to
https://cs.android.com/android/platform/superproject/+/master:external/scudo/standalone/flags.inc;l=13
https://cs.android.com/android/platform/superproject/+/master:external/scudo/standalone/flags.inc;l=18
both defaulting to 0). We would at least diagnose immediate UAF with this which is better than nothing.

I'm inclined to start with this since it's the simplest approach and we can experiment with other approaches in followups.

1024

Because in the usual case we aren't collecting stack traces.

1039

That's an interesting idea. Since it's not clear at this point whether we're going to go with inline deallocation info this seems like followup territory.

compiler-rt/lib/scudo/standalone/include/scudo/interface.h
30

Yes.

37

Yes, that would be clearer. Done (fault_addr to be consistent).

45

It can be done at any time, even in the signal handler (these functions are just returning pointers into globals, which is safe to do anywhere). It's not clear that any particular time is better (we do it early on Android because of circumstances that are specific to bionic, and arguably later is better in order to reduce the window of memory corruption) so I wouldn't make a specific recommendation here.

55

I've changed this to recommend a constant 16 * the largest primary allocation size. It's true that it may be better to introduce an API here to allow scudo to specify the size of the memory region, but let's save that for a followup.

59

Hmm, there isn't really one at the moment. On Android I'm using an unsubmitted kernel patch to read the tags via ptrace: https://github.com/pcc/linux/commit/969d2d71099372ef68c05f29545b88bd90b2a528

Catalin will be proposing an official ptrace API for this, and at that point I will update the comment.

65

I added it near the top.

70

This may be a good idea, but let's save it for a follow up because there would likely need to be an orthogonal mechanism for communicating the crashes to the crash handler process even if the end result looks the same to the user.

easily see a double-free that ends up doing some load/store with incorrect tags

An invalid free maybe, but not a double free I don't think (except, possibly, for secondary allocations, but that would be a different fault). A double free passes the same pointer twice, and at both times the metadata, which is untagged, will be in the same location and accessed via a pointer which will be untagged before accessing it. Accesses to user memory, e.g. to zero the memory or store deallocation metadata, are unchecked.

compiler-rt/lib/scudo/standalone/memtag.h
207

I'm not sure that that's the only change that we would need to make in order to support passing TBI bits into the allocator, so I'd prefer not to unless we have some kind of test for that capability.

compiler-rt/lib/scudo/standalone/stack_depot.h
43

I think the style in scudo is to let fields be zero-initialized and only explicitly initialize (e.g. via initLinkerInitialized) if necessary.

45

Agreed.

51

You mean static const uptr kRingSize = (1 << kRingBits) * 8; etc.? This is the number of elements, not the size in bytes. If we wanted to shrink this on 32-bit there would be more needed than this. We would probably start by using uptr for the type of ring, but that's not enough because we're using the upper 32 bits of ring header entries to store the size. This code isn't being used on 32-bit so I'd prefer to waste most of the upper 32 bits on 32-bit for now, rather than write dead code.

65

I've added a large comment near the top and the individual functions.

86

This is now covered in the documentation. Note that the top 31 bits (the size) cannot act as a "UID" as such, they are actually necessary in order to interpret the hash, as there is no other way to determine the size.

hctim added inline comments.Apr 6 2020, 10:30 AM
compiler-rt/lib/scudo/standalone/combined.h
740

Okay - but we definitely neet to test with a garbage depot/regioninfo/memory/memorytags, etc.

746

I don't think we can assume that typeof(PrimaryT) is the same between dead process, and crash handler, right? Same as below with findNearestBlock()? A 32-bit process can use a 64-bit crash_handler, right (and I'm not sure whether there's any svelte vs. non-svelte crossover)? I see from your AOSP patches right now we unconditionally create ScudoCrashData even though there might be incompatibilities.

compiler-rt/lib/scudo/standalone/stack_depot.h
51

Sure.

pcc marked an inline comment as done.Apr 6 2020, 1:13 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
746

On Android the bitness is always the same. See e.g.
https://cs.android.com/android/platform/superproject/+/master:system/core/debuggerd/handler/debuggerd_handler.cpp;l=83

You can also find assumptions about the process bitness e.g. here:
https://cs.android.com/android/platform/superproject/+/master:system/core/debuggerd/libdebuggerd/tombstone.cpp;l=336

We may need to do something about svelte, but not immediately since the svelte allocator is not used on Android (svelte devices use jemalloc).

hctim added inline comments.Apr 7 2020, 4:59 PM
compiler-rt/lib/scudo/standalone/combined.h
744

Just adding a tracking tag here - please LMK if you'd like me to write the fuzz target/oss-fuzz integration for this function. For now, I'm confident the following call will crash in findNearestBlock is called:

scudo_error_info ErrorInfo;
const char EmptyBuffer[1] = {};
getErrorInfo(&ErrorInfo, 0, EmptyBuffer, EmptyBuffer, EmptyBuffer, EmptyBuffer, 0, 1);

This function should pretty much take any sort of garbage - the result can be UB (IMO all entries of ErrorInfo should be nullptr or the function should return false on bad data), but it shouldn't crash. Worth validating the structs like Chromium GWP-ASan: https://source.chromium.org/chromium/chromium/src/+/master:components/gwp_asan/common/allocator_state.cc;drc=28442cacc3be1a7d05a898aba663025a143095ac;bpv=1;bpt=1;l=64?originalUrl=https:%2F%2Fcs.chromium.org%2F

pcc marked an inline comment as done.Apr 7 2020, 5:39 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
744

That might crash because DepotPtr and RegionInfoPtr aren't pointing to buffers that are large enough. I think we can trust the analyzing process to pass buffers that have appropriate sizes (e.g. MemorySize argument or __scudo_*_size() return value). The contents of the buffers themselves are another matter, of course.

pcc marked 2 inline comments as done.Apr 8 2020, 9:29 AM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
740

That will only cover the cases that we can think of. It seems better to rely on a fuzzer since it will also (in principle) cover the cases that we can't think of.

compiler-rt/lib/scudo/standalone/memtag.h
207

Actually, since this function takes a pointer derived from a fault address (i.e. not explicitly passed into the allocator), it may legitimately have TBI bits, so I'll mask them out here.

pcc updated this revision to Diff 256158.Apr 8 2020, 6:48 PM
  • Add fuzzer, fix bugs found using it
hctim added inline comments.Apr 9 2020, 12:38 PM
compiler-rt/lib/scudo/standalone/fuzz/get_error_info_fuzzer.cpp
17

Is this the right fuzz target?

hctim added inline comments.Apr 9 2020, 12:56 PM
compiler-rt/lib/scudo/standalone/primary64.h
253
primary64.h:253:64: error: implicit conversion changes signedness: 'long' to 'unsigned long' [-Werror,-Wsign-conversion]
      B.BlockBegin = B.RegionBegin + sptr(Ptr - B.RegionBegin) /
                                     ~~~~~~~~~~~~~~~~~~~~~~~~~~^
pcc updated this revision to Diff 256362.Apr 9 2020, 1:02 PM
  • Fix build error
pcc marked an inline comment as done.Apr 9 2020, 1:03 PM
pcc marked an inline comment as done.
pcc added inline comments.
compiler-rt/lib/scudo/standalone/fuzz/get_error_info_fuzzer.cpp
17

Resolved out-of-band

pcc updated this revision to Diff 256426.Apr 9 2020, 3:28 PM
  • Replace the fuzzer with one written by mitchp and fix an overflow bug
  • Move RegionInfo alignment to field to avoid alignment requirement for getErrorInfo callers
cryptoad accepted this revision.Apr 15 2020, 10:41 AM

Thanks Peter & Mitch!
As a side issue, do you have an idea on how to support the interface functions (__scudo_*) for both allocators once we get svelte support?

This revision is now accepted and ready to land.Apr 15 2020, 10:41 AM
pcc added a comment.Apr 17 2020, 5:19 PM

Thanks Peter & Mitch!
As a side issue, do you have an idea on how to support the interface functions (__scudo_*) for both allocators once we get svelte support?

My initial inclination would be to move the functions to wrappers_c.inc, but we'd need to think about the interaction with other platforms. I will give this some more thought.

This revision was automatically updated to reflect the committed changes.