This is an archive of the discontinued LLVM Phabricator instance.

scudo: Add support for tracking stack traces of secondary allocations.
ClosedPublic

Authored by pcc on Jan 6 2021, 8:37 PM.

Details

Summary

There is no centralized store of information related to secondary
allocations. Moreover the allocations themselves become inaccessible
when the allocation is freed in order to implement UAF detection,
so we can't store information there to be used in case of UAF
anyway.

Therefore our storage location for tracking stack traces of secondary
allocations is a secondary ring buffer. The ring buffer is copied to
the process creating the crash dump when a fault occurs.

In order to support the scenario where an access to the ring buffer
is interrupted by a concurrently occurring crash, the secondary
ring buffer is accessed in a lock-free manner. This requires the
use of 128-bit atomics, which are generally only supported on 64-bit
platforms. Therefore the secondary ring buffer is disabled on 32-bit
platforms. The ring buffer would be unused on such platforms anyway
due to the lack of support for memory tagging on platforms other
than arm64.

This change enables 16-byte compare-exchange instructions on x86 in
order to avoid a warning where 128-bit atomics are used. Since the code
wouldn't actually be called in production on x86 as mentioned, this is
mostly in order to provide compile-time coverage and support fuzzing.

Depends on D93731

Diff Detail

Event Timeline

pcc created this revision.Jan 6 2021, 8:37 PM
pcc requested review of this revision.Jan 6 2021, 8:37 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 6 2021, 8:37 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Can we have a setting where the secondary ring buffer is used for primary allocations, too? The primary record could be pushed when the chunk is about to be reused.

compiler-rt/lib/scudo/standalone/combined.h
1351

This loop seems expensive. Why not store alloc and dealloc as two independent events? It will use a bit more memory, but given the same memory budget, dealloc events should survive longer because they will be generally closer to the end of the ring (and they are the most interesting ones).

pcc added a comment.Jan 8 2021, 5:51 PM

Can we have a setting where the secondary ring buffer is used for primary allocations, too? The primary record could be pushed when the chunk is about to be reused.

Let's do that in a separate change.

compiler-rt/lib/scudo/standalone/combined.h
1351

Hmm, I wouldn't expect it to be that expensive compared to the cost of the allocation itself.

I thought about storing these as two separate events but I wasn't sure that just a deallocation stack trace on its own would be useful. But having given it more thought I suppose that:

  1. We don't need to concern ourselves too much with memory usage here because secondary allocations are rare enough that we can choose a relatively small buffer size, so doubling it shouldn't matter that much
  2. Just the deallocation stack trace would be better than nothing. That being said, we could store the allocation trace and tid inline with the allocation and copy them into the ring buffer entry on deallocation. That would cost more memory, but as mentioned in 1) that should be inconsequential.

So let's go with the separate events.

pcc updated this revision to Diff 316322.Jan 12 2021, 9:58 PM
  • Instead of updating an existing ring buffer entry, create a new entry
pcc updated this revision to Diff 320975.Feb 2 2021, 7:39 PM
  • Use ring buffer for primary deallocations as well
pcc updated this revision to Diff 320976.Feb 2 2021, 7:41 PM
  • Update comment
eugenis added inline comments.Feb 24 2021, 3:52 PM
compiler-rt/lib/scudo/standalone/combined.h
1123

Does this actually do anything if all stores are relaxed?

1150

Why create this ring buffer entry while the allocation is still alive?
This would add some complexity to error reporting, but we could iterate over the live secondary allocations to find the right one. That should reduce ring buffer traffic by 2x (not counting primary of course).

pcc added inline comments.Feb 24 2021, 8:00 PM
compiler-rt/lib/scudo/standalone/combined.h
1123

The goal is to guard against the thread executing this access being interrupted by a crash, not against concurrent access. By the time the ring buffer entry is accessed the thread will have been stopped so the effects of any stores will have been committed to memory.

That being said, I suspect that the compiler would be allowed to eliminate this store, so we may need something stronger here.

1150

It would add quite a bit of complexity. malloc_iterate is implemented via pointer chasing, whereas __scudo_get_error_info expects a few fixed blocks of data to be copied. Implementing the pointer chasing on the __scudo_get_error_info side would probably require callbacks to read memory, or something like that. It doesn't seem worth it to reduce ring buffer traffic for secondary allocations, which are quite uncommon anyway.

pcc updated this revision to Diff 326275.Feb 24 2021, 8:43 PM
  • Use acquire/release for accesses to Ptr
eugenis accepted this revision.Feb 25 2021, 3:37 PM

LGTM with a nit

compiler-rt/lib/scudo/standalone/combined.h
1150

Would __atomic_signal_fence(__ATOMIC_SEQ_CST) be more appropriate?

This revision is now accepted and ready to land.Feb 25 2021, 3:37 PM
pcc added inline comments.Feb 25 2021, 4:26 PM
compiler-rt/lib/scudo/standalone/combined.h
1150

(assuming this was meant as a reply to the other comment)

Yes, that's a good idea, I'll do that.

This revision was landed with ongoing or failed builds.Mar 9 2021, 11:43 AM
This revision was automatically updated to reflect the committed changes.