Page MenuHomePhabricator

scudo: Add initial memory tagging support.
ClosedPublic

Authored by pcc on Nov 26 2019, 7:25 PM.

Details

Summary

When the hardware and operating system support the ARM Memory Tagging
Extension, tag primary allocation granules with a random tag. The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly. Memory is
also retagged with a random tag when it is freed, and we opportunistically
reuse the new tag when the block is reused to reduce overhead. This causes
linear buffer overflows to be caught deterministically and non-linear buffer
overflows and use-after-free to be caught probabilistically.

This feature is currently only enabled for the Android allocator
and depends on an experimental Linux kernel branch available here:
https://github.com/pcc/linux/tree/android-experimental-mte

All code that depends on the kernel branch is hidden behind a macro,
ANDROID_EXPERIMENTAL_MTE. This is the same macro that is used by the Android
platform and may only be defined in non-production configurations. When the
userspace interface is finalized the code will be updated to use the stable
interface and all #ifdef ANDROID_EXPERIMENTAL_MTE will be removed.

Depends on D70761

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
cryptoad added inline comments.Dec 10 2019, 12:11 PM
compiler-rt/lib/scudo/standalone/combined.h
285

Maybe move this part of the comment to where the memset is?

291

There already is a BlockEnd variable outside this scope used for the Secondary, maybe reuse it?
It might be cleaner to initialize it where the allocate is for the Primary, but then performance will suffer due to the extra getSizeByClassId in the fast path, which is not ideal.
If you want to keep it in this block, it seems to be const.

323

const?

pcc updated this revision to Diff 233175.Dec 10 2019, 12:33 PM
pcc marked 4 inline comments as done.
  • Address review comments
compiler-rt/lib/scudo/standalone/combined.h
291

I made this variable const and renamed the other one to reduce confusion.

Build result: FAILURE - Could not check out parent git hash "dd9de96af155762189fbc541572c02ef44b3ca72". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

eugenis added inline comments.Dec 10 2019, 1:04 PM
compiler-rt/lib/scudo/standalone/combined.h
303

Is it possible that header was reclaimed, but data was only partially reclaimed? I think the code is correct in this case, too, but consider updating the comment.

compiler-rt/lib/scudo/standalone/memtag.h
36

I don't think it is the job of a memory allocator to mess with PSTATE.TCO.
Let the caller deal with it?

compiler-rt/lib/scudo/standalone/wrappers_c.inc
187

This api is not thread safe. TCO is per-thread, I think, so this would not work in a multi-threaded program at all. Either document this fact, or, preferably, remove the TCO code and use relaxed atomic for UseMemoryTagging.

pcc marked 2 inline comments as done.Dec 10 2019, 1:53 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
36

I can see the argument either way.

On one hand, TCO is a thread-wide property which the caller could arguably be considered responsible for.

On the other, disabling tag checks is required in order to avoid tag check failures for future allocations which reuse previously tagged chunks, so to a certain extent it makes sense to perform an operation that is required in order to keep the allocator working, even if it involves setting what is technically a thread-wide property. The operation that we perform depends on the allocator's implementation, so arguably it belongs with the allocator. If for example we switched to mprotecting without PROT_MTE when tag checks are disabled, there would be no need to set TCO here.

If you still think that it belongs in the caller, maybe we could document that setting TCO is required, but this may change in the future.

compiler-rt/lib/scudo/standalone/wrappers_c.inc
187

Let's not use atomics for this without a use case. Due to the complexity of disabling MTE in a multi-threaded program, I don't think we should even attempt to support it for now.

I will document that this requires the program to be single threaded at the point when the function is called.

pcc marked an inline comment as done.Dec 10 2019, 1:59 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
303

Yes, that's possible, and it's handled in the same way as full reclaiming, which I forgot to comment on as well. I will add a comment for both.

pcc updated this revision to Diff 233195.Dec 10 2019, 2:17 PM
  • Address review comments

Build result: FAILURE - Could not check out parent git hash "dd9de96af155762189fbc541572c02ef44b3ca72". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

eugenis added inline comments.Dec 10 2019, 3:07 PM
compiler-rt/lib/scudo/standalone/memtag.h
36

OK, SGTM, let's keep it here.

pcc updated this revision to Diff 234329.Dec 17 2019, 9:32 AM
  • Load tags in malloc_iterate
pcc updated this revision to Diff 234332.Dec 17 2019, 9:45 AM
  • Only load tags if useMemoryTagging()

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

rankov added a subscriber: rankov.Dec 18 2019, 1:57 PM
rankov added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
576

Should you use UNLIKELY here?

compiler-rt/lib/scudo/standalone/memtag.h
36

PSTATE.TCO can be easily changed by other code. So it is not a good choice for disabling tag checks in this case.
Also, the current kernel documentation says that PSTATE.TCO is reset to 0 in signal handlers.

Another concern with disabling tagging is that when memory is deallocated, the tags will remain, so enabling tag checks again is dangerous. This should be documented. Alternative is to untag memory on deallocation after tagging is disabled, but this will add cost.

I think that it would be better to leave disabling tag checks to the caller. The tests could use PSTATE.TCO, other callers might want to use other ways like calling a prctl to disable tag checks.

51

Instead of using assembly here, you could use functions from arm_acle.h

pcc marked 3 inline comments as done.Dec 18 2019, 2:35 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
576

malloc_iterate is a rarely used debugging function, so we shouldn't worry too much about this sort of thing here I think.

compiler-rt/lib/scudo/standalone/memtag.h
36

Yes, I had intended to switch this over to using a prctl once the kernel patches to support prctl became available.

Now that I think about it, it would be a good idea to use PSTATE.TCO instead of prctl in the tests in order to avoid interfering with the "real" allocator's tag checking state during the tests. The fact that PSTATE.TCO is set to 0 during a signal handler is a good reason why the real allocator should use prctl and not TCO to disable tag checks. I think that's a good argument for moving the functionality into the caller. I'll do that then. I'll also make it clear that this is a one way operation and tag checks should not be turned back on.

51

Without __attribute__((target("mte"))) on these functions I get errors such as

fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.ldg

if I try to use the intrinsics. And with that attribute the functions don't get inlined because LLVM IR doesn't support scoping the availability of the instructions to a block (unlike .arch_extension in inline asm). So I'd prefer to stick with inline asm for now.

pcc updated this revision to Diff 234823.Dec 19 2019, 6:23 PM
pcc marked an inline comment as done.
  • Move tag check disablement to the caller

Unit tests: pass. 61010 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

pcc updated this revision to Diff 234927.Dec 20 2019, 10:58 AM
  • Formatting

Unit tests: pass. 61010 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

hctim added inline comments.Dec 23 2019, 10:08 AM
compiler-rt/lib/scudo/standalone/allocator_config.h
43–45

Maybe:
typedef SizeClassAllocator64<SizeClassMap, 30U, /*MaySupportMemoryTagging=*/ true> Primary;?

compiler-rt/lib/scudo/standalone/memtag.h
98

Size % 16 == 0 always here, so this could just be UntaggedEnd = Ptr + Size?

109

13.8% of chromium fuzzing-found heap OOB are > 16 bytes stride. Given that this is primary-only, the cost of retagging the OldChunk - NewChunk might be an acceptable performance penalty.

kevin.brodsky added inline comments.Dec 30 2019, 5:35 AM
compiler-rt/lib/scudo/standalone/memtag.h
31

I wonder if this is really a good thing. If libc fails to enable tag checking before the allocator is initialised (which is quite possible, given that until recently malloc() was called very early in Bionic's libc_init), then Scudo will not tag anything. Wouldn't it be possible instead to explicitly ask Scudo to use tagging when it is initialised? This would also be more consistent with the malloc_disable_memory_tagging() interface: Scudo does not take care of enabling / disabling tag checking, so arguably it shouldn't check if it is enabled either.

77

Since this asm statement is modifying memory, is it safe to use it without a "memory" clobber? It certainly isn't safe in general. Same comment for the other asm statements that use st*g.

pcc marked 5 inline comments as done.Jan 6 2020, 10:54 AM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/allocator_config.h
43–45

Will do

compiler-rt/lib/scudo/standalone/memtag.h
31

So the motivation behind adding the check here was along the lines of "if the application doesn't enable memory tag checks then there's no point in enabling memory tagging". But I can see the value in decoupling these two things especially since the libc might not have a chance to turn on memory tagging before the first allocation. I'll change this to be purely based on the hwcap and let the application call malloc_disable_memory_tagging() early if it doesn't want to use memory tagging.

I'd prefer not to add an explicit initialization step to the allocator since the allocation functions can in principle be called at any time, so we'd need additional complexity to handle the case where the allocation functions were called before the allocator was formally initialized.

77

I was somehow under the impression that volatile implied the "memory" clobber, but that doesn't appear to be backed up by the documentation so I'll add the clobbers here and elsewhere.

98

The caller isn't rounding Size as far as I can tell, so this isn't guaranteed to be the case.

109

This seems more like something that we'd want to do in precise mode than in mitigation mode. Recall that even with a large stride we (probabilistically) can't go outside of the bounds of the chunk. The chunk is cleared during dealloc with stzg so there's no info leak potential either.

pcc updated this revision to Diff 236479.Jan 6 2020, 4:21 PM
  • Address review comments
pcc marked 3 inline comments as done.Jan 6 2020, 4:24 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
31

I forgot about the other reason why I added this check, which was to avoid breaking the tests if the libc did not issue the prctl to enable memory tag checks. This seems like something that can be properly checked for in the tests themselves, which I've now done.

Unit tests: pass. 61259 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

kevin.brodsky added inline comments.Jan 7 2020, 8:21 AM
compiler-rt/lib/scudo/standalone/memtag.h
31

OK this makes sense, always enabling tagging in Scudo and then disabling it explicitly should be fairly robust.

77

My understanding is that volatile and "memory" do different things: the former tells the compiler not to (re)move the asm statement, while the latter tells the compiler that the the asm statement may perform reads or writes from arbitrary memory locations (forcing the compiler to reload values from memory as needed). GCC's manual has a rather good description of this: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

162

Technically "memory" is required even for memory reads. That said, in that case, the only thing that could affect ldg is a st*g, which is only done in another asm volatile statement, so "memory" is probably not absolutely needed here.

pcc updated this revision to Diff 236940.Jan 8 2020, 5:20 PM
  • Add memory clobber
pcc marked an inline comment as done.Jan 8 2020, 5:20 PM

Unit tests: pass. 61259 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61908 tests passed, 0 failed and 782 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

I think this looks good. I think this might not be Fuchsia compatible and could probably use some #if SCUDO_LINUX or ANDROID or top of the __aarch64__ checks.
Fuchsia will want memory tagging support at some point, I 'll check the patch on the platform once the inconsistencies I saw are addressed.

compiler-rt/lib/scudo/standalone/combined.h
199

I am assuming the compiler will inline this most of the time, but could we put that inline or FORCE_INLINE to double down?

compiler-rt/lib/scudo/standalone/memtag.h
15

If I followed properly memtag.h is included on all platforms, but I am not sure sys/*.h is available everywhere (Fuchsia doesn't have one).
So this probably requires some #if SCUDO_LINUX or something to that extent?

pcc marked 3 inline comments as done.Jan 16 2020, 12:36 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
199

inline is implicit on inline definitions (and somewhat confusingly in C++ inline doesn't really control inlining decisions, it's more of a linkage specification), so adding it here would have no effect. I'm personally not a fan of micro-optimizing inlining decisions like this, but I suppose it wouldn't hurt to add an ALWAYS_INLINE here.

compiler-rt/lib/scudo/standalone/memtag.h
15

Yes, this will need SCUDO_LINUX, done.

pcc marked an inline comment as done.Jan 16 2020, 12:37 PM

And I verified that this patch works on Fuchsia (x64 and arm64).

pcc updated this revision to Diff 238578.Jan 16 2020, 12:37 PM
  • Address review comments

Unit tests: fail. 61907 tests passed, 1 failed and 782 were skipped.

failed: Clang.CXX/temp/temp_arg/temp_arg_template/p3-2a.cpp

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

cryptoad accepted this revision.Jan 16 2020, 1:01 PM

LGTM, Thanks for all the work Peter!

This revision is now accepted and ready to land.Jan 16 2020, 1:01 PM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Mon, Jan 20, 5:00 AM
bjope added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
15

I also had to move the auxv include a few lines down (inside the {{#if defined(ANDROID_EXPERIMENTAL_MTE)}} guard) for things to build on my RedHat 6.10 server.
Would it make sense to do that change upstream?

pcc marked an inline comment as done.Tue, Jan 21, 12:20 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
15

I don't think so, we will eventually need to include it unconditionally on Linux once ANDROID_EXPERIMENTAL_MTE goes away as mentioned in the commit message. I think it would be better to change the build system so that we don't build/test scudo on Linux machines without a sys/auxv.h.

bjope added inline comments.Wed, Jan 22, 1:56 AM
compiler-rt/lib/scudo/standalone/memtag.h
15

Oh yes, that would be better of course. Although, I'm not that familiar with adding such checks, so I'm not sure exactly how to do it properly.
I see sys/auxv.h being used in other places in compiler-rt as well, but maybe I don't hit those due to some existing build system checks for those parts (and maybe those checks arent't directed at checking for the presence of sys/auxv.h, but rather something else).

bjope added inline comments.Wed, Jan 29, 8:25 AM
compiler-rt/lib/scudo/standalone/memtag.h
15

Proposed fix, checking that sys/auxv.h can be found: https://reviews.llvm.org/D73631