This is an archive of the discontinued LLVM Phabricator instance.

scudo: Add initial memory tagging support.
ClosedPublic

Authored by pcc on Nov 26 2019, 7:25 PM.

Details

Summary

When the hardware and operating system support the ARM Memory Tagging
Extension, tag primary allocation granules with a random tag. The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly. Memory is
also retagged with a random tag when it is freed, and we opportunistically
reuse the new tag when the block is reused to reduce overhead. This causes
linear buffer overflows to be caught deterministically and non-linear buffer
overflows and use-after-free to be caught probabilistically.

This feature is currently only enabled for the Android allocator
and depends on an experimental Linux kernel branch available here:
https://github.com/pcc/linux/tree/android-experimental-mte

All code that depends on the kernel branch is hidden behind a macro,
ANDROID_EXPERIMENTAL_MTE. This is the same macro that is used by the Android
platform and may only be defined in non-production configurations. When the
userspace interface is finalized the code will be updated to use the stable
interface and all #ifdef ANDROID_EXPERIMENTAL_MTE will be removed.

Depends on D70761

Diff Detail

Event Timeline

pcc created this revision.Nov 26 2019, 7:25 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added subscribers: Restricted Project, kristof.beyls, srhines. · View Herald Transcript
pcc updated this revision to Diff 231168.Nov 26 2019, 7:31 PM
  • Silence warnings

Build result: pass - 60330 tests passed, 0 failed and 732 were skipped.

Log files: console-log.txt, CMakeCache.txt

Build result: pass - 60330 tests passed, 0 failed and 732 were skipped.

Log files: console-log.txt, CMakeCache.txt

Thank you Peter this is awesome!

compiler-rt/lib/scudo/standalone/combined.h
164

nit on naming: I usually put the verb 1st and Maybe last.
I might have been wrong with regard to LLVM style and can change the other names, but I'd like to keep the naming scheme consistent.
Let me know which of the 2 is preferable.

579

I you could use COMPILER_CHECK, as it is used in other places.
It ends up being a static_assert but it feel more consistent.

compiler-rt/lib/scudo/standalone/linux.cpp
39

Does this work when compiled on Android but outside of Bionic?

compiler-rt/lib/scudo/standalone/memtag.h
24

If you could use the INLINE macros for consistency please.

25

I assume that when you do roundUpTo(NewSize, 16) and all the 16-related arithmetic, it's related to the granule size.
Could it be constant'd out in the whole file?

compiler-rt/lib/scudo/standalone/primary64.h
43

Maybe default to false?

pcc marked 6 inline comments as done.Nov 27 2019, 9:22 AM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
164

Yeah, the Maybe at the beginning is more consistent with the rest of LLVM. For now I'll put it at the end and we can change it later.

579

The rest of LLVM uses static_assert directly. I'll change this one to COMPILER_CHECK but maybe this could be changed over to being consistent with LLVM as well?

compiler-rt/lib/scudo/standalone/linux.cpp
39

Yes, as long as you're building scudo as part of the Android platform (the header:
https://android.googlesource.com/platform/bionic/+/900d07d6a1f3e1eca8cdbb3b1db1ceeec0acc9e2/libc/platform/bionic/mte_kernel.h
is deliberately not exposed in the NDK). Unfortunately this means that the MTE tests can't run as part of check-scudo, but the situation will hopefully improve once we no longer need ANDROID_EXPERIMENTAL_MTE.

compiler-rt/lib/scudo/standalone/memtag.h
24

Sure, again inline is what the rest of LLVM uses.

25

Probably not in the inline asm parts but that only leaves a couple of places. The MTE-specific part of the code is small enough that I reckon that it's clear enough to just write out the constant.

compiler-rt/lib/scudo/standalone/primary64.h
43

Yeah, that's a good idea, I'll do that.

hctim added inline comments.Nov 27 2019, 11:05 AM
compiler-rt/lib/scudo/standalone/combined.h
320

In Chromium, ~11% of bugs are nonlinear (as determined with Heap-buffer-*flow READ|WRITE {*} over Heap-buffer-*flow with a fixed deterministic size).

The fixed size classes only go up to 24-byte allocations, so anything 24 < x <= [a page] also land in this bucket - but we're also not counting wild SEGVs or UBSan errors that allow for attacker-controlled offsets...

I think it worth it to have a tagged secondary - although I underderstand there's some performance implications of this. Maybe guarded behind a runtime flag?

pcc marked an inline comment as done.Nov 27 2019, 12:12 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
320

I'm not sure where you got the number 24 from. On Android we set MaxSizeLog to 17:
http://llvm-cs.pcc.me.uk/projects/compiler-rt/lib/scudo/standalone/size_class_map.h#143
so any allocations <= 2^17 bytes will use the primary allocator.

Tagged secondary would be nice, but I think I'd prefer to do it in a different way. Specifically, we might consider asking the kernel folks for a way to set the "background tag" of a mapping, so that any faulted pages in the mapping get the background tag. That way, we don't pay the cost of faulting up front.

hctim added inline comments.Nov 27 2019, 12:32 PM
compiler-rt/lib/scudo/standalone/combined.h
320

This is from linux fuzzing on Chromium - where ClusterFuzz actually buckets them based on the READ|WRITE of size ### from ASan.

The idea sounds good - is it possible to update the comment to reflect this?

The granules either side of the allocation are tagged with tag 0

But only if the granule on the right is within the current chunk, right?

This patch does not retag memory on free, so it would not catch use-after-free. Unless I'm missing something.
It looks like a better strategy would be tagging memory _only_ on free (and realloc, and when new memory is requested from the system, too).

compiler-rt/lib/scudo/standalone/combined.h
345

Do we want to touch memory with the tagged pointer first to catch double-free & invalid-free bugs?

425–429

UNLIKELY?

pcc added a comment.Nov 27 2019, 1:16 PM

The granules either side of the allocation are tagged with tag 0

But only if the granule on the right is within the current chunk, right?

In the case where the granule on the right is not in the current chunk, there are two possibilities:

  • We are at the next chunk, which will have a tag 0 at the beginning for the header granule.
  • We are at the end of the mapping.

Either way we will get a SEGV when we access the next granule.

This patch does not retag memory on free, so it would not catch use-after-free. Unless I'm missing something.

Correct. We won't catch UAF unless we happen to reuse the chunk in the right way. I was originally planning to do tag on free later.

It looks like a better strategy would be tagging memory _only_ on free (and realloc, and when new memory is requested from the system, too).

Interesting. So:

  • On mmap we do a separate IRG for each block and tag the entire block except for first granule.
  • On free we IRG and tag the entire block except first granule.
  • On malloc we tag the granule before and after (modulo end-of-block) with 0.

So we would do about 1.5x the amount of work (because we don't know how big the malloc is going to be), with additional upfront work. Maybe the upfront work is fine on Android because of the zygote.

That said, maybe we could cut the 1.5x down to about 1x by only tagging max(half block size, usable size) on free, and the remainder on malloc.

In D70762#1762272, @pcc wrote:

In the case where the granule on the right is not in the current chunk, there are two possibilities:

  • We are at the next chunk, which will have a tag 0 at the beginning for the header granule.
  • We are at the end of the mapping.

Either way we will get a SEGV when we access the next granule.

Yes, of course. SG.

Interesting. So:

  • On mmap we do a separate IRG for each block and tag the entire block except for first granule.
  • On free we IRG and tag the entire block except first granule.
  • On malloc we tag the granule before and after (modulo end-of-block) with 0.

So we would do about 1.5x the amount of work (because we don't know how big the malloc is going to be), with additional upfront work. Maybe the upfront work is fine on Android because of the zygote.

That said, maybe we could cut the 1.5x down to about 1x by only tagging max(half block size, usable size) on free, and the remainder on malloc.

We don't need the upfront work if we can track the "has never been tagged" state of the chunk somewhere. Ideally, not in the chunk header to avoid paging everything in too early.
Maybe we can optimize for size of malloc <= size of the previous free by storing the size of the free() in the header.

Anyway, this beats the 2x amount of work needed to catch UAF by tagging in both malloc and free.

hctim added inline comments.Nov 27 2019, 1:49 PM
compiler-rt/lib/scudo/standalone/combined.h
345

Should be handled below in the chunk header check, no?

550

nit: newline after if?

compiler-rt/lib/scudo/standalone/memtag.h
27

Can we move this ifdef inside of systemSupportsMemoryTagging?

51

These asm stubs seem mostly abstractable - which would allow us to extend to future platforms easier, and make the intermediate [read - non-mte instructions] code easier to maintain.

Looks like we could abstract away to storeZeroTag abd randomTagMemory (or similar).

hctim added a comment.Nov 27 2019, 1:59 PM

The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly

It seems valuable to have the LHS and RHS of an allocation as a nonzero tag. IIUC, the chunk header is on the LHS for primary allocations, and making the header MTE-protected (the tag can be stored in the Primary allocator struct somewhere) seems like a good additional security step to make it unwriteable from a deterministic (zeroed) pointer.

compiler-rt/lib/scudo/standalone/primary64.h
193

Nit: line length

293

nit: leave newline

pcc added a comment.Nov 27 2019, 2:09 PM

The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly

It seems valuable to have the LHS and RHS of an allocation as a nonzero tag. IIUC, the chunk header is on the LHS for primary allocations, and making the header MTE-protected (the tag can be stored in the Primary allocator struct somewhere) seems like a good additional security step to make it unwriteable from a deterministic (zeroed) pointer.

It's already protected by using tag 0, which we don't use in heap pointers, so I'm not sure what your concern is.

pcc added a comment.Nov 27 2019, 3:13 PM
In D70762#1762272, @pcc wrote:

In the case where the granule on the right is not in the current chunk, there are two possibilities:

  • We are at the next chunk, which will have a tag 0 at the beginning for the header granule.
  • We are at the end of the mapping.

Either way we will get a SEGV when we access the next granule.

Yes, of course. SG.

Interesting. So:

  • On mmap we do a separate IRG for each block and tag the entire block except for first granule.
  • On free we IRG and tag the entire block except first granule.
  • On malloc we tag the granule before and after (modulo end-of-block) with 0.

So we would do about 1.5x the amount of work (because we don't know how big the malloc is going to be), with additional upfront work. Maybe the upfront work is fine on Android because of the zygote.

That said, maybe we could cut the 1.5x down to about 1x by only tagging max(half block size, usable size) on free, and the remainder on malloc.

We don't need the upfront work if we can track the "has never been tagged" state of the chunk somewhere. Ideally, not in the chunk header to avoid paging everything in too early.
Maybe we can optimize for size of malloc <= size of the previous free by storing the size of the free() in the header.

There is already a header field 'SizeOrUnusedBytes" that stores the allocation size. When a chunk is freed, we don't disturb that field. That gives us a way to recover the size of the previous allocation. We can call getChunkFromBlock() (modifying it to accept deallocated chunks) to recover the location of the chunk header given a block.

I think we can use the header itself to store the "has never been tagged" state. If the header read as a word is equal to 0, that means that the chunk has never been used before and we need to IRG before setting tags. That won't result in early paging because by the time we read the header we've already decided to use that block for the allocation.

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

alex added a subscriber: alex.Nov 29 2019, 8:05 AM
In D70762#1762390, @pcc wrote:

There is already a header field 'SizeOrUnusedBytes" that stores the allocation size. When a chunk is freed, we don't disturb that field. That gives us a way to recover the size of the previous allocation. We can call getChunkFromBlock() (modifying it to accept deallocated chunks) to recover the location of the chunk header given a block.

I think we can use the header itself to store the "has never been tagged" state. If the header read as a word is equal to 0, that means that the chunk has never been used before and we need to IRG before setting tags. That won't result in early paging because by the time we read the header we've already decided to use that block for the allocation.

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

A point here which maybe hasn't been considered is that if reclaiming kicks in, the pages containing the freed chunks will be zero'd out, which probably invalidates assumptions about headers contents.

pcc added a comment.Dec 3 2019, 10:22 AM
In D70762#1762390, @pcc wrote:

There is already a header field 'SizeOrUnusedBytes" that stores the allocation size. When a chunk is freed, we don't disturb that field. That gives us a way to recover the size of the previous allocation. We can call getChunkFromBlock() (modifying it to accept deallocated chunks) to recover the location of the chunk header given a block.

I think we can use the header itself to store the "has never been tagged" state. If the header read as a word is equal to 0, that means that the chunk has never been used before and we need to IRG before setting tags. That won't result in early paging because by the time we read the header we've already decided to use that block for the allocation.

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

A point here which maybe hasn't been considered is that if reclaiming kicks in, the pages containing the freed chunks will be zero'd out, which probably invalidates assumptions about headers contents.

When you say reclaiming you mean calling releasePagesToOS(), correct? In that case, wouldn't that cause the header to be set to 0, which would put us in the same state as if we hadn't used the chunk before?

In D70762#1767420, @pcc wrote:

When you say reclaiming you mean calling releasePagesToOS(), correct? In that case, wouldn't that cause the header to be set to 0, which would put us in the same state as if we hadn't used the chunk before?

This is correct, with the caveat that it was allocated and freed as opposed to never used.

pcc updated this revision to Diff 233157.Dec 10 2019, 11:21 AM
pcc marked 10 inline comments as done.
  • Address review comments
pcc added a comment.Dec 10 2019, 11:21 AM

The code now implements UAF checks by retagging on free.

In D70762#1767420, @pcc wrote:

When you say reclaiming you mean calling releasePagesToOS(), correct? In that case, wouldn't that cause the header to be set to 0, which would put us in the same state as if we hadn't used the chunk before?

This is correct, with the caveat that it was allocated and freed as opposed to never used.

I discovered that it is possible for a chunk to be partially reclaimed, which means that the header could still be there, but part or all of the data could be reclaimed. The code that I've added to allocate() handles the various possibilities.

In D70762#1762390, @pcc wrote:

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

I tried implementing this but was uncomfortable with the level of code complexity, so the code only tries to reuse tags if our start address is the same as that of the previous allocation. This should be true in the majority of cases, so it seems fine to me.

compiler-rt/lib/scudo/standalone/combined.h
345

Yes, let's leave it to the header check.

579

Left as is due to D70793

compiler-rt/lib/scudo/standalone/memtag.h
24

Left as is due to D70793

51

I created setRandomTag since I needed it for tag-on-free, although I feel that it's better to keep things at the chunk level here as splitting things into too many pieces can make it harder to understand the big picture.

Build result: FAILURE - Could not check out parent git hash "dd9de96af155762189fbc541572c02ef44b3ca72". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

cryptoad added inline comments.Dec 10 2019, 12:11 PM
compiler-rt/lib/scudo/standalone/combined.h
242

Maybe move this part of the comment to where the memset is?

248

There already is a BlockEnd variable outside this scope used for the Secondary, maybe reuse it?
It might be cleaner to initialize it where the allocate is for the Primary, but then performance will suffer due to the extra getSizeByClassId in the fast path, which is not ideal.
If you want to keep it in this block, it seems to be const.

280

const?

pcc updated this revision to Diff 233175.Dec 10 2019, 12:33 PM
pcc marked 4 inline comments as done.
  • Address review comments
compiler-rt/lib/scudo/standalone/combined.h
248

I made this variable const and renamed the other one to reduce confusion.

Build result: FAILURE - Could not check out parent git hash "dd9de96af155762189fbc541572c02ef44b3ca72". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

eugenis added inline comments.Dec 10 2019, 1:04 PM
compiler-rt/lib/scudo/standalone/combined.h
260

Is it possible that header was reclaimed, but data was only partially reclaimed? I think the code is correct in this case, too, but consider updating the comment.

compiler-rt/lib/scudo/standalone/memtag.h
36

I don't think it is the job of a memory allocator to mess with PSTATE.TCO.
Let the caller deal with it?

compiler-rt/lib/scudo/standalone/wrappers_c.inc
180

This api is not thread safe. TCO is per-thread, I think, so this would not work in a multi-threaded program at all. Either document this fact, or, preferably, remove the TCO code and use relaxed atomic for UseMemoryTagging.

pcc marked 2 inline comments as done.Dec 10 2019, 1:53 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
36

I can see the argument either way.

On one hand, TCO is a thread-wide property which the caller could arguably be considered responsible for.

On the other, disabling tag checks is required in order to avoid tag check failures for future allocations which reuse previously tagged chunks, so to a certain extent it makes sense to perform an operation that is required in order to keep the allocator working, even if it involves setting what is technically a thread-wide property. The operation that we perform depends on the allocator's implementation, so arguably it belongs with the allocator. If for example we switched to mprotecting without PROT_MTE when tag checks are disabled, there would be no need to set TCO here.

If you still think that it belongs in the caller, maybe we could document that setting TCO is required, but this may change in the future.

compiler-rt/lib/scudo/standalone/wrappers_c.inc
180

Let's not use atomics for this without a use case. Due to the complexity of disabling MTE in a multi-threaded program, I don't think we should even attempt to support it for now.

I will document that this requires the program to be single threaded at the point when the function is called.

pcc marked an inline comment as done.Dec 10 2019, 1:59 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
260

Yes, that's possible, and it's handled in the same way as full reclaiming, which I forgot to comment on as well. I will add a comment for both.

pcc updated this revision to Diff 233195.Dec 10 2019, 2:17 PM
  • Address review comments

Build result: FAILURE - Could not check out parent git hash "dd9de96af155762189fbc541572c02ef44b3ca72". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

eugenis added inline comments.Dec 10 2019, 3:07 PM
compiler-rt/lib/scudo/standalone/memtag.h
36

OK, SGTM, let's keep it here.

pcc updated this revision to Diff 234329.Dec 17 2019, 9:32 AM
  • Load tags in malloc_iterate
pcc updated this revision to Diff 234332.Dec 17 2019, 9:45 AM
  • Only load tags if useMemoryTagging()

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

rankov added a subscriber: rankov.Dec 18 2019, 1:57 PM
rankov added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
510

Should you use UNLIKELY here?

compiler-rt/lib/scudo/standalone/memtag.h
36

PSTATE.TCO can be easily changed by other code. So it is not a good choice for disabling tag checks in this case.
Also, the current kernel documentation says that PSTATE.TCO is reset to 0 in signal handlers.

Another concern with disabling tagging is that when memory is deallocated, the tags will remain, so enabling tag checks again is dangerous. This should be documented. Alternative is to untag memory on deallocation after tagging is disabled, but this will add cost.

I think that it would be better to leave disabling tag checks to the caller. The tests could use PSTATE.TCO, other callers might want to use other ways like calling a prctl to disable tag checks.

51

Instead of using assembly here, you could use functions from arm_acle.h

pcc marked 3 inline comments as done.Dec 18 2019, 2:35 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
510

malloc_iterate is a rarely used debugging function, so we shouldn't worry too much about this sort of thing here I think.

compiler-rt/lib/scudo/standalone/memtag.h
36

Yes, I had intended to switch this over to using a prctl once the kernel patches to support prctl became available.

Now that I think about it, it would be a good idea to use PSTATE.TCO instead of prctl in the tests in order to avoid interfering with the "real" allocator's tag checking state during the tests. The fact that PSTATE.TCO is set to 0 during a signal handler is a good reason why the real allocator should use prctl and not TCO to disable tag checks. I think that's a good argument for moving the functionality into the caller. I'll do that then. I'll also make it clear that this is a one way operation and tag checks should not be turned back on.

51

Without __attribute__((target("mte"))) on these functions I get errors such as

fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.ldg

if I try to use the intrinsics. And with that attribute the functions don't get inlined because LLVM IR doesn't support scoping the availability of the instructions to a block (unlike .arch_extension in inline asm). So I'd prefer to stick with inline asm for now.

pcc updated this revision to Diff 234823.Dec 19 2019, 6:23 PM
pcc marked an inline comment as done.
  • Move tag check disablement to the caller

Unit tests: pass. 61010 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

pcc updated this revision to Diff 234927.Dec 20 2019, 10:58 AM
  • Formatting

Unit tests: pass. 61010 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

hctim added inline comments.Dec 23 2019, 10:08 AM
compiler-rt/lib/scudo/standalone/allocator_config.h
43

Maybe:
typedef SizeClassAllocator64<SizeClassMap, 30U, /*MaySupportMemoryTagging=*/ true> Primary;?

compiler-rt/lib/scudo/standalone/memtag.h
98

Size % 16 == 0 always here, so this could just be UntaggedEnd = Ptr + Size?

109

13.8% of chromium fuzzing-found heap OOB are > 16 bytes stride. Given that this is primary-only, the cost of retagging the OldChunk - NewChunk might be an acceptable performance penalty.

kevin.brodsky added inline comments.Dec 30 2019, 5:35 AM
compiler-rt/lib/scudo/standalone/memtag.h
31

I wonder if this is really a good thing. If libc fails to enable tag checking before the allocator is initialised (which is quite possible, given that until recently malloc() was called very early in Bionic's libc_init), then Scudo will not tag anything. Wouldn't it be possible instead to explicitly ask Scudo to use tagging when it is initialised? This would also be more consistent with the malloc_disable_memory_tagging() interface: Scudo does not take care of enabling / disabling tag checking, so arguably it shouldn't check if it is enabled either.

77

Since this asm statement is modifying memory, is it safe to use it without a "memory" clobber? It certainly isn't safe in general. Same comment for the other asm statements that use st*g.

pcc marked 5 inline comments as done.Jan 6 2020, 10:54 AM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/allocator_config.h
43

Will do

compiler-rt/lib/scudo/standalone/memtag.h
31

So the motivation behind adding the check here was along the lines of "if the application doesn't enable memory tag checks then there's no point in enabling memory tagging". But I can see the value in decoupling these two things especially since the libc might not have a chance to turn on memory tagging before the first allocation. I'll change this to be purely based on the hwcap and let the application call malloc_disable_memory_tagging() early if it doesn't want to use memory tagging.

I'd prefer not to add an explicit initialization step to the allocator since the allocation functions can in principle be called at any time, so we'd need additional complexity to handle the case where the allocation functions were called before the allocator was formally initialized.

77

I was somehow under the impression that volatile implied the "memory" clobber, but that doesn't appear to be backed up by the documentation so I'll add the clobbers here and elsewhere.

98

The caller isn't rounding Size as far as I can tell, so this isn't guaranteed to be the case.

109

This seems more like something that we'd want to do in precise mode than in mitigation mode. Recall that even with a large stride we (probabilistically) can't go outside of the bounds of the chunk. The chunk is cleared during dealloc with stzg so there's no info leak potential either.

pcc updated this revision to Diff 236479.Jan 6 2020, 4:21 PM
  • Address review comments
pcc marked 3 inline comments as done.Jan 6 2020, 4:24 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
31

I forgot about the other reason why I added this check, which was to avoid breaking the tests if the libc did not issue the prctl to enable memory tag checks. This seems like something that can be properly checked for in the tests themselves, which I've now done.

Unit tests: pass. 61259 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

kevin.brodsky added inline comments.Jan 7 2020, 8:21 AM
compiler-rt/lib/scudo/standalone/memtag.h
31

OK this makes sense, always enabling tagging in Scudo and then disabling it explicitly should be fairly robust.

77

My understanding is that volatile and "memory" do different things: the former tells the compiler not to (re)move the asm statement, while the latter tells the compiler that the the asm statement may perform reads or writes from arbitrary memory locations (forcing the compiler to reload values from memory as needed). GCC's manual has a rather good description of this: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

162

Technically "memory" is required even for memory reads. That said, in that case, the only thing that could affect ldg is a st*g, which is only done in another asm volatile statement, so "memory" is probably not absolutely needed here.

pcc updated this revision to Diff 236940.Jan 8 2020, 5:20 PM
  • Add memory clobber
pcc marked an inline comment as done.Jan 8 2020, 5:20 PM

Unit tests: pass. 61259 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Unit tests: pass. 61908 tests passed, 0 failed and 782 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

I think this looks good. I think this might not be Fuchsia compatible and could probably use some #if SCUDO_LINUX or ANDROID or top of the __aarch64__ checks.
Fuchsia will want memory tagging support at some point, I 'll check the patch on the platform once the inconsistencies I saw are addressed.

compiler-rt/lib/scudo/standalone/combined.h
164

I am assuming the compiler will inline this most of the time, but could we put that inline or FORCE_INLINE to double down?

compiler-rt/lib/scudo/standalone/memtag.h
15

If I followed properly memtag.h is included on all platforms, but I am not sure sys/*.h is available everywhere (Fuchsia doesn't have one).
So this probably requires some #if SCUDO_LINUX or something to that extent?

pcc marked 3 inline comments as done.Jan 16 2020, 12:36 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/combined.h
164

inline is implicit on inline definitions (and somewhat confusingly in C++ inline doesn't really control inlining decisions, it's more of a linkage specification), so adding it here would have no effect. I'm personally not a fan of micro-optimizing inlining decisions like this, but I suppose it wouldn't hurt to add an ALWAYS_INLINE here.

compiler-rt/lib/scudo/standalone/memtag.h
15

Yes, this will need SCUDO_LINUX, done.

pcc marked an inline comment as done.Jan 16 2020, 12:37 PM

And I verified that this patch works on Fuchsia (x64 and arm64).

pcc updated this revision to Diff 238578.Jan 16 2020, 12:37 PM
  • Address review comments

Unit tests: fail. 61907 tests passed, 1 failed and 782 were skipped.

failed: Clang.CXX/temp/temp_arg/temp_arg_template/p3-2a.cpp

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

cryptoad accepted this revision.Jan 16 2020, 1:01 PM

LGTM, Thanks for all the work Peter!

This revision is now accepted and ready to land.Jan 16 2020, 1:01 PM
This revision was automatically updated to reflect the committed changes.
bjope added a subscriber: bjope.Jan 20 2020, 5:00 AM
bjope added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
15

I also had to move the auxv include a few lines down (inside the {{#if defined(ANDROID_EXPERIMENTAL_MTE)}} guard) for things to build on my RedHat 6.10 server.
Would it make sense to do that change upstream?

pcc marked an inline comment as done.Jan 21 2020, 12:20 PM
pcc added inline comments.
compiler-rt/lib/scudo/standalone/memtag.h
15

I don't think so, we will eventually need to include it unconditionally on Linux once ANDROID_EXPERIMENTAL_MTE goes away as mentioned in the commit message. I think it would be better to change the build system so that we don't build/test scudo on Linux machines without a sys/auxv.h.

bjope added inline comments.Jan 22 2020, 1:56 AM
compiler-rt/lib/scudo/standalone/memtag.h
15

Oh yes, that would be better of course. Although, I'm not that familiar with adding such checks, so I'm not sure exactly how to do it properly.
I see sys/auxv.h being used in other places in compiler-rt as well, but maybe I don't hit those due to some existing build system checks for those parts (and maybe those checks arent't directed at checking for the presence of sys/auxv.h, but rather something else).

bjope added inline comments.Jan 29 2020, 8:25 AM
compiler-rt/lib/scudo/standalone/memtag.h
15

Proposed fix, checking that sys/auxv.h can be found: https://reviews.llvm.org/D73631