This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/lib/scudo/standalone/
-
lib/
-
scudo/
-
standalone/
2/2
allocator_config.h
16/23
combined.h
-
common.h
1/2
linux.cpp
18/33
memtag.h
-
primary32.h
2/4
primary64.h
-
tests/
-
combined_test.cpp
-
primary_test.cpp
1/2
wrappers_c.inc

Differential D70762

scudo: Add initial memory tagging support.
ClosedPublic

Authored by pcc on Nov 26 2019, 7:25 PM.

Download Raw Diff

Details

Reviewers

cryptoad
hctim
eugenis
jfb

Commits

rGc299d1981dea: scudo: Add initial memory tagging support.

Summary

When the hardware and operating system support the ARM Memory Tagging
Extension, tag primary allocation granules with a random tag. The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly. Memory is
also retagged with a random tag when it is freed, and we opportunistically
reuse the new tag when the block is reused to reduce overhead. This causes
linear buffer overflows to be caught deterministically and non-linear buffer
overflows and use-after-free to be caught probabilistically.

This feature is currently only enabled for the Android allocator
and depends on an experimental Linux kernel branch available here:
https://github.com/pcc/linux/tree/android-experimental-mte

All code that depends on the kernel branch is hidden behind a macro,
ANDROID_EXPERIMENTAL_MTE. This is the same macro that is used by the Android
platform and may only be defined in non-production configurations. When the
userspace interface is finalized the code will be updated to use the stable
interface and all #ifdef ANDROID_EXPERIMENTAL_MTE will be removed.

Depends on D70761

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pcc created this revision.Nov 26 2019, 7:25 PM

Herald added a reviewer: jfb. · View Herald TranscriptNov 26 2019, 7:25 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: Restricted Project, kristof.beyls, srhines. · View Herald Transcript

Silence warnings

Herald added a subscriber: dexonsmith. · View Herald TranscriptNov 26 2019, 7:31 PM

Build result: pass - 60330 tests passed, 0 failed and 732 were skipped.

Log files: console-log.txt, CMakeCache.txt

Harbormaster completed remote builds in B41537: Diff 231167.Nov 26 2019, 7:56 PM

Build result: pass - 60330 tests passed, 0 failed and 732 were skipped.

Log files: console-log.txt, CMakeCache.txt

Harbormaster completed remote builds in B41538: Diff 231168.Nov 26 2019, 7:59 PM

Thank you Peter this is awesome!

compiler-rt/lib/scudo/standalone/combined.h
164	nit on naming: I usually put the verb 1st and `Maybe` last. I might have been wrong with regard to LLVM style and can change the other names, but I'd like to keep the naming scheme consistent. Let me know which of the 2 is preferable.
572	I you could use `COMPILER_CHECK`, as it is used in other places. It ends up being a `static_assert` but it feel more consistent.
compiler-rt/lib/scudo/standalone/linux.cpp
39	Does this work when compiled on Android but outside of Bionic?
compiler-rt/lib/scudo/standalone/memtag.h
24	If you could use the `INLINE` macros for consistency please.
25	I assume that when you do `roundUpTo(NewSize, 16)` and all the 16-related arithmetic, it's related to the granule size. Could it be constant'd out in the whole file?
compiler-rt/lib/scudo/standalone/primary64.h
43	Maybe default to `false`?

pcc marked 6 inline comments as done.Nov 27 2019, 9:22 AM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/combined.h
164	Yeah, the `Maybe` at the beginning is more consistent with the rest of LLVM. For now I'll put it at the end and we can change it later.
572	The rest of LLVM uses `static_assert` directly. I'll change this one to `COMPILER_CHECK` but maybe this could be changed over to being consistent with LLVM as well?
compiler-rt/lib/scudo/standalone/linux.cpp
39	Yes, as long as you're building scudo as part of the Android platform (the header: https://android.googlesource.com/platform/bionic/+/900d07d6a1f3e1eca8cdbb3b1db1ceeec0acc9e2/libc/platform/bionic/mte_kernel.h is deliberately not exposed in the NDK). Unfortunately this means that the MTE tests can't run as part of check-scudo, but the situation will hopefully improve once we no longer need `ANDROID_EXPERIMENTAL_MTE`.
compiler-rt/lib/scudo/standalone/memtag.h
24	Sure, again `inline` is what the rest of LLVM uses.
25	Probably not in the inline asm parts but that only leaves a couple of places. The MTE-specific part of the code is small enough that I reckon that it's clear enough to just write out the constant.
compiler-rt/lib/scudo/standalone/primary64.h
43	Yeah, that's a good idea, I'll do that.

hctim added inline comments.Nov 27 2019, 11:05 AM

compiler-rt/lib/scudo/standalone/combined.h
313	In Chromium, ~11% of bugs are nonlinear (as determined with `Heap-buffer-flow READ\|WRITE {}` over `Heap-buffer-*flow` with a fixed deterministic size). The fixed size classes only go up to 24-byte allocations, so anything `24 < x <= [a page]` also land in this bucket - but we're also not counting wild SEGVs or UBSan errors that allow for attacker-controlled offsets... I think it worth it to have a tagged secondary - although I underderstand there's some performance implications of this. Maybe guarded behind a runtime flag?

pcc marked an inline comment as done.Nov 27 2019, 12:12 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/combined.h
313	I'm not sure where you got the number 24 from. On Android we set MaxSizeLog to 17: http://llvm-cs.pcc.me.uk/projects/compiler-rt/lib/scudo/standalone/size_class_map.h#143 so any allocations <= 2^17 bytes will use the primary allocator. Tagged secondary would be nice, but I think I'd prefer to do it in a different way. Specifically, we might consider asking the kernel folks for a way to set the "background tag" of a mapping, so that any faulted pages in the mapping get the background tag. That way, we don't pay the cost of faulting up front.

hctim added inline comments.Nov 27 2019, 12:32 PM

compiler-rt/lib/scudo/standalone/combined.h
313	This is from linux fuzzing on Chromium - where ClusterFuzz actually buckets them based on the `READ\|WRITE of size ###` from ASan. The idea sounds good - is it possible to update the comment to reflect this?

The granules either side of the allocation are tagged with tag 0

But only if the granule on the right is within the current chunk, right?

This patch does not retag memory on free, so it would not catch use-after-free. Unless I'm missing something.
It looks like a better strategy would be tagging memory _only_ on free (and realloc, and when new memory is requested from the system, too).

compiler-rt/lib/scudo/standalone/combined.h
338	Do we want to touch memory with the tagged pointer first to catch double-free & invalid-free bugs?
418–422	UNLIKELY?

In D70762#1762254, @eugenis wrote:

The granules either side of the allocation are tagged with tag 0

But only if the granule on the right is within the current chunk, right?

In the case where the granule on the right is not in the current chunk, there are two possibilities:

We are at the next chunk, which will have a tag 0 at the beginning for the header granule.
We are at the end of the mapping.

Either way we will get a SEGV when we access the next granule.

This patch does not retag memory on free, so it would not catch use-after-free. Unless I'm missing something.

Correct. We won't catch UAF unless we happen to reuse the chunk in the right way. I was originally planning to do tag on free later.

It looks like a better strategy would be tagging memory _only_ on free (and realloc, and when new memory is requested from the system, too).

Interesting. So:

On mmap we do a separate IRG for each block and tag the entire block except for first granule.
On free we IRG and tag the entire block except first granule.
On malloc we tag the granule before and after (modulo end-of-block) with 0.

So we would do about 1.5x the amount of work (because we don't know how big the malloc is going to be), with additional upfront work. Maybe the upfront work is fine on Android because of the zygote.

That said, maybe we could cut the 1.5x down to about 1x by only tagging max(half block size, usable size) on free, and the remainder on malloc.

In D70762#1762272, @pcc wrote:

In the case where the granule on the right is not in the current chunk, there are two possibilities:

We are at the next chunk, which will have a tag 0 at the beginning for the header granule.

We are at the end of the mapping.

Either way we will get a SEGV when we access the next granule.

Yes, of course. SG.

Interesting. So:

On mmap we do a separate IRG for each block and tag the entire block except for first granule.

On free we IRG and tag the entire block except first granule.

On malloc we tag the granule before and after (modulo end-of-block) with 0.

So we would do about 1.5x the amount of work (because we don't know how big the malloc is going to be), with additional upfront work. Maybe the upfront work is fine on Android because of the zygote.

That said, maybe we could cut the 1.5x down to about 1x by only tagging max(half block size, usable size) on free, and the remainder on malloc.

We don't need the upfront work if we can track the "has never been tagged" state of the chunk somewhere. Ideally, not in the chunk header to avoid paging everything in too early.
Maybe we can optimize for size of malloc <= size of the previous free by storing the size of the free() in the header.

Anyway, this beats the 2x amount of work needed to catch UAF by tagging in both malloc and free.

hctim added inline comments.Nov 27 2019, 1:49 PM

compiler-rt/lib/scudo/standalone/combined.h
338	Should be handled below in the chunk header check, no?
543	nit: newline after `if`?
compiler-rt/lib/scudo/standalone/memtag.h
27	Can we move this ifdef inside of `systemSupportsMemoryTagging`?
51	These asm stubs seem mostly abstractable - which would allow us to extend to future platforms easier, and make the intermediate [read - non-mte instructions] code easier to maintain. Looks like we could abstract away to `storeZeroTag` abd `randomTagMemory` (or similar).

The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly

It seems valuable to have the LHS and RHS of an allocation as a nonzero tag. IIUC, the chunk header is on the LHS for primary allocations, and making the header MTE-protected (the tag can be stored in the Primary allocator struct somewhere) seems like a good additional security step to make it unwriteable from a deterministic (zeroed) pointer.

compiler-rt/lib/scudo/standalone/primary64.h
193	Nit: line length
293	nit: leave newline

In D70762#1762324, @hctim wrote:

The granules
either side of the allocation are tagged with tag 0, which is normally
excluded from the set of tags that may be selected randomly

It seems valuable to have the LHS and RHS of an allocation as a nonzero tag. IIUC, the chunk header is on the LHS for primary allocations, and making the header MTE-protected (the tag can be stored in the Primary allocator struct somewhere) seems like a good additional security step to make it unwriteable from a deterministic (zeroed) pointer.

It's already protected by using tag 0, which we don't use in heap pointers, so I'm not sure what your concern is.

In D70762#1762282, @eugenis wrote:

In D70762#1762272, @pcc wrote:

In the case where the granule on the right is not in the current chunk, there are two possibilities:

We are at the next chunk, which will have a tag 0 at the beginning for the header granule.

We are at the end of the mapping.

Either way we will get a SEGV when we access the next granule.

Yes, of course. SG.

Interesting. So:

On mmap we do a separate IRG for each block and tag the entire block except for first granule.

On free we IRG and tag the entire block except first granule.

On malloc we tag the granule before and after (modulo end-of-block) with 0.

So we would do about 1.5x the amount of work (because we don't know how big the malloc is going to be), with additional upfront work. Maybe the upfront work is fine on Android because of the zygote.

That said, maybe we could cut the 1.5x down to about 1x by only tagging max(half block size, usable size) on free, and the remainder on malloc.

We don't need the upfront work if we can track the "has never been tagged" state of the chunk somewhere. Ideally, not in the chunk header to avoid paging everything in too early.
Maybe we can optimize for size of malloc <= size of the previous free by storing the size of the free() in the header.

There is already a header field 'SizeOrUnusedBytes" that stores the allocation size. When a chunk is freed, we don't disturb that field. That gives us a way to recover the size of the previous allocation. We can call getChunkFromBlock() (modifying it to accept deallocated chunks) to recover the location of the chunk header given a block.

I think we can use the header itself to store the "has never been tagged" state. If the header read as a word is equal to 0, that means that the chunk has never been used before and we need to IRG before setting tags. That won't result in early paging because by the time we read the header we've already decided to use that block for the allocation.

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

alex added a subscriber: alex.Nov 29 2019, 8:05 AM

In D70762#1762390, @pcc wrote:

There is already a header field 'SizeOrUnusedBytes" that stores the allocation size. When a chunk is freed, we don't disturb that field. That gives us a way to recover the size of the previous allocation. We can call getChunkFromBlock() (modifying it to accept deallocated chunks) to recover the location of the chunk header given a block.

I think we can use the header itself to store the "has never been tagged" state. If the header read as a word is equal to 0, that means that the chunk has never been used before and we need to IRG before setting tags. That won't result in early paging because by the time we read the header we've already decided to use that block for the allocation.

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

A point here which maybe hasn't been considered is that if reclaiming kicks in, the pages containing the freed chunks will be zero'd out, which probably invalidates assumptions about headers contents.

In D70762#1767355, @cryptoad wrote:

In D70762#1762390, @pcc wrote:

There is already a header field 'SizeOrUnusedBytes" that stores the allocation size. When a chunk is freed, we don't disturb that field. That gives us a way to recover the size of the previous allocation. We can call getChunkFromBlock() (modifying it to accept deallocated chunks) to recover the location of the chunk header given a block.

I think we can use the header itself to store the "has never been tagged" state. If the header read as a word is equal to 0, that means that the chunk has never been used before and we need to IRG before setting tags. That won't result in early paging because by the time we read the header we've already decided to use that block for the allocation.

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

A point here which maybe hasn't been considered is that if reclaiming kicks in, the pages containing the freed chunks will be zero'd out, which probably invalidates assumptions about headers contents.

When you say reclaiming you mean calling releasePagesToOS(), correct? In that case, wouldn't that cause the header to be set to 0, which would put us in the same state as if we hadn't used the chunk before?

In D70762#1767420, @pcc wrote:

When you say reclaiming you mean calling releasePagesToOS(), correct? In that case, wouldn't that cause the header to be set to 0, which would put us in the same state as if we hadn't used the chunk before?

This is correct, with the caveat that it was allocated and freed as opposed to never used.

Address review comments

The code now implements UAF checks by retagging on free.

In D70762#1767533, @cryptoad wrote:

In D70762#1767420, @pcc wrote:

When you say reclaiming you mean calling releasePagesToOS(), correct? In that case, wouldn't that cause the header to be set to 0, which would put us in the same state as if we hadn't used the chunk before?

This is correct, with the caveat that it was allocated and freed as opposed to never used.

I discovered that it is possible for a chunk to be partially reclaimed, which means that the header could still be there, but part or all of the data could be reclaimed. The code that I've added to allocate() handles the various possibilities.

In D70762#1762390, @pcc wrote:

One complication is that we need to handle the case where the new allocation has lower alignment than the old allocation. In that case, malloc will need to set tags on both sides of the allocation (because the previous free will have retagged starting from a higher address).

I tried implementing this but was uncomfortable with the level of code complexity, so the code only tries to reuse tags if our start address is the same as that of the previous allocation. This should be true in the majority of cases, so it seems fine to me.

compiler-rt/lib/scudo/standalone/combined.h
338	Yes, let's leave it to the header check.
572	Left as is due to D70793
compiler-rt/lib/scudo/standalone/memtag.h
24	Left as is due to D70793
51	I created `setRandomTag` since I needed it for tag-on-free, although I feel that it's better to keep things at the chunk level here as splitting things into too many pieces can make it harder to understand the big picture.

pcc edited the summary of this revision. (Show Details)Dec 10 2019, 11:24 AM

pcc added parent revisions: D71291: scudo: Move getChunkFromBlock() allocated check into caller. NFCI., D71292: scudo: Tweak how we align UserPtr. NFCI..

Build result: FAILURE - Could not check out parent git hash "dd9de96af155762189fbc541572c02ef44b3ca72". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

Harbormaster failed remote builds in B42233: Diff 233157!Dec 10 2019, 11:32 AM

cryptoad added inline comments.Dec 10 2019, 12:11 PM

compiler-rt/lib/scudo/standalone/combined.h
242	Maybe move this part of the comment to where the memset is?
248	There already is a `BlockEnd` variable outside this scope used for the Secondary, maybe reuse it? It might be cleaner to initialize it where the allocate is for the Primary, but then performance will suffer due to the extra `getSizeByClassId` in the fast path, which is not ideal. If you want to keep it in this block, it seems to be `const`.
280	`const`?

Address review comments

compiler-rt/lib/scudo/standalone/combined.h
248	I made this variable `const` and renamed the other one to reduce confusion.

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

Harbormaster failed remote builds in B42238: Diff 233175!Dec 10 2019, 12:38 PM

eugenis added inline comments.Dec 10 2019, 1:04 PM

compiler-rt/lib/scudo/standalone/combined.h
260	Is it possible that header was reclaimed, but data was only partially reclaimed? I think the code is correct in this case, too, but consider updating the comment.
compiler-rt/lib/scudo/standalone/memtag.h
36	I don't think it is the job of a memory allocator to mess with PSTATE.TCO. Let the caller deal with it?
compiler-rt/lib/scudo/standalone/wrappers_c.inc
180	This api is not thread safe. TCO is per-thread, I think, so this would not work in a multi-threaded program at all. Either document this fact, or, preferably, remove the TCO code and use relaxed atomic for UseMemoryTagging.

pcc marked 2 inline comments as done.Dec 10 2019, 1:53 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/memtag.h
36	I can see the argument either way. On one hand, TCO is a thread-wide property which the caller could arguably be considered responsible for. On the other, disabling tag checks is required in order to avoid tag check failures for future allocations which reuse previously tagged chunks, so to a certain extent it makes sense to perform an operation that is required in order to keep the allocator working, even if it involves setting what is technically a thread-wide property. The operation that we perform depends on the allocator's implementation, so arguably it belongs with the allocator. If for example we switched to mprotecting without PROT_MTE when tag checks are disabled, there would be no need to set TCO here. If you still think that it belongs in the caller, maybe we could document that setting TCO is required, but this may change in the future.
compiler-rt/lib/scudo/standalone/wrappers_c.inc
180	Let's not use atomics for this without a use case. Due to the complexity of disabling MTE in a multi-threaded program, I don't think we should even attempt to support it for now. I will document that this requires the program to be single threaded at the point when the function is called.

pcc marked an inline comment as done.Dec 10 2019, 1:59 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/combined.h
260	Yes, that's possible, and it's handled in the same way as full reclaiming, which I forgot to comment on as well. I will add a comment for both.

Address review comments

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

Harbormaster failed remote builds in B42244: Diff 233195!Dec 10 2019, 2:22 PM

eugenis added inline comments.Dec 10 2019, 3:07 PM

compiler-rt/lib/scudo/standalone/memtag.h
36	OK, SGTM, let's keep it here.

Load tags in malloc_iterate

Only load tags if useMemoryTagging()

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Unit tests: unknown.

clang-tidy: unknown.

clang-format: unknown.

Build artifacts: diff.json, console-log.txt

Harbormaster failed remote builds in B42669: Diff 234329!Dec 17 2019, 9:55 AM

Harbormaster failed remote builds in B42670: Diff 234332!

rankov added a subscriber: rankov.Dec 18 2019, 1:57 PM

rankov added inline comments.

compiler-rt/lib/scudo/standalone/combined.h
503	Should you use UNLIKELY here?
compiler-rt/lib/scudo/standalone/memtag.h
36	PSTATE.TCO can be easily changed by other code. So it is not a good choice for disabling tag checks in this case. Also, the current kernel documentation says that PSTATE.TCO is reset to 0 in signal handlers. Another concern with disabling tagging is that when memory is deallocated, the tags will remain, so enabling tag checks again is dangerous. This should be documented. Alternative is to untag memory on deallocation after tagging is disabled, but this will add cost. I think that it would be better to leave disabling tag checks to the caller. The tests could use PSTATE.TCO, other callers might want to use other ways like calling a prctl to disable tag checks.
51	Instead of using assembly here, you could use functions from arm_acle.h

pcc marked 3 inline comments as done.Dec 18 2019, 2:35 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/combined.h
503	malloc_iterate is a rarely used debugging function, so we shouldn't worry too much about this sort of thing here I think.
compiler-rt/lib/scudo/standalone/memtag.h
36	Yes, I had intended to switch this over to using a prctl once the kernel patches to support prctl became available. Now that I think about it, it would be a good idea to use PSTATE.TCO instead of prctl in the tests in order to avoid interfering with the "real" allocator's tag checking state during the tests. The fact that PSTATE.TCO is set to 0 during a signal handler is a good reason why the real allocator should use prctl and not TCO to disable tag checks. I think that's a good argument for moving the functionality into the caller. I'll do that then. I'll also make it clear that this is a one way operation and tag checks should not be turned back on.
51	Without `__attribute__((target("mte")))` on these functions I get errors such as fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.ldg if I try to use the intrinsics. And with that attribute the functions don't get inlined because LLVM IR doesn't support scoping the availability of the instructions to a block (unlike `.arch_extension` in inline asm). So I'd prefer to stick with inline asm for now.

Move tag check disablement to the caller

Unit tests: pass. 61010 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B42809: Diff 234823!Dec 19 2019, 6:36 PM

Formatting

Unit tests: pass. 61010 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B42846: Diff 234927!Dec 20 2019, 11:21 AM

hctim added inline comments.Dec 23 2019, 10:08 AM

compiler-rt/lib/scudo/standalone/allocator_config.h
43	Maybe: `typedef SizeClassAllocator64<SizeClassMap, 30U, /MaySupportMemoryTagging=/ true> Primary;`?
compiler-rt/lib/scudo/standalone/memtag.h
98	`Size % 16 == 0` always here, so this could just be `UntaggedEnd = Ptr + Size`?
109	13.8% of chromium fuzzing-found heap OOB are > 16 bytes stride. Given that this is primary-only, the cost of retagging the `OldChunk - NewChunk` might be an acceptable performance penalty.

kevin.brodsky added a subscriber: kevin.brodsky.Dec 30 2019, 2:38 AM

kevin.brodsky added inline comments.Dec 30 2019, 5:35 AM

compiler-rt/lib/scudo/standalone/memtag.h
31	I wonder if this is really a good thing. If libc fails to enable tag checking before the allocator is initialised (which is quite possible, given that until recently `malloc()` was called very early in Bionic's libc_init), then Scudo will not tag anything. Wouldn't it be possible instead to explicitly ask Scudo to use tagging when it is initialised? This would also be more consistent with the `malloc_disable_memory_tagging()` interface: Scudo does not take care of enabling / disabling tag checking, so arguably it shouldn't check if it is enabled either.
77	Since this asm statement is modifying memory, is it safe to use it without a "memory" clobber? It certainly isn't safe in general. Same comment for the other asm statements that use `st*g`.

pcc marked 5 inline comments as done.Jan 6 2020, 10:54 AM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/allocator_config.h
43	Will do
compiler-rt/lib/scudo/standalone/memtag.h
31	So the motivation behind adding the check here was along the lines of "if the application doesn't enable memory tag checks then there's no point in enabling memory tagging". But I can see the value in decoupling these two things especially since the libc might not have a chance to turn on memory tagging before the first allocation. I'll change this to be purely based on the hwcap and let the application call `malloc_disable_memory_tagging()` early if it doesn't want to use memory tagging. I'd prefer not to add an explicit initialization step to the allocator since the allocation functions can in principle be called at any time, so we'd need additional complexity to handle the case where the allocation functions were called before the allocator was formally initialized.
77	I was somehow under the impression that `volatile` implied the `"memory"` clobber, but that doesn't appear to be backed up by the documentation so I'll add the clobbers here and elsewhere.
98	The caller isn't rounding `Size` as far as I can tell, so this isn't guaranteed to be the case.
109	This seems more like something that we'd want to do in precise mode than in mitigation mode. Recall that even with a large stride we (probabilistically) can't go outside of the bounds of the chunk. The chunk is cleared during dealloc with stzg so there's no info leak potential either.

Address review comments

pcc marked 3 inline comments as done.Jan 6 2020, 4:24 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/memtag.h
31	I forgot about the other reason why I added this check, which was to avoid breaking the tests if the libc did not issue the prctl to enable memory tag checks. This seems like something that can be properly checked for in the tests themselves, which I've now done.

Unit tests: pass. 61259 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B43385: Diff 236479!Jan 6 2020, 4:34 PM

kevin.brodsky added inline comments.Jan 7 2020, 8:21 AM

compiler-rt/lib/scudo/standalone/memtag.h
31	OK this makes sense, always enabling tagging in Scudo and then disabling it explicitly should be fairly robust.
77	My understanding is that `volatile` and `"memory"` do different things: the former tells the compiler not to (re)move the `asm` statement, while the latter tells the compiler that the the `asm` statement may perform reads or writes from arbitrary memory locations (forcing the compiler to reload values from memory as needed). GCC's manual has a rather good description of this: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
162	Technically `"memory"` is required even for memory reads. That said, in that case, the only thing that could affect `ldg` is a `st*g`, which is only done in another `asm volatile` statement, so `"memory"` is probably not absolutely needed here.

Add memory clobber

pcc marked an inline comment as done.Jan 8 2020, 5:20 PM

Unit tests: pass. 61259 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B43546: Diff 236940!Jan 8 2020, 5:56 PM

ebase

Unit tests: pass. 61908 tests passed, 0 failed and 782 were skipped.

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster completed remote builds in B44106: Diff 238388.Jan 15 2020, 4:52 PM

I think this looks good. I think this might not be Fuchsia compatible and could probably use some #if SCUDO_LINUX or ANDROID or top of the __aarch64__ checks.
Fuchsia will want memory tagging support at some point, I 'll check the patch on the platform once the inconsistencies I saw are addressed.

compiler-rt/lib/scudo/standalone/combined.h
164	I am assuming the compiler will inline this most of the time, but could we put that `inline` or `FORCE_INLINE` to double down?
compiler-rt/lib/scudo/standalone/memtag.h
15	If I followed properly `memtag.h` is included on all platforms, but I am not sure `sys/*.h` is available everywhere (Fuchsia doesn't have one). So this probably requires some `#if SCUDO_LINUX` or something to that extent?

pcc marked 3 inline comments as done.Jan 16 2020, 12:36 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/combined.h
164	`inline` is implicit on inline definitions (and somewhat confusingly in C++ `inline` doesn't really control inlining decisions, it's more of a linkage specification), so adding it here would have no effect. I'm personally not a fan of micro-optimizing inlining decisions like this, but I suppose it wouldn't hurt to add an `ALWAYS_INLINE` here.
compiler-rt/lib/scudo/standalone/memtag.h
15	Yes, this will need `SCUDO_LINUX`, done.

And I verified that this patch works on Fuchsia (x64 and arm64).

Address review comments

Unit tests: fail. 61907 tests passed, 1 failed and 782 were skipped.

failed: Clang.CXX/temp/temp_arg/temp_arg_template/p3-2a.cpp

clang-tidy: unknown.

clang-format: pass.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Harbormaster failed remote builds in B44188: Diff 238578!Jan 16 2020, 12:57 PM

LGTM, Thanks for all the work Peter!

This revision is now accepted and ready to land.Jan 16 2020, 1:01 PM

Closed by commit rGc299d1981dea: scudo: Add initial memory tagging support. (authored by pcc). · Explain WhyJan 16 2020, 1:36 PM

This revision was automatically updated to reflect the committed changes.

bjope added a subscriber: bjope.Jan 20 2020, 5:00 AM

bjope added inline comments.

compiler-rt/lib/scudo/standalone/memtag.h
15	I also had to move the auxv include a few lines down (inside the {{#if defined(ANDROID_EXPERIMENTAL_MTE)}} guard) for things to build on my RedHat 6.10 server. Would it make sense to do that change upstream?

pcc marked an inline comment as done.Jan 21 2020, 12:20 PM

pcc added inline comments.

compiler-rt/lib/scudo/standalone/memtag.h
15	I don't think so, we will eventually need to include it unconditionally on Linux once ANDROID_EXPERIMENTAL_MTE goes away as mentioned in the commit message. I think it would be better to change the build system so that we don't build/test scudo on Linux machines without a sys/auxv.h.

bjope added inline comments.Jan 22 2020, 1:56 AM

compiler-rt/lib/scudo/standalone/memtag.h
15	Oh yes, that would be better of course. Although, I'm not that familiar with adding such checks, so I'm not sure exactly how to do it properly. I see sys/auxv.h being used in other places in compiler-rt as well, but maybe I don't hit those due to some existing build system checks for those parts (and maybe those checks arent't directed at checking for the presence of sys/auxv.h, but rather something else).

bjope added inline comments.Jan 29 2020, 8:25 AM

compiler-rt/lib/scudo/standalone/memtag.h
15	Proposed fix, checking that sys/auxv.h can be found: https://reviews.llvm.org/D73631

Revision Contents

Path

Size

compiler-rt/

lib/

scudo/

standalone/

2 lines

129 lines

1 line

8 lines

205 lines

4 lines

26 lines

tests/

combined_test.cpp

90 lines

primary_test.cpp

4 lines

wrappers_c.inc

4 lines

Diff 233175

compiler-rt/lib/scudo/standalone/allocator_config.h

Show All 34 Lines	#endif
typedef MapAllocator<> Secondary;		typedef MapAllocator<> Secondary;
template <class A> using TSDRegistryT = TSDRegistryExT<A>; // Exclusive		template <class A> using TSDRegistryT = TSDRegistryExT<A>; // Exclusive
};		};

struct AndroidConfig {		struct AndroidConfig {
using SizeClassMap = AndroidSizeClassMap;		using SizeClassMap = AndroidSizeClassMap;
#if SCUDO_CAN_USE_PRIMARY64		#if SCUDO_CAN_USE_PRIMARY64
// 1GB regions		// 1GB regions
typedef SizeClassAllocator64<SizeClassMap, 30U> Primary;		typedef SizeClassAllocator64<SizeClassMap, 30U, true> Primary;
		hctimUnsubmitted Done Reply Inline Actions Maybe: `typedef SizeClassAllocator64<SizeClassMap, 30U, /MaySupportMemoryTagging=/ true> Primary;`? hctim: Maybe: `typedef SizeClassAllocator64<SizeClassMap, 30U, /MaySupportMemoryTagging=/ true>…
		pccAuthorUnsubmitted Done Reply Inline Actions Will do pcc: Will do
#else		#else
// 512KB regions		// 512KB regions
typedef SizeClassAllocator32<SizeClassMap, 19U> Primary;		typedef SizeClassAllocator32<SizeClassMap, 19U> Primary;
#endif		#endif
typedef MapAllocator<> Secondary;		typedef MapAllocator<> Secondary;
template <class A>		template <class A>
using TSDRegistryT = TSDRegistrySharedT<A, 2U>; // Shared, max 2 TSDs.		using TSDRegistryT = TSDRegistrySharedT<A, 2U>; // Shared, max 2 TSDs.
};		};
Show All 36 Lines

compiler-rt/lib/scudo/standalone/combined.h

Show All 9 Lines
#define SCUDO_COMBINED_H_		#define SCUDO_COMBINED_H_

#include "chunk.h"		#include "chunk.h"
#include "common.h"		#include "common.h"
#include "flags.h"		#include "flags.h"
#include "flags_parser.h"		#include "flags_parser.h"
#include "interface.h"		#include "interface.h"
#include "local_cache.h"		#include "local_cache.h"
		#include "memtag.h"
#include "quarantine.h"		#include "quarantine.h"
#include "report.h"		#include "report.h"
#include "secondary.h"		#include "secondary.h"
#include "tsd.h"		#include "tsd.h"

namespace scudo {		namespace scudo {

template <class Params> class Allocator {		template <class Params> class Allocator {
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	public:
// - unlinking the local stats from the global ones (destroying the cache does		// - unlinking the local stats from the global ones (destroying the cache does
// the last two items).		// the last two items).
void commitBack(TSD<ThisT> *TSD) {		void commitBack(TSD<ThisT> *TSD) {
Quarantine.drain(&TSD->QuarantineCache,		Quarantine.drain(&TSD->QuarantineCache,
QuarantineCallback(*this, TSD->Cache));		QuarantineCallback(*this, TSD->Cache));
TSD->Cache.destroy(&Stats);		TSD->Cache.destroy(&Stats);
}		}

		void untagPointerMaybe(void Ptr) {
		cryptoadUnsubmitted Done Reply Inline Actions nit on naming: I usually put the verb 1st and `Maybe` last. I might have been wrong with regard to LLVM style and can change the other names, but I'd like to keep the naming scheme consistent. Let me know which of the 2 is preferable. cryptoad: nit on naming: I usually put the verb 1st and `Maybe` last. I might have been wrong with regard…
		pccAuthorUnsubmitted Done Reply Inline Actions Yeah, the `Maybe` at the beginning is more consistent with the rest of LLVM. For now I'll put it at the end and we can change it later. pcc: Yeah, the `Maybe` at the beginning is more consistent with the rest of LLVM. For now I'll put…
		cryptoadUnsubmitted Done Reply Inline Actions I am assuming the compiler will inline this most of the time, but could we put that `inline` or `FORCE_INLINE` to double down? cryptoad: I am assuming the compiler will inline this most of the time, but could we put that `inline` or…
		pccAuthorUnsubmitted Done Reply Inline Actions `inline` is implicit on inline definitions (and somewhat confusingly in C++ `inline` doesn't really control inlining decisions, it's more of a linkage specification), so adding it here would have no effect. I'm personally not a fan of micro-optimizing inlining decisions like this, but I suppose it wouldn't hurt to add an `ALWAYS_INLINE` here. pcc: `inline` is implicit on inline definitions (and somewhat confusingly in C++ `inline` doesn't…
		if (Primary.SupportsMemoryTagging)
		return reinterpret_cast<void *>(
		untagPointer(reinterpret_cast<uptr>(Ptr)));
		return Ptr;
		}

NOINLINE void *allocate(uptr Size, Chunk::Origin Origin,		NOINLINE void *allocate(uptr Size, Chunk::Origin Origin,
uptr Alignment = MinAlignment,		uptr Alignment = MinAlignment,
bool ZeroContents = false) {		bool ZeroContents = false) {
initThreadMaybe();		initThreadMaybe();
ZeroContents \|= static_cast<bool>(Options.ZeroContents);		ZeroContents \|= static_cast<bool>(Options.ZeroContents);

if (UNLIKELY(Alignment > MaxAlignment)) {		if (UNLIKELY(Alignment > MaxAlignment)) {
if (Options.MayReturnNull)		if (Options.MayReturnNull)
Show All 18 Lines	if (UNLIKELY(Size >= MaxAllowedMallocSize)) {
if (Options.MayReturnNull)		if (Options.MayReturnNull)
return nullptr;		return nullptr;
reportAllocationSizeTooBig(Size, NeededSize, MaxAllowedMallocSize);		reportAllocationSizeTooBig(Size, NeededSize, MaxAllowedMallocSize);
}		}
DCHECK_LE(Size, NeededSize);		DCHECK_LE(Size, NeededSize);

void *Block;		void *Block;
uptr ClassId;		uptr ClassId;
uptr BlockEnd;		uptr SecondaryBlockEnd;
if (LIKELY(PrimaryT::canAllocate(NeededSize))) {		if (LIKELY(PrimaryT::canAllocate(NeededSize))) {
ClassId = SizeClassMap::getClassIdBySize(NeededSize);		ClassId = SizeClassMap::getClassIdBySize(NeededSize);
DCHECK_NE(ClassId, 0U);		DCHECK_NE(ClassId, 0U);
bool UnlockRequired;		bool UnlockRequired;
auto *TSD = TSDRegistry.getTSDAndLock(&UnlockRequired);		auto *TSD = TSDRegistry.getTSDAndLock(&UnlockRequired);
Block = TSD->Cache.allocate(ClassId);		Block = TSD->Cache.allocate(ClassId);
if (UnlockRequired)		if (UnlockRequired)
TSD->unlock();		TSD->unlock();
} else {		} else {
ClassId = 0;		ClassId = 0;
Block =		Block = Secondary.allocate(NeededSize, Alignment, &SecondaryBlockEnd,
Secondary.allocate(NeededSize, Alignment, &BlockEnd, ZeroContents);		ZeroContents);
}		}

if (UNLIKELY(!Block)) {		if (UNLIKELY(!Block)) {
if (Options.MayReturnNull)		if (Options.MayReturnNull)
return nullptr;		return nullptr;
reportOutOfMemory(NeededSize);		reportOutOfMemory(NeededSize);
}		}

// We only need to zero the contents for Primary backed allocations. This		const uptr BlockUptr = reinterpret_cast<uptr>(Block);
// condition is not necessarily unlikely, but since memset is costly, we		const uptr UnalignedUserPtr = BlockUptr + Chunk::getHeaderSize();
// might as well mark it as such.
if (UNLIKELY(ZeroContents && ClassId))
memset(Block, 0, PrimaryT::getSizeByClassId(ClassId));

const uptr UnalignedUserPtr =
reinterpret_cast<uptr>(Block) + Chunk::getHeaderSize();
const uptr UserPtr = roundUpTo(UnalignedUserPtr, Alignment);		const uptr UserPtr = roundUpTo(UnalignedUserPtr, Alignment);

		void Ptr = reinterpret_cast<void >(UserPtr);
		void *TaggedPtr = Ptr;
		if (ClassId) {
		// We only need to zero or tag the contents for Primary backed
		// allocations. We only set tags for primary allocations in order to avoid
		// faulting potentially large numbers of pages for large secondary
		// allocations. We assume that guard pages are enough to protect these
		// allocations.
		//
		// FIXME: When the kernel provides a way to set the background tag of a
		// mapping, we should be able to tag secondary allocations as well.
		//
		// When memory tagging is enabled, zeroing the contents is done as part of
		cryptoadUnsubmitted Done Reply Inline Actions Maybe move this part of the comment to where the memset is? cryptoad: Maybe move this part of the comment to where the memset is?
		// setting the tag.
		if (UNLIKELY(useMemoryTagging())) {
		uptr PrevUserPtr;
		Chunk::UnpackedHeader Header;
		const uptr BlockEnd = BlockUptr + PrimaryT::getSizeByClassId(ClassId);
		// If possible, try to reuse the UAF tag that was set by deallocate().
		cryptoadUnsubmitted Not Done Reply Inline Actions There already is a `BlockEnd` variable outside this scope used for the Secondary, maybe reuse it? It might be cleaner to initialize it where the allocate is for the Primary, but then performance will suffer due to the extra `getSizeByClassId` in the fast path, which is not ideal. If you want to keep it in this block, it seems to be `const`. cryptoad: There already is a `BlockEnd` variable outside this scope used for the Secondary, maybe reuse…
		pccAuthorUnsubmitted Done Reply Inline Actions I made this variable `const` and renamed the other one to reduce confusion. pcc: I made this variable `const` and renamed the other one to reduce confusion.
		// For simplicity, only reuse tags if we have the same start address as
		// the previous allocation. This handles the majority of cases since
		// most allocations will not be more aligned than the minimum alignment.
		//
		// We need to handle situations involving partially reclaimed chunks,
		// and retag the reclaimed portions if necessary. There are two
		// possibilities for partial reclaiming:
		//
		// (1) Header was not reclaimed, all data was reclaimed (e.g. because
		// data started on a page boundary).
		// (2) Header was not reclaimed, data was partially reclaimed.
		//
		eugenisUnsubmitted Not Done Reply Inline Actions Is it possible that header was reclaimed, but data was only partially reclaimed? I think the code is correct in this case, too, but consider updating the comment. eugenis: Is it possible that header was reclaimed, but data was only partially reclaimed? I think the…
		pccAuthorUnsubmitted Done Reply Inline Actions Yes, that's possible, and it's handled in the same way as full reclaiming, which I forgot to comment on as well. I will add a comment for both. pcc: Yes, that's possible, and it's handled in the same way as full reclaiming, which I forgot to…
		// We can detect case (1) by loading the tag from the start
		// of the chunk. If it is zero, it means that either all data was
		// reclaimed (since we never use zero as the chunk tag), or that the
		// previous allocation was of size zero. Either way, we need to prepare
		// a new chunk from scratch.
		//
		// We can detect case (2) by moving to the next page (if covered by the
		// chunk) and loading the tag of its first granule. If it is zero, it
		// means that all following pages may need to be retagged. On the other
		// hand, if it is nonzero, we can assume that all following pages are
		// still tagged, according to the logic that if any of the pages
		// following the next page were reclaimed, the next page would have been
		// reclaimed as well.
		uptr TaggedUserPtr;
		if (getChunkFromBlock(BlockUptr, &PrevUserPtr, &Header) &&
		PrevUserPtr == UserPtr &&
		(TaggedUserPtr = loadTag(UserPtr)) != UserPtr) {
		uptr PrevEnd = TaggedUserPtr + Header.SizeOrUnusedBytes;
		const uptr NextPage = roundUpTo(TaggedUserPtr, getPageSizeCached());
		if (NextPage < PrevEnd && loadTag(NextPage) != NextPage)
		cryptoadUnsubmitted Done Reply Inline Actions `const`? cryptoad: `const`?
		PrevEnd = NextPage;
		TaggedPtr = reinterpret_cast<void *>(TaggedUserPtr);
		resizeTaggedChunk(PrevEnd, TaggedUserPtr + Size, BlockEnd);
		} else {
		TaggedPtr = prepareTaggedChunk(Ptr, Size, BlockEnd);
		}
		} else if (UNLIKELY(ZeroContents)) {
		// This condition is not necessarily unlikely, but since memset is
		// costly, we might as well mark it as such.
		memset(Block, 0, PrimaryT::getSizeByClassId(ClassId));
		}
		}

Chunk::UnpackedHeader Header = {};		Chunk::UnpackedHeader Header = {};
if (UNLIKELY(UnalignedUserPtr != UserPtr)) {		if (UNLIKELY(UnalignedUserPtr != UserPtr)) {
const uptr Offset = UserPtr - UnalignedUserPtr;		const uptr Offset = UserPtr - UnalignedUserPtr;
DCHECK_GE(Offset, 2 * sizeof(u32));		DCHECK_GE(Offset, 2 * sizeof(u32));
// The BlockMarker has no security purpose, but is specifically meant for		// The BlockMarker has no security purpose, but is specifically meant for
// the chunk iteration function that can be used in debugging situations.		// the chunk iteration function that can be used in debugging situations.
// It is the only situation where we have to locate the start of a chunk		// It is the only situation where we have to locate the start of a chunk
// based on its block address.		// based on its block address.
reinterpret_cast<u32 *>(Block)[0] = BlockMarker;		reinterpret_cast<u32 *>(Block)[0] = BlockMarker;
reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);		reinterpret_cast<u32 *>(Block)[1] = static_cast<u32>(Offset);
Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;		Header.Offset = (Offset >> MinAlignmentLog) & Chunk::OffsetMask;
}		}
Header.ClassId = ClassId & Chunk::ClassIdMask;		Header.ClassId = ClassId & Chunk::ClassIdMask;
Header.State = Chunk::State::Allocated;		Header.State = Chunk::State::Allocated;
Header.Origin = Origin & Chunk::OriginMask;		Header.Origin = Origin & Chunk::OriginMask;
Header.SizeOrUnusedBytes = (ClassId ? Size : BlockEnd - (UserPtr + Size)) &		Header.SizeOrUnusedBytes =
		(ClassId ? Size : SecondaryBlockEnd - (UserPtr + Size)) &
Chunk::SizeOrUnusedBytesMask;		Chunk::SizeOrUnusedBytesMask;
void Ptr = reinterpret_cast<void >(UserPtr);
Chunk::storeHeader(Cookie, Ptr, &Header);		Chunk::storeHeader(Cookie, Ptr, &Header);

		hctimUnsubmitted Not Done Reply Inline Actions In Chromium, ~11% of bugs are nonlinear (as determined with `Heap-buffer-flow READ\|WRITE {}` over `Heap-buffer-flow` with a fixed deterministic size). The fixed size classes only go up to 24-byte allocations, so anything `24 < x <= [a page]` also land in this bucket - but we're also not counting wild SEGVs or UBSan errors that allow for attacker-controlled offsets... I think it worth it to have a tagged secondary - although I underderstand there's some performance implications of this. Maybe guarded behind a runtime flag? hctim:* In Chromium, ~11% of bugs are nonlinear (as determined with `Heap-buffer-flow READ\|WRITE {}`…
		pccAuthorUnsubmitted Done Reply Inline Actions I'm not sure where you got the number 24 from. On Android we set MaxSizeLog to 17: http://llvm-cs.pcc.me.uk/projects/compiler-rt/lib/scudo/standalone/size_class_map.h#143 so any allocations <= 2^17 bytes will use the primary allocator. Tagged secondary would be nice, but I think I'd prefer to do it in a different way. Specifically, we might consider asking the kernel folks for a way to set the "background tag" of a mapping, so that any faulted pages in the mapping get the background tag. That way, we don't pay the cost of faulting up front. pcc: I'm not sure where you got the number 24 from. On Android we set MaxSizeLog to 17: http://llvm…
		hctimUnsubmitted Done Reply Inline Actions This is from linux fuzzing on Chromium - where ClusterFuzz actually buckets them based on the `READ\|WRITE of size ###` from ASan. The idea sounds good - is it possible to update the comment to reflect this? hctim: This is from linux fuzzing on Chromium - where ClusterFuzz actually buckets them based on the…
if (&__scudo_allocate_hook)		if (&__scudo_allocate_hook)
__scudo_allocate_hook(Ptr, Size);		__scudo_allocate_hook(TaggedPtr, Size);

return Ptr;		return TaggedPtr;
}		}

NOINLINE void deallocate(void *Ptr, Chunk::Origin Origin, uptr DeleteSize = 0,		NOINLINE void deallocate(void *Ptr, Chunk::Origin Origin, uptr DeleteSize = 0,
UNUSED uptr Alignment = MinAlignment) {		UNUSED uptr Alignment = MinAlignment) {
// For a deallocation, we only ensure minimal initialization, meaning thread		// For a deallocation, we only ensure minimal initialization, meaning thread
// local data will be left uninitialized for now (when using ELF TLS). The		// local data will be left uninitialized for now (when using ELF TLS). The
// fallback cache will be used instead. This is a workaround for a situation		// fallback cache will be used instead. This is a workaround for a situation
// where the only heap operation performed in a thread would be a free past		// where the only heap operation performed in a thread would be a free past
// the TLS destructors, ending up in initialized thread specific data never		// the TLS destructors, ending up in initialized thread specific data never
// being destroyed properly. Any other heap operation will do a full init.		// being destroyed properly. Any other heap operation will do a full init.
initThreadMaybe(/MinimalInit=/true);		initThreadMaybe(/MinimalInit=/true);

if (&__scudo_deallocate_hook)		if (&__scudo_deallocate_hook)
__scudo_deallocate_hook(Ptr);		__scudo_deallocate_hook(Ptr);

if (UNLIKELY(!Ptr))		if (UNLIKELY(!Ptr))
return;		return;
if (UNLIKELY(!isAligned(reinterpret_cast<uptr>(Ptr), MinAlignment)))		if (UNLIKELY(!isAligned(reinterpret_cast<uptr>(Ptr), MinAlignment)))
reportMisalignedPointer(AllocatorAction::Deallocating, Ptr);		reportMisalignedPointer(AllocatorAction::Deallocating, Ptr);

		Ptr = untagPointerMaybe(Ptr);
		eugenisUnsubmitted Not Done Reply Inline Actions Do we want to touch memory with the tagged pointer first to catch double-free & invalid-free bugs? eugenis: Do we want to touch memory with the tagged pointer first to catch double-free & invalid-free…
		hctimUnsubmitted Not Done Reply Inline Actions Should be handled below in the chunk header check, no? hctim: Should be handled below in the chunk header check, no?
		pccAuthorUnsubmitted Done Reply Inline Actions Yes, let's leave it to the header check. pcc: Yes, let's leave it to the header check.

Chunk::UnpackedHeader Header;		Chunk::UnpackedHeader Header;
Chunk::loadHeader(Cookie, Ptr, &Header);		Chunk::loadHeader(Cookie, Ptr, &Header);

if (UNLIKELY(Header.State != Chunk::State::Allocated))		if (UNLIKELY(Header.State != Chunk::State::Allocated))
reportInvalidChunkState(AllocatorAction::Deallocating, Ptr);		reportInvalidChunkState(AllocatorAction::Deallocating, Ptr);
if (Options.DeallocTypeMismatch) {		if (Options.DeallocTypeMismatch) {
if (Header.Origin != Origin) {		if (Header.Origin != Origin) {
// With the exception of memalign'd chunks, that can be still be free'd.		// With the exception of memalign'd chunks, that can be still be free'd.
Show All 11 Lines	NOINLINE void deallocate(void *Ptr, Chunk::Origin Origin, uptr DeleteSize = 0,
}		}

quarantineOrDeallocateChunk(Ptr, &Header, Size);		quarantineOrDeallocateChunk(Ptr, &Header, Size);
}		}

void reallocate(void OldPtr, uptr NewSize, uptr Alignment = MinAlignment) {		void reallocate(void OldPtr, uptr NewSize, uptr Alignment = MinAlignment) {
initThreadMaybe();		initThreadMaybe();

		void *OldTaggedPtr = OldPtr;
		OldPtr = untagPointerMaybe(OldPtr);

// The following cases are handled by the C wrappers.		// The following cases are handled by the C wrappers.
DCHECK_NE(OldPtr, nullptr);		DCHECK_NE(OldPtr, nullptr);
DCHECK_NE(NewSize, 0);		DCHECK_NE(NewSize, 0);

if (UNLIKELY(!isAligned(reinterpret_cast<uptr>(OldPtr), MinAlignment)))		if (UNLIKELY(!isAligned(reinterpret_cast<uptr>(OldPtr), MinAlignment)))
reportMisalignedPointer(AllocatorAction::Reallocating, OldPtr);		reportMisalignedPointer(AllocatorAction::Reallocating, OldPtr);

Chunk::UnpackedHeader OldHeader;		Chunk::UnpackedHeader OldHeader;
Show All 32 Lines	if (reinterpret_cast<uptr>(OldPtr) + NewSize <= BlockEnd) {
OldSize < NewSize ? NewSize - OldSize : OldSize - NewSize;		OldSize < NewSize ? NewSize - OldSize : OldSize - NewSize;
if (Delta <= SizeClassMap::MaxSize / 2) {		if (Delta <= SizeClassMap::MaxSize / 2) {
Chunk::UnpackedHeader NewHeader = OldHeader;		Chunk::UnpackedHeader NewHeader = OldHeader;
NewHeader.SizeOrUnusedBytes =		NewHeader.SizeOrUnusedBytes =
(ClassId ? NewSize		(ClassId ? NewSize
: BlockEnd - (reinterpret_cast<uptr>(OldPtr) + NewSize)) &		: BlockEnd - (reinterpret_cast<uptr>(OldPtr) + NewSize)) &
Chunk::SizeOrUnusedBytesMask;		Chunk::SizeOrUnusedBytesMask;
Chunk::compareExchangeHeader(Cookie, OldPtr, &NewHeader, &OldHeader);		Chunk::compareExchangeHeader(Cookie, OldPtr, &NewHeader, &OldHeader);
return OldPtr;		if (UNLIKELY(ClassId && useMemoryTagging()))
		resizeTaggedChunk(reinterpret_cast<uptr>(OldTaggedPtr) + OldSize,
		reinterpret_cast<uptr>(OldTaggedPtr) + NewSize,
		BlockEnd);
		return OldTaggedPtr;
		eugenisUnsubmitted Done Reply Inline Actions UNLIKELY? eugenis: UNLIKELY?
}		}
}		}

// Otherwise we allocate a new one, and deallocate the old one. Some		// Otherwise we allocate a new one, and deallocate the old one. Some
// allocators will allocate an even larger chunk (by a fixed factor) to		// allocators will allocate an even larger chunk (by a fixed factor) to
// allow for potential further in-place realloc. The gains of such a trick		// allow for potential further in-place realloc. The gains of such a trick
// are currently unclear.		// are currently unclear.
void *NewPtr = allocate(NewSize, Chunk::Origin::Malloc, Alignment);		void *NewPtr = allocate(NewSize, Chunk::Origin::Malloc, Alignment);
if (NewPtr) {		if (NewPtr) {
const uptr OldSize = getSize(OldPtr, &OldHeader);		const uptr OldSize = getSize(OldPtr, &OldHeader);
memcpy(NewPtr, OldPtr, Min(NewSize, OldSize));		memcpy(NewPtr, OldTaggedPtr, Min(NewSize, OldSize));
quarantineOrDeallocateChunk(OldPtr, &OldHeader, OldSize);		quarantineOrDeallocateChunk(OldPtr, &OldHeader, OldSize);
}		}
return NewPtr;		return NewPtr;
}		}

// TODO(kostyak): while this locks the Primary & Secondary, it still allows		// TODO(kostyak): while this locks the Primary & Secondary, it still allows
// pointers to be fetched from the TSD. We ultimately want to		// pointers to be fetched from the TSD. We ultimately want to
// lock the registry as well. For now, it's good enough.		// lock the registry as well. For now, it's good enough.
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	auto Lambda = [this, From, To, Callback, Arg](uptr Block) {
if (Block < From \|\| Block >= To)		if (Block < From \|\| Block >= To)
return;		return;
uptr Chunk;		uptr Chunk;
Chunk::UnpackedHeader Header;		Chunk::UnpackedHeader Header;
if (getChunkFromBlock(Block, &Chunk, &Header) &&		if (getChunkFromBlock(Block, &Chunk, &Header) &&
Header.State == Chunk::State::Allocated)		Header.State == Chunk::State::Allocated)
Callback(Chunk, getSize(reinterpret_cast<void *>(Chunk), &Header), Arg);		Callback(Chunk, getSize(reinterpret_cast<void *>(Chunk), &Header), Arg);
};		};
Primary.iterateOverBlocks(Lambda);		Primary.iterateOverBlocks(Lambda);
		rankovUnsubmitted Not Done Reply Inline Actions Should you use UNLIKELY here? rankov: Should you use UNLIKELY here?
		pccAuthorUnsubmitted Done Reply Inline Actions malloc_iterate is a rarely used debugging function, so we shouldn't worry too much about this sort of thing here I think. pcc: malloc_iterate is a rarely used debugging function, so we shouldn't worry too much about this…
Secondary.iterateOverBlocks(Lambda);		Secondary.iterateOverBlocks(Lambda);
}		}

bool canReturnNull() {		bool canReturnNull() {
initThreadMaybe();		initThreadMaybe();
return Options.MayReturnNull;		return Options.MayReturnNull;
}		}

// TODO(kostyak): implement this as a "backend" to mallopt.		// TODO(kostyak): implement this as a "backend" to mallopt.
bool setOption(UNUSED uptr Option, UNUSED uptr Value) { return false; }		bool setOption(UNUSED uptr Option, UNUSED uptr Value) { return false; }

// Return the usable size for a given chunk. Technically we lie, as we just		// Return the usable size for a given chunk. Technically we lie, as we just
// report the actual size of a chunk. This is done to counteract code actively		// report the actual size of a chunk. This is done to counteract code actively
// writing past the end of a chunk (like sqlite3) when the usable size allows		// writing past the end of a chunk (like sqlite3) when the usable size allows
// for it, which then forces realloc to copy the usable size of a chunk as		// for it, which then forces realloc to copy the usable size of a chunk as
// opposed to its actual size.		// opposed to its actual size.
uptr getUsableSize(const void *Ptr) {		uptr getUsableSize(const void *Ptr) {
initThreadMaybe();		initThreadMaybe();
if (UNLIKELY(!Ptr))		if (UNLIKELY(!Ptr))
return 0;		return 0;
		Ptr = untagPointerMaybe(const_cast<void *>(Ptr));
Chunk::UnpackedHeader Header;		Chunk::UnpackedHeader Header;
Chunk::loadHeader(Cookie, Ptr, &Header);		Chunk::loadHeader(Cookie, Ptr, &Header);
// Getting the usable size of a chunk only makes sense if it's allocated.		// Getting the usable size of a chunk only makes sense if it's allocated.
if (UNLIKELY(Header.State != Chunk::State::Allocated))		if (UNLIKELY(Header.State != Chunk::State::Allocated))
reportInvalidChunkState(AllocatorAction::Sizing, const_cast<void *>(Ptr));		reportInvalidChunkState(AllocatorAction::Sizing, const_cast<void *>(Ptr));
return getSize(Ptr, &Header);		return getSize(Ptr, &Header);
}		}

void getStats(StatCounters S) {		void getStats(StatCounters S) {
initThreadMaybe();		initThreadMaybe();
Stats.get(S);		Stats.get(S);
}		}

// Returns true if the pointer provided was allocated by the current		// Returns true if the pointer provided was allocated by the current
// allocator instance, which is compliant with tcmalloc's ownership concept.		// allocator instance, which is compliant with tcmalloc's ownership concept.
// A corrupted chunk will not be reported as owned, which is WAI.		// A corrupted chunk will not be reported as owned, which is WAI.
bool isOwned(const void *Ptr) {		bool isOwned(const void *Ptr) {
initThreadMaybe();		initThreadMaybe();
if (!Ptr \|\| !isAligned(reinterpret_cast<uptr>(Ptr), MinAlignment))		if (!Ptr \|\| !isAligned(reinterpret_cast<uptr>(Ptr), MinAlignment))
		hctimUnsubmitted Done Reply Inline Actions nit: newline after `if`? hctim: nit: newline after `if`?
return false;		return false;
		Ptr = untagPointerMaybe(const_cast<void *>(Ptr));
Chunk::UnpackedHeader Header;		Chunk::UnpackedHeader Header;
return Chunk::isValid(Cookie, Ptr, &Header) &&		return Chunk::isValid(Cookie, Ptr, &Header) &&
Header.State == Chunk::State::Allocated;		Header.State == Chunk::State::Allocated;
}		}

		bool useMemoryTagging() {
		return Primary.useMemoryTagging();
		}

		void disableMemoryTagging() {
		if (useMemoryTagging())
		disableMemoryTagChecks();
		Primary.disableMemoryTagging();
		}

private:		private:
using SecondaryT = typename Params::Secondary;		using SecondaryT = typename Params::Secondary;
typedef typename PrimaryT::SizeClassMap SizeClassMap;		typedef typename PrimaryT::SizeClassMap SizeClassMap;

static const uptr MinAlignmentLog = SCUDO_MIN_ALIGNMENT_LOG;		static const uptr MinAlignmentLog = SCUDO_MIN_ALIGNMENT_LOG;
static const uptr MaxAlignmentLog = 24U; // 16 MB seems reasonable.		static const uptr MaxAlignmentLog = 24U; // 16 MB seems reasonable.
static const uptr MinAlignment = 1UL << MinAlignmentLog;		static const uptr MinAlignment = 1UL << MinAlignmentLog;
static const uptr MaxAlignment = 1UL << MaxAlignmentLog;		static const uptr MaxAlignment = 1UL << MaxAlignmentLog;
static const uptr MaxAllowedMallocSize =		static const uptr MaxAllowedMallocSize =
FIRST_32_SECOND_64(1UL << 31, 1ULL << 40);		FIRST_32_SECOND_64(1UL << 31, 1ULL << 40);

static_assert(MinAlignment >= sizeof(Chunk::PackedHeader),		static_assert(MinAlignment >= sizeof(Chunk::PackedHeader),
		cryptoadUnsubmitted Not Done Reply Inline Actions I you could use `COMPILER_CHECK`, as it is used in other places. It ends up being a `static_assert` but it feel more consistent. cryptoad: I you could use `COMPILER_CHECK`, as it is used in other places. It ends up being a…
		pccAuthorUnsubmitted Done Reply Inline Actions The rest of LLVM uses `static_assert` directly. I'll change this one to `COMPILER_CHECK` but maybe this could be changed over to being consistent with LLVM as well? pcc: The rest of LLVM uses `static_assert` directly. I'll change this one to `COMPILER_CHECK` but…
		pccAuthorUnsubmitted Done Reply Inline Actions Left as is due to D70793 pcc: Left as is due to D70793
"Minimal alignment must at least cover a chunk header.");		"Minimal alignment must at least cover a chunk header.");
		static_assert(!PrimaryT::SupportsMemoryTagging \|\|
		MinAlignment >= archMemoryTagGranuleSize(), "");

static const u32 BlockMarker = 0x44554353U;		static const u32 BlockMarker = 0x44554353U;

GlobalStats Stats;		GlobalStats Stats;
TSDRegistryT TSDRegistry;		TSDRegistryT TSDRegistry;
PrimaryT Primary;		PrimaryT Primary;
SecondaryT Secondary;		SecondaryT Secondary;
QuarantineT Quarantine;		QuarantineT Quarantine;
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	private:

ALWAYS_INLINE void initThreadMaybe(bool MinimalInit = false) {		ALWAYS_INLINE void initThreadMaybe(bool MinimalInit = false) {
TSDRegistry.initThreadMaybe(this, MinimalInit);		TSDRegistry.initThreadMaybe(this, MinimalInit);
}		}

void quarantineOrDeallocateChunk(void Ptr, Chunk::UnpackedHeader Header,		void quarantineOrDeallocateChunk(void Ptr, Chunk::UnpackedHeader Header,
uptr Size) {		uptr Size) {
Chunk::UnpackedHeader NewHeader = *Header;		Chunk::UnpackedHeader NewHeader = *Header;
		if (UNLIKELY(NewHeader.ClassId && useMemoryTagging())) {
		uptr TaggedBegin, TaggedEnd;
		setRandomTag(Ptr, Size, &TaggedBegin, &TaggedEnd);
		}
// If the quarantine is disabled, the actual size of a chunk is 0 or larger		// If the quarantine is disabled, the actual size of a chunk is 0 or larger
// than the maximum allowed, we return a chunk directly to the backend.		// than the maximum allowed, we return a chunk directly to the backend.
// Logical Or can be short-circuited, which introduces unnecessary		// Logical Or can be short-circuited, which introduces unnecessary
// conditional jumps, so use bitwise Or and let the compiler be clever.		// conditional jumps, so use bitwise Or and let the compiler be clever.
const bool BypassQuarantine = !Quarantine.getCacheSize() \| !Size \|		const bool BypassQuarantine = !Quarantine.getCacheSize() \| !Size \|
(Size > Options.QuarantineMaxChunkSize);		(Size > Options.QuarantineMaxChunkSize);
if (BypassQuarantine) {		if (BypassQuarantine) {
NewHeader.State = Chunk::State::Available;		NewHeader.State = Chunk::State::Available;
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

compiler-rt/lib/scudo/standalone/common.h

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	constexpr uptr MaxRandomLength = 256U;			constexpr uptr MaxRandomLength = 256U;
	bool getRandom(void *Buffer, uptr Length, bool Blocking = false);			bool getRandom(void *Buffer, uptr Length, bool Blocking = false);

	// Platform memory mapping functions.			// Platform memory mapping functions.

	#define MAP_ALLOWNOMEM (1U << 0)			#define MAP_ALLOWNOMEM (1U << 0)
	#define MAP_NOACCESS (1U << 1)			#define MAP_NOACCESS (1U << 1)
	#define MAP_RESIZABLE (1U << 2)			#define MAP_RESIZABLE (1U << 2)
				#define MAP_MEMTAG (1U << 3)

	// Our platform memory mapping use is restricted to 3 scenarios:			// Our platform memory mapping use is restricted to 3 scenarios:
	// - reserve memory at a random address (MAP_NOACCESS);			// - reserve memory at a random address (MAP_NOACCESS);
	// - commit memory in a previously reserved space;			// - commit memory in a previously reserved space;
	// - commit memory at a random address.			// - commit memory at a random address.
	// As such, only a subset of parameters combinations is valid, which is checked			// As such, only a subset of parameters combinations is valid, which is checked
	// by the function implementation. The Data parameter allows to pass opaque			// by the function implementation. The Data parameter allows to pass opaque
	// platform specific data to the function.			// platform specific data to the function.
	Show All 24 Lines

compiler-rt/lib/scudo/standalone/linux.cpp

	Show All 29 Lines

	#if SCUDO_ANDROID			#if SCUDO_ANDROID
	#include <sys/prctl.h>			#include <sys/prctl.h>
	// Definitions of prctl arguments to set a vma name in Android kernels.			// Definitions of prctl arguments to set a vma name in Android kernels.
	#define ANDROID_PR_SET_VMA 0x53564d41			#define ANDROID_PR_SET_VMA 0x53564d41
	#define ANDROID_PR_SET_VMA_ANON_NAME 0			#define ANDROID_PR_SET_VMA_ANON_NAME 0
	#endif			#endif

				#ifdef ANDROID_EXPERIMENTAL_MTE
				#include <bionic/mte_kernel.h>
				cryptoadUnsubmitted Not Done Reply Inline Actions Does this work when compiled on Android but outside of Bionic? cryptoad: Does this work when compiled on Android but outside of Bionic?
				pccAuthorUnsubmitted Done Reply Inline Actions Yes, as long as you're building scudo as part of the Android platform (the header: https://android.googlesource.com/platform/bionic/+/900d07d6a1f3e1eca8cdbb3b1db1ceeec0acc9e2/libc/platform/bionic/mte_kernel.h is deliberately not exposed in the NDK). Unfortunately this means that the MTE tests can't run as part of check-scudo, but the situation will hopefully improve once we no longer need `ANDROID_EXPERIMENTAL_MTE`. pcc: Yes, as long as you're building scudo as part of the Android platform (the header: https…
				#endif

	namespace scudo {			namespace scudo {

	uptr getPageSize() { return static_cast<uptr>(sysconf(_SC_PAGESIZE)); }			uptr getPageSize() { return static_cast<uptr>(sysconf(_SC_PAGESIZE)); }

	void NORETURN die() { abort(); }			void NORETURN die() { abort(); }

	void map(void Addr, uptr Size, UNUSED const char *Name, uptr Flags,			void map(void Addr, uptr Size, UNUSED const char *Name, uptr Flags,
	UNUSED MapPlatformData *Data) {			UNUSED MapPlatformData *Data) {
	int MmapFlags = MAP_PRIVATE \| MAP_ANONYMOUS;			int MmapFlags = MAP_PRIVATE \| MAP_ANONYMOUS;
	int MmapProt;			int MmapProt;
	if (Flags & MAP_NOACCESS) {			if (Flags & MAP_NOACCESS) {
	MmapFlags \|= MAP_NORESERVE;			MmapFlags \|= MAP_NORESERVE;
	MmapProt = PROT_NONE;			MmapProt = PROT_NONE;
	} else {			} else {
	MmapProt = PROT_READ \| PROT_WRITE;			MmapProt = PROT_READ \| PROT_WRITE;
				#if defined(__aarch64__) && defined(ANDROID_EXPERIMENTAL_MTE)
				if (Flags & MAP_MEMTAG)
				MmapProt \|= PROT_MTE;
				#endif
	}			}
	if (Addr) {			if (Addr) {
	// Currently no scenario for a noaccess mapping with a fixed address.			// Currently no scenario for a noaccess mapping with a fixed address.
	DCHECK_EQ(Flags & MAP_NOACCESS, 0);			DCHECK_EQ(Flags & MAP_NOACCESS, 0);
	MmapFlags \|= MAP_FIXED;			MmapFlags \|= MAP_FIXED;
	}			}
	void *P = mmap(Addr, Size, MmapProt, MmapFlags, -1, 0);			void *P = mmap(Addr, Size, MmapProt, MmapFlags, -1, 0);
	if (P == MAP_FAILED) {			if (P == MAP_FAILED) {
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

compiler-rt/lib/scudo/standalone/memtag.h

This file was added.

				//===-- memtag.h ------------------------------------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef SCUDO_MEMTAG_H_
				#define SCUDO_MEMTAG_H_

				#include "internal_defs.h"

				#include <sys/auxv.h>
				#if defined(ANDROID_EXPERIMENTAL_MTE)
				cryptoadUnsubmitted Not Done Reply Inline Actions If I followed properly `memtag.h` is included on all platforms, but I am not sure `sys/.h` is available everywhere (Fuchsia doesn't have one). So this probably requires some `#if SCUDO_LINUX` or something to that extent? cryptoad:* If I followed properly `memtag.h` is included on all platforms, but I am not sure `sys/*.h` is…
				pccAuthorUnsubmitted Done Reply Inline Actions Yes, this will need `SCUDO_LINUX`, done. pcc: Yes, this will need `SCUDO_LINUX`, done.
				bjopeUnsubmitted Not Done Reply Inline Actions I also had to move the auxv include a few lines down (inside the {{#if defined(ANDROID_EXPERIMENTAL_MTE)}} guard) for things to build on my RedHat 6.10 server. Would it make sense to do that change upstream? bjope: I also had to move the auxv include a few lines down (inside the {{#if defined…
				pccAuthorUnsubmitted Done Reply Inline Actions I don't think so, we will eventually need to include it unconditionally on Linux once ANDROID_EXPERIMENTAL_MTE goes away as mentioned in the commit message. I think it would be better to change the build system so that we don't build/test scudo on Linux machines without a sys/auxv.h. pcc: I don't think so, we will eventually need to include it unconditionally on Linux once…
				bjopeUnsubmitted Not Done Reply Inline Actions Oh yes, that would be better of course. Although, I'm not that familiar with adding such checks, so I'm not sure exactly how to do it properly. I see sys/auxv.h being used in other places in compiler-rt as well, but maybe I don't hit those due to some existing build system checks for those parts (and maybe those checks arent't directed at checking for the presence of sys/auxv.h, but rather something else). bjope: Oh yes, that would be better of course. Although, I'm not that familiar with adding such checks…
				bjopeUnsubmitted Not Done Reply Inline Actions Proposed fix, checking that sys/auxv.h can be found: https://reviews.llvm.org/D73631 bjope: Proposed fix, checking that sys/auxv.h can be found: https://reviews.llvm.org/D73631
				#include <bionic/mte_kernel.h>
				#endif

				namespace scudo {

				#if defined(__aarch64__)

				inline constexpr bool archSupportsMemoryTagging() { return true; }
				inline constexpr size_t archMemoryTagGranuleSize() { return 16; }
				cryptoadUnsubmitted Not Done Reply Inline Actions If you could use the `INLINE` macros for consistency please. cryptoad: If you could use the `INLINE` macros for consistency please.
				pccAuthorUnsubmitted Done Reply Inline Actions Sure, again `inline` is what the rest of LLVM uses. pcc: Sure, again `inline` is what the rest of LLVM uses.
				pccAuthorUnsubmitted Done Reply Inline Actions Left as is due to D70793 pcc: Left as is due to D70793

				cryptoadUnsubmitted Not Done Reply Inline Actions I assume that when you do `roundUpTo(NewSize, 16)` and all the 16-related arithmetic, it's related to the granule size. Could it be constant'd out in the whole file? cryptoad: I assume that when you do `roundUpTo(NewSize, 16)` and all the 16-related arithmetic, it's…
				pccAuthorUnsubmitted Done Reply Inline Actions Probably not in the inline asm parts but that only leaves a couple of places. The MTE-specific part of the code is small enough that I reckon that it's clear enough to just write out the constant. pcc: Probably not in the inline asm parts but that only leaves a couple of places. The MTE-specific…
				inline bool systemSupportsMemoryTagging() {
				#if defined(ANDROID_EXPERIMENTAL_MTE)
				hctimUnsubmitted Done Reply Inline Actions Can we move this ifdef inside of `systemSupportsMemoryTagging`? hctim: Can we move this ifdef inside of `systemSupportsMemoryTagging`?
				return getauxval(AT_HWCAP2) & HWCAP2_MTE;
				#else
				return false;
				#endif
				kevin.brodskyUnsubmitted Not Done Reply Inline Actions I wonder if this is really a good thing. If libc fails to enable tag checking before the allocator is initialised (which is quite possible, given that until recently `malloc()` was called very early in Bionic's libc_init), then Scudo will not tag anything. Wouldn't it be possible instead to explicitly ask Scudo to use tagging when it is initialised? This would also be more consistent with the `malloc_disable_memory_tagging()` interface: Scudo does not take care of enabling / disabling tag checking, so arguably it shouldn't check if it is enabled either. kevin.brodsky: I wonder if this is really a good thing. If libc fails to enable tag checking before the…
				pccAuthorUnsubmitted Done Reply Inline Actions So the motivation behind adding the check here was along the lines of "if the application doesn't enable memory tag checks then there's no point in enabling memory tagging". But I can see the value in decoupling these two things especially since the libc might not have a chance to turn on memory tagging before the first allocation. I'll change this to be purely based on the hwcap and let the application call `malloc_disable_memory_tagging()` early if it doesn't want to use memory tagging. I'd prefer not to add an explicit initialization step to the allocator since the allocation functions can in principle be called at any time, so we'd need additional complexity to handle the case where the allocation functions were called before the allocator was formally initialized. pcc: So the motivation behind adding the check here was along the lines of "if the application…
				pccAuthorUnsubmitted Done Reply Inline Actions I forgot about the other reason why I added this check, which was to avoid breaking the tests if the libc did not issue the prctl to enable memory tag checks. This seems like something that can be properly checked for in the tests themselves, which I've now done. pcc: I forgot about the other reason why I added this check, which was to avoid breaking the tests…
				kevin.brodskyUnsubmitted Not Done Reply Inline Actions OK this makes sense, always enabling tagging in Scudo and then disabling it explicitly should be fairly robust. kevin.brodsky: OK this makes sense, always enabling tagging in Scudo and then disabling it explicitly should…
				}

				inline void disableMemoryTagChecks() {
				__asm__ __volatile__(".arch_extension mte; msr tco, #1");
				}
				eugenisUnsubmitted Not Done Reply Inline Actions I don't think it is the job of a memory allocator to mess with PSTATE.TCO. Let the caller deal with it? eugenis: I don't think it is the job of a memory allocator to mess with PSTATE.TCO. Let the caller deal…
				pccAuthorUnsubmitted Done Reply Inline Actions I can see the argument either way. On one hand, TCO is a thread-wide property which the caller could arguably be considered responsible for. On the other, disabling tag checks is required in order to avoid tag check failures for future allocations which reuse previously tagged chunks, so to a certain extent it makes sense to perform an operation that is required in order to keep the allocator working, even if it involves setting what is technically a thread-wide property. The operation that we perform depends on the allocator's implementation, so arguably it belongs with the allocator. If for example we switched to mprotecting without PROT_MTE when tag checks are disabled, there would be no need to set TCO here. If you still think that it belongs in the caller, maybe we could document that setting TCO is required, but this may change in the future. pcc: I can see the argument either way. On one hand, TCO is a thread-wide property which the caller…
				eugenisUnsubmitted Not Done Reply Inline Actions OK, SGTM, let's keep it here. eugenis: OK, SGTM, let's keep it here.
				rankovUnsubmitted Done Reply Inline Actions PSTATE.TCO can be easily changed by other code. So it is not a good choice for disabling tag checks in this case. Also, the current kernel documentation says that PSTATE.TCO is reset to 0 in signal handlers. Another concern with disabling tagging is that when memory is deallocated, the tags will remain, so enabling tag checks again is dangerous. This should be documented. Alternative is to untag memory on deallocation after tagging is disabled, but this will add cost. I think that it would be better to leave disabling tag checks to the caller. The tests could use PSTATE.TCO, other callers might want to use other ways like calling a prctl to disable tag checks. rankov: PSTATE.TCO can be easily changed by other code. So it is not a good choice for disabling tag…
				pccAuthorUnsubmitted Done Reply Inline Actions Yes, I had intended to switch this over to using a prctl once the kernel patches to support prctl became available. Now that I think about it, it would be a good idea to use PSTATE.TCO instead of prctl in the tests in order to avoid interfering with the "real" allocator's tag checking state during the tests. The fact that PSTATE.TCO is set to 0 during a signal handler is a good reason why the real allocator should use prctl and not TCO to disable tag checks. I think that's a good argument for moving the functionality into the caller. I'll do that then. I'll also make it clear that this is a one way operation and tag checks should not be turned back on. pcc: Yes, I had intended to switch this over to using a prctl once the kernel patches to support…

				inline void enableMemoryTagChecksTestOnly() {
				__asm__ __volatile__(".arch_extension mte; msr tco, #0");
				}

				inline uptr untagPointer(uptr Ptr) { return Ptr & ((1ULL << 56) - 1); }

				inline void setRandomTag(void Ptr, uptr Size, uptr TaggedBegin,
				uptr *TaggedEnd) {
				void *End;
				__asm__ __volatile__(
				R"(
				.arch_extension mte

				// Set a random tag for Ptr in TaggedPtr. This needs to happen even if
				hctimUnsubmitted Not Done Reply Inline Actions These asm stubs seem mostly abstractable - which would allow us to extend to future platforms easier, and make the intermediate [read - non-mte instructions] code easier to maintain. Looks like we could abstract away to `storeZeroTag` abd `randomTagMemory` (or similar). hctim: These asm stubs seem mostly abstractable - which would allow us to extend to future platforms…
				pccAuthorUnsubmitted Done Reply Inline Actions I created `setRandomTag` since I needed it for tag-on-free, although I feel that it's better to keep things at the chunk level here as splitting things into too many pieces can make it harder to understand the big picture. pcc: I created `setRandomTag` since I needed it for tag-on-free, although I feel that it's better to…
				rankovUnsubmitted Not Done Reply Inline Actions Instead of using assembly here, you could use functions from arm_acle.h rankov: Instead of using assembly here, you could use functions from arm_acle.h
				pccAuthorUnsubmitted Done Reply Inline Actions Without `__attribute__((target("mte")))` on these functions I get errors such as fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.ldg if I try to use the intrinsics. And with that attribute the functions don't get inlined because LLVM IR doesn't support scoping the availability of the instructions to a block (unlike `.arch_extension` in inline asm). So I'd prefer to stick with inline asm for now. pcc: Without `__attribute__((target("mte")))` on these functions I get errors such as ``` fatal…
				// Size = 0 so that TaggedPtr ends up pointing at a valid address.
				irg %[TaggedPtr], %[Ptr]
				mov %[Cur], %[TaggedPtr]

				// Skip the loop if Size = 0. We don't want to do any tagging in this case.
				cbz %[Size], 2f

				// Set the memory tag of the region
				// [TaggedPtr, TaggedPtr + roundUpTo(Size, 16))
				// to the pointer tag stored in TaggedPtr.
				add %[End], %[TaggedPtr], %[Size]

				1:
				stzg %[Cur], [%[Cur]], #16
				cmp %[Cur], %[End]
				b.lt 1b

				2:
				)"
				: [ TaggedPtr ] "=&r"(TaggedBegin), [ Cur ] "=&r"(TaggedEnd), [ End ] "=&r"(End)
				: [ Ptr ] "r"(Ptr), [ Size ] "r"(Size));
				}

				inline void prepareTaggedChunk(void Ptr, size_t Size, uptr BlockEnd) {
				// Prepare the granule before the chunk to store the chunk header by setting
				// its tag to 0. Normally its tag will already be 0, but in the case where a
				kevin.brodskyUnsubmitted Done Reply Inline Actions Since this asm statement is modifying memory, is it safe to use it without a "memory" clobber? It certainly isn't safe in general. Same comment for the other asm statements that use `stg`. kevin.brodsky:* Since this asm statement is modifying memory, is it safe to use it without a "memory" clobber?
				pccAuthorUnsubmitted Done Reply Inline Actions I was somehow under the impression that `volatile` implied the `"memory"` clobber, but that doesn't appear to be backed up by the documentation so I'll add the clobbers here and elsewhere. pcc: I was somehow under the impression that `volatile` implied the `"memory"` clobber, but that…
				kevin.brodskyUnsubmitted Not Done Reply Inline Actions My understanding is that `volatile` and `"memory"` do different things: the former tells the compiler not to (re)move the `asm` statement, while the latter tells the compiler that the the `asm` statement may perform reads or writes from arbitrary memory locations (forcing the compiler to reload values from memory as needed). GCC's manual has a rather good description of this: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html kevin.brodsky: My understanding is that `volatile` and `"memory"` do different things: the former tells the…
				// chunk holding a low alignment allocation is reused for a higher alignment
				// allocation, the chunk may already have a non-zero tag from the previous
				// allocation.
				__asm__ __volatile__(".arch_extension mte; stg %0, [%0, #-16]" : : "r"(Ptr));

				uptr TaggedBegin, TaggedEnd;
				setRandomTag(Ptr, Size, &TaggedBegin, &TaggedEnd);

				// Finally, set the tag of the granule past the end of the allocation to 0,
				// to catch linear overflows even if a previous larger allocation used the
				// same block and tag. Only do this if the granule past the end is in our
				// block, because this would otherwise lead to a SEGV if the allocation
				// covers the entire block and our block is at the end of a mapping. The tag
				// of the next block's header granule will be set to 0, so it will serve the
				// purpose of catching linear overflows in this case.
				uptr UntaggedEnd = untagPointer(TaggedEnd);
				if (UntaggedEnd != BlockEnd)
				__asm__ __volatile__(".arch_extension mte; stg %0, [%0]"
				:
				: "r"(UntaggedEnd));
				return reinterpret_cast<void *>(TaggedBegin);
				hctimUnsubmitted Not Done Reply Inline Actions `Size % 16 == 0` always here, so this could just be `UntaggedEnd = Ptr + Size`? hctim: `Size % 16 == 0` always here, so this could just be `UntaggedEnd = Ptr + Size`?
				pccAuthorUnsubmitted Done Reply Inline Actions The caller isn't rounding `Size` as far as I can tell, so this isn't guaranteed to be the case. pcc: The caller isn't rounding `Size` as far as I can tell, so this isn't guaranteed to be the case.
				}

				inline void resizeTaggedChunk(uptr OldPtr, uptr NewPtr, uptr BlockEnd) {
				uptr RoundOldPtr = roundUpTo(OldPtr, 16);
				if (RoundOldPtr >= NewPtr) {
				// If the allocation is shrinking we just need to set the tag past the end
				// of the allocation to 0. See explanation in prepareTaggedChunk above.
				uptr RoundNewPtr = untagPointer(roundUpTo(NewPtr, 16));
				if (RoundNewPtr != BlockEnd)
				__asm__ __volatile__(".arch_extension mte; stg %0, [%0]"
				:
				hctimUnsubmitted Not Done Reply Inline Actions 13.8% of chromium fuzzing-found heap OOB are > 16 bytes stride. Given that this is primary-only, the cost of retagging the `OldChunk - NewChunk` might be an acceptable performance penalty. hctim: 13.8% of chromium fuzzing-found heap OOB are > 16 bytes stride. Given that this is primary-only…
				pccAuthorUnsubmitted Done Reply Inline Actions This seems more like something that we'd want to do in precise mode than in mitigation mode. Recall that even with a large stride we (probabilistically) can't go outside of the bounds of the chunk. The chunk is cleared during dealloc with stzg so there's no info leak potential either. pcc: This seems more like something that we'd want to do in precise mode than in mitigation mode.
				: "r"(RoundNewPtr));
				return;
				}

				__asm__ __volatile__(R"(
				.arch_extension mte

				// Set the memory tag of the region
				// [roundUpTo(OldPtr, 16), roundUpTo(NewPtr, 16))
				// to the pointer tag stored in OldPtr.
				1:
				stzg %[Cur], [%[Cur]], #16
				cmp %[Cur], %[End]
				b.lt 1b

				// Finally, set the tag of the granule past the end of the allocation to 0.
				and %[Cur], %[Cur], #(1 << 56) - 1
				cmp %[Cur], %[BlockEnd]
				b.eq 2f
				stg %[Cur], [%[Cur]]

				2:
				)"
				: [ Cur ] "+&r"(RoundOldPtr), [ End ] "+&r"(NewPtr)
				: [ BlockEnd ] "r"(BlockEnd));
				}

				inline uptr tagPointer(uptr UntaggedPtr, uptr Tag) {
				return UntaggedPtr \| (Tag & (0xfUL << 56));
				}

				inline uptr loadTag(uptr Ptr) {
				uptr TaggedPtr = Ptr;
				__asm__ __volatile__(".arch_extension mte; ldg %0, [%0]"
				: "+r"(TaggedPtr));
				return TaggedPtr;
				}

				#else

				inline constexpr bool archSupportsMemoryTagging() { return false; }

				inline bool systemSupportsMemoryTagging() {
				UNREACHABLE("memory tagging not supported");
				}

				inline size_t archMemoryTagGranuleSize() {
				UNREACHABLE("memory tagging not supported");
				}

				inline void disableMemoryTagChecks() {
				UNREACHABLE("memory tagging not supported");
				}
				kevin.brodskyUnsubmitted Done Reply Inline Actions Technically `"memory"` is required even for memory reads. That said, in that case, the only thing that could affect `ldg` is a `stg`, which is only done in another `asm volatile` statement, so `"memory"` is probably not absolutely needed here. kevin.brodsky:* Technically `"memory"` is required even for memory reads. That said, in that case, the only…

				inline void enableMemoryTagChecksTestOnly() {
				UNREACHABLE("memory tagging not supported");
				}

				inline uptr untagPointer(uptr Ptr) {
				(void)Ptr;
				UNREACHABLE("memory tagging not supported");
				}

				inline void setRandomTag(void Ptr, uptr Size, uptr TaggedBegin,
				uptr *TaggedEnd) {
				(void)Ptr;
				(void)Size;
				(void)TaggedBegin;
				(void)TaggedEnd;
				UNREACHABLE("memory tagging not supported");
				}

				inline void prepareTaggedChunk(void Ptr, size_t Size, uptr BlockEnd) {
				(void)Ptr;
				(void)Size;
				(void)BlockEnd;
				UNREACHABLE("memory tagging not supported");
				}

				inline void resizeTaggedChunk(uptr OldPtr, uptr NewPtr, uptr BlockEnd) {
				(void)OldPtr;
				(void)NewPtr;
				(void)BlockEnd;
				UNREACHABLE("memory tagging not supported");
				}

				inline uptr loadTag(uptr Ptr) {
				(void)Ptr;
				UNREACHABLE("memory tagging not supported");
				}

				#endif

				}

				#endif

compiler-rt/lib/scudo/standalone/primary32.h

Show All 40 Lines
template <class SizeClassMapT, uptr RegionSizeLog> class SizeClassAllocator32 {		template <class SizeClassMapT, uptr RegionSizeLog> class SizeClassAllocator32 {
public:		public:
typedef SizeClassMapT SizeClassMap;		typedef SizeClassMapT SizeClassMap;
// Regions should be large enough to hold the largest Block.		// Regions should be large enough to hold the largest Block.
static_assert((1UL << RegionSizeLog) >= SizeClassMap::MaxSize, "");		static_assert((1UL << RegionSizeLog) >= SizeClassMap::MaxSize, "");
typedef SizeClassAllocator32<SizeClassMapT, RegionSizeLog> ThisT;		typedef SizeClassAllocator32<SizeClassMapT, RegionSizeLog> ThisT;
typedef SizeClassAllocatorLocalCache<ThisT> CacheT;		typedef SizeClassAllocatorLocalCache<ThisT> CacheT;
typedef typename CacheT::TransferBatch TransferBatch;		typedef typename CacheT::TransferBatch TransferBatch;
		static const bool SupportsMemoryTagging = false;

static uptr getSizeByClassId(uptr ClassId) {		static uptr getSizeByClassId(uptr ClassId) {
return (ClassId == SizeClassMap::BatchClassId)		return (ClassId == SizeClassMap::BatchClassId)
? sizeof(TransferBatch)		? sizeof(TransferBatch)
: SizeClassMap::getSizeByClassId(ClassId);		: SizeClassMap::getSizeByClassId(ClassId);
}		}

static bool canAllocate(uptr Size) { return Size <= SizeClassMap::MaxSize; }		static bool canAllocate(uptr Size) { return Size <= SizeClassMap::MaxSize; }
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	for (uptr I = 0; I < NumClasses; I++) {
continue;		continue;
SizeClassInfo *Sci = getSizeClassInfo(I);		SizeClassInfo *Sci = getSizeClassInfo(I);
ScopedLock L(Sci->Mutex);		ScopedLock L(Sci->Mutex);
TotalReleasedBytes += releaseToOSMaybe(Sci, I, /Force=/true);		TotalReleasedBytes += releaseToOSMaybe(Sci, I, /Force=/true);
}		}
return TotalReleasedBytes;		return TotalReleasedBytes;
}		}

		bool useMemoryTagging() { return false; }
		void disableMemoryTagging() {}

private:		private:
static const uptr NumClasses = SizeClassMap::NumClasses;		static const uptr NumClasses = SizeClassMap::NumClasses;
static const uptr RegionSize = 1UL << RegionSizeLog;		static const uptr RegionSize = 1UL << RegionSizeLog;
static const uptr NumRegions = SCUDO_MMAP_RANGE_SIZE >> RegionSizeLog;		static const uptr NumRegions = SCUDO_MMAP_RANGE_SIZE >> RegionSizeLog;
#if SCUDO_WORDSIZE == 32U		#if SCUDO_WORDSIZE == 32U
typedef FlatByteMap<NumRegions> ByteMap;		typedef FlatByteMap<NumRegions> ByteMap;
#else		#else
typedef TwoLevelByteMap<(NumRegions >> 12), 1UL << 12> ByteMap;		typedef TwoLevelByteMap<(NumRegions >> 12), 1UL << 12> ByteMap;
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

compiler-rt/lib/scudo/standalone/primary64.h

//===-- primary64.h ---------------------------------------------- C++ --===//		//===-- primary64.h ---------------------------------------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef SCUDO_PRIMARY64_H_		#ifndef SCUDO_PRIMARY64_H_
#define SCUDO_PRIMARY64_H_		#define SCUDO_PRIMARY64_H_

#include "bytemap.h"		#include "bytemap.h"
#include "common.h"		#include "common.h"
#include "list.h"		#include "list.h"
#include "local_cache.h"		#include "local_cache.h"
		#include "memtag.h"
#include "release.h"		#include "release.h"
#include "stats.h"		#include "stats.h"
#include "string_utils.h"		#include "string_utils.h"

namespace scudo {		namespace scudo {

// SizeClassAllocator64 is an allocator tuned for 64-bit address space.		// SizeClassAllocator64 is an allocator tuned for 64-bit address space.
//		//
Show All 9 Lines
//		//
// The 1st Region (for size class 0) holds the TransferBatches. This is a		// The 1st Region (for size class 0) holds the TransferBatches. This is a
// structure used to transfer arrays of available pointers from the class size		// structure used to transfer arrays of available pointers from the class size
// freelist to the thread specific freelist, and back.		// freelist to the thread specific freelist, and back.
//		//
// The memory used by this allocator is never unmapped, but can be partially		// The memory used by this allocator is never unmapped, but can be partially
// released if the platform allows for it.		// released if the platform allows for it.

template <class SizeClassMapT, uptr RegionSizeLog> class SizeClassAllocator64 {		template <class SizeClassMapT, uptr RegionSizeLog,
		bool MaySupportMemoryTagging = false>
		cryptoadUnsubmitted Not Done Reply Inline Actions Maybe default to `false`? cryptoad: Maybe default to `false`?
		pccAuthorUnsubmitted Done Reply Inline Actions Yeah, that's a good idea, I'll do that. pcc: Yeah, that's a good idea, I'll do that.
		class SizeClassAllocator64 {
public:		public:
typedef SizeClassMapT SizeClassMap;		typedef SizeClassMapT SizeClassMap;
typedef SizeClassAllocator64<SizeClassMap, RegionSizeLog> ThisT;		typedef SizeClassAllocator64<SizeClassMap, RegionSizeLog,
		MaySupportMemoryTagging>
		ThisT;
typedef SizeClassAllocatorLocalCache<ThisT> CacheT;		typedef SizeClassAllocatorLocalCache<ThisT> CacheT;
typedef typename CacheT::TransferBatch TransferBatch;		typedef typename CacheT::TransferBatch TransferBatch;
		static const bool SupportsMemoryTagging =
		MaySupportMemoryTagging && archSupportsMemoryTagging();

static uptr getSizeByClassId(uptr ClassId) {		static uptr getSizeByClassId(uptr ClassId) {
return (ClassId == SizeClassMap::BatchClassId)		return (ClassId == SizeClassMap::BatchClassId)
? sizeof(TransferBatch)		? sizeof(TransferBatch)
: SizeClassMap::getSizeByClassId(ClassId);		: SizeClassMap::getSizeByClassId(ClassId);
}		}

static bool canAllocate(uptr Size) { return Size <= SizeClassMap::MaxSize; }		static bool canAllocate(uptr Size) { return Size <= SizeClassMap::MaxSize; }
Show All 25 Lines	for (uptr I = 0; I < NumClasses; I++) {
// limit is mostly arbitrary and based on empirical observations.		// limit is mostly arbitrary and based on empirical observations.
// TODO(kostyak): make the lower limit a runtime option		// TODO(kostyak): make the lower limit a runtime option
Region->CanRelease = (ReleaseToOsInterval >= 0) &&		Region->CanRelease = (ReleaseToOsInterval >= 0) &&
(I != SizeClassMap::BatchClassId) &&		(I != SizeClassMap::BatchClassId) &&
(getSizeByClassId(I) >= (PageSize / 32));		(getSizeByClassId(I) >= (PageSize / 32));
Region->RandState = getRandomU32(&Seed);		Region->RandState = getRandomU32(&Seed);
}		}
ReleaseToOsIntervalMs = ReleaseToOsInterval;		ReleaseToOsIntervalMs = ReleaseToOsInterval;

		if (SupportsMemoryTagging)
		UseMemoryTagging = systemSupportsMemoryTagging();
}		}
void init(s32 ReleaseToOsInterval) {		void init(s32 ReleaseToOsInterval) {
memset(this, 0, sizeof(*this));		memset(this, 0, sizeof(*this));
initLinkerInitialized(ReleaseToOsInterval);		initLinkerInitialized(ReleaseToOsInterval);
}		}

void unmapTestOnly() {		void unmapTestOnly() {
unmap(reinterpret_cast<void *>(PrimaryBase), PrimarySize, UNMAP_ALL, &Data);		unmap(reinterpret_cast<void *>(PrimaryBase), PrimarySize, UNMAP_ALL, &Data);
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	for (uptr I = 0; I < NumClasses; I++) {
continue;		continue;
RegionInfo *Region = getRegionInfo(I);		RegionInfo *Region = getRegionInfo(I);
ScopedLock L(Region->Mutex);		ScopedLock L(Region->Mutex);
TotalReleasedBytes += releaseToOSMaybe(Region, I, /Force=/true);		TotalReleasedBytes += releaseToOSMaybe(Region, I, /Force=/true);
}		}
return TotalReleasedBytes;		return TotalReleasedBytes;
}		}

		bool useMemoryTagging() const {
		hctimUnsubmitted Done Reply Inline Actions Nit: line length hctim: Nit: line length
		return SupportsMemoryTagging && UseMemoryTagging;
		}
		void disableMemoryTagging() { UseMemoryTagging = false; }

private:		private:
static const uptr RegionSize = 1UL << RegionSizeLog;		static const uptr RegionSize = 1UL << RegionSizeLog;
static const uptr NumClasses = SizeClassMap::NumClasses;		static const uptr NumClasses = SizeClassMap::NumClasses;
static const uptr PrimarySize = RegionSize * NumClasses;		static const uptr PrimarySize = RegionSize * NumClasses;

// Call map for user memory with at least this size.		// Call map for user memory with at least this size.
static const uptr MapSizeIncrement = 1UL << 17;		static const uptr MapSizeIncrement = 1UL << 17;
// Fill at most this number of batches from the newly map'd memory.		// Fill at most this number of batches from the newly map'd memory.
Show All 25 Lines	struct ALIGNED(SCUDO_CACHE_LINE_SIZE) RegionInfo {
ReleaseToOsInfo ReleaseInfo;		ReleaseToOsInfo ReleaseInfo;
};		};
static_assert(sizeof(RegionInfo) % SCUDO_CACHE_LINE_SIZE == 0, "");		static_assert(sizeof(RegionInfo) % SCUDO_CACHE_LINE_SIZE == 0, "");

uptr PrimaryBase;		uptr PrimaryBase;
RegionInfo *RegionInfoArray;		RegionInfo *RegionInfoArray;
MapPlatformData Data;		MapPlatformData Data;
s32 ReleaseToOsIntervalMs;		s32 ReleaseToOsIntervalMs;
		bool UseMemoryTagging;

RegionInfo *getRegionInfo(uptr ClassId) const {		RegionInfo *getRegionInfo(uptr ClassId) const {
DCHECK_LT(ClassId, NumClasses);		DCHECK_LT(ClassId, NumClasses);
return &RegionInfoArray[ClassId];		return &RegionInfoArray[ClassId];
}		}

uptr getRegionBaseByClassId(uptr ClassId) const {		uptr getRegionBaseByClassId(uptr ClassId) const {
return PrimaryBase + (ClassId << RegionSizeLog);		return PrimaryBase + (ClassId << RegionSizeLog);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (TotalUserBytes > MappedUser) {
Str.output();		Str.output();
}		}
return nullptr;		return nullptr;
}		}
if (UNLIKELY(MappedUser == 0))		if (UNLIKELY(MappedUser == 0))
Region->Data = Data;		Region->Data = Data;
if (UNLIKELY(!map(reinterpret_cast<void *>(RegionBeg + MappedUser),		if (UNLIKELY(!map(reinterpret_cast<void *>(RegionBeg + MappedUser),
UserMapSize, "scudo:primary",		UserMapSize, "scudo:primary",
MAP_ALLOWNOMEM \| MAP_RESIZABLE, &Region->Data)))		MAP_ALLOWNOMEM \| MAP_RESIZABLE \|
		(useMemoryTagging() ? MAP_MEMTAG : 0),
		&Region->Data)))
return nullptr;		return nullptr;
Region->MappedUser += UserMapSize;		Region->MappedUser += UserMapSize;
C->getStats().add(StatMapped, UserMapSize);		C->getStats().add(StatMapped, UserMapSize);
}		}

hctimUnsubmitted Not Done Reply Inline Actions nit: leave newline hctim: nit: leave newline
const u32 NumberOfBlocks = Min(		const u32 NumberOfBlocks = Min(
MaxNumBatches * MaxCount,		MaxNumBatches * MaxCount,
static_cast<u32>((Region->MappedUser - Region->AllocatedUser) / Size));		static_cast<u32>((Region->MappedUser - Region->AllocatedUser) / Size));
DCHECK_GT(NumberOfBlocks, 0);		DCHECK_GT(NumberOfBlocks, 0);

TransferBatch *B = nullptr;		TransferBatch *B = nullptr;
constexpr u32 ShuffleArraySize =		constexpr u32 ShuffleArraySize =
MaxNumBatches * TransferBatch::MaxNumCached;		MaxNumBatches * TransferBatch::MaxNumCached;
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp

Show All 16 Lines
#include <vector>		#include <vector>

static std::mutex Mutex;		static std::mutex Mutex;
static std::condition_variable Cv;		static std::condition_variable Cv;
static bool Ready = false;		static bool Ready = false;

static constexpr scudo::Chunk::Origin Origin = scudo::Chunk::Origin::Malloc;		static constexpr scudo::Chunk::Origin Origin = scudo::Chunk::Origin::Malloc;

		static void disableDebuggerdMaybe() {
		#if SCUDO_ANDROID
		// Disable the debuggerd signal handler on Android, without this we can end
		// up spending a significant amount of time creating tombstones.
		signal(SIGSEGV, SIG_DFL);
		#endif
		}

		template <class AllocatorT>
		bool isTaggedAllocation(AllocatorT *Allocator, scudo::uptr Size,
		scudo::uptr Alignment) {
		if (!Allocator->useMemoryTagging())
		return false;

		const scudo::uptr MinAlignment = 1UL << SCUDO_MIN_ALIGNMENT_LOG;
		if (Alignment < MinAlignment) Alignment = MinAlignment;
		const scudo::uptr NeededSize =
		scudo::roundUpTo(Size, MinAlignment) +
		((Alignment > MinAlignment) ? Alignment : scudo::Chunk::getHeaderSize());
		return AllocatorT::PrimaryT::canAllocate(NeededSize);
		}

		template <class AllocatorT>
		void checkMemoryTaggingMaybe(AllocatorT Allocator, void P, scudo::uptr Size,
		scudo::uptr Alignment) {
		if (!isTaggedAllocation(Allocator, Size, Alignment))
		return;

		Size = scudo::roundUpTo(Size, scudo::archMemoryTagGranuleSize());
		EXPECT_DEATH({
		disableDebuggerdMaybe();
		reinterpret_cast<char *>(P)[-1] = 0xaa;
		}, "");
		EXPECT_DEATH({
		disableDebuggerdMaybe();
		reinterpret_cast<char *>(P)[Size] = 0xaa;
		}, "");
		}

template <class Config> static void testAllocator() {		template <class Config> static void testAllocator() {
using AllocatorT = scudo::Allocator<Config>;		using AllocatorT = scudo::Allocator<Config>;
auto Deleter = [](AllocatorT *A) {		auto Deleter = [](AllocatorT *A) {
A->unmapTestOnly();		A->unmapTestOnly();
delete A;		delete A;
};		};
std::unique_ptr<AllocatorT, decltype(Deleter)> Allocator(new AllocatorT,		std::unique_ptr<AllocatorT, decltype(Deleter)> Allocator(new AllocatorT,
Deleter);		Deleter);
Show All 18 Lines	for (scudo::uptr AlignLog = MinAlignLog; AlignLog <= 16U; AlignLog++) {
continue;		continue;
const scudo::uptr Size = (1U << SizeLog) + Delta;		const scudo::uptr Size = (1U << SizeLog) + Delta;
void *P = Allocator->allocate(Size, Origin, Align);		void *P = Allocator->allocate(Size, Origin, Align);
EXPECT_NE(P, nullptr);		EXPECT_NE(P, nullptr);
EXPECT_TRUE(Allocator->isOwned(P));		EXPECT_TRUE(Allocator->isOwned(P));
EXPECT_TRUE(scudo::isAligned(reinterpret_cast<scudo::uptr>(P), Align));		EXPECT_TRUE(scudo::isAligned(reinterpret_cast<scudo::uptr>(P), Align));
EXPECT_LE(Size, Allocator->getUsableSize(P));		EXPECT_LE(Size, Allocator->getUsableSize(P));
memset(P, 0xaa, Size);		memset(P, 0xaa, Size);
		checkMemoryTaggingMaybe(Allocator.get(), P, Size, Align);
Allocator->deallocate(P, Origin, Size);		Allocator->deallocate(P, Origin, Size);
}		}
}		}
}		}
Allocator->releaseToOS();		Allocator->releaseToOS();

// Ensure that specifying ZeroContents returns a zero'd out block.		// Ensure that specifying ZeroContents returns a zero'd out block.
for (scudo::uptr SizeLog = 0U; SizeLog <= 20U; SizeLog++) {		for (scudo::uptr SizeLog = 0U; SizeLog <= 20U; SizeLog++) {
Show All 11 Lines	template <class Config> static void testAllocator() {

// Verify that a chunk will end up being reused, at some point.		// Verify that a chunk will end up being reused, at some point.
const scudo::uptr NeedleSize = 1024U;		const scudo::uptr NeedleSize = 1024U;
void *NeedleP = Allocator->allocate(NeedleSize, Origin);		void *NeedleP = Allocator->allocate(NeedleSize, Origin);
Allocator->deallocate(NeedleP, Origin);		Allocator->deallocate(NeedleP, Origin);
bool Found = false;		bool Found = false;
for (scudo::uptr I = 0; I < 1024U && !Found; I++) {		for (scudo::uptr I = 0; I < 1024U && !Found; I++) {
void *P = Allocator->allocate(NeedleSize, Origin);		void *P = Allocator->allocate(NeedleSize, Origin);
if (P == NeedleP)		if (Allocator->untagPointerMaybe(P) ==
		Allocator->untagPointerMaybe(NeedleP))
Found = true;		Found = true;
Allocator->deallocate(P, Origin);		Allocator->deallocate(P, Origin);
}		}
EXPECT_TRUE(Found);		EXPECT_TRUE(Found);

constexpr scudo::uptr MaxSize = Config::Primary::SizeClassMap::MaxSize;		constexpr scudo::uptr MaxSize = Config::Primary::SizeClassMap::MaxSize;

// Reallocate a large chunk all the way down to a byte, verifying that we		// Reallocate a large chunk all the way down to a byte, verifying that we
Show All 20 Lines	template <class Config> static void testAllocator() {
P = Allocator->allocate(DataSize, Origin);		P = Allocator->allocate(DataSize, Origin);
memset(P, Marker, DataSize);		memset(P, Marker, DataSize);
for (scudo::sptr Delta = -32; Delta < 32; Delta += 8) {		for (scudo::sptr Delta = -32; Delta < 32; Delta += 8) {
const scudo::uptr NewSize = DataSize + Delta;		const scudo::uptr NewSize = DataSize + Delta;
void *NewP = Allocator->reallocate(P, NewSize);		void *NewP = Allocator->reallocate(P, NewSize);
EXPECT_EQ(NewP, P);		EXPECT_EQ(NewP, P);
for (scudo::uptr I = 0; I < DataSize - 32; I++)		for (scudo::uptr I = 0; I < DataSize - 32; I++)
EXPECT_EQ((reinterpret_cast<char *>(NewP))[I], Marker);		EXPECT_EQ((reinterpret_cast<char *>(NewP))[I], Marker);
		checkMemoryTaggingMaybe(Allocator.get(), NewP, NewSize, 0);
}		}
Allocator->deallocate(P, Origin);		Allocator->deallocate(P, Origin);

// Allocates a bunch of chunks, then iterate over all the chunks, ensuring		// Allocates a bunch of chunks, then iterate over all the chunks, ensuring
// they are the ones we allocated. This requires the allocator to not have any		// they are the ones we allocated. This requires the allocator to not have any
// other allocated chunk at this point (eg: won't work with the Quarantine).		// other allocated chunk at this point (eg: won't work with the Quarantine).
if (!UseQuarantine) {		if (!UseQuarantine) {
std::vector<void *> V;		std::vector<void *> V;
Show All 12 Lines	if (!UseQuarantine) {
while (!V.empty()) {		while (!V.empty()) {
Allocator->deallocate(V.back(), Origin);		Allocator->deallocate(V.back(), Origin);
V.pop_back();		V.pop_back();
}		}
}		}

Allocator->releaseToOS();		Allocator->releaseToOS();

		if (Allocator->useMemoryTagging()) {
		// Check that use-after-free is detected.
		for (scudo::uptr SizeLog = 0U; SizeLog <= 20U; SizeLog++) {
		const scudo::uptr Size = 1U << SizeLog;
		if (!isTaggedAllocation(Allocator.get(), Size, 1))
		continue;
		// UAF detection is probabilistic, so we repeat the test up to 256 times
		// if necessary. With 15 possible tags this means a 1 in 15^256 chance of
		// a false positive.
		EXPECT_DEATH({
		disableDebuggerdMaybe();
		for (unsigned I = 0; I != 256; ++I) {
		void *P = Allocator->allocate(Size, Origin);
		Allocator->deallocate(P, Origin);
		reinterpret_cast<char *>(P)[0] = 0xaa;
		}
		}, "");
		EXPECT_DEATH({
		disableDebuggerdMaybe();
		for (unsigned I = 0; I != 256; ++I) {
		void *P = Allocator->allocate(Size, Origin);
		Allocator->deallocate(P, Origin);
		reinterpret_cast<char *>(P)[Size - 1] = 0xaa;
		}
		}, "");
		}

		// Check that disabling memory tagging works correctly.
		void *P = Allocator->allocate(2048, Origin);
		EXPECT_DEATH(reinterpret_cast<char *>(P)[2048] = 0xaa, "");
		Allocator->disableMemoryTagging();
		reinterpret_cast<char *>(P)[2048] = 0xaa;
		Allocator->deallocate(P, Origin);

		P = Allocator->allocate(2048, Origin);
		EXPECT_EQ(Allocator->untagPointerMaybe(P), P);
		reinterpret_cast<char *>(P)[2048] = 0xaa;
		Allocator->deallocate(P, Origin);

		Allocator->releaseToOS();

		// The allocator may have disabled memory tag checks globally, which may
		// interfere with subsequent tests. Re-enable them now.
		scudo::enableMemoryTagChecksTestOnly();
		}

scudo::uptr BufferSize = 8192;		scudo::uptr BufferSize = 8192;
std::vector<char> Buffer(BufferSize);		std::vector<char> Buffer(BufferSize);
scudo::uptr ActualSize = Allocator->getStats(Buffer.data(), BufferSize);		scudo::uptr ActualSize = Allocator->getStats(Buffer.data(), BufferSize);
while (ActualSize > BufferSize) {		while (ActualSize > BufferSize) {
BufferSize = ActualSize + 1024;		BufferSize = ActualSize + 1024;
Buffer.resize(BufferSize);		Buffer.resize(BufferSize);
ActualSize = Allocator->getStats(Buffer.data(), BufferSize);		ActualSize = Allocator->getStats(Buffer.data(), BufferSize);
}		}
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

compiler-rt/lib/scudo/standalone/tests/primary_test.cpp

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	}			}

	TEST(ScudoPrimaryTest, BasicPrimary) {			TEST(ScudoPrimaryTest, BasicPrimary) {
	using SizeClassMap = scudo::DefaultSizeClassMap;			using SizeClassMap = scudo::DefaultSizeClassMap;
	#if !SCUDO_FUCHSIA			#if !SCUDO_FUCHSIA
	testPrimary<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();			testPrimary<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
	#endif			#endif
	testPrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();			testPrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();
				testPrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U, true>>();
	}			}

	// The 64-bit SizeClassAllocator can be easily OOM'd with small region sizes.			// The 64-bit SizeClassAllocator can be easily OOM'd with small region sizes.
	// For the 32-bit one, it requires actually exhausting memory, so we skip it.			// For the 32-bit one, it requires actually exhausting memory, so we skip it.
	TEST(ScudoPrimaryTest, Primary64OOM) {			TEST(ScudoPrimaryTest, Primary64OOM) {
	using Primary = scudo::SizeClassAllocator64<scudo::DefaultSizeClassMap, 20U>;			using Primary = scudo::SizeClassAllocator64<scudo::DefaultSizeClassMap, 20U>;
	using TransferBatch = Primary::CacheT::TransferBatch;			using TransferBatch = Primary::CacheT::TransferBatch;
	Primary Allocator;			Primary Allocator;
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	}			}

	TEST(ScudoPrimaryTest, PrimaryIterate) {			TEST(ScudoPrimaryTest, PrimaryIterate) {
	using SizeClassMap = scudo::DefaultSizeClassMap;			using SizeClassMap = scudo::DefaultSizeClassMap;
	#if !SCUDO_FUCHSIA			#if !SCUDO_FUCHSIA
	testIteratePrimary<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();			testIteratePrimary<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
	#endif			#endif
	testIteratePrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();			testIteratePrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();
				testIteratePrimary<scudo::SizeClassAllocator64<SizeClassMap, 24U, true>>();
	}			}

	static std::mutex Mutex;			static std::mutex Mutex;
	static std::condition_variable Cv;			static std::condition_variable Cv;
	static bool Ready = false;			static bool Ready = false;

	template <typename Primary> static void performAllocations(Primary *Allocator) {			template <typename Primary> static void performAllocations(Primary *Allocator) {
	static THREADLOCAL typename Primary::CacheT Cache;			static THREADLOCAL typename Primary::CacheT Cache;
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	}			}

	TEST(ScudoPrimaryTest, PrimaryThreaded) {			TEST(ScudoPrimaryTest, PrimaryThreaded) {
	using SizeClassMap = scudo::SvelteSizeClassMap;			using SizeClassMap = scudo::SvelteSizeClassMap;
	#if !SCUDO_FUCHSIA			#if !SCUDO_FUCHSIA
	testPrimaryThreaded<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();			testPrimaryThreaded<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
	#endif			#endif
	testPrimaryThreaded<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();			testPrimaryThreaded<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();
				testPrimaryThreaded<scudo::SizeClassAllocator64<SizeClassMap, 24U, true>>();
	}			}

	// Through a simple allocation that spans two pages, verify that releaseToOS			// Through a simple allocation that spans two pages, verify that releaseToOS
	// actually releases some bytes (at least one page worth). This is a regression			// actually releases some bytes (at least one page worth). This is a regression
	// test for an error in how the release criteria were computed.			// test for an error in how the release criteria were computed.
	template <typename Primary> static void testReleaseToOS() {			template <typename Primary> static void testReleaseToOS() {
	auto Deleter = [](Primary *P) {			auto Deleter = [](Primary *P) {
	P->unmapTestOnly();			P->unmapTestOnly();
	Show All 14 Lines
	}			}

	TEST(ScudoPrimaryTest, ReleaseToOS) {			TEST(ScudoPrimaryTest, ReleaseToOS) {
	using SizeClassMap = scudo::DefaultSizeClassMap;			using SizeClassMap = scudo::DefaultSizeClassMap;
	#if !SCUDO_FUCHSIA			#if !SCUDO_FUCHSIA
	testReleaseToOS<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();			testReleaseToOS<scudo::SizeClassAllocator32<SizeClassMap, 18U>>();
	#endif			#endif
	testReleaseToOS<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();			testReleaseToOS<scudo::SizeClassAllocator64<SizeClassMap, 24U>>();
				testReleaseToOS<scudo::SizeClassAllocator64<SizeClassMap, 24U, true>>();
	}			}

compiler-rt/lib/scudo/standalone/wrappers_c.inc

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	return scudo::setErrnoOnNull(
SCUDO_ALLOCATOR.allocate(size, scudo::Chunk::Origin::Malloc, alignment));		SCUDO_ALLOCATOR.allocate(size, scudo::Chunk::Origin::Malloc, alignment));
}		}

INTERFACE WEAK int SCUDO_PREFIX(malloc_info)(UNUSED int options, FILE *stream) {		INTERFACE WEAK int SCUDO_PREFIX(malloc_info)(UNUSED int options, FILE *stream) {
fputs("<malloc version=\"scudo-1\">", stream);		fputs("<malloc version=\"scudo-1\">", stream);
fputs("</malloc>", stream);		fputs("</malloc>", stream);
return 0;		return 0;
}		}

		INTERFACE WEAK void SCUDO_PREFIX(malloc_disable_memory_tagging)() {
		eugenisUnsubmitted Not Done Reply Inline Actions This api is not thread safe. TCO is per-thread, I think, so this would not work in a multi-threaded program at all. Either document this fact, or, preferably, remove the TCO code and use relaxed atomic for UseMemoryTagging. eugenis: This api is not thread safe. TCO is per-thread, I think, so this would not work in a multi…
		pccAuthorUnsubmitted Done Reply Inline Actions Let's not use atomics for this without a use case. Due to the complexity of disabling MTE in a multi-threaded program, I don't think we should even attempt to support it for now. I will document that this requires the program to be single threaded at the point when the function is called. pcc: Let's not use atomics for this without a use case. Due to the complexity of disabling MTE in a…
		SCUDO_ALLOCATOR.disableMemoryTagging();
		}

This is an archive of the discontinued LLVM Phabricator instance.

scudo: Add initial memory tagging support.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 233175

compiler-rt/lib/scudo/standalone/allocator_config.h

compiler-rt/lib/scudo/standalone/combined.h

compiler-rt/lib/scudo/standalone/common.h

compiler-rt/lib/scudo/standalone/linux.cpp

compiler-rt/lib/scudo/standalone/memtag.h

compiler-rt/lib/scudo/standalone/primary32.h

compiler-rt/lib/scudo/standalone/primary64.h

compiler-rt/lib/scudo/standalone/tests/combined_test.cpp

compiler-rt/lib/scudo/standalone/tests/primary_test.cpp

compiler-rt/lib/scudo/standalone/wrappers_c.inc

scudo: Add initial memory tagging support.
ClosedPublic