This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer] Initial implementation of a Hardened Allocator
ClosedPublic

Authored by cryptoad on May 9 2016, 3:13 PM.

Details

Summary

This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:

  • additional consistency checks on the allocation function parameters and on the heap chunks;
  • use of checksum protected chunk header, to detect corruption;
  • randomness to the allocator base;
  • delayed freelist (quarantine), to mitigate use after free and overall determinism.

Additional mitigations are in the works.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vitalybuka added inline comments.May 11 2016, 12:05 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
53 ↗(On Diff #56926)

enum ChunkState : u8 {

ChunkAvailible  = 0,

};

80 ↗(On Diff #56926)

I see why you need 128bit for structure.
Why do you need u128 type for 2 bit members?

309 ↗(On Diff #56926)

could be removed?

406 ↗(On Diff #56926)

no Die?

543 ↗(On Diff #56926)

maybe remove temp variable?

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
57 ↗(On Diff #56926)

QuarantineSizeMb;

70 ↗(On Diff #56926)

const uptr AllocatorSpace =

72 ↗(On Diff #56926)

static is not needed

83 ↗(On Diff #56926)

scudoMalloc(uptr Size, AllocType AllocType)

kcc added inline comments.May 11 2016, 1:51 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
80 ↗(On Diff #56926)

I actually like it this way. The intent is more clear.

cryptoad updated this revision to Diff 56965.May 11 2016, 2:37 PM
cryptoad marked 19 inline comments as done.

Resolving the issues raised in the new batch of comments.
Started renaming the variables, functions, etc, to be compliant with the LLVM coding standards.

docs/HardenedAllocator.rst
88 ↗(On Diff #56926)

I would have to check that.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
406 ↗(On Diff #56926)

Good catch!

glider added inline comments.May 12 2016, 5:30 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
13 ↗(On Diff #56965)

Please either elaborate what other "various security improvements" are there, or remove that phrase.

115 ↗(On Diff #56965)

I would've started with a handwritten crc32 implementation and bother with hardware support only iff it's performance-critical (don't think it is)

161 ↗(On Diff #56965)

Shouldn't the success memory order be __ATOMIC_RELEASE?

175 ↗(On Diff #56965)

Please remind if Scudo is going to be used together with any of the sanitizers.
If yes, the destructor magic won't probably work as intended, because other tools also play with it.

262 ↗(On Diff #56965)

Are we going to target 32-bit systems? This is gonna overflow uptr on x86.

278 ↗(On Diff #56965)

Despite SSE 4.2 may be quite common at Google, I don't think it's a good idea to bail out if it's unsupported.
Note TestCPUFeature() doesn't work on AMD processors yet.

435 ↗(On Diff #56965)

So remove it, maybe?

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
1 ↗(On Diff #56965)

I don't insist much, but I think either the library name should be "scudo" instead of "hardened_allocator", or the names of the files under hardened_allocator/ should start with "hardened_allocator_"

projects/compiler-rt/lib/hardened_allocator/scudo_flags.inc
29 ↗(On Diff #56965)

Feature request: filling the chunk context with a nonzero byte.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
1 ↗(On Diff #56965)

Do the build configs prevent this file from being built on ARM?

32 ↗(On Diff #56965)

Do you really need the subleaf parameter now?

39 ↗(On Diff #56965)

I believe requiring certain Intel CPU models in order for the allocator to work isn't a good idea.

57 ↗(On Diff #56965)

Note this is thread-unsafe. Not sure if that matters here, but still.

84 ↗(On Diff #56965)

Why not use a bool here?

86 ↗(On Diff #56965)

Usually a "k" prefix denotes a constant. If you're going to change it, that's just a regular variable named "has_rd_rand" (or something like that)

93 ↗(On Diff #56965)

Please move this call inside the loop below.

104 ↗(On Diff #56965)

Is this problem that severe that we want to abort?
Note that we don't abort if the CPU doesn't support rdrand.

109 ↗(On Diff #56965)

My gut feeling is that XORing rdtsc and the time since epoch is actually reducing the entropy, not increasing it. Any idea if that's true?
Also, do we really need a dependency on std::chrono?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.h
50 ↗(On Diff #56965)

GetSeed is unused.

58 ↗(On Diff #56965)

These "a", "b", "c" comments don't help :(
Please remove them.

projects/compiler-rt/test/hardened_allocator/alignment.cc
13 ↗(On Diff #56965)

alignment is unused.

projects/compiler-rt/test/hardened_allocator/init.cc
1 ↗(On Diff #56965)

Do you really need this test?

projects/compiler-rt/test/hardened_allocator/malloc.cc
1 ↗(On Diff #56965)

Each test must have comments that describe its purpose.

projects/compiler-rt/test/hardened_allocator/mismatch.cc
2 ↗(On Diff #56965)

FYI you can use --check-prefix to write more test-specific CHECK directives.

26 ↗(On Diff #56965)

Nit: spare newline

projects/compiler-rt/test/hardened_allocator/quarantine.cc
15 ↗(On Diff #56965)

You should probably add nullptr checks to other allocations in other tests.

projects/compiler-rt/test/hardened_allocator/realloc.cc
29 ↗(On Diff #56965)

Nit: a comment on a separat line must end with a period.

projects/compiler-rt/test/hardened_allocator/sizes.cc
19 ↗(On Diff #56965)

s/fulfill/allocate?

53 ↗(On Diff #56965)

Where does this line come from? I don't see the allocator printing it anywhere.

glider edited edge metadata.May 12 2016, 5:34 AM

$0.05 regarding the cryptographic security: can someone please clarify the goal of hardening the allocator?
If the intention is to use it for e.g. ASan in production, the randomness should be at least not worse than that of regular allocators.

glider added inline comments.May 12 2016, 8:23 AM
projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
109 ↗(On Diff #56965)

As a data point, I've ran RdTSC() and std::chrono::high_resolution_clock::now().time_since_epoch().count() 200278017 times.
The number of unique values of both variables was exactly 200278017, while the number of unique XOR values was only 200205416, i.e. there were 0.036% collisions.

filcab added a subscriber: filcab.May 12 2016, 9:50 AM
filcab added inline comments.
docs/HardenedAllocator.rst
47 ↗(On Diff #56965)

I think this parenthesis should be somewhere else. The crc32 instruction is an actual requirement of the allocator right now.

P.S: Alternatively, remove the "if ..." and say it's a requirement, like you do in the next paragraph.

110 ↗(On Diff #56965)

If we're making a "hardened allocator", shouldn't the default be to zero out all chunks?
(of course, performance would suffer a lot. I'm just curious if there are other reasons)

projects/compiler-rt/cmake/config-ix.cmake
196 ↗(On Diff #56965)

I know safestack, cfi, and ESan don't follow this, but we can probably put the HARDENED_ALLOCATOR stuff in alphabetical order with the other stuff above.

projects/compiler-rt/lib/CMakeLists.txt
58 ↗(On Diff #56965)

Same here.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
32 ↗(On Diff #56965)

I would remove the TODO.
If we're in a debugger, Abort() might trigger the debugger and stop the program execution. exit() will simply close the program. We've added the abort in ASan, back in the day (D12332), due to this.
It's the default for ASan on OS X and on the PS4.

48 ↗(On Diff #56965)

It's very likely a smaller price to pay than to have to use atomic updates + spin on update collisions.

60 ↗(On Diff #56965)

I could do without the u128 typedef. You're only using it for the PackedHeader typedef.

93 ↗(On Diff #56965)

I'd suggest:

COMPILER_CHECK(sizeof(UnpackedHeader) == sizeof(PackedHeader));

Since it makes it explicit we want those two to be the same and that it's not a coincidence that we're expecting both to "happen to be" the same size as u128.
The sizeof(PackedHeader) isn't needed, since it's a typedef for u128 (even after my proposed change, it won't be needed).

95 ↗(On Diff #56965)

Remove the static here, since you're not adding it to other const-qualified variables when it isn't needed.

117 ↗(On Diff #56965)

No need to do this. Much better to just handle it on the CMake side (which you're already doing).

123 ↗(On Diff #56965)

"... 16 least significant bits of the header of the first 8 bytes..."

134 ↗(On Diff #56965)

Why are you not using the C++11 atomics?

242 ↗(On Diff #56965)

Why?

262 ↗(On Diff #56965)

If this ends up being ported, it's a simple matter of using the FIRST_32_SECOND_64 macro.

278 ↗(On Diff #56965)

Source files are compiled assuming that feature is available.
We'll have to add a fallback checksum (plus change build and this check) to address this comment.

I would be ok with keeping the SSE4.2 requirement until we get a non-zero amount of requests/bug reports.

P.S: http://store.steampowered.com/hwsurvey (first result for "hardware survey". It's clearly biased, but I'd guess developer CPUs are also biased to be more recent/powerful than an average computer) puts SSE4.2 adoption at ~80%.

334 ↗(On Diff #56965)

Pobably best to zero the whole thing (needed_size - ChunkHeaderSize)?

373 ↗(On Diff #56965)

Please push the negation through.

449 ↗(On Diff #56965)

I'm guessing you only need to zero the additional contents to account for possible overflows that might have happened.
Otherwise:
ZeroContents = true -> they're already zeroed
ZeroContents = false -> No need to zero.

473 ↗(On Diff #56965)

If we have the ZeroContents, no need to zero again.

474 ↗(On Diff #56965)

Should we zero the whole block, instead of just the size we were asked?
(Hardening it a tiny bit against overflows)

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
41 ↗(On Diff #56965)

Why the empty comment trailing the while (false)?

projects/compiler-rt/lib/hardened_allocator/scudo_malloc_linux.cc
1 ↗(On Diff #56965)

scudo_interceptors.cc (.cpp in the future, but do what Vitaly suggested and change the names only after approval to ease code review)

15 ↗(On Diff #56965)

Don't add these to the whole file.
I'm ok with protecting Linux/glibc-specific functions like pvalloc, etc. Those will need it anyway if this gets ported somewhere.
No need to protect the malloc/free interceptors with SANITIZER_LINUX, though.

projects/compiler-rt/lib/hardened_allocator/scudo_rtl.cc
50 ↗(On Diff #56965)

Should this be an #error?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
59 ↗(On Diff #56965)
else
  UNIMPLEMENTED();

(or something similar)

89 ↗(On Diff #56965)

Does gcc actually do anything with this?
If not, then just delete it. AFAICT, clang doesn't care unless you have an asm attribute to tie it to a specific register.

98 ↗(On Diff #56965)

Nit: Why not a simple int? Closer to the "usual idiom" in C++.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.h
26 ↗(On Diff #56965)

static_assert(sizeof(Dest) == sizeof(Source), "Sized are not equal!");

projects/compiler-rt/test/hardened_allocator/double-free.cc
28 ↗(On Diff #56965)

Add a posix_memalign version. We have a special case in free for it.

projects/compiler-rt/test/hardened_allocator/mismatch.cc
21 ↗(On Diff #56965)

You should add, at least, the memalign -> something_other_than_free case, since it's a special case.

projects/compiler-rt/test/hardened_allocator/quarantine.cc
15 ↗(On Diff #56965)

if (p)

BTW we've been discussing the issue with the random seed (and the header cookies) being reused upon fork() today.
If you've a service that forks in response to every client request, it can be exploited by brute-forcing the CRC of a single object (which remains the same upon fork())

Thus two questions arise:

  • shouldn't we increase the size of the header's crc32 to, um, 32 bits?
  • is it possible to re-initialize the seed and the cookie upon fork() (a dummy solution is to iterate over the heap and fix all headers, but maybe there's something more elegant?)
glider added inline comments.May 12 2016, 10:18 AM
projects/compiler-rt/lib/hardened_allocator/scudo_malloc_linux.cc
15 ↗(On Diff #56965)

Note that for other systems (e.g. OSX) it may be incorrect to intercept malloc/free.
Therefore it should be ok to keep those in a Linux/FreeBSD-specific file and keep SANITIZER_LINUX | SANITIZER_FREEBSD for the whole file.

75 ↗(On Diff #56965)

It's better to add comments (e.g. " // SANITIZER_LINUX" here) to #endif directives, especially when the code doesn't fit on a single screen.

dvyukov added inline comments.May 12 2016, 10:25 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
87 ↗(On Diff #56965)

Don't we need only 20 bits here?

162 ↗(On Diff #56965)

Looks pointless.

187 ↗(On Diff #56965)

This will leak memory.
Destructors run FIFO order, so later-created user dtors can run after you. Plus pthread frees thread stack and pthread_specific regions after running pthread_specific dtors.

195 ↗(On Diff #56965)

Why initGlobal is not called from ScudoInitInternal?
If you expect that malloc can come before ScudoInitInternal, then you also need to call ScudoInitInternal from initThread. Otherwise it won't work anyway.

241 ↗(On Diff #56965)

Why is this commented out? Looks cleaner.

283 ↗(On Diff #56965)

Do we want to sanity check options.QuarantineSizeMb) << 20? What if it overflows?

284 ↗(On Diff #56965)

Make this tunable as well. If my program has 10000 threads, 1MB per thread is a lot.

293 ↗(On Diff #56965)

s/alignment/malloc alignment/
So that user can get at least some glue when she sees this on console.

325 ↗(On Diff #56965)

It seems to me that we don't actually need with_offset and all the associated if's.
You can just always store (chunk_beg - alloc_beg) >> MinAlignmentLog into header.offset and always subtract it from user_beg.

326 ↗(On Diff #56965)

There is a very tricky, implicit relation between MinAlignment, MaxAlignment and number of bits in offset. If there of these change in future we can get a nice attack vector due to offset overflow.

Check that MaxAlignment/MinAlignment fits into offset during init.

381 ↗(On Diff #56965)

I wonder if delete_size can be 0 and it does not mean that delete_size is not passed, it is just legally zero. What is passed in for delete of an array with 0 elements?

projects/compiler-rt/lib/hardened_allocator/scudo_flags.cc
56 ↗(On Diff #56965)

You use convoluted way to express 64 that requires remembering powers of two, and then spell 64 in comments. Why not just say "64"?

projects/compiler-rt/lib/hardened_allocator/scudo_flags.inc
18 ↗(On Diff #56965)

s/-1/64/ then the default will be visible in help as well.
User can't tune this value if she does not have a single reference point. If I know that the default is 64, then I can set it to 32 of 128. If I don't know the default, what am I supposed to do?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
109 ↗(On Diff #56965)

This is used to initialize global cookie. I would use /dev/random. Or there must be 16 bytes of good randomness in auxv.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.h
44 ↗(On Diff #56965)

This is not used. Remove.

46 ↗(On Diff #56965)

This is not used. Remove.

50 ↗(On Diff #56965)

This is not used. Remove.

dvyukov added inline comments.May 12 2016, 10:25 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
64 ↗(On Diff #56965)

It's better to comment right on the fields rather than duplicate them here. The comment has good chances of getting outdated. It's also harder to find the relevant part of the comment for a particular field.
E.g.:

u8  state          : 2;  // available, allocated, or quarantined

comments like 'salt' on 'salt' field are excessive, drop them.

104 ↗(On Diff #56965)

I am missing the relation between the requirement to not load header second time and making the function static.
Why not:

void *AllocBeg(UnpackedHeader *header)

?

126 ↗(On Diff #56965)

Add a bold comment to checksum filed that Checksum expects it to be low 16 bits. And maybe add some debug check here.

128 ↗(On Diff #56965)

Won't it do to initialize crc to cookie as:

u64 crc = _mm_crc32_u64(cookie, reinterpret_cast<uptr>(this));

?

135 ↗(On Diff #56965)

Why does this need to be acquire? Please comment.

153 ↗(On Diff #56965)

Why release?

161 ↗(On Diff #56965)

Why acquire?

glider added inline comments.May 12 2016, 10:50 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
278 ↗(On Diff #56965)

Well, IIUC right now the implementation just aborts for AMD processors, which are among those ~80%.

kcc added inline comments.May 12 2016, 12:15 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
115 ↗(On Diff #56965)

I am pretty sure it *is* performance critical.
If this gets used on older or non-x86 systems we can add other implementation later.

175 ↗(On Diff #56965)

Afiact, scudo will not be combinable with any of the sanitizers other than with ubsan

262 ↗(On Diff #56965)

no 32-bit for now (or ever?)

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
1 ↗(On Diff #56965)

I initially proposed to name the dir hardened_allocator to make it more self-descriptive.
But if others don't mind to have dir named "scuda" let the author decide.

filcab added inline comments.May 12 2016, 12:52 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
278 ↗(On Diff #56965)

I was just talking about the SSE4.2 part. Sorry, I was going to comment on the CPUID thing but forgot. Doing it now.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
64 ↗(On Diff #56965)

Same on AMD (source: https://support.amd.com/TechDocs/25481.pdf):
"
CPUID Fn0000_0001_ECX Feature Identifiers
...
20 SSE42: SSE4.2 instruction support.
"

66 ↗(On Diff #56965)

Doesn't exist on AMD.
Since SSE4.2 is required for this, you'll need to implement SSE4.2 detection for other CPU brands.
RDRAND seems to exist only on Intel CPUs and there's a fallback path, so having it only on Intel doesn't seem like a problem.

cryptoad updated this revision to Diff 57100.May 12 2016, 2:21 PM
cryptoad edited edge metadata.
cryptoad marked 44 inline comments as done.

Addressed some of the issues raised during the review.
Additional renaming done to comply with the LLVM coding standard.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
87 ↗(On Diff #56965)

This is indeed the case.
I figured I would align that one to a multiple of 8 bits as we have some space in the second half of the 128-bit integer.
I am not opposed to shortening to the actual needed bit size if you feel strongly about it.

134 ↗(On Diff #56965)

Using std::atomic<unsigned __int128>?

242 ↗(On Diff #56965)

Sorry this was a remainder of a debugging session. I switched it back to the original plan which was to use the thread_local QuarantineCache.

262 ↗(On Diff #56965)

No plan to support 32-bit as of yet, but yes we will use FIRST_32_SECOND_64 if we do.

334 ↗(On Diff #56965)

I guess we have several choices here:

  • move it up and zero the whole thing prior to the header being store
  • leave it here and zero the whole thing post header, which will have to account for the offset (needed_size - (chunk_beg - alloc_beg))

I think the first option would be better.

projects/compiler-rt/lib/hardened_allocator/scudo_rtl.cc
50 ↗(On Diff #56965)

I figured the additional initialization techniques used by ASan et al. could be added later on. Hence the #error for now.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
32 ↗(On Diff #56965)

I figured it could be useful if a feature such as RDSEED was needed.

39 ↗(On Diff #56965)

My point of view when writing this was that I had to be as competitive as can be with other allocators, so that the benefit of additional checks would not be offseted by a dramatic decrease in performances. In the initial stages, it was determined that a BSD checksum vs the CPU backed CRC32 induced a performance gain of about 10% in pure allocation benchmarks, so I went that way.

I am not opposed to doing something purely software, but I'd rather start this way and then expend it to be more portable.

84 ↗(On Diff #56965)

I wanted to use a 3 state variable: unintialized, true, false. Hence the -1, 0, 1.

98 ↗(On Diff #56965)

I tried to be consistent using the Sanitizer types. But I see your point.

109 ↗(On Diff #56965)

Is your suggestion to get rid of the epoch component?

projects/compiler-rt/test/hardened_allocator/sizes.cc
53 ↗(On Diff #56965)

This is from sanitizer_allocator.cc, that handles this condition.

dvyukov added inline comments.May 13 2016, 12:55 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
88 ↗(On Diff #57100)

Does it improve generated code?
Leaving it as 24 is OK in that case, but it needs to be explained in comments. Width of that field has crucial implicit relation with Min/MaxAlignment. When double-checked width, I found that it's not what I would expect it to be. What means that either I missing something else important here, or there is a bug, or things get out of sync. This uncertainty is very unpleasant and takes time for anybody reading the code.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
110 ↗(On Diff #57100)

Yes, all that is predictable.
/dev/random is meant specifically for such cases, it uses various sources of entropy to create strongly random numbers.

On second though, just remove all rdtsc/cpuid/rdrand trickery and read from /dev/random. If rdrand is present, kernel will use it.

filcab added inline comments.May 13 2016, 7:55 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
135 ↗(On Diff #57100)

Yeah, should be nicer. Unless it's a problem due to something I'm not thinking of, then I'd rather have more standard constructs (even though there's no guarantee of an __int128 type, let alone an std::atomic<__int128>, AFAICT libstdc++ and libc++ will have those implemented).

cryptoad updated this revision to Diff 57362.May 16 2016, 9:54 AM
cryptoad marked 26 inline comments as done.

This update addresses another batch of comments raised during the review.
Among the notable changes:

  • after discussion with dvyukov, the memory order for the atomic operation has to changed to relaxed;
  • the 'with_offset' field in the header is going away, and the offset field now always stores the distance between the backend allocation and the chunk.
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
188 ↗(On Diff #57100)

For this, I used the same technique that is used in ASan's PlatformTSDDtor, as it seems to be the most viable one.
I am not sure what alternative would work here.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
42 ↗(On Diff #57100)

This actually a straight copy from sanitizer_internal_defs.h, just replacing CheckFailed. So I left it as is.
This will go away when I redo the CHECK_IMPL logic to follow kcc@'s suggestion to implement templated failure functions.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
110 ↗(On Diff #57100)

I gave it a try and I have had a lot of issues with /dev/random on my system.
Between the fact that it's blocking, and that sometimes it won't return the amount of bytes requested, the tests have been failing inconsistently.
Tests with /dev/urandom worked better though.
I am going to dig further into that.

filcab added inline comments.May 16 2016, 10:13 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
135 ↗(On Diff #57362)

Did std::atomic<unsigned __int128> not work/was too slow?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111 ↗(On Diff #57362)

Using /dev/urandom should be what you need, yes.
Did you still have problems with urandom, btw?

cryptoad added inline comments.May 16 2016, 10:37 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
135 ↗(On Diff #57362)

It's in the works :)

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111 ↗(On Diff #57362)

/dev/udrandom appeared to work fine.

dvyukov added inline comments.May 17 2016, 1:19 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
189 ↗(On Diff #57362)

If a free comes after we drained local cache, asan uses a global cache. Grep for "fallback" in asan_allocator.cc. Tsan now uses the same. It sucks. But I don't see how to do better. We need to detect when a thread is actually finished, but it's tricky to do with pthread_join API.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111 ↗(On Diff #57362)

/dev/urandom is not what you need. It trades security for performance. I.e. instead of blocking it will just give you predictable randomness. Which kind of defeats the whole purpose of a security allocator.
/dev/random blocks when it does not have enough entropy. But there is not much you can do if you do need the entropy.
If it returns less bytes, read again. That's how it works with all read calls.

filcab added inline comments.May 17 2016, 6:02 AM
projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111 ↗(On Diff #57362)

People like Daniel Bernstein and Thomas Ptacek (and others) tend to disagree and say that /dev/urandom is what we need:
http://blog.cr.yp.to/20140205-entropy.html
http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/

I'm no expert in this, so I tend to rely on people who work on this kind of thing.

cryptoad updated this revision to Diff 57702.May 18 2016, 4:11 PM
cryptoad marked 24 inline comments as done.

This diff addresses another batch of comments from the review, as well as some renaming to converge towards LLVM coding standard compliance.
Among the notable changes:

  • a fallback mechanism has been added to service allocations and deallocation post thread tear-down as per dvyukov guidance;
  • the initialization has been moved around to not depend on .preinit_array; a test has been added to make sure that preinit allocations are serviced successfully;
  • std::atomic is used in place of the GCC builtins; the compiled code is identical.

There are still some comments left to address, notably regarding the source of randomness.

cryptoad updated this revision to Diff 58659.May 26 2016, 11:37 AM

With this diff ends the renaming process, so unless I missed or misunderstood something, this should be compliant with the LLVM coding standards.
Additional, I migrated the thread local PRNG initialization to use /dev/urandom for seeding purposes. This appears to not have significantly impacted the performances.

cryptoad marked 12 inline comments as done.May 26 2016, 12:53 PM

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.
kcc's suggestion was to change the CHECKs in the Allocator to make them call a templated failure function (or virtual) (http://reviews.llvm.org/D20084#425327).
I've realize since then that we also need that to be true for the Quarantine, and potentially any abstracted Sanitizer function (the per platform file access functions come to mind). Basically we really want CheckFailed to be ours anywhere in the project, so that when compiling the hardened allocator, none of the with-callbacks version will be compiled in.

I'd welcome some feedback as to how to do that as cleanly as possible, thanks!

kcc added a comment.May 26 2016, 1:04 PM

Yea...
So one possible way is to re-define __sanitizer::CheckFailed.
It should be relatively easy if we never going to allow mixing scudo and the sanitizers.
There is no way to mix scudo and asan/tsan/msan anyway, because they have conflicting allocators.
You may mix scudo and ubsan, since ubsan does not have an allocator, but the only sane way to have ubasan
in prod is to use it in trapping mode which does not have run-time. So we are good here too.

So, try this:

  • move __sanitizer::CheckFailed into a separate file.
  • make sure it is used by *san
  • make sure it is not used by scudo
  • defined your own __sanitizer::CheckFailed

I have not tested it, so something may go wrong, you'll need to experiment...

kcc added inline comments.May 27 2016, 3:19 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
29 ↗(On Diff #58659)

So, you should not need this any more, let's remove it now.
I'll make on more pass afterwards (Monday-ish)

dvyukov edited edge metadata.May 28 2016, 11:44 PM

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.

Why can't we just set Die as CheckFailedCallback?

dvyukov added inline comments.May 28 2016, 11:46 PM
projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
99 ↗(On Diff #58659)

urandom is not secure and can allow to guess the cookie in a local setuid binary.

cryptoad marked an inline comment as done.May 30 2016, 7:57 PM

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.

Why can't we just set Die as CheckFailedCallback?

What I am trying to prevent here is the use of callbacks at all. They would be an interesting target for an attacker as they would be writable function pointers that could be triggered on demand on heap corruption.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
99 ↗(On Diff #58659)

So on this matter it seems that the general agreement is that urandom on modern Linux system is secure and can be used for cryptographic purposes. Even the more recent getrandom system call uses urandom by default, with the following entry in the man page: "Unless you are doing long-term key generation (and perhaps not even then), you probably shouldn't be using GRND_RANDOM. The cryptographic algorithms used for /dev/urandom are quite conservative, and so should be sufficient for all purposes."
/dev/random performs poorly in my tests, often blocking the allocator.

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.

Why can't we just set Die as CheckFailedCallback?

What I am trying to prevent here is the use of callbacks at all. They would be an interesting target for an attacker as they would be writable function pointers that could be triggered on demand on heap corruption.

But that would mean that an attacker broke ASLR and can write arbitrary values at necessary memory locations. Does it still make sense to defend in such case?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
99 ↗(On Diff #58659)

Okay. You know better.

But that would mean that an attacker broke ASLR and can write arbitrary values at necessary memory locations. Does it still make sense to defend in such case?

That is correct. I think it is still worth it to not take the chance.
Previous work on other heaps have leveraged such features, given the same assumptions (for example the commit function pointer in the Windows Heap https://www.blackhat.com/presentations/bh-usa-09/MCDONALD/BHUSA09-McDonald-WindowsHeap-PAPER.pdf).
I think it's particularly important to make sure that the failure path fails fast and ideally without the possibility of interruption (like __fastfail http://www.alex-ionescu.com/?p=69).

cryptoad updated this revision to Diff 59082.May 31 2016, 9:26 AM
cryptoad edited edge metadata.

We now replace the Sanitizer's termination functions so that no callbacks can be called in CheckFailed and Die.

kcc added a comment.Jun 1 2016, 10:19 AM

Mostly LG.
Please address a few remaining nits and wait until tomorrow for more comments.
If no significant comments, rename the dir to scudo and I will land it.

Note: I did not review this code from security perspective in details because

  • not an expert
  • there are known security weaknesses in the backend allocator (we will need to handle them separately)
  • it'll be easier to do further security assessment once the code is committed.
docs/HardenedAllocator.rst
89 ↗(On Diff #59082)

Did you?

94 ↗(On Diff #59082)

Give an example instead of referring to "usual ASan syntax".
Scudo users don't have to be asan experts.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
63 ↗(On Diff #59082)

align the comment block

109 ↗(On Diff #59082)

I suggest to replace all cases of

if (!cond) {
  Printf()
  Die()
}

With

if (!cond)
  DieWithMessage();

This is using the Printf from sanitizer_common, right?
It might be worth replacing it with your own, simpler one.
If you agree, just leave a TODO near DieWithMessage and address it later.

278 ↗(On Diff #59082)

s/late/later

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
18 ↗(On Diff #59082)

is currently only supported?
os "supports x86_64"?

49 ↗(On Diff #59082)

Does the following block of code have to be in this header?
Why not in .cc?

cryptoad updated this revision to Diff 59581.Jun 3 2016, 10:49 AM
cryptoad marked 7 inline comments as done.

Addressing some comments raised in the review, notably:

  • new dieWithMessage function wrapping a Printf+Die functionality - still currently using Sanitizer's VSNPrintf which will be changed later;
  • updated the documentation;
docs/HardenedAllocator.rst
89 ↗(On Diff #59082)

I removed the part about the preinit_array as I do not use that anymore.
Whatever LIT is using requires the whole-archive flag, if using gcc to link the static library against a project, it doesn't.

94 ↗(On Diff #59082)

I didn't realize that I hadn't updated the options names below as well. Also added ThreadLocalQuarantineSizeKb.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
109 ↗(On Diff #59082)

There is also a PrintfAndReportCallback callback that I just noticed.
I will have to address that later as well.

kcc accepted this revision.Jun 3 2016, 2:48 PM
kcc edited edge metadata.

LGTM.
I think it's as good as we can make it via code review.
Let's make it better by incremental changes.

Now, please rename the directories to lib/scudo and test/scudo, upload the updated patch and let me land it.
Probably also rename HardenedAllocator.rst to ScudoHardenedAllocator.rst or some such (up to you)

This revision is now accepted and ready to land.Jun 3 2016, 2:48 PM
cryptoad updated this revision to Diff 59638.Jun 3 2016, 4:07 PM
cryptoad edited edge metadata.

This patch renames the directories and files to fit the scheme suggested during the review, and the LLVM practices:

  • hardened_allocator is now scudo everywhere;
  • documentation is now in ScudoHardenedAllocator.rst;
  • all .cc files are now .cpp;
  • additionally scudo_malloc_linux.cc is now scudo_interceptors.cpp;
  • build files have been updated accordingly, as well as all references to the previous naming scheme (the library and checks rules are now 'scudo' and 'check-scudo').
kcc added a comment.Jun 3 2016, 5:31 PM

I am getting these when trying 'ninja check-scudo'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/atomic:266: undefined reference to `__sync_val_compare_and_swap_16'
Any suggestions?

BTW, do we make sure that check-scudo is not run if the current machine does not support proper SSE?

Most likely were missing -latomic (IIRC)
I wouldn't expect to have to link with that when using the std::atomic, but
that might be it.

Filipe

In D20084#449057, @kcc wrote:

I am getting these when trying 'ninja check-scudo'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/atomic:266: undefined reference to `__sync_val_compare_and_swap_16'
Any suggestions?

Regarding @filcab's comment, -latomic is in the cflags in the lit.cfg, not sure why this is not working for you.

BTW, do we make sure that check-scudo is not run if the current machine does not support proper SSE?

We do check for SSE 4.2 in the init of the Allocator via CHECK(testCPUFeature(SSE4_2))

In D20084#449057, @kcc wrote:

I am getting these when trying 'ninja check-scudo'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/atomic:266: undefined reference to `__sync_val_compare_and_swap_16'
Any suggestions?

I also use -mcx16 in g3 BUILD, that might be it.

kcc added a comment.Jun 6 2016, 9:29 AM

We do check for SSE 4.2 in the init of the Allocator via CHECK(testCPUFeature(SSE4_2))

That's not enough.
This is a run-time check and so if someone runs "check-all" on a machine that does not support SSE4_2
thye *will* run check-scudo and get a test failure.
Instead, we need to ensure that check-scudo is not executed as part of check-all when there is no proper HW support

And of course, fix check-scudo on my machine [ :) ] so that I can test it before committing.

cryptoad updated this revision to Diff 59744.Jun 6 2016, 10:07 AM

Updated Scudo LIT CMakeLists.txt to only add check-scudo on Linux x64 machines with SSE4.2.

This revision was automatically updated to reflect the committed changes.
kcc added a comment.Jun 6 2016, 6:28 PM

Thanks again for contributing this code, let's now make the allocator even more hardened!