Page MenuHomePhabricator

[sanitizer] Initial implementation of a Hardened Allocator
ClosedPublic

Authored by cryptoad on May 9 2016, 3:13 PM.

Details

Summary

This is an initial implementation of a Hardened Allocator based on Sanitizer Common's CombinedAllocator.
It aims at mitigating heap based vulnerabilities by adding several features to the base allocator, while staying relatively fast.
The following were implemented:

  • additional consistency checks on the allocation function parameters and on the heap chunks;
  • use of checksum protected chunk header, to detect corruption;
  • randomness to the allocator base;
  • delayed freelist (quarantine), to mitigate use after free and overall determinism.

Additional mitigations are in the works.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dvyukov added inline comments.May 12 2016, 10:25 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
65

It's better to comment right on the fields rather than duplicate them here. The comment has good chances of getting outdated. It's also harder to find the relevant part of the comment for a particular field.
E.g.:

u8  state          : 2;  // available, allocated, or quarantined

comments like 'salt' on 'salt' field are excessive, drop them.

105

I am missing the relation between the requirement to not load header second time and making the function static.
Why not:

void *AllocBeg(UnpackedHeader *header)

?

127

Add a bold comment to checksum filed that Checksum expects it to be low 16 bits. And maybe add some debug check here.

129

Won't it do to initialize crc to cookie as:

u64 crc = _mm_crc32_u64(cookie, reinterpret_cast<uptr>(this));

?

136

Why does this need to be acquire? Please comment.

154

Why release?

162

Why acquire?

glider added inline comments.May 12 2016, 10:50 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
279

Well, IIUC right now the implementation just aborts for AMD processors, which are among those ~80%.

kcc added inline comments.May 12 2016, 12:15 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
116

I am pretty sure it *is* performance critical.
If this gets used on older or non-x86 systems we can add other implementation later.

176

Afiact, scudo will not be combinable with any of the sanitizers other than with ubsan

263

no 32-bit for now (or ever?)

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
2

I initially proposed to name the dir hardened_allocator to make it more self-descriptive.
But if others don't mind to have dir named "scuda" let the author decide.

filcab added inline comments.May 12 2016, 12:52 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
279

I was just talking about the SSE4.2 part. Sorry, I was going to comment on the CPUID thing but forgot. Doing it now.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
65

Same on AMD (source: https://support.amd.com/TechDocs/25481.pdf):
"
CPUID Fn0000_0001_ECX Feature Identifiers
...
20 SSE42: SSE4.2 instruction support.
"

67

Doesn't exist on AMD.
Since SSE4.2 is required for this, you'll need to implement SSE4.2 detection for other CPU brands.
RDRAND seems to exist only on Intel CPUs and there's a fallback path, so having it only on Intel doesn't seem like a problem.

cryptoad updated this revision to Diff 57100.May 12 2016, 2:21 PM
cryptoad edited edge metadata.
cryptoad marked 44 inline comments as done.

Addressed some of the issues raised during the review.
Additional renaming done to comply with the LLVM coding standard.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
88

This is indeed the case.
I figured I would align that one to a multiple of 8 bits as we have some space in the second half of the 128-bit integer.
I am not opposed to shortening to the actual needed bit size if you feel strongly about it.

135

Using std::atomic<unsigned __int128>?

243

Sorry this was a remainder of a debugging session. I switched it back to the original plan which was to use the thread_local QuarantineCache.

263

No plan to support 32-bit as of yet, but yes we will use FIRST_32_SECOND_64 if we do.

335

I guess we have several choices here:

  • move it up and zero the whole thing prior to the header being store
  • leave it here and zero the whole thing post header, which will have to account for the offset (needed_size - (chunk_beg - alloc_beg))

I think the first option would be better.

projects/compiler-rt/lib/hardened_allocator/scudo_rtl.cc
51

I figured the additional initialization techniques used by ASan et al. could be added later on. Hence the #error for now.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
33

I figured it could be useful if a feature such as RDSEED was needed.

40

My point of view when writing this was that I had to be as competitive as can be with other allocators, so that the benefit of additional checks would not be offseted by a dramatic decrease in performances. In the initial stages, it was determined that a BSD checksum vs the CPU backed CRC32 induced a performance gain of about 10% in pure allocation benchmarks, so I went that way.

I am not opposed to doing something purely software, but I'd rather start this way and then expend it to be more portable.

85

I wanted to use a 3 state variable: unintialized, true, false. Hence the -1, 0, 1.

99

I tried to be consistent using the Sanitizer types. But I see your point.

110

Is your suggestion to get rid of the epoch component?

projects/compiler-rt/test/hardened_allocator/sizes.cc
54

This is from sanitizer_allocator.cc, that handles this condition.

dvyukov added inline comments.May 13 2016, 12:55 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
89

Does it improve generated code?
Leaving it as 24 is OK in that case, but it needs to be explained in comments. Width of that field has crucial implicit relation with Min/MaxAlignment. When double-checked width, I found that it's not what I would expect it to be. What means that either I missing something else important here, or there is a bug, or things get out of sync. This uncertainty is very unpleasant and takes time for anybody reading the code.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111

Yes, all that is predictable.
/dev/random is meant specifically for such cases, it uses various sources of entropy to create strongly random numbers.

On second though, just remove all rdtsc/cpuid/rdrand trickery and read from /dev/random. If rdrand is present, kernel will use it.

filcab added inline comments.May 13 2016, 7:55 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
136

Yeah, should be nicer. Unless it's a problem due to something I'm not thinking of, then I'd rather have more standard constructs (even though there's no guarantee of an __int128 type, let alone an std::atomic<__int128>, AFAICT libstdc++ and libc++ will have those implemented).

cryptoad updated this revision to Diff 57362.May 16 2016, 9:54 AM
cryptoad marked 26 inline comments as done.

This update addresses another batch of comments raised during the review.
Among the notable changes:

  • after discussion with dvyukov, the memory order for the atomic operation has to changed to relaxed;
  • the 'with_offset' field in the header is going away, and the offset field now always stores the distance between the backend allocation and the chunk.
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
189

For this, I used the same technique that is used in ASan's PlatformTSDDtor, as it seems to be the most viable one.
I am not sure what alternative would work here.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
43

This actually a straight copy from sanitizer_internal_defs.h, just replacing CheckFailed. So I left it as is.
This will go away when I redo the CHECK_IMPL logic to follow kcc@'s suggestion to implement templated failure functions.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111

I gave it a try and I have had a lot of issues with /dev/random on my system.
Between the fact that it's blocking, and that sometimes it won't return the amount of bytes requested, the tests have been failing inconsistently.
Tests with /dev/urandom worked better though.
I am going to dig further into that.

filcab added inline comments.May 16 2016, 10:13 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
135

Did std::atomic<unsigned __int128> not work/was too slow?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111

Using /dev/urandom should be what you need, yes.
Did you still have problems with urandom, btw?

cryptoad added inline comments.May 16 2016, 10:37 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
135

It's in the works :)

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111

/dev/udrandom appeared to work fine.

dvyukov added inline comments.May 17 2016, 1:19 AM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
189

If a free comes after we drained local cache, asan uses a global cache. Grep for "fallback" in asan_allocator.cc. Tsan now uses the same. It sucks. But I don't see how to do better. We need to detect when a thread is actually finished, but it's tricky to do with pthread_join API.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111

/dev/urandom is not what you need. It trades security for performance. I.e. instead of blocking it will just give you predictable randomness. Which kind of defeats the whole purpose of a security allocator.
/dev/random blocks when it does not have enough entropy. But there is not much you can do if you do need the entropy.
If it returns less bytes, read again. That's how it works with all read calls.

filcab added inline comments.May 17 2016, 6:02 AM
projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
111

People like Daniel Bernstein and Thomas Ptacek (and others) tend to disagree and say that /dev/urandom is what we need:
http://blog.cr.yp.to/20140205-entropy.html
http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/

I'm no expert in this, so I tend to rely on people who work on this kind of thing.

cryptoad updated this revision to Diff 57702.May 18 2016, 4:11 PM
cryptoad marked 24 inline comments as done.

This diff addresses another batch of comments from the review, as well as some renaming to converge towards LLVM coding standard compliance.
Among the notable changes:

  • a fallback mechanism has been added to service allocations and deallocation post thread tear-down as per dvyukov guidance;
  • the initialization has been moved around to not depend on .preinit_array; a test has been added to make sure that preinit allocations are serviced successfully;
  • std::atomic is used in place of the GCC builtins; the compiled code is identical.

There are still some comments left to address, notably regarding the source of randomness.

cryptoad updated this revision to Diff 58659.May 26 2016, 11:37 AM

With this diff ends the renaming process, so unless I missed or misunderstood something, this should be compliant with the LLVM coding standards.
Additional, I migrated the thread local PRNG initialization to use /dev/urandom for seeding purposes. This appears to not have significantly impacted the performances.

cryptoad marked 12 inline comments as done.May 26 2016, 12:53 PM

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.
kcc's suggestion was to change the CHECKs in the Allocator to make them call a templated failure function (or virtual) (http://reviews.llvm.org/D20084#425327).
I've realize since then that we also need that to be true for the Quarantine, and potentially any abstracted Sanitizer function (the per platform file access functions come to mind). Basically we really want CheckFailed to be ours anywhere in the project, so that when compiling the hardened allocator, none of the with-callbacks version will be compiled in.

I'd welcome some feedback as to how to do that as cleanly as possible, thanks!

kcc added a comment.May 26 2016, 1:04 PM

Yea...
So one possible way is to re-define __sanitizer::CheckFailed.
It should be relatively easy if we never going to allow mixing scudo and the sanitizers.
There is no way to mix scudo and asan/tsan/msan anyway, because they have conflicting allocators.
You may mix scudo and ubsan, since ubsan does not have an allocator, but the only sane way to have ubasan
in prod is to use it in trapping mode which does not have run-time. So we are good here too.

So, try this:

  • move __sanitizer::CheckFailed into a separate file.
  • make sure it is used by *san
  • make sure it is not used by scudo
  • defined your own __sanitizer::CheckFailed

I have not tested it, so something may go wrong, you'll need to experiment...

kcc added inline comments.May 27 2016, 3:19 PM
projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
30

So, you should not need this any more, let's remove it now.
I'll make on more pass afterwards (Monday-ish)

dvyukov edited edge metadata.May 28 2016, 11:44 PM

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.

Why can't we just set Die as CheckFailedCallback?

dvyukov added inline comments.May 28 2016, 11:46 PM
projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
100

urandom is not secure and can allow to guess the cookie in a local setuid binary.

cryptoad marked an inline comment as done.May 30 2016, 7:57 PM

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.

Why can't we just set Die as CheckFailedCallback?

What I am trying to prevent here is the use of callbacks at all. They would be an interesting target for an attacker as they would be writable function pointers that could be triggered on demand on heap corruption.

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
100

So on this matter it seems that the general agreement is that urandom on modern Linux system is secure and can be used for cryptographic purposes. Even the more recent getrandom system call uses urandom by default, with the following entry in the man page: "Unless you are doing long-term key generation (and perhaps not even then), you probably shouldn't be using GRND_RANDOM. The cryptographic algorithms used for /dev/urandom are quite conservative, and so should be sufficient for all purposes."
/dev/random performs poorly in my tests, often blocking the allocator.

One of the outstanding items on my list was to have a look at the CHECK logic to be able to have everything fast fail without callbacks if something went wrong.

Why can't we just set Die as CheckFailedCallback?

What I am trying to prevent here is the use of callbacks at all. They would be an interesting target for an attacker as they would be writable function pointers that could be triggered on demand on heap corruption.

But that would mean that an attacker broke ASLR and can write arbitrary values at necessary memory locations. Does it still make sense to defend in such case?

projects/compiler-rt/lib/hardened_allocator/scudo_utils.cc
100

Okay. You know better.

But that would mean that an attacker broke ASLR and can write arbitrary values at necessary memory locations. Does it still make sense to defend in such case?

That is correct. I think it is still worth it to not take the chance.
Previous work on other heaps have leveraged such features, given the same assumptions (for example the commit function pointer in the Windows Heap https://www.blackhat.com/presentations/bh-usa-09/MCDONALD/BHUSA09-McDonald-WindowsHeap-PAPER.pdf).
I think it's particularly important to make sure that the failure path fails fast and ideally without the possibility of interruption (like __fastfail http://www.alex-ionescu.com/?p=69).

cryptoad updated this revision to Diff 59082.May 31 2016, 9:26 AM
cryptoad edited edge metadata.

We now replace the Sanitizer's termination functions so that no callbacks can be called in CheckFailed and Die.

kcc added a comment.Jun 1 2016, 10:19 AM

Mostly LG.
Please address a few remaining nits and wait until tomorrow for more comments.
If no significant comments, rename the dir to scudo and I will land it.

Note: I did not review this code from security perspective in details because

  • not an expert
  • there are known security weaknesses in the backend allocator (we will need to handle them separately)
  • it'll be easier to do further security assessment once the code is committed.
docs/HardenedAllocator.rst
90

Did you?

95

Give an example instead of referring to "usual ASan syntax".
Scudo users don't have to be asan experts.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
64

align the comment block

110

I suggest to replace all cases of

if (!cond) {
  Printf()
  Die()
}

With

if (!cond)
  DieWithMessage();

This is using the Printf from sanitizer_common, right?
It might be worth replacing it with your own, simpler one.
If you agree, just leave a TODO near DieWithMessage and address it later.

279

s/late/later

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.h
19

is currently only supported?
os "supports x86_64"?

50

Does the following block of code have to be in this header?
Why not in .cc?

cryptoad updated this revision to Diff 59581.Jun 3 2016, 10:49 AM
cryptoad marked 7 inline comments as done.

Addressing some comments raised in the review, notably:

  • new dieWithMessage function wrapping a Printf+Die functionality - still currently using Sanitizer's VSNPrintf which will be changed later;
  • updated the documentation;
docs/HardenedAllocator.rst
90

I removed the part about the preinit_array as I do not use that anymore.
Whatever LIT is using requires the whole-archive flag, if using gcc to link the static library against a project, it doesn't.

95

I didn't realize that I hadn't updated the options names below as well. Also added ThreadLocalQuarantineSizeKb.

projects/compiler-rt/lib/hardened_allocator/scudo_allocator.cc
110

There is also a PrintfAndReportCallback callback that I just noticed.
I will have to address that later as well.

kcc accepted this revision.Jun 3 2016, 2:48 PM
kcc edited edge metadata.

LGTM.
I think it's as good as we can make it via code review.
Let's make it better by incremental changes.

Now, please rename the directories to lib/scudo and test/scudo, upload the updated patch and let me land it.
Probably also rename HardenedAllocator.rst to ScudoHardenedAllocator.rst or some such (up to you)

This revision is now accepted and ready to land.Jun 3 2016, 2:48 PM
cryptoad updated this revision to Diff 59638.Jun 3 2016, 4:07 PM
cryptoad edited edge metadata.

This patch renames the directories and files to fit the scheme suggested during the review, and the LLVM practices:

  • hardened_allocator is now scudo everywhere;
  • documentation is now in ScudoHardenedAllocator.rst;
  • all .cc files are now .cpp;
  • additionally scudo_malloc_linux.cc is now scudo_interceptors.cpp;
  • build files have been updated accordingly, as well as all references to the previous naming scheme (the library and checks rules are now 'scudo' and 'check-scudo').
kcc added a comment.Jun 3 2016, 5:31 PM

I am getting these when trying 'ninja check-scudo'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/atomic:266: undefined reference to `__sync_val_compare_and_swap_16'
Any suggestions?

BTW, do we make sure that check-scudo is not run if the current machine does not support proper SSE?

Most likely were missing -latomic (IIRC)
I wouldn't expect to have to link with that when using the std::atomic, but
that might be it.

Filipe

In D20084#449057, @kcc wrote:

I am getting these when trying 'ninja check-scudo'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/atomic:266: undefined reference to `__sync_val_compare_and_swap_16'
Any suggestions?

Regarding @filcab's comment, -latomic is in the cflags in the lit.cfg, not sure why this is not working for you.

BTW, do we make sure that check-scudo is not run if the current machine does not support proper SSE?

We do check for SSE 4.2 in the init of the Allocator via CHECK(testCPUFeature(SSE4_2))

In D20084#449057, @kcc wrote:

I am getting these when trying 'ninja check-scudo'
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../include/c++/4.8/atomic:266: undefined reference to `__sync_val_compare_and_swap_16'
Any suggestions?

I also use -mcx16 in g3 BUILD, that might be it.

kcc added a comment.Jun 6 2016, 9:29 AM

We do check for SSE 4.2 in the init of the Allocator via CHECK(testCPUFeature(SSE4_2))

That's not enough.
This is a run-time check and so if someone runs "check-all" on a machine that does not support SSE4_2
thye *will* run check-scudo and get a test failure.
Instead, we need to ensure that check-scudo is not executed as part of check-all when there is no proper HW support

And of course, fix check-scudo on my machine [ :) ] so that I can test it before committing.

cryptoad updated this revision to Diff 59744.Jun 6 2016, 10:07 AM

Updated Scudo LIT CMakeLists.txt to only add check-scudo on Linux x64 machines with SSE4.2.

This revision was automatically updated to reflect the committed changes.
kcc added a comment.Jun 6 2016, 6:28 PM

Thanks again for contributing this code, let's now make the allocator even more hardened!