This is an archive of the discontinued LLVM Phabricator instance.

[scudo] 32-bit and hardware agnostic support
ClosedPublic

Authored by cryptoad on Nov 7 2016, 11:18 AM.

Details

Summary

This update introduces i386 support for the Scudo Hardened Allocator, and
offers software alternatives for functions that used to require hardware
specific instruction sets. This should make porting to new architectures
easier.

Among the changes:

  • The chunk header has been changed to accomodate the size limitations encountered on 32-bit architectures. We now fit everything in 64-bit. This was achieved by storing the amount of unused bytes in an allocation rather than the size itself, as one can be deduced from the other with the help of the GetActuallyAllocatedSize function. As it turns out, this header can be used for both 64 and 32 bit, and as such we dropped the requirement for the 128-bit compare and exchange instruction support (cmpxchg16b).
  • Add 32-bit support for the checksum and the PRNG functions: if the SSE 4.2 instruction set is supported, use the 32-bit CRC32 instruction, and in the XorShift128, use a 32-bit based state instead of 64-bit.
  • Add software support for CRC32: if SSE 4.2 is not supported, fallback on a software implementation.
  • Modify tests that were not 32-bit compliant, and expand them to cover more allocation and alignment sizes. The random shuffle test has been deactivated for linux-i386 & linux-i686 as the 32-bit sanitizer allocator doesn't currently randomize chunks.

Event Timeline

cryptoad updated this revision to Diff 77066.Nov 7 2016, 11:18 AM
cryptoad retitled this revision from to [scudo] 32-bit and hardware agnostic support.
cryptoad updated this object.
cryptoad added a subscriber: llvm-commits.
filcab added a subscriber: filcab.Nov 8 2016, 10:47 AM

Here's a few things I found on a quick read.
Thanks for doing this,
Filipe

lib/scudo/scudo_allocator.cpp
38

Maybe just use __scudo::DefaultSizeClassMap and get rid of the above typedef?

350

It seems all these values are constant. Can you turn this into a static_assert near their definitions?

361

Same thing as above.

512–513

I'd rather have separate utility functions for this.
One "unpacks" the header onto some var on the stack, then you can getUsableSize on it. Maybe even have one which checks if the chunk is good (allocated), but that might be abstracting too much unless we need those verifications often.

512–513

Do we need this if? What goes wrong if we emit it?

lib/scudo/scudo_allocator.h
97

Do we need this comment?

lib/scudo/scudo_utils.cpp
89–90

You'll get data races on CPUInfo.
Since those numbers never change, how about writing to the static one register by register, atomically?
Relaxed stores would be enough (not contributing to data races, at most you'd write more than once the exact same value).
You'll also need to load them with atomic loads (which on x86 is a simple mov).

134–195

const

cryptoad marked an inline comment as done.Nov 8 2016, 12:27 PM

Thanks Filipe, I appreciate you having a look at this. I fixed some of raised issues in my working copy, I have some comments/questions in return for you for clarification before I change more things.

lib/scudo/scudo_allocator.cpp
38

SizeClassMap::kMaxSize is used below to compute the maximum offset for a chunk.
In order to be able to use the same code construct for 32 & 64-bit, I used the first typedef, which led to the second one in the AP structure.

512–513

The reasoning behind this was that malloc_usable_size could potentially be the first allocator related function called in a program which would lead to bad things if the initialization wasn't done. I agree that this should probably not happen, so it becomes a matter of CHECK vs doing it.

512–513

To clarify this, would it boil down to moving the loadHeader logic outside of the sizing function?

lib/scudo/scudo_utils.cpp
89–90

This is only called from the global initialization routine call via pthread_once, hence me not making it thread safe.
I am happy to change it if you feel it could become a problem, let me know.

cryptoad updated this revision to Diff 77413.Nov 9 2016, 3:52 PM

Addressing some comments by filcab raised during the review.

  • getUsableSize is now a ScudoChunk function requiring a copy of the header to work on, the associated code has been modified accordingly;
  • added an isValid function to ScudoChunk, allowing for checksum verification without death on failure, which is now used by __sanitizer_get_ownership; added tests verifying that this works;
  • corrected a few irregularities discovered while moving things around.

It looks like the patch is good, but I'll bail on giving the patch a go-ahead since I'm not too familiar with the allocator itself and some low-level problems you might be running into. I'll let Kostya or others decide on green lighting.

lib/scudo/scudo_allocator.cpp
512–513

Fair enough. I guess this can come from code which we don't control, so keeping this should be ok. Maybe a comment?

512–513

Yes. The way you changed it looks good.

lib/scudo/scudo_utils.cpp
89–90

Indeed, all is good.

Aleksey, just FYI

kcc added inline comments.Nov 15 2016, 5:08 PM
lib/scudo/scudo_allocator.cpp
44

Do you want to leave some comment saying that the 32-bit backend allocator itself is much less secure than the 64-bit one?

136

I would recommend if (SANITIZER_WORD_SIZE == 64)
instead of #if
at least if that compiles.

having #ifdefs inside a function makes maintenance harder.

lib/scudo/scudo_utils.cpp
180

here too

lib/scudo/scudo_utils.h
44

don't introduce #ifdefs, please.

I'd make it two separate classes and then typedef to one or the other using FIRST_32_SECOND_64

cryptoad marked 16 inline comments as done.Nov 16 2016, 11:35 AM

New diff coming up addressing the comments.
I am purposely trying to avoid having the CRC32 be a function pointer pointing to either the software or hardware version as I feel having a call to writeable function pointer in a critical checking function would not be advisable. This makes the code somewhat clunky, and I am open to suggestions.

cryptoad updated this revision to Diff 78232.Nov 16 2016, 11:39 AM
cryptoad edited edge metadata.

Addressing comments raised by kcc@:

  • reworking code to remove SANITIZER_WORDSIZE ifdefs in the CRC32;
  • split the Xorshift PRNG in a 32 and 64 class.
cryptoad updated this revision to Diff 78272.Nov 16 2016, 3:15 PM

Slight modification:

  • getting rid of the 32-bit version of the Xorshift128+ as the compiler does a fine job at compiling the 64-bit one for i386;
  • stumbled onto a bunch of defines in cpuid.h, it makes more sense to use them rather than define our own.
kcc added a comment.Nov 21 2016, 5:06 PM

LGTM.
Please give one more day to Alexey to comment.

alekseyshl edited edge metadata.Nov 21 2016, 6:33 PM

LGTM

lib/scudo/scudo_utils.cpp
88

Why not let compiler handle init of the static, we won't need a flag then?

static CPUIDInfo CPUInfo = getCPUIDInfo();

89–90

Why not just:

if (!InfoInitialized) {
  if (isSupportedCPU())
cryptoad marked 2 inline comments as done.Nov 22 2016, 10:05 AM

Thank you Kostya & Aleksey.
Uploading a new diff addressing Aleksey's comments. PTAL.

cryptoad updated this revision to Diff 78891.Nov 22 2016, 10:09 AM
cryptoad edited edge metadata.

Addressing alekseyshl@ comments:

  • refactor the CPU feature code to take advantage of the static initialization; this allows to get rid of a flag, a simplifies a bit the code.
cryptoad updated this revision to Diff 79416.Nov 28 2016, 9:41 AM

Using RoundDownTo instead of the binary mask operation.

cryptoad updated this revision to Diff 79468.Nov 28 2016, 3:40 PM

After internal discussions, extend a bit this patch with regard to SSE 4.2

We changed the SSE 4.2 logic to no longer be required, but be optional.
We now check if -msse4.2 is supported by the compiler, and emit the hardware
optimized CRC32 code if it is (-msse4.2 was compulsory before). At runtime, we
check for CPU support.

This introduces a couple of ifdef in the checkum function, gets rid of the
/proc/cpuinfo cmake test.

This would allow the allocator to run on a wider range of hardware.

kcc accepted this revision.Nov 29 2016, 5:18 PM
kcc edited edge metadata.

LGTM

This revision is now accepted and ready to land.Nov 29 2016, 5:18 PM
cryptoad closed this revision.Nov 30 2016, 9:42 AM

This breaks building with gcc:

FAILED: projects/compiler-rt/lib/scudo/CMakeFiles/clang_rt.scudo-x86_64.dir/scudo_allocator.cpp.o
/usr/bin/g++ -D_DEBUG -D_GNU_SOURCE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iprojects/compiler-rt/lib/scudo -I/home/markus/llvm/projects/compiler-
rt/lib/scudo -Iinclude -I/home/markus/llvm/include -I/home/markus/llvm/projects/compiler-rt/lib/scudo/.. -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-s
trings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -Werror=date-time -std=c++11 -ffunction
-sections -fdata-sections -Wall -std=c++11 -Wno-unused-parameter -O3 -march=native -Wno-implicit-fallthrough -pipe -fnull-this-pointer -flifetime-dse=1 -UNDEBUG -m64 -fPIC -fno
-builtin -fno-exceptions -fomit-frame-pointer -funwind-tables -fno-stack-protector -fvisibility=hidden -fvisibility-inlines-hidden -fno-function-sections -fno-lto -O3 -g -Wno-vari
adic-macros -Wno-non-virtual-dtor -fno-rtti -msse4.2 -MD -MT projects/compiler-rt/lib/scudo/CMakeFiles/clang_rt.scudo-x86_64.dir/scudo_allocator.cpp.o -MF projects/compiler-rt/lib
/scudo/CMakeFiles/clang_rt.scudo-x86_64.dir/scudo_allocator.cpp.o.d -o projects/compiler-rt/lib/scudo/CMakeFiles/clang_rt.scudo-x86_64.dir/scudo_allocator.cpp.o -c /home/markus/ll
vm/projects/compiler-rt/lib/scudo/scudo_allocator.cpp
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/cstdlib:77:0,

from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/stdlib.h:36,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/mm_malloc.h:27,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/xmmintrin.h:34,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/smmintrin.h:32,
from /home/markus/llvm/projects/compiler-rt/lib/scudo/scudo_allocator.cpp:86:

/usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/bits/std_abs.h:52:11: error: ‘::abs’ has not been declared

using ::abs;
        ^~~

In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/stdlib.h:36:0,

from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/mm_malloc.h:27,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/xmmintrin.h:34,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/emmintrin.h:31,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/pmmintrin.h:31,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/tmmintrin.h:31,
from /usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/smmintrin.h:32,
from /home/markus/llvm/projects/compiler-rt/lib/scudo/scudo_allocator.cpp:86:

/usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/cstdlib:124:11: error: ‘::div_t’ has not been declared

using ::div_t;
        ^~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/cstdlib:125:11: error: ‘::ldiv_t’ has not been declared

using ::ldiv_t;
        ^~~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/cstdlib:127:11: error: ‘::abort’ has not been declared

using ::abort;
        ^~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/cstdlib:128:11: error: ‘::atexit’ has not been declared

using ::atexit;
        ^~~~~~

/usr/lib/gcc/x86_64-pc-linux-gnu/7.0.0/include/g++-v7/cstdlib:131:11: error: ‘::at_quick_exit’ has not been declared

using ::at_quick_exit;
        ^~~~~~~~~~~~~

etc...

octoploid, I can't seem to reproduce this when building with gcc and the build bot passes:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/320/steps/check-scudo%20in%20gcc%20build/logs/stdio

I'd have to go on a limb and suggest the following: would you mind trying to add:

#  define _MM_MALLOC_H_INCLUDED
#  define __MM_MALLOC_H

before:

#  include <smmintrin.h>

and tell me if that changes things?

The issue got fixed by r288486.