This is an archive of the discontinued LLVM Phabricator instance.

[scudo] Separate hardware CRC32 routines
ClosedPublic

Authored by cryptoad on Jan 6 2017, 3:29 PM.

Details

Summary

As raised in D28304, enabling SSE 4.2 for the whole Scudo tree leads to the
emission of SSE 4.2 instructions everywhere, while the runtime checks only
applied to the CRC32 computing function.

This patch separates the CRC32 function taking advantage of the hardware into
its own file, and only enabled -msse4.2 for that file, if detected to be
supported by the compiler.

Another consequence of removing SSE4.2 globally is realizing that memcpy were
not being optimized, which turned out to be due to the -fno-builtin in
SANITIZER_COMMON_CFLAGS. So we now explicitely enable builtins for Scudo.

The resulting assembly looks good, with some CALLs are introduced instead of
the CRC32 code being inlined.

Event Timeline

cryptoad updated this revision to Diff 83444.Jan 6 2017, 3:29 PM
cryptoad retitled this revision from to [scudo] Separate hardware CRC32 routines.
cryptoad updated this object.
cryptoad added reviewers: kcc, alekseyshl, mgorny.
cryptoad added a subscriber: llvm-commits.
mgorny added inline comments.Jan 7 2017, 12:44 AM
lib/scudo/scudo_allocator.cpp
85

Stray empty line, it seems.

lib/scudo/scudo_crc32.cpp
42

This is still not entirely correct. Strictly speaking, the compiler can emit SSE4.2 instructions here, i.e. making them apply to pre-hardware or non-hardware call branch. However, it's highly unlikely it will use SSE4.2 to do comparison or function call, so I'm not going to insist on moving this elsewhere.

However, indirection into two function calls may actually affect performance, I think. Since hardware and software variants have the same signature, maybe it'd be better to replace the enum with function pointer, store hardware or software function in it and just call that function pointer in scudo_allocator?

Also, this reminds me I wanted to point out that ARM hardware variant is never used since hardware function is only called when SSE4.2 is available.

cryptoad updated this revision to Diff 83540.Jan 7 2017, 3:21 PM
cryptoad marked an inline comment as done.

Removing empty line.

lib/scudo/scudo_crc32.cpp
42

Thank you Michal for your comments, I appreciate you having a look.

You are right about the function pointer, it would make things easier. The reasoning behind ot using one was that a call to a writeable function pointer at the core of the allocator could be leveraged by an attacker for arbitrary code execution purposes - which we are trying to avoid. Hence me trying to jump through hoops to avoid that.

Regarding the code generation, I tried to re-arrange things different ways without be able to get rid of the calls. If a solution jumps at you, I'll be happy to change things.

As for ARM, the reason there is no hardware path available yet is only due to me not having the correct devices to test it properly yet. I have an ARM32 board (without CRC capabilities), and just managed to get an AArch64 compliant OS on a Raspberry Pi 3, so it will get there shortly. I managed to get things working with -march=armv8.1-a+crc, and a runtime check based on getauxval(AT_HWCAP).

mgorny accepted this revision.Jan 9 2017, 1:57 PM
mgorny edited edge metadata.

Well, I've tested it and it seems to work for me. I think it's good enough to go as-is.

For a potential future improvement, this is how I'd do it:

  1. Make the external file purely computeHardwareCRC32(),
  2. Create an inline function in allocator that has the conditionals and calls computeHardwareCRC32 or computeSoftwareCRC32 appropriately.

This way, the hardware file is used only when hardware is selected, and the inline function can be inlined to avoid extra recurrence.

Furthermore, I would consider just skipping the runtime detection and using hardware code if __SSE4_2__ is defined in allocator, i.e. -march= used to build compiler-rt already enables SSE4.2. In this case random bits of code may require SSE4.2 already, so there's no point in excluding the CRC32.

This revision is now accepted and ready to land.Jan 9 2017, 1:57 PM

Thank you Michal.
Good idea about skipping the detection is SSE4.2 is enabled in the allocator.

alekseyshl accepted this revision.Jan 9 2017, 3:14 PM
alekseyshl edited edge metadata.
cryptoad closed this revision.Jan 10 2017, 8:50 AM