Page MenuHomePhabricator

Use sys::Memory::AllocateRWX for JIT code
AbandonedPublic

Authored by krytarowski on Jul 18 2017, 5:58 AM.

Details

Reviewers
joerg
rnk
lhames
Summary

NetBSD ships with PaX MPROTECT (known as W^X).

This means that if a memory page was writable, it cannot be switched to executable.
Allocating such pages requires extended interface that is handled inside AllocateRWX.

This fixes circa 200 unexpected failures in LLVM tests ("check-llvm") on NetBSD 8.0(beta).
All JIT, ExecutionEngine and similar failures are resolved.

Sponsored by <The NetBSD Foundation>

Diff Detail

Repository
rL LLVM

Event Timeline

krytarowski created this revision.Jul 18 2017, 5:58 AM
rnk accepted this revision.Jul 19 2017, 2:57 PM
This revision is now accepted and ready to land.Jul 19 2017, 2:57 PM
lhames edited edge metadata.Jul 19 2017, 3:05 PM

Could you hold off committing while I review?

Could you hold off committing while I review?

OK

rnk requested changes to this revision.Jul 19 2017, 3:18 PM

Yeah, I spoke too soon. I didn't realize that AllocateRWX now does different things on different OSs. We should revisit that.

This revision now requires changes to proceed.Jul 19 2017, 3:18 PM

Is AllocateRWX returning RWX blocks? If so, is anyone clearing the W bit later?

For architectures that support it, I think I would like to keep the existing scheme: JIT'd memory is initialized while in RW- mode, then switched to RX- mode prior to execution. That means JIT'd code has no easy way to modify itself during execution.

I'm also open to changing the model (I'm not a security expert) but that would require more consideration.

Is AllocateRWX returning RWX blocks? If so, is anyone clearing the W bit later?

For architectures that support it, I think I would like to keep the existing scheme: JIT'd memory is initialized while in RW- mode, then switched to RX- mode prior to execution. That means JIT'd code has no easy way to modify itself during execution.

I'm also open to changing the model (I'm not a security expert) but that would require more consideration.

Right, the pages are allocated RWX.

As far as I can tell, this interface is not prepared to disable R, W, X properties at least in a portable way.

I've missed that we need to use ReleaseRWX() for the RWX regions.

Switching a page from the W mode to X is prohibited with PaX MPROTECT (on NetBSD).

Right, the pages are allocated RWX.

It seems odd to prevent the flipping of the W/X bits while allowing RWX pages. I assume there are additional privileges required (granted only to JITs and debuggers?) to get RWX pages? Is there a good source for me to read up about the PaX MPROTECT design?

I think we need to consider this design further before we move to it by default, but I am keen to look into this: to the extent that we can support RWX pages there are some excellent optimizations we can make for JIT'd code that have been on my wish-list for a while.

We could develop the new scheme alongside the old one by introducing parallel RWX-based stub and callback managers.

joerg edited edge metadata.Jul 19 2017, 5:08 PM

Right, the pages are allocated RWX.

It seems odd to prevent the flipping of the W/X bits while allowing RWX pages. I assume there are additional privileges
required (granted only to JITs and debuggers?) to get RWX pages? Is there a good source for me to read up about the PaX MPROTECT design?

They are not allocated RWX, they are allocated RW with the option for later X. I.e. the kernel enforces W^X, but you can request
additional protections for later use. Without that, mprotect with X would be rejected later. There are patching for Linux for similar effect, but I don't think
they ever got merged. libffi is a victim of that on Linux.

Right, the pages are allocated RWX.

It seems odd to prevent the flipping of the W/X bits while allowing RWX pages. I assume there are additional privileges required (granted only to JITs and debuggers?) to get RWX pages? Is there a good source for me to read up about the PaX MPROTECT design?

I think that the whole design is about remapping all (R)WX pages to (R)W and not allowing to remap it as X in future.
AllocateRWX uses OS-specific interfaces to workaround it (if possible).

By default attaching to a process with ptrace(2) eliminates this restriction for debuggers. This is needed to insert software breakpoints.

I think we need to consider this design further before we move to it by default, but I am keen to look into this: to the extent that we can support RWX pages there are some excellent optimizations we can make for JIT'd code that have been on my wish-list for a while.

We could develop the new scheme alongside the old one by introducing parallel RWX-based stub and callback managers.

I recommend to discuss it with @joerg as he is the author of our JIT interface in mmap(2).
I have no personal preferences here, I want to get LLVM JIT functional.

rnk added a comment.Jul 19 2017, 5:24 PM

They are not allocated RWX, they are allocated RW with the option for later X. I.e. the kernel enforces W^X, but you can request
additional protections for later use. Without that, mprotect with X would be rejected later. There are patching for Linux for similar effect, but I don't think
they ever got merged. libffi is a victim of that on Linux.

That makes sense, but it seems like the right thing then is for LLVM to change its APIs to allocate RW pages everywhere and then flip them between RW and RX. We should remove this error-prone AllocateRWX API.

Right, the pages are allocated RWX.

It seems odd to prevent the flipping of the W/X bits while allowing RWX pages. I assume there are additional privileges
required (granted only to JITs and debuggers?) to get RWX pages? Is there a good source for me to read up about the PaX MPROTECT design?

They are not allocated RWX, they are allocated RW with the option for later X. I.e. the kernel enforces W^X, but you can request
additional protections for later use. Without that, mprotect with X would be rejected later. There are patching for Linux for similar effect, but I don't think
they ever got merged. libffi is a victim of that on Linux.

Right, this is NetBSD-specific implementation detail of AllocateRWX.

libffi examples (ffi_closure_alloc):
https://github.com/libffi/libffi/blob/6e2e041b6df6a3c3a5ca8a750dedbbd465e5ca97/src/closures.c

joerg added a comment.Jul 20 2017, 1:53 AM

Keep in mind that the SELinux case in libffi is not fork-safe. One important part LLVM needs to consider is
whether it wants to enshrine the performance penalty of mprotect-after-commit in its APIs or not. The second part
is whether platforms should aim to support hot-patchable JIT for multi-threaded environments or not. If the latter is
not considered relevant, the API only needs to provide a function to allocate JIT-safe memory and a function to make
it executable. If the latter is relevant, the current AllocateRWX is the interface you will end up with, one way or the other.

They are not allocated RWX, they are allocated RW with the option for later X. I.e. the kernel enforces W^X, but you can request additional protections for later use.

When you say "additional protections" do you mean you can add the 'X' permission later (to get RWX), or just that you can later toggle from RW- to R-X?

That makes sense, but it seems like the right thing then is for LLVM to change its APIs to allocate RW pages everywhere and then flip them between RW and RX. We should remove this error-prone AllocateRWX API.

From a quick grep it seems like AllocateRWX is only called from the llvm-rtdyld tool. I'll see how difficult it is to remove it. If it's easy enough we can kill it, then discuss more future-proof APIs.

One important part LLVM needs to consider is whether it wants to enshrine the performance penalty of mprotect-after-commit in its APIs or not. The second part is whether platforms should aim to support hot-patchable JIT for multi-threaded environments or not.

For the JIT I'm hoping we can avoid having to choose by providing different implementations of a common higher-level API. Probably the StubsManager/CallbackManager level: If your platform supports hot-patching, use a HotPatchableStubsManager and you get the extra performance. If not, use a SlowButSafeStubsManager and everything still works. I hope to have time to prototype and sanity check that idea in a week or so.

joerg added a comment.Jul 21 2017, 2:10 AM

They are not allocated RWX, they are allocated RW with the option for later X. I.e. the kernel enforces W^X, but you can request additional protections for later use.

When you say "additional protections" do you mean you can add the 'X' permission later (to get RWX), or just that you can later toggle from RW- to R-X?

The kernel strictly enforces W^X, so you can't request RWX. You can say "I want RW now and later maybe X" and with that to you can toggle from RW to RX (well, and back).

That makes sense, but it seems like the right thing then is for LLVM to change its APIs to allocate RW pages everywhere and then flip them between RW and RX. We should remove this error-prone AllocateRWX API.

From a quick grep it seems like AllocateRWX is only called from the llvm-rtdyld tool. I'll see how difficult it is to remove it. If it's easy enough we can kill it, then discuss more future-proof APIs.

One important part LLVM needs to consider is whether it wants to enshrine the performance penalty of mprotect-after-commit in its APIs or not. The second part is whether platforms should aim to support hot-patchable JIT for multi-threaded environments or not.

For the JIT I'm hoping we can avoid having to choose by providing different implementations of a common higher-level API.
Probably the StubsManager/CallbackManager level: If your platform supports hot-patching, use a HotPatchableStubsManager
and you get the extra performance. If not, use a SlowButSafeStubsManager and everything still works. I hope to have time
to prototype and sanity check that idea in a week or so.

There are two things to consider here: the short term use for 5.0 (and I support Kamil's patch in that regard) and the long term goal.
The current code is a strict no-go from the NetBSD perspective: you want to turn a pure RW mapping executable, that's not allowed.
We have an API for expressing the desired interface, so the short term fix can be the current patch.

Long term, the situation is a bit different. The problem with your approach is that it likely requires additional non-nop instructions as
patch point to be thread-safe. Consider incremental optimization in the JVM-JIT sense: you want to patch jumps at the beginning of
the function to point to any existing users to the new version. You can't just turn the page from RX to RW for that though, otherwise other
threads will fault.

There are two things to consider here: the short term use for 5.0 (and I support Kamil's patch in that regard) and the long term goal.
The current code is a strict no-go from the NetBSD perspective: you want to turn a pure RW mapping executable, that's not allowed.
We have an API for expressing the desired interface, so the short term fix can be the current patch.

The current patch switches from requesting RW- pages to RWX, which I don't think is valid on iOS although I'll have to double check that.

That said, it sounds the iOS requirement (assuming I'm remembering it right) and the NetBSD one are almost identical. It sounds like we just need to switch from raw allocateMappedMemory calls to a new pair:

allocateFutureExecutablePage(...)
makePageExecutable(...)

On iOS this just forwards to allocateMappedMemory. On NetBSD the first call could do whatever magic is required to make the page flippable. Does that sound reasonable?

This seems achievable before rc2, and potentially safer since it doesn't change the behavior on existing platforms.

Long term, the situation is a bit different. The problem with your approach is that it likely requires additional non-nop instructions as
patch point to be thread-safe. Consider incremental optimization in the JVM-JIT sense: you want to patch jumps at the beginning of
the function to point to any existing users to the new version. You can't just turn the page from RX to RW for that though, otherwise other
threads will fault.

It seems as if there are several approaches available here, depending on the target platform: If you have RWX pages you can just patch without changing permissions. Likewise if you can map the same page twice (once with RW- and once with R-X). As for nops, you can either not use them at all (do everything through indirection, though we still need to do the initial stub setup), we can make function starts patchable, or we can make call-sites patchable. Picking the right high level API should allow us experiment and/or support multiple implementations.

The current patch switches from requesting RW- pages to RWX, which I don't think is valid on iOS although I'll have to double check that.

Sorry - I just realized this doesn't address your earlier comment:

They are not allocated RWX, they are allocated RW with the option for later X. I.e. the kernel enforces W^X, but you can request
additional protections for later use. Without that, mprotect with X would be rejected later. There are patching for Linux for similar effect, but I don't think
they ever got merged. libffi is a victim of that on Linux.

So assuming the AllocateRWX call doesn't actually allocate RWX pages (and the latter calls to mprotect to flip the permissions work) we can go ahead with this patch as is, though I'd love to rename AllocateRWX on mainline to remove the confusion.

I have no opinion on further design, if this patch is fine as of now - an intermediate solution for 5.0.0(svn) - great.

krytarowski added a comment.EditedJul 27 2017, 4:23 AM

Ping? Can I push this? I have another fix for MPROTECT on NetBSD with the original sys::Memory::allocateMappedMemory(), not directly related to RWX here. However both patches interfere and can break JIT for current users - without merging this one.

Ok - I've finally had a chance to read AllocateRWX (sorry - it has been a busy week) and it does indeed try to allocate RWX pages, at least on Darwin and Linux. I assume the 'PROT_MPROTECT(PROT_EXEC)' bit for NetBSD is the W^X extension you referenced earlier.

The proper API for existing JIT use-cases should consist of two parts: allocateFutureExecutablePage(...) which should allocate RW- pages on all platforms (as well as marking the page as 'future-executable' for NetBSD), and makePageExecutable(...) which should toggle from RW- to R-X mode on all platforms.

I'll be in #llvm on IRC as 'lhames' for the next couple of hours if you would like to chat about the design.

krytarowski abandoned this revision.Aug 4 2017, 5:56 AM

This should be done differently.