Page MenuHomePhabricator

Add Forward-Edge Control-Flow Integrity support
ClosedPublic

Authored by tmroeder on Jun 16 2014, 4:14 PM.

Details

Reviewers
nlewycky
Summary

This patch adds a new pass that can inject checks before indirect calls to make sure that these calls target known locations. It supports three types of checks and, at compile time, it can take the name of a custom function to call when an indirect call check fails.

This foward CFI implementation depends on the recent jumptable attribute, as well as http://reviews.llvm.org/D4128, which adds a flag to cause all address-taken functions to get the attributes unnamed_addr and jumptable.

This CFI implementation can also easily be augmented with ExternalFunctionAnalysis (http://reviews.llvm.org/D2873); this allows the rewriter to skip rewriting for functions that are known at compile time to take functions defined outside the current module. Another possibility for integration with EFA is to call a different failure function for sites that use known external function pointers than for the other indirect call check sites.

See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-February/070210 for the initial discussion of this feature on llvmdev.

This pass incidentally moves the function JumpInstrTables::transformType from private to public and makes it static (with a new argument that specifies the table type to use); this is so that the CFI code can transform function types at call sites to determine which jump-instruction table to use for the check at that site.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kcc added a comment.Jul 7 2014, 4:50 AM

I've made a very quick glance. Before the second round I'd like to see tests that use opt (IR=>IR) instead of (or in addition to) llc tests (IR=>ASM).

Maybe in a separate patch I'd like to see end-to-end tests (like we have in compiler-rt/test).

off-topic: are you going to handle RET instructions?

lib/CodeGen/ForwardControlFlowIntegrity.cpp
69

Please use 'static' instead of anon namespace.
From http://llvm.org/docs/CodingStandards.html#anonymous-namespaces
make anonymous namespaces as small as possible, and only use them for class declarations.

test/CodeGen/X86/cfi_non_default_function.ll
11

add
CHECK-LABEL: @m

test/CodeGen/X86/cfi_simple_indirect_call.ll
13

use SUB-LABEL, etc

kcc added a comment.Jul 7 2014, 4:52 AM

Ah, I see you can't use opt tests because you are actually running as part of llc, sorry.

kcc added a comment.Jul 7 2014, 5:27 AM

some more comments.

include/llvm/CodeGen/ForwardControlFlowIntegrity.h
57

Can this be SmallVector?

lib/CodeGen/ForwardControlFlowIntegrity.cpp
106

maybe inline this in the header?

lib/CodeGen/JumpInstrTables.cpp
50

why did you remove .?

test/CodeGen/X86/cfi_simple_indirect_call.ll
13

also add a test with invoke

Comments on the current set of comments. I'm also about to update this with a patch that handles the other comments.

include/llvm/CodeGen/ForwardControlFlowIntegrity.h
57

I don't know SmallVector well, so I'm not sure. It's true that I don't need efficient search over this data structure, since it's just filled and iterated, so at least it could be a vector. Does SmallVector suffer if it gets large?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
903

I've found that sometimes the jumptable code was getting emitted in a section that was not .text. It might be possible to fix this by emitting the code somewhere else, but it looked to me like each chunk of output was emitting a section switch if it needed to make sure it was in a given section. This shouldn't add an unnecessary section switch if there's no jumptable, since it's guarded by the condition

if (JITI && !JITI->getTables().empty())

There's no way to turn jumptable off, though, since we agreed in the original jumptable code review that the mere presence of the jumptable attribute should be enough to trigger the inclusion of a jumptable.

As for removing the MCSA_Global emission, this fixes a problem I noticed recently with separate compilation. As it stands, if you compile two different modules with jumptable annotations then try to link them, the link will fail, since both will have, e.g., __llvm_jump_instr_table_0_1 as global symbols. But that symbol doesn't need to be global, since it can only be referenced by the current module. Removing the MCSA_Global annotation allows multiple modules to be built with jumptable and to link correctly.

Note that pointers to __llvm_jump_instr_table_0_1 can be handed to other modules, but other modules can't directly refer to this pointer in their code; this was part of the whole problem with &f != <received pointer to f> that led to the requirement for unnamed_addr on all jumptable-annotated functions.

lib/CodeGen/ForwardControlFlowIntegrity.cpp
106

I was asked in a review for a different class to take the destructor out of the header and put it in the cc

tmroeder updated this revision to Diff 11295.Jul 10 2014, 2:42 PM
tmroeder set the repository for this revision to rL LLVM.

This patch addresses the latest set of comments on the FCFI code.

kcc added inline comments.Jul 11 2014, 2:20 AM
include/llvm/CodeGen/ForwardControlFlowIntegrity.h
57

SmallVector becomes just std::vector when it becomes large.

jfb added inline comments.Jul 12 2014, 2:20 PM
include/llvm/CodeGen/CommandFlags.h
245

Could you clarify that the above two options are linked? Or just have cfi-func-name, implying a call to abort otherwise? It kind of seems like the default cfi-func could just call abort?

Also, what's the function's expected signature?

What happens if the cfi-func itself gets CFI instrumented, and it fails the check? *That* should probably call abort directly, not recurse :)

include/llvm/Target/TargetOptions.h
54

Use enum class.

lib/CodeGen/ForwardControlFlowIntegrity.cpp
147

When does this happen?

152

Same.

159

nullptr.

199

Could you fix this?

245

Shouldn't this be a fatal error?

279

I'd use a switch here, so LLVM warns if a new CFIntegrity is added and unhandled.

298

Shouldn't you get the pointer size from the Target here too?

355

nullptr.

lib/CodeGen/JumpInstrTables.cpp
264

nullptr is already the default for this function, no need to specify it.

tmroeder added inline comments.Jul 14 2014, 2:34 PM
include/llvm/CodeGen/CommandFlags.h
245

Is there an abort function that's guaranteed to be present for any IR? Right now, my "abort" is to execute a trap instruction.

lib/CodeGen/ForwardControlFlowIntegrity.cpp
152

I think you're right here and in the previous comment: this shouldn't ever happen.

199

I'd like to fix this, but I don't know where I can get this information in a principled manner. This requires the size of a jumptable entry, and that depends on the size of the jump instruction in bytes. In principle, this is available from MCInstrDesc::getSize(), but that returns 0 when I've tried it for JMP_4 on X86.

I can certainly encode it myself in the Targets, but that seems like dangerous duplication of functionality with the descriptions in the instruction tables.

298

It's not a question of pointer size but rather the size of an unconditional jump instruction. I haven't yet figured out where to get that information.

jfb added inline comments.Jul 14 2014, 7:04 PM
include/llvm/CodeGen/CommandFlags.h
245

Good point, there may not be a C library... Could you default to abort, and fall back to @llvm.trap if it doesn't exits?

lib/CodeGen/ForwardControlFlowIntegrity.cpp
199

Oh I see.

It's not just a single instruction on each target though? If so then the content of jumptable entries does seem pretty target specific, so it kind of sounds like something that would belong in the target. Size is kind of icky though, it would be nice to factor out the target's instruction pattern from their actual encoding.

tmroeder added inline comments.Jul 15 2014, 11:44 AM
lib/CodeGen/ForwardControlFlowIntegrity.cpp
199

Well, the instruction itself is already generated in the Target; in the jumptable patch, I added a pair of virtual methods to include/llvm/Target/TargetInstrInfo.h and instantiated it for X86 and ARM in their Target directories.

It does happen to be a single instruction on each architecture I've tried so far (X86, ARM, PPC), since it's just an unconditional branch to the target function. Maybe I should just add another method to TargetInstrInfo? But like I said, I don't know any clean way to get this from the encoding...I hope that's just my ignorance of the details of how Targets manage their instruction encodings: surely there must be a way to get the upper bound of the length of an unconditional jump instruction?

tmroeder updated this revision to Diff 11959.Jul 28 2014, 2:31 PM
tmroeder set the repository for this revision to rL LLVM.

This version of the patch adds another virtual call to TargetInstrInfo to get a bound on the size of a jumptable entry, as discussed on llvmdev. This is used stored in JumpInstrTableInfo and used by ForwardControlFlowIntegrity to generate masks for jumps to jumptable entries. PTAL.

jfb added inline comments.Jul 28 2014, 6:29 PM
include/llvm/Analysis/JumpInstrTableInfo.h
41

I'm not a fan of default constructor parameters, but it seems like the right thing here.

The power-of-two invariant should also be specified here. Also, you should name the variable ByteAlign (it's pretty obvious it's not BitAlign, but I like explicit), as well as all the other Align.

61

This invariant isn't actually enforced in the code, instead the code tries to round up on use. I think it would be better to enforce the invariant on construction.

66

It's not clear what Bound is, it would be good to summarize TargetInstrInfo's explanation, or refer to it.

include/llvm/CodeGen/ForwardControlFlowIntegrity.h
83

What's the default function that ignores violations?

87

Power-of-2 restriction?

include/llvm/Target/TargetInstrInfo.h
342

The maximum of uncond branch and trap?

include/llvm/Target/TargetOptions.h
243

It looks like someone thought size was important above, since bitfields are used for bools. I'm not sure whether you should follow that lead (it would require reshuffling members around).

lib/Target/ARM/ARMBaseInstrInfo.cpp
4364

This comment (and the others below and in other files) isn't clear: what has to be kept in sync exactly?

4391

Thumb instructions are 2 or 4 bytes, and ARM instructions are always 4. The largest immediate on a direct branch has 26 bits, so that's insufficient for a lot of cases, you'll therefore need a PC-relative load followed by an indirect branch, and a location to store the 32-bit constant pool entry for the address. Assuming you do:

ldr rX, [PC, +#8]
bx rX
0xdeadbeef ; The address

Then you need 12 bytes, which I think is the worst case. Note that the constant pool entry could be elsewhere, preferably *not* in executable memory, but that would require another register and more infrastructure.

Note that you may want to use blx instead, if you intend on balancing the CPU's call/return stack: blx would be for calls, see the following page for returns

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438i/BABGEAEF.html
tmroeder added inline comments.Jul 29 2014, 10:02 AM
include/llvm/Analysis/JumpInstrTableInfo.h
61

Hmm...I thought I did the rounding in the createJumpInstrTableInfo; but I think you're right that it should be moved to the constructor.

include/llvm/Target/TargetInstrInfo.h
342

Right.

lib/Target/ARM/ARMBaseInstrInfo.cpp
4364

See my comment after getJumpInstrTableEntryBound: the bound in that function must be a bound on the possible instruction lengths returned by these two functions.

4391

I don't understand, but that's probably because I don't know ARM well. The goal here is to do an direct, unconditional branch using the code that's already checked in in the function getUnconditionalBranch. Are you saying that there are circumstances in which that code doesn't work?

If not, then are you saying that there are circumstances in which that code will generate this 12 byte sequence? The bound code needs to provide an upper bound for the sequences produced by getUnconditionalBranch and getTrap.

With respect to b vs blx, the goal is to perform an unconditional branch without touching any stacks at all; does b fail this condition? I'm trying to do the equivalent of the X86 jmp.

jfb added inline comments.Jul 29 2014, 10:28 AM
include/llvm/Target/TargetInstrInfo.h
342

Could you clarify the comment for this?

lib/Target/ARM/ARMBaseInstrInfo.cpp
4364

What I meant is that simply reading this comment doesn't provide the relevant information to understand it. It makes sense in this patch, but once checked in I'd have a WTF moment reading just this one comment on its own. You should probably refer the reader to the base class' getJumpInstrTableEntryBound.

4391

ARM has a fixed-width instruction encoding, and immediates are either encoded in the instruction itself or they have to be loaded if they overflow the storage available inside a fixed-width instruction. ARM doesn't have variable-bit-width instructions with very big immediates like x86 does.

26 bits is the biggest PC-relative immediate you can encode in a direct branch (for ARM, it's smaller for Thumb and Thumb2), so branches that go out of bounds from this must be indirect branches (or you can sprinkle the code with trampolines, ew).

So yes, for ARM the upper bounds would be 12 bytes if the sequence I suggest is used. For Thumb1 and Thumb2 it would be 8 (since LDR and BX can fit in 2 bytes each, and the immediate in 4), though for Thumb2 that'll depend on the register allocator using one of the first 8 registers (out of 16) for the LDR instruction and I'm not sure if LLVM will guarantee this.

On ARM branches:

  • B is a direct branch, with a limited PC-relative immediate.
  • BL and BLX are the same direct branches, but they change the link register (aka LR aka r14) meaning that the branch is a call (and LR is set to the address that needs to be returned to, which is the one right after the BL[X] instruction). The "X" means that the instruction set should be eXchanged (if it's currently ARM then go to Thumb, if it's Thumb then go to ARM) it's generally not something you'd want to do!
  • BX is an indirect branch, and so is BLX (same name as the above BLX, but with a register operand).

All of these can be conditionalized, either straight in the instruction in ARM mode or with an IT block in Thumb mode. They can also be unconditional.

I recommend looking at the "ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition". It's available from ARM's site behind a registration (or you can get it from work at arm-eng). See section A8.8.18 and later.

The point I'm trying to make about call/return stack is unrelated to the stack where values are spilled: it's a non-architectural stack the CPU keeps internally of all previous calls, and it uses it to predict return locations for indirect branches to speculatively start executing instructions before knowing where it'll actually jump. If you imbalance that stack then you'll shed performance, sometimes 5%-15%, so keeping it balanced matters!

I'm not sure if that helps?

I just realized that there are two independent concerns here, and I'd like to separate them.

The first is the core implementation of CFI; there are some remaining comments to address, and then I'd like to be able to commit that part with x86 backend support.

The second is the comments and questions about ARM performance of jumptables and the exact details of the instructions needed to implement ARM backend support. I'm currently looking into this, based on the comments and questions from JF, and I will continue doing that. But I'd like to not block the core CFI work and the x86 backend on those questions.

Are the reviewers amenable to me splitting this into two pieces: one that deals with CFI and the x86 backend code, and one that deals with ARM?

jfb added a comment.Oct 21 2014, 4:38 PM

Are the reviewers amenable to me splitting this into two pieces: one that deals with CFI and the x86 backend code, and one that deals with ARM?

Yes, as long as we figure out a nice way to point out that ARM support isn't there at the moment.

In D4167#35, @jfb wrote:

Are the reviewers amenable to me splitting this into two pieces: one that deals with CFI and the x86 backend code, and one that deals with ARM?

Yes, as long as we figure out a nice way to point out that ARM support isn't there at the moment.

How about: "ARM support for the jumptable attribute is currently limited to code with small numbers of indirect calls".

Or CFI/x86 commit could temporarily remove ARM support pending performance investigations. The default in TargetInstrInfo is to fail the compile with a message that says the backend doesn't support the jumptable attribute.

jfb added a comment.Oct 22 2014, 5:00 PM

How about: "ARM support for the jumptable attribute is currently limited to code with small numbers of indirect calls".

Or CFI/x86 commit could temporarily remove ARM support pending performance investigations. The default in TargetInstrInfo is to fail the compile with a message that says the backend doesn't support the jumptable attribute.

I think the later would be better.

tmroeder updated this revision to Diff 15424.Oct 24 2014, 11:19 AM

Here's a diff that fixes the outstanding comments and removes ARM support for now, as discussed in previous comments.

jfb added a comment.Oct 26 2014, 2:46 PM

Looks good overall.

include/llvm/CodeGen/ForwardControlFlowIntegrity.h
46

override

68

You can drop struct here.

test/CodeGen/X86/cfi_enforcing.ll
2

Could you test x86-32 as well here? No triple below, pass it to the command line instead.

15

Add the registers here, to make sure the masks are on the right thing. You can use FileCheck's capture capabilities to avoid hard-coding the register.

28

Could you explain why?

tmroeder updated this revision to Diff 15509.Oct 27 2014, 5:00 PM

This patch addresses the most recent set of comments.

tmroeder updated this revision to Diff 15510.Oct 27 2014, 5:04 PM

Forgot to remove "XFAIL win32" in the previous patch.

This looks right about ready to land to me.

include/llvm/CodeGen/CommandFlags.h
294–295

So there's nothing wrong with this as is, but I'd really rather see something in the ubsan style. It turns out that including all the strings for the names of all the functions may be expensive (string data plus relocations) in terms of file size, and may be redundant with getting the same information in other ways (like debug info). In any event, this can wait for a future patch, when it's needed.

include/llvm/Target/TargetInstrInfo.h
431–433

What if we're on an architecture where the instruction requires an alignment?

lib/CodeGen/ForwardControlFlowIntegrity.cpp
42–43

You don't appear to use these?

120

Just checking "CS" in bool context is equivalent to "CS.isCall() || CS.isInvoke". That makes this

if (!(CS && isIndirectCall(CS)))
236

Does

FunctionType *WarningFunTy =

​ FunctionType::get(Type::getVoidTy(M.getContext()), {CharPtrTy, CharPtrTy}, false);
work?

241–242

I think this is a case for report_fatal_error. Unreachable will be deleted by the compiler entirely in an optimized build, it should only be used when it is logically impossible. Instead, I think this can be reached through choice of command line options.

244–246

In the sanitizers, we have an option -fsanitize-recover which controls whether the "invariant violated" function will return or not. Apparently having 'noreturn' on the function declaration is a big deal for overall performance for them, probably due to the fact that it gives the register allocator less to worry about. Does this matter to you? And if so, would you use the same flag, or make it always noreturn?

374

"ParentFun->getName()" --> "ParentName" as declared one line above?

381–385

IRBuilder's CreateGlobalString/CreateGlobalStringPtr knows more tricks, like making it "unnamed_addr" so the string can be merged with duplicates. You don't need to give it an insertion point so long as you promise not to create instructions with it. Creating globals is fine.

test/CodeGen/X86/cfi_invoke.ll
36

Newline at end of file please!

amanone added a subscriber: amanone.Nov 6 2014, 6:35 AM
tmroeder updated this revision to Diff 16000.Nov 10 2014, 11:30 AM

This patch addresses the most recent comments in the review.

tmroeder added inline comments.Nov 10 2014, 11:32 AM
include/llvm/Target/TargetInstrInfo.h
431–433

This value is used to compute the necessary alignment of the jumptable entry. So, if the instruction requires alignment, then that should be taken into account in specifying this bound. I'll add that information to this comment.

lib/CodeGen/ForwardControlFlowIntegrity.cpp
42–43

Ah, right. this is left over from before I switched to SmallVector etc.

236

I tried it, and that doesn't compile; it complains about the initializer list not being convertible to ArrayRef.

241–242

Yes, you're right that this can be reached through command-line flags, so I'll changed it.

244–246

I haven't experimented with the noreturn flag for the "invariant violated" function, but I think that's an interesting and important point and one that should be explored wrt performance here. Can I take that up in a future commit?

381–385

Thanks! This lets me simplify this significantly.

test/CodeGen/X86/cfi_enforcing.ll
28

I think this is an artifact of copy-paste from an earlier LTO test. I don't think this should be here.

nlewycky accepted this revision.Nov 10 2014, 6:25 PM
nlewycky added a reviewer: nlewycky.

LGTM

This revision is now accepted and ready to land.Nov 10 2014, 6:25 PM
tmroeder closed this revision.Nov 13 2014, 8:32 AM

Committed in r221708