Page MenuHomePhabricator

Add Windows Control Flow Guard checks (/guard:cf).
Needs ReviewPublic

Authored by ajpaverd on Aug 5 2019, 11:21 AM.



A new module pass (Transforms/CFGuard/CFGuard.cpp) inserts CFGuard checks on
indirect function calls, using either the check or dispatch mechanism. These
checks require new calling conventions for the supported targets (currently
x86, ARM, and AArch64). An additional pass (Target/X86/X86FixupCFGuard.cpp) is
used on x86 to prevent stack spills between loading and calling the check, which
could be exploited. Another pass (CodeGen/CFGuardLongjmp.cpp) is used to
identify and emit valid longjmp targets, as required by /guard:cf.

Event Timeline

ajpaverd created this revision.Aug 5 2019, 11:21 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 5 2019, 11:21 AM
alex added a subscriber: alex.Aug 17 2019, 7:49 AM
dmajor added a subscriber: dmajor.Aug 19 2019, 10:02 AM
rnk added a comment.Mon, Aug 26, 4:16 PM

I plan to take a look at this tomorrow, sorry for putting this off for a while. :(

In D65761#1646059, @rnk wrote:

I plan to take a look at this tomorrow, sorry for putting this off for a while. :(

Thanks @rnk! If there's anything I can clarify/fix, please let me know.

rnk added a reviewer: pcc.Wed, Sep 4, 10:55 AM
rnk added a subscriber: hans.Thu, Sep 5, 5:50 PM

Here is some feedback, I apologize for dragging my feet.

Also, I would really like to get feedback from @pcc. He's OOO currently, though.


@hans, WDYT about the -cc1 interface? I think this is fine.


Comment formatting needs to be fixed, you can use clang-format for this.


I hesitate to make MBB labels public, I think it might be better to build your own MCSymbol label with a public name.


Rather than changing the machine CFG, I would suggest using MachineInstr::setPostInstrSymbol, which I've been planning to use for exactly this purpose. :) You can make labels with MCContext::createSymbol, and you'll want to come up with symbols that won't clash with C or C++ symbols. I see MSVC makes labels like $LN4, so maybe $cfgsjN or something like that. I think MCContext will add numbers for you.


I guess it's not X86CFGuard.cpp now that you've generalized it as a cross-platform IR pass.


Can you elaborate on what the correctness requirement is? The correctness requirement is just that the loaded value is not saved in a CSR which is then spilled and reloaded from memory across a call site. If possible, I think it would be worth teaching both FastISel and SDISel to build the correct MachineInstrs during instruction selection, and turn this into a validation pass that fails if it sees indirect calls through registers in functions where CFG is enabled.


I think you probably also want to look for tail calls. I think the best way to do this would be to use MI.isCall() in the loop, and then switch on the opcode to handle all call variants.


Is it possible to use MachineRegisterInfo to iterate over all the uses of the load, instead of scanning the BB? It seems like this could be O(n^2) if a single BB contains a long sequence of indirect calls. I think it would look like this:

MachineRegisterInfo *MRI = MF.getRegInfo();
for (MachineInstr &MI : MRI->use_instructions(Reg))
  // do something with MI

There's also use_operands.


Erasing at the end is always the safest way to do it. :)


This doesn't do anything with the return value, should it?


Does this need to be a ModulePass? It's preferable to avoid module passes if possible. When the pass manager has a sequence of function passes to run, it runs all passes on function f, then g, then h, and so on, rather than running the function pass over the entire module and then the next pass over the entire module, and so on. This is done to get better cache locality: each function pass touches f's data structures repeatedly, and then moves on to g without ever going back to f.

You should be able to do the global checks in an override of doInitialization, which receives a Module.


I guess more LLVM-style names for these would be CF_Check, CF_Dispatch. Of course, Rui is changing the style soon anyway.


You'd want to set the tail call kind here to forward tail or musttail. This matters for things like member pointer function thunks. This C++ makes a musttail thunk, for example:

struct A { virtual void f(); };
auto mp = &A::f;

Which gives this IR:

define linkonce_odr void @"??_9A@@$BA@AA"(%struct.A* %this, ...) #0 comdat align 2 {
  %0 = bitcast %struct.A* %this to void (%struct.A*, ...)***
  %vtable = load void (%struct.A*, ...)**, void (%struct.A*, ...)*** %0, align 8, !tbaa !3
  %1 = load void (%struct.A*, ...)*, void (%struct.A*, ...)** %vtable, align 8
  musttail call void (%struct.A*, ...) %1(%struct.A* %this, ...)
  ret void

This does an indirect virtual call through the 0th slot, and if we lose musttail, it won't perfectly forward arguments.


So, this isn't going to work if the original calling convention was something other than the default. For x64, the only commonly used non-standard convention would be vectorcall. Changing the convention here in the IR is going to make it hard to pass that along.

I think as you can see from how much code you have here already, replacing call instructions in general is really hard. These days there are also bundles, which I guess is something else missing from the above.

Here's a sketch of an alternative approach:

  • load __guard_dispatch_icall_fptr as normal
  • get the pre-existing function type of the callee
  • cast the loaded pointer to the original function pointer type
  • use CB->setCalledOperand to replace the call target
  • add the original call target as an "argument bundle" to the existing instruction
  • change SelectionDAGBuilder.cpp to recognize the new bundle kind and create a new ArgListEntry in the argument list
  • make and set a new bit in ArgListEntry, something like isCFGTarget
  • change CC_X86_Win64_C to put CFGTarget arguments in RAX, similar to how CCIfNest puts things in R10

This way, the original call instruction remains with exactly the same arrangement of args, attributes, callbr successors, tail call kind, etc. All you're doing is changing the callee, and passing the original callee as an extra argument on the side. Basically, this approach leverages bundles to avoid messing with the function type, which more or less requires replacing the call. :)

There are probably issues with this design that I haven't foreseen, but I think it's worth a try.


These annotations shouldn't be necessary, the tests should pass as written on Linux. The mtriple flag above tells llc to generate code for Windows.

hans added a comment.Fri, Sep 6, 5:07 AM

This is very exciting! I didn't look closely at the actual instrumentation code, as rnk knows that better and had some good comments.


The -cc1 interface looks good to me, but since these are cc1 options only, I think they should be in, not (I realize this is a pre-existing issue with -cfguard).


Maybe check for this first, and then the ControlFlowGuardNoChecks one in an else-if, to make this one take precedence?


It seems unfortunate that the Instrument and NoChecks variables allow four different states. whereas I think the logic should really only allow for three: 1) do nothing, 2) tables only, 3) tables and checks. Maybe we could have an enum instead?


Also I'd suggest braces since even though there's just one statement, there are multiple lines here and in the else block.


Thanks for remembering /fallback. Please add a /guard:cf option to the first test in clang/test/Driver/cl-fallback.c


Indentation looks a little off here.

"function, which can be inserted before indirect calls" -- maybe instead "function, calls to which can be inserted ..."


I don't think X86_32 is common nomenclature.. maybe just X86 is enough.


Indentation off here too.


Maybe explicitly call out in the text that this calling convention is only used on x86_64, and is used instead of cfguard_checkcc (iiuc)?


nit: s/take/takes/


ultra nit: period at the end, here and for other comments


Naive question: Why do we generate code as in the examples in the first place, and can't some general optimization pass do this folding? From the examples it looks like straight-forward constant propagation.

rnk added inline comments.Fri, Sep 6, 11:22 AM

Actually, I used this test IR, LLVM seems to always fold the memory operand into the call:

@fptr = external dso_local global void()*
define i32 @foo() {
	%fp1 = load void()*, void()** @fptr
	call void %fp1()
	%fp2 = load void()*, void()** @fptr
	call void %fp2()
	ret i32 0

Maybe it won't do it if there are more parameters, I'm not sure.

I ran llc with both isels for x64 and ia32, and it always folded the load into the call. Maybe it's best to make this a verification pass that emits an error via MCContext if there is an unfolded load of the CFG check function pointer?