This is an archive of the discontinued LLVM Phabricator instance.

Implement Control Flow Integrity for virtual calls.
ClosedPublic

Authored by pcc on Feb 4 2015, 6:47 PM.

Details

Summary

This patch introduces the -fsanitize=cfi-vptr flag, which enables a control
flow integrity scheme that checks that virtual calls take place using a vptr of
the correct dynamic type. More details in the new docs/ControlFlowIntegrity.rst
file.

It also introduces the -fsanitize=cfi flag, which is currently a synonym for
-fsanitize=cfi-vptr, but will eventually cover all CFI checks implemented
in Clang.

This uses the bitset mechanism currently under review at
http://reviews.llvm.org/D7288

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 19374.Feb 4 2015, 6:47 PM
pcc retitled this revision from to Implement Control Flow Integrity for virtual calls..
pcc updated this object.
pcc edited the test plan for this revision. (Show Details)
pcc added reviewers: jfb, kcc, silvas.
pcc added a subscriber: Unknown Object (MLST).
jfb added a reviewer: tmroeder.Feb 5 2015, 9:10 AM
jfb added inline comments.Feb 5 2015, 9:48 AM
docs/ControlFlowIntegrity.rst
3 ↗(On Diff #19374)

Could you add a section which contains links to publications on CFI, for the approaches that are implemented as well as ones that aren't but may be relevant.

lib/AST/ItaniumMangle.cpp
3945 ↗(On Diff #19374)

Is that just the basename? Does it take capitalization into account? I want to make sure it'll mangle the same way on different hosts, so it's possible to generate the same binary when cross-compiling.

IIUC the caveat is that two files with the same name and classses in anonymous namespaces would mangle to the same bitset? I would point out this cornercase in a comment.

Also, what happens if a user (foolishly) defines a class in a header file within an anonymous namespace, and then uses it or subclasses it in .cc files? I guess that should just work (with or without CFI) but would be inefficient because of the duplication?

lib/AST/MicrosoftMangle.cpp
2574 ↗(On Diff #19374)

report_fatal_error

lib/CodeGen/CGVTables.cpp
867 ↗(On Diff #19374)

if (auto Arr = dyn_cast<llvm::ConstantArray>(GV->getInitializer())) { is LLVM-idiomatic.

903 ↗(On Diff #19374)

assert(Offset1 != Offset2);

test/CodeGenCXX/cfi-vptr.cpp
70 ↗(On Diff #19374)

What's in the {{.*}}?

pcc added inline comments.Feb 5 2015, 1:34 PM
lib/AST/ItaniumMangle.cpp
3945 ↗(On Diff #19374)

Is that just the basename? Does it take capitalization into account? I want to make sure it'll mangle the same way on different hosts, so it's possible to generate the same binary when cross-compiling.

It is the full source file name as supplied on the command line. These names won't make it into the final object file, so that seems like it would mostly do as far as hermeticity goes.

IIUC the caveat is that two files with the same name and classses in anonymous namespaces would mangle to the same bitset? I would point out this cornercase in a comment.

I think you would either need the two object files to be compiled from the same relative path from different directories, or somehow build two translation units from the same source file.

Also, what happens if a user (foolishly) defines a class in a header file within an anonymous namespace, and then uses it or subclasses it in .cc files? I guess that should just work (with or without CFI) but would be inefficient because of the duplication?

I don't think it would work under CFI. The two classes would have different identities, and the source file names mangled into the bitset names would be different. So the program would type check, but would fail at runtime under CFI if one translation unit tries to vcall a class from another translation unit.

lib/AST/MicrosoftMangle.cpp
2574 ↗(On Diff #19374)

Makes sense. I suppose you can contrive to get here with the Microsoft ABI.

lib/CodeGen/CGVTables.cpp
867 ↗(On Diff #19374)

More to the point, this entire block (lines 862-872) is obsolete (it belonged to an earlier version of the code). I'll remove it.

903 ↗(On Diff #19374)

Will do.

test/CodeGenCXX/cfi-vptr.cpp
70 ↗(On Diff #19374)

That would be the absolute path to the file.

pcc updated this revision to Diff 19436.Feb 5 2015, 2:55 PM
  • Address review comments
docs/ControlFlowIntegrity.rst
3 ↗(On Diff #19374)

Done.

lib/AST/MicrosoftMangle.cpp
2574 ↗(On Diff #19374)

Actually I don't think you can. The mangleCXXVTableBitSet function can only be reached from the Itanium-specific code that sets up a vtable call or constructs a vtable, or from GenerateConstructionVTable, which is only used by the code that creates VTTs, which is an Itanium-specific concept. I changed it anyway to be defensive.

kcc edited edge metadata.Feb 5 2015, 5:29 PM

So, you've decided to implement this in clang, as opposed to llvm.
My concern with this is that if devirtualization (either partial or full) happens later in llvm
the checks will remain. Or even worse, the checks may potentially inhibit devirtualization and we will never know.

Your valid concern that it might be hard to carry the dynamic type information through the LLVM passes.
However I think that we can achieve that with reasonable effort:

  • simple approach is to use metadata attached to the call site and some way to ensure that the metadata is preserved (e.g. a special calling convention that requires the metadata to be present).
  • or extra call parameter that will be removed in a clean-up path

We had a lengthy off-line conversation with Peter and did not reach any conclusion,
so maybe someone else has thoughts....
Meanwhile, Peter, please write several tests were partial and complete devirtualization may happen
and see how this interoperates with the CFI checks.
Note that LLVM today does not have devirtualization (other than very simple full one),
so we should be careful not to create problems for ourselves in future.

docs/ControlFlowIntegrity.rst
5 ↗(On Diff #19436)

Mention that this is under development.

pcc added a comment.Feb 6 2015, 1:19 PM

I've verified that this scheme does not inhibit the current scheme of devirtualization via constant propagation. For example, in the following program:

struct A {
  virtual void f() = 0;
};

struct B : A {
  virtual void f();
};

void B::f() {}

A* ptr() {
  return new B;
}

int main() {
  ptr()->f();
}

the main function looks like this:

define i32 @main() #1 {
  br i1 true, label %2, label %1

; <label>:1                                       ; preds = %0
  call void @llvm.trap()
  unreachable

; <label>:2                                       ; preds = %0
  ret i32 0
}

You will also notice that the optimization I just implemented on the LLVM side of things eliminated the check. We might want to play with the pass ordering so that the CFG is simplified in this case (it doesn't have any effect on the generated code in this case though:)

00000000004006a0 <main>:
  4006a0:       55                      push   %rbp
  4006a1:       48 89 e5                mov    %rsp,%rbp
  4006a4:       31 c0                   xor    %eax,%eax
  4006a6:       5d                      pop    %rbp
  4006a7:       c3                      retq   
  4006a8:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  4006af:       00

Here's a more complex example where we might want to do partial devirtualization:

bool p;

struct A {
  virtual void f() = 0;
};

struct B : A {
  virtual void f();
};

struct C : A {
  virtual void f();
};

void B::f() {}
void C::f() {}

A* ptr() {
  return ((long)(&p)> 42) ? (A *)new B : (A *)new C;
}

int main() {
  ptr()->f();
}

We can eliminate the check here as well:

define i32 @main() #2 {
  %1 = tail call noalias i8* @_Znwm(i64 8) #5
  %2 = bitcast i8* %1 to i32 (...)***
  store i32 (...)** select (i1 icmp sgt (i64 ptrtoint (i8* @p to i64), i64 42), i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV1C, i64 0, i64 2) to i32 (...)**)), i32 (...)*** %2, align 8, !tbaa !1
  br i1 true, label %4, label %3

; <label>:3                                       ; preds = %0
  tail call void @llvm.trap()
  unreachable

; <label>:4                                       ; preds = %0
  %5 = bitcast i8* %1 to %struct.A*
  %6 = load void (%struct.A*)** bitcast (i32 (...)** select (i1 icmp sgt (i64 ptrtoint (i8* @p to i64), i64 42), i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV1B, i64 0, i64 2) to i32 (...)**), i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV1C, i64 0, i64 2) to i32 (...)**)) to void (%struct.A*)**), align 8
  tail call void %6(%struct.A* %5)
  ret i32 0
}

After further offline discussion we came up with the idea of introducing an intrinsic that protects calls via a supplied function pointer. To give one example:

%vptr = load ...
%fptrptr = gep %vptr, ...
%fptr = load %fptrptr
%checked_fptr = call @llvm.bitset.cficheck(%vptr, "bitset1", %fptr)
call %checked_fptr(...)

The semantics of the intrinsic are such that the intrinsic performs the check and returns its third argument, otherwise it aborts. We can probably teach the relevant parts of the optimizer that @llvm.bitset.cficheck is essentially an identity operation over its third argument.

I am reasonably sure that such a scheme or something like it would permit devirtualization while allowing us to move the check away from any devirtualized calls (a pass can move the intrinsic call to the same basic block as its users, if possible). So I would propose leaving the code as is and we can probably introduce a new intrinsic like this when we get around to doing devirtualization.

pcc updated this revision to Diff 19514.Feb 6 2015, 1:58 PM
pcc edited edge metadata.
  • Add note that schemes are under development
kcc added a comment.Feb 6 2015, 3:19 PM

So, the effort to remove the redundant checks is reasonable, but not zero.
Will this effort be greater than the effort to add the checks downstream in llvm after all the devirtualization has happened?

Also: if we want to do CFI for non-virtual indirect calls, will we be adding it in clang or in llvm?

pcc added a comment.Feb 6 2015, 3:48 PM

In my opinion, adding checks in LLVM would add significant complexity and/or require changing axioms relating to how IR is transformed. The transform to move the check after devirtualization would be relatively simple.

Also: if we want to do CFI for non-virtual indirect calls, will we be adding it in clang or in llvm?

We would presumably be emitting the checks in clang.

kcc added a comment.Feb 17 2015, 11:35 AM
In D7424#120046, @pcc wrote:

In my opinion, adding checks in LLVM would add significant complexity and/or require changing axioms relating to how IR is transformed. The transform to move the check after devirtualization would be relatively simple.

Also: if we want to do CFI for non-virtual indirect calls, will we be adding it in clang or in llvm?

We would presumably be emitting the checks in clang.

This may deserve a separate thread on llvmdev/cfe-dev. I'd like to hear other opinions. Could you please start the discussion?
(where to insert the vcall and indir-call checks: in clang or in llvm)
Emitting the checks in clang means that the optimizations will have to undo the checks somehow when indir calls turn into direct calls and potentially inlined.

pcc updated this revision to Diff 20131.Feb 17 2015, 6:49 PM
  • Add design doc
jfb edited edge metadata.Feb 19 2015, 9:47 AM

This patch lgtm: IIUC it sounds like devirtualization will still be possible with this approach.

docs/ControlFlowIntegrity.rst
20 ↗(On Diff #20131)

Is there a reference number we can quote, or a benchmark that users can run to check what the overheads are?

48 ↗(On Diff #20131)

s/regular/pre-built/ or something along those lines.

That makes me wonder: for PNaCl we could have a version of libc++.a that also has CFI. Could the exclusion list be done through module metadata merging? i.e. doing LTO on a module without CFI lists its classes and adds exclusions for them, and modules with CFI "just work"?

I wouldn't do this in the current patch.

63 ↗(On Diff #20131)

Úlfar, here and above?

jfb added inline comments.Feb 19 2015, 9:51 AM
docs/ControlFlowIntegrity.rst
20 ↗(On Diff #20131)

Also, a warning on binary size bloat should be here.

pcc added inline comments.Feb 19 2015, 11:58 AM
docs/ControlFlowIntegrity.rst
20 ↗(On Diff #20131)

Done.

48 ↗(On Diff #20131)

s/regular/pre-built/ or something along those lines.

Done.

That makes me wonder: for PNaCl we could have a version of libc++.a that also has CFI. Could the exclusion list be done through module metadata merging? i.e. doing LTO on a module without CFI lists its classes and adds exclusions for them, and modules with CFI "just work"?

Possibly, but if the overhead is low enough it may be simplest to turn it on for everything.

63 ↗(On Diff #20131)

Done.

pcc updated this revision to Diff 20325.Feb 19 2015, 11:58 AM
pcc edited edge metadata.
  • Improve docs
jfb accepted this revision.Feb 19 2015, 12:47 PM
jfb edited edge metadata.

lgtm

This revision is now accepted and ready to land.Feb 19 2015, 12:47 PM
This revision was automatically updated to reflect the committed changes.