This is an archive of the discontinued LLVM Phabricator instance.

[ubsan] Port the function sanitizer to C
Changes PlannedPublic

Authored by vsk on Sep 22 2017, 5:35 PM.

Details

Reviewers
pcc
pete
arphaman
Summary

The function sanitizer relies on RTTI to check callee types, but this
scheme doesn't work well in languages without the ODR.

This patch introduces a simple, best-effort function type encoding
which can be used when RTTI isn't available. In this scheme, function
types are encoded within 32 bits. The return type and all parameter
types are recorded using a 3-bit encoding. Zero is a special value in
the 3-bit encoding which means "there is either no type here OR any type
would be permissible here".

This scheme allows false negatives, but not false positives. It's simple
and does not require any changes to the instrumentation.

Testing: I've found some minor issues with the new check, and no FPs.

https://trac.ffmpeg.org/ticket/6685
https://github.com/openssl/openssl/issues/4413

Diff Detail

Event Timeline

vsk updated this revision to Diff 116453.Sep 22 2017, 6:13 PM
  • Remove some noisy changes.
vsk edited the summary of this revision. (Show Details)Sep 24 2017, 7:30 PM
pcc edited edge metadata.Oct 3 2017, 3:44 PM

Wouldn't we get false positives if there is an indirect call in C++ code that calls into C code (or vice versa)?

I think I'd prefer it if we came up with a precise encoding of function types that was independent of RTTI, and use it in all languages. One possibility would be to represent each function type with an object of size 1 whose name contains the mangled function type, and use its address as the identity of the function type.

vsk planned changes to this revision.Oct 3 2017, 3:53 PM
In D38210#887635, @pcc wrote:

Wouldn't we get false positives if there is an indirect call in C++ code that calls into C code (or vice versa)?

Ah, right, I'm surprised I didn't hit that while testing.

I think I'd prefer it if we came up with a precise encoding of function types that was independent of RTTI, and use it in all languages. One possibility would be to represent each function type with an object of size 1 whose name contains the mangled function type, and use its address as the identity of the function type.

That makes sense. Like the RTTI object it could be made linkonce_odr.