This is an archive of the discontinued LLVM Phabricator instance.

Implement function type checker for the undefined behavior sanitizer.
ClosedPublic

Authored by pcc on Aug 9 2013, 1:30 AM.

Details

Summary

This uses function prefix data to store
function type information at the function pointer.

Diff Detail

Event Timeline

pcc updated this revision to Unknown Object (????).Aug 9 2013, 1:31 PM
  • Also test for the RTTI pointer check code
pcc updated this revision to Unknown Object (????).Sep 15 2013, 6:32 PM

Rebase.

pcc added a comment.Sep 15 2013, 6:35 PM

The LLVM-side dependency for this has landed now.

pcc updated this revision to Unknown Object (????).Oct 10 2013, 7:08 PM
Rebase.

Clever! The approach only makes me feel a little uneasy =)

It would be polite to issue a diagnostic if someone explicitly asks for this (not via -fsanitize=undefined) and we don't support it for their target.

lib/CodeGen/CGExpr.cpp
3107

This is a bit hard to parse. (!TargetDecl || !isa<FunctionDecl>(TargetDecl)) maybe?

3108

* on the right.

3112

Some indication of what "PD" stands for here would be useful.

3112–3113

We usually format this as

llvm::Type *PDStructTyElems[] = {
  PDSig->getType(),
  FTRTTIConst->getType()
};

or similar.

3117–3121

What happens if the callee address is at the end of a page and doesn't have the extra data? Could this load trigger a segfault?

3134–3135

Likewise.

lib/CodeGen/CodeGenFunction.cpp
525–526

It'd be nice if we could only do this for address-taken functions. When we take the address of a function, we could emit linkonce_odr thunk with the extra data, and use that instead of the original address... except that would break address-of-function comparisons between instrumented and uninstrumented code. Can we provide an option to enable that behavior for the case where we're OK with an ABI change?

531–532

Does this include the calling convention attributes?

533

Spaces around braces.

lib/CodeGen/TargetInfo.cpp
602–603

What do 'F' and 'T' demangle as? An invalid instruction encoding would give me more confidence here.

pcc updated this revision to Unknown Object (????).Oct 18 2013, 4:06 PM
  • Address code review comments
pcc added a comment.Oct 18 2013, 4:09 PM

It would be polite to issue a diagnostic if someone explicitly asks for this (not via -fsanitize=undefined) and we don't support it for their target.

Hmm, we should probably be doing this for other sanitizers too (e.g. MSan and DFSan are only supported on 64-bit Linux).

lib/CodeGen/CGExpr.cpp
3107

Done.

3108

Done.

3112

Globally replaced 'PD' with 'Prefix'.

3112–3113

Done. clang-format prefers something closer to the formatting I originally used; I'll raise this with the clang-format folks.

3117–3121

It could in principle. In practice, Clang always aligns function bodies to 16 bytes, GCC does it at -O>=2 and GCC (4.4 and 4.6) codegens an empty function body at -O0 with >4 bytes. A quick survey of a number of system libraries (libc, libm, libstdc++) on an Ubuntu machine reveals that the functions are suitably aligned. I think the circumstances in which a segfault might occur are sufficiently rare as to justify doing a single load. If it does turn out to be a problem in practice we could consider splitting the load (perhaps conditional on a command line flag).

3134–3135

Done.

lib/CodeGen/CodeGenFunction.cpp
525–526

I like the idea of doing this as an option, but I would prefer to land this version first.

531–532

Not with the Itanium ABI, which does not have a representation for calling conventions, but the Microsoft ABI (for which calling conventions matter more) does, so once RTTI has been implemented for that ABI, this should respect calling conventions there.

533

Done.

lib/CodeGen/TargetInfo.cpp
602–603

"rex.RX push %rsp", according to objdump. But I don't think it really matters what this decodes to. If the instruction pointer finds itself here the situation isn't dissimilar to jumping into the middle of a multibyte instruction. (Besides, the RTTI pointer could decode to anything.)

rsmith accepted this revision.Oct 18 2013, 4:34 PM

LGTM

lib/CodeGen/CGExpr.cpp
3107

It'd be nice if we could erase these checks if the optimizer resolves the call.

lib/CodeGen/TargetInfo.cpp
602–603

I think it matters: it's not unreasonable for a function to start as:

jmp +8
6 byte loop body
check loop condition
conditional jmp back to loop body

IIUC, your check would flag a false positive on indirect calls to such a function.

However, since the 'F' is an instruction prefix that is redundant on a 'T' (pushq %rsp) instruction, I think it's reasonable to assume that the FT signature will not occur in uninstrumented code.

pcc added a comment.Oct 18 2013, 5:59 PM

What about the compiler-rt changes?

lib/CodeGen/CGExpr.cpp
3107

Sure. A design point of function prefix data is that it permits this sort of optimization. Maybe if I get a chance I'll see if I can teach the optimizer to do this.

lib/CodeGen/TargetInfo.cpp
602–603

I see. I think I agree with your reasoning.

pcc closed this revision.Oct 20 2013, 2:33 PM

Closed by commit rL193058 (authored by @pcc).