This is an archive of the discontinued LLVM Phabricator instance.

[ubsan] Use pass_object_size info in bounds checks
ClosedPublic

Authored by vsk on Dec 6 2017, 7:45 PM.

Details

Summary

Teach UBSan's bounds check to opportunistically use pass_object_size
information to check array accesses.

rdar://33272922

Diff Detail

Event Timeline

vsk created this revision.Dec 6 2017, 7:45 PM

Thanks for this!

It's interesting to me that these array-bound checks don't seem to use @llvm.objectsize in some form already. I can't find any notes about it grepping through git/source, so I'm happy with it.

lib/CodeGen/CGExpr.cpp
819

nit: would IgnoreParenImpCasts be better?

828

We should probably only do this if the argument to pass_object_size is 0 or 1. 2 and 3 give lower-bounds on size (and a default value of 0), which could result in false-positives.

(And please add a test that we don't do this for 2 or 3.)

vsk updated this revision to Diff 126037.Dec 7 2017, 1:55 PM

Thanks for your feedback.

  • Give up on 0-sized types.
  • Give up on pass_object_size(2 | 3).
vsk marked 2 inline comments as done.Dec 7 2017, 1:56 PM
vsk updated this revision to Diff 126064.Dec 7 2017, 3:45 PM
  • Handle constant size modifiers while we're at it (e.g "int foo(int p[static 10])").
george.burgess.iv accepted this revision.Dec 7 2017, 4:25 PM

LGTM. Thanks again!

This revision is now accepted and ready to land.Dec 7 2017, 4:25 PM
vsk closed this revision.Dec 8 2017, 11:26 AM

Landed in r320128.

efriedma edited edge metadata.Dec 8 2017, 11:32 AM

It's interesting to me that these array-bound checks don't seem to use @llvm.objectsize in some form already.

That would be a cool experiment. That said, one of the upsides of the current ubsan is that whether it will produce a diagnostic is predictable (as long as you don't use uninitialized data); you lose that to some extent with llvm.objectsize because it depends on the optimizer.

lib/CodeGen/CGExpr.cpp
833

"int f(int a[10])" might look like an array, but it isn't: it's just a different syntax to declare a pointer. So it's legal to "lie" in the signature. (If you want to actually pass a pointer to an array, you have to write "int (*a)[10]".) And the definition of "static" says "an array with at least as many elements as specified by the size expression", which isn't a maximum, so that doesn't really help either.

Most people would consider it bad style to put a number into the array bound which doesn't reflect reality, but I think we shouldn't try to check it unless the user explicitly requests it.

vsk added a comment.Dec 8 2017, 11:52 AM

I backed out the part of this patch which deals with array parameters declared like p[10] or p[static 10]: r320185.

lib/CodeGen/CGExpr.cpp
833

My copy of the C99 draft (n1256) is a little fuzzy on this point [*]. There's enough of a gray area here that it seems appropriate to back out this part of the patch.

  • It states: "In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array) ...". I took the parenthetical literally, and didn't know about the 'at least as many' language.

That said, one of the upsides of the current ubsan is that whether it will produce a diagnostic is predictable (as long as you don't use uninitialized data); you lose that to some extent with llvm.objectsize because it depends on the optimizer.

True. If that's not desirable to have in array-bounds, we could potentially move these checks under -fsanitize=object-size instead. We'd just have to be careful about not emitting object-size and array-bounds checks for the same array access.

efriedma added inline comments.Dec 8 2017, 12:19 PM
lib/CodeGen/CGExpr.cpp
833

"which specifies the size of an array" in this context just means that it's the part of the array declarator which specifies the size of the array, not that it's the size of any particular array. The decay rules for function parameters are the relevant bit of the standard.

And yes, this is all very confusing; this is why C++ has std::array.