Page MenuHomePhabricator

Add alloc_size attribute to clang
ClosedPublic

Authored by george.burgess.iv on Nov 2 2015, 6:25 PM.

Details

Summary

This patch adds the alloc_size attribute, docs, and all of the logic needed to handle it to clang. The only thing that's _really_ missing is our ability to handle non-const locals, because that's proving to be a moderately interesting challenge. Our current strategy is to lower alloc_size to LLVM, and have it handle those cases.

There's also a restructuring of TryEvaluateBuiltinObjectSize thrown in, because I felt like it was getting overly subtle. Hopefully the new version is considerably easier to reason about. :)

— Implementation notes —

  • With the restructuring of TryEvaluateBuiltinObjectSize, I was able to make EM_DesignatorFold (now EM_OffsetFold) the only EvalMode we use in evaluating __builtin_object_size.
  • The InvalidBase functionality of LValues was further (ab)used to get information to TryEvaluateBuiltinObjectSize. We know an LValue has been initialized by an alloc_size function if the base is invalid and the base itself is a CallExpr (or a CastExpr that contains the alloc_size CallExpr). This won’t conflict with current behavior, because (prior to this patch) all invalid bases were MemberExprs.

Diff Detail

Repository
rL LLVM

Event Timeline

george.burgess.iv retitled this revision from to Add alloc_size attribute to clang.
george.burgess.iv updated this object.
george.burgess.iv added a reviewer: rsmith.
george.burgess.iv added a subscriber: cfe-commits.
george.burgess.iv updated this object.

As discussed offline:

  • Added support for treating const variables as constexpr
  • Added requirement for LValues with an invalid alloc_size base to have an array index as their first designator entry (along with support for having an array of unknown size as the first entry in an LValue's designator)

Additionally:

  • Fixed a bug with the prior impl where we'd report too many bytes for type=3 with alloc_size
  • Refactored a fair amount of duplicated logic
  • Added checks to be sure that LValues with invalid bases don't get moved into APValues

...I didn't end up refactoring the LValue designator walking, because we only needed one change in one place (because invalid bases can't find their way into APValues). I'm happy to try to make a visitor-like thing and use it to refactor the ~4-5 places we walk {LValue,APValue} designators (3 of which reside in ExprConstant.cpp) if we still think that would be relatively valuable.

Also, due to allowing the evaluation of const expressions as constexpr expressions, this patch needs http://reviews.llvm.org/D14877 to go in so we don't break tests. :)

aaron.ballman added inline comments.Dec 10 2015, 12:42 PM
include/clang/Basic/Attr.td
714 ↗(On Diff #42348)

Should this be using HasFunctionProto instead of Function?

include/clang/Basic/DiagnosticSemaKinds.td
2144 ↗(On Diff #42348)

This warning diagnostic is not used.

2145 ↗(On Diff #42348)

This text is a bit misleading based on the diagnostic. The attribute doesn't apply to the integer argument. The attribute argument should refer to an integer parameter of the function declaration. I think the diagnostic should reflect that. Perhaps:

%0 attribute argument may only refer to a function parameter of integer type

and the diagnostic loc can point to the attribute argument in question. (If that's not possible, it would be good to add a %1 to identify which of the two possible attribute arguments is at fault.)

lib/AST/ExprConstant.cpp
117 ↗(On Diff #42348)

Add a newline to separate the functions (here and elsewhere).

128 ↗(On Diff #42348)

const auto *, here and elsewhere. Whenever you have auto, we want all the cv-qualifiers as well as */& information along with the auto.

144 ↗(On Diff #42348)

Is this function going to expand to be more than just a forward to another? If not, we should remove it.

156 ↗(On Diff #42348)

Would be good to have '&& "explanation of assert"' as well (here and elsewhere).

264 ↗(On Diff #42348)

Missing a period.

269 ↗(On Diff #42348)

Missing function documentation.

357 ↗(On Diff #42348)

Missing period.

1109 ↗(On Diff #42348)

Is there a reason this assert only fires in !NDEBUG mode? Is it too expensive to enable for any assert build?

lib/Sema/SemaDeclAttr.cpp
721 ↗(On Diff #42348)

Please do not use auto unless the type is explicitly spelled out in the initializer (here and elsewhere).

728 ↗(On Diff #42348)

Are character types and Boolean types also okay?

764 ↗(On Diff #42348)

If you use HasFunctionProto in Attr.td, none of this is required. If you can't use HasFunctionProto, then there's still no need to check FD because that's handled by the common attribute code.

766 ↗(On Diff #42348)

Please, no magic numbers. I don't think 4 is even the correct value, that's ExpectedParameter.

777 ↗(On Diff #42348)

Please do not use auto unless the type is explicitly spelled out in the initializer.

george.burgess.iv marked 15 inline comments as done.

Addressed all feedback

majnemer added inline comments.
include/clang/Basic/AttrDocs.td
214 ↗(On Diff #42474)

nonconst -> non-const?

george.burgess.iv marked an inline comment as done.
  • Fixed wording in AttrDocs
  • Realized I forgot to press 'submit' on comment responses after the last update. Will correct this in ~5 seconds :)
include/clang/Basic/Attr.td
714 ↗(On Diff #42348)

Nice catch!

include/clang/Basic/AttrDocs.td
214 ↗(On Diff #42474)

Danke

lib/AST/ExprConstant.cpp
1109 ↗(On Diff #42348)

I was under the impression that asserts are only enabled if !NDEBUG. The code only exists to make a few assertions, so I think guarding it with "if assertions are enabled" makes that purpose more clear. If my assumption is wrong, I'm happy to remove the ifndef.

lib/Sema/SemaDeclAttr.cpp
728 ↗(On Diff #42348)

Boolean makes no sense, but char is fine here. Thanks

766 ↗(On Diff #42348)

Doesn't matter; dead code.

aaron.ballman added inline comments.Dec 11 2015, 11:59 AM
lib/AST/ExprConstant.cpp
1110 ↗(On Diff #42547)

I was under the impression that asserts are only enabled if !NDEBUG. The code only exists to make a few assertions, so I think guarding it with "if assertions are enabled" makes that purpose more clear. If my assumption is wrong, I'm happy to remove the ifndef.

That's a good point; I forgot that we use NDEBUG as our release+asserts flag (derp). That being said, this is basically the same as:

assert((!BInvalid || isa<MemberExpr>(B.get<Expr*>()) || tryUnwrapAllocSizeCall(B.get<Expr*>())) && "Unexpected type of invalid base");

Uncertain if that is an improvement or not, but it is shorter-ish. :-)

lib/Sema/SemaDeclAttr.cpp
728 ↗(On Diff #42547)

Boolean makes no sense, but char is fine here. Thanks

isIntegerType() already covers isCharType(). I think what you want is:

if (!Param->getType()->isIntegerType() || Param->getType()->isBooleanType())
  • Rebased
  • Removed bits that were already upstreamed as part of D14877
  • Added support for LLVM's allocsize attribute (under review at D14933 )
aaron.ballman edited edge metadata.Feb 2 2016, 6:14 AM

The attribute part looks mostly good (a few small nits), but the rest should be reviewed by @rsmith.

lib/Sema/SemaDeclAttr.cpp
737 ↗(On Diff #45622)

Can be replaced with assert(Attr.isArgExpr(AttrArgNo) && "expected expression argument");

739 ↗(On Diff #45622)

This whole if should be replaced by a call to checkFunctionOrMethodParameterIndex(), which will return the correct function parameter number as an output parameter.

george.burgess.iv edited edge metadata.
george.burgess.iv marked 2 inline comments as done.

Addressed all feedback.

lib/Sema/SemaDeclAttr.cpp
737 ↗(On Diff #45622)

Neat! Thanks.

Now that allocsize is in LLVM (r266032), this isn't blocked. Yay!

Rebased, added a few tests, did a few tiny refactors, and fixed an overflow bug when the user tried to allocate 2**63 < N < 2**64 bytes (because CharUnits uses an int64_t as storage).

aaron.ballman added inline comments.Apr 28 2016, 5:47 AM
include/clang/Basic/Attr.td
753 ↗(On Diff #53611)

I don't see any C++ tests involving templates; can you add some?

test/SemaCXX/constant-expression-cxx11.cpp
1171 ↗(On Diff #53611)

This change seems out of place for a test that doesn't use alloc_size anywhere. Can you explain why this test has changed?

george.burgess.iv marked 2 inline comments as done.

Addressed all feedback

include/clang/Basic/Attr.td
753 ↗(On Diff #53611)

Good catch

test/SemaCXX/constant-expression-cxx11.cpp
1171 ↗(On Diff #53611)

Yup!

Clang's constexpr evaluator is conservative about const variables. If it kept this attitude, alloc_size evaluation can't happen in non-constexpr contexts in clang, even in cases like:

int foo() {
  void *const p = malloc(10);
  return __builtin_object_size(p, 0);
}

The change I made at ExprConstant.cpp:2789 (which Richard okayed when I talked with him offline) makes the constexpr evaluator more willing to look at const variables. With that, we can fold calls to __builtin_object_size like above in clang.

It also means that there are a few cases outside of alloc_size where the constexpr evaluator can be more aggressive. As it turns out, this is one of them (as is the change in test/CodeGenCXX/global-init.cpp, IIRC)

aaron.ballman accepted this revision.May 2 2016, 6:13 AM
aaron.ballman edited edge metadata.

Thank you for working on this! LGTM, but please wait for @rsmith to okay as well.

test/SemaCXX/constant-expression-cxx11.cpp
1171 ↗(On Diff #55478)

Thank you for the explanation!

This revision is now accepted and ready to land.May 2 2016, 6:13 AM

I'm wondering what the status of this patch is since someone has asked us to add support for this attribute in clang. Are you still waiting for review?

test/CodeGenCXX/alloc-size.cpp
66 ↗(On Diff #55478)

Is it necessary to compute __builtin_object_size in the front-end (rather than in some IR passes like instcombine) when it takes the pointer returned by a function marked alloc_size?

Also, is the IR optimization smart enough to get the exact object size in the following case?

void foo(int a, int b) {
  void *p0 = my_malloc(a);
  g0 = __builtin_object_size(p0, 0);
  void *p1 = my_calloc(a, b);
  g1 = __builtin_object_size(p1, 1);
}

void foo1() {
  foo(10, 50);
}

I'm wondering what the status of this patch is since someone has asked us to add support for this attribute in clang. Are you still waiting for review?

Waiting for an LGTM from Richard, though this patch currently doesn't pass all tests. One of the changes in it causes some craziness with how we codegen blocks (in e.g. objc/OpenCL). Basically, we'll end up emitting the same block definition N times, since this makes the constexpr evaluator more accurate in some cases. I hope to fix that + get back to this in the nearish future.

(Happy to accept an LGTM now, though, since the bug isn't with this patch. ;) Naturally, I'll submit after the block-related bug is fixed.)

test/CodeGenCXX/alloc-size.cpp
66 ↗(On Diff #55478)

Is it necessary to compute __builtin_object_size in the front-end (rather than in some IR passes like instcombine)

It's necessary to try in the frontend. If that fails, we'll lower calls to __bos to @llvm.objectsize, and let LLVM try it.

The main reason that we care for the frontend is, among other reasons, so we can use it with the enable_if attribute, like so:

void strncat(char *buf, int n, const char *from)
  __attribute__((overloadable,
                 enable_if(__builtin_object_size(buf, 0) < n, ""),
                 unavailable("'n' is larger than the target buffer!")));

void strncat(char *buf, int n, const char *from)
  __attribute__((overloadable));

int main() {
  char buf[4];
  strncat(buf, sizeof(buf)+1, "hi"); // expected-error{{'n' is larger than the target buffer!}}
}

Also, is the IR optimization smart enough to get the exact object size in the following case?

__builtin_object_size *always* gets lowered to a constant by either clang or LLVM. That is, if foo gets inlined into foo1, then LLVM should be able to determine accurate values for g0 and g1. If foo isn't inlined, then no: you'll get -1 for both.

If you'd like to see what we can and can't get, LLVM already has the allocsize attribute (and @llvm.objectsize intrinsic) in it. :)

ahatanak added inline comments.Aug 3 2016, 1:16 PM
test/CodeGenCXX/alloc-size.cpp
66 ↗(On Diff #55478)

Thanks for the explanation. I see why you want to evaluate it in the front-end.

It looks like llvm optimizes function foo before it gets inlined into function foo1 and, as a result, g0 and g1 both get -1 instead of 10 and 500. I guess llvm should defer optimizing the function until after inliner is run.

test/CodeGenCXX/alloc-size.cpp
66 ↗(On Diff #55478)

Hm. Looks like how we handle allocsize in LLVM may be slightly broken, then, because it works with regular allocas.

Filed PR28834. Thanks for the heads-up. :)

hfinkel added a subscriber: hfinkel.Nov 2 2016, 9:25 AM

What's the status of this?

With any luck, this'll be on the top of my to-do list next week. :)

george.burgess.iv edited edge metadata.

Rebased and made the __builtin_object_size code a tiny bit cleaner.

The blocks bugfix I mentioned will be up as a separate review in a few minutes. :)

I still think that this looks good, so if @rsmith doesn't comment over the weekend, you can go ahead and commit on Monday -- we can handle any feedback in post-commit review.

include/clang/Basic/AttrDocs.td
240 ↗(On Diff #77222)

Double space after the full stop should be a single space (here and below).

This revision was automatically updated to reflect the committed changes.
george.burgess.iv marked an inline comment as done.Dec 19 2016, 5:16 PM

Thanks, everyone! :)