This is an archive of the discontinued LLVM Phabricator instance.

Add the allocsize attribute to LLVM
ClosedPublic

Authored by george.burgess.iv on Nov 23 2015, 12:56 PM.

Details

Summary

Add the allocsize function-level attribute to LLVM.

This attribute allows users to tag arbitrary functions as "malloc-like", so intrinsics like @llvm.objectsize can give higher quality results without requiring modifications to LLVM itself.

Diff Detail

Repository
rL LLVM

Event Timeline

george.burgess.iv retitled this revision from to Add the allocsize attribute to LLVM.
george.burgess.iv updated this object.
george.burgess.iv added a reviewer: echristo.
george.burgess.iv added a subscriber: llvm-commits.
nlopes added a subscriber: nlopes.Dec 15 2015, 2:12 PM

I was wondering whether we could go a bit more generic and support arbitrary expressions (like llvm.assume). This design only supports allocation functions that return a pointer to the beginning of a buffer with size p1 * p2 * ... * pn, where pi are the parameters.

It would be nice to support arbitrary expressions for size and offset. There are many cases in practice where these would be helpful. Of course then we would need to expose a generalized version of gcc's attribute to applications.

support arbitrary expressions (like llvm.assume).

I'm fine allowing this attribute to store a Function* or Value* that LLVM tries to constant fold at each callsite, but I can't see how llvm.assume fits into the picture at all. Can you give a concrete example of what you're trying to accomplish with the addition you proposed?

support arbitrary expressions (like llvm.assume).

I'm fine allowing this attribute to store a Function* or Value* that LLVM tries to constant fold at each callsite, but I can't see how llvm.assume fits into the picture at all. Can you give a concrete example of what you're trying to accomplish with the addition you proposed?

I mentioned assume as a precedent for something that takes an arbitrary expression which is not codegen'ed.
See e.g.: http://lists.llvm.org/pipermail/cfe-dev/2012-June/022272.html

My stab at allowing LLVM to support arbitrary expressions in allocsize is available at D16390. It needs some refactoring/cleanup, but it gives you an idea of what I think we need to do to support allocsize with arbitrary expressions well in LLVM. If we end up _really_ wanting to support that, I'll fix the other patch up a bit and submit it for actual-review. Personally, I'm sketched out by the things that we need to do in that patch, so I think I'd rather continue with a review on this one.

If the consensus is that the other patch is worth the modifications that it brings, I'll clean it up + submit it for actual-review.

  • Rebased
  • Added tests for overflow
  • Added support in the bitcode reader/writer + tests for it.
ab added a subscriber: ab.Feb 16 2016, 5:18 PM

D16390 is indeed quite repulsive; what do you think of annotating the callsites instead?

It does increase bitcode size, but perhaps something like this would feel more natural:

define i8* @checked_malloc(i32 %N) {
	%1 = call i8* @object_malloc(i32 %N)
	%2 = call i32 @llvm.objectsize.i32.p0i8(i8* %1)
	%max = add i32 %N, 8
	%c = icmp ule %max, %2
	call void @llvm.assume(i1 %c)
	%3 = icmp lt %2, %N
	br i1 %3, label %Trap, label %Return
	...
}

You could also introduce dedicated intrinsics (and maybe even wrap the calls with them, to make the lookup trivial).

Let me modify your code sample a bit, so it represents what the transformation would really look like if we wanted to support arbitrary expressions.

 define i8* @checked_malloc(i32 %N) {
 	%1 = call i8* @object_malloc(i32 %N)
 	%2 = call {i32, i32} @llvm.objectsize.and.object.offset(i8* %1)
	%3 = call {i32, i32} @__autogenerated_get_object_size_of_object_malloc(i32 %N)
	%c = icmp eq %2, %3
	call void @llvm.assume(i1 %c)
	%4 = extractelement {i32, i32} %2, 0
	%5 = icmp lt %4, %N
	br i1 %3, label %Trap, label %Return
	...
}

So let's walk through what needs to happen here:

  • If we don't run a special piece of code (ObjectSizeOffsetEvaluator), then the call to @__autogenerated_get_object_size_of_object_malloc must go away without a trace.
  • If we do run ObjectSizeOffsetEvaluator on %1, @__autogenerated_get_object_size_of_object_malloc must stay, as it will be used

Because of this, we either have to get rid of @__autogenerated_get_object_size_of_object_malloc before inlining, or we need to mark it noinline so it's not impossible to remove from functions. The former case means we lose our allocsize *really* early, and the latter means we probably get slower code (especially if the function amounts to return {N - 8, 8};)

Also, we'd need a highly restricted user function constant folder to make this work, just like in D16390.

Plus, @llvm.assume says on the tin that it makes optimizing things harder, so this makes optimizing any function that calls an allocation function harder. Ouch. Okay, let's try something else.

define i8* @checked_malloc(i32 %N) {
  %1 = call i8* @object_malloc(i32 %N)
  call void @llvm.assert_object_size_is(i8* %1, {i32, i32}(i32)* @__autogenerated_get_object_size_of_object_malloc, i32 %N)
  ...
}

This seems more reasonable. Here are the problems it poses, as I see it:

  • It still makes computing @llvm.objectsize more complex (instead of chasing a pointer through bitcasts/GEPs/... to its definition, you need to check at every GEP/bitcast/... for uses of the pointer that are calls to this intrinsic.
  • We still need a special user function constant folder.
  • We still need to rely on optimizations happening to @__autogenerated_get_object_size_of_object_malloc in order to have a chance of constant folding it.
  • We waste time optimizing @__autogenerated_get_object_size_of_object_malloc, a function that may never even hit an object file.
  • We would need to magically turn the intrinsic into an actual function call in some cases. If this happens late, the function won't get inlined. Like said above, if the function amounts to return {N - 8, 8};, that's not good.
  • @__autogenerated_get_object_size_of_object_malloc isn't allowed to have any side-effects (because the intrinsic would presumably be treated as though it has no side-effects, much unlike a call instruction).
  • We need to keep around anything passed to @__autogenerated_get_object_size_of_object_malloc until the intrinsic is deleted, so it may suffer from similar optimization problems as @llvm.assume.
  • We still need to emit this at every callsite to any allocation function.

...And this is the best alternate design I can think of. Does this match what you were thinking? If so, I'm not at all convinced that it's strictly better than what I'm presenting here, which prevents no optimizations, is very lightweight, doesn't require special handling everywhere, doesn't need a custom constant folder, etc.


Honestly, and this is me speaking from a position of complete ignorance as to what the actual real-life use case of this would be, in what case are tiny wrapper functions unacceptable? The only things I have to pull from are here: http://lists.llvm.org/pipermail/cfe-dev/2012-June/022272.html -- can you tell me why the below alternatives don't work just as well? Or give me an example where it wouldn't be possible to write a similar wrapper? I'm unable to think of one.

// Original
char *my_strdup(char *str) __attribute__((alloc_size_ex(strlen(str)+1)));

// Modified
extern char *_my_strdup_impl(const char *str, int N) __attribute__((alloc_size(1)));
inline char *my_strdup(const char *str) {
  size_t total_len = strlen(str)+1;
  return _my_strdup_impl(str, total_len);
}

// ------------------

// Original
void *my_complex_alloc(int n, int size, int add) __attribute__((alloc_size_ex(n * size + add)));

// Modified -- _impl returns the start of the buffer
extern void *_my_complex_alloc_impl(int total_bytes, int n, int size, int add) __attribute__((alloc_size(0)));
inline void *my_complex_alloc(int n, int size, int add) {
  return _my_complex_alloc_impl(n * size + add, n, size, add);
}

// ------------------
// Original
char *middle(int size) __attribute__((alloc_size_ex(size, size/2)));

// Modified - Again, _middle_impl returns the start of the buffer
extern char *_middle_impl(int size) __attribute__((alloc_size(0)));
inline char *middle(int size) { return _middle_impl(size) + size/2; }
sanjoy added a subscriber: sanjoy.Feb 16 2016, 8:03 PM

Hi George,

Have you considered using operand bundles to express arbitrary
allocation size information? With operand bundles, the representation
would be something like

define i8* @checked_malloc(i32 %N) {
      %alloc_size = add i32 %N, 8
      %1 = call i8* @object_malloc(i32 %N) [ "alloc-size"(i32 %alloc_size) ]
      %2 = call i32 @llvm.objectsize.i32.p0i8(i8* %1)
      %3 = icmp lt %2, %N
      br i1 %3, label %Trap, label %Return
      ...
}

and you'd fold (objectsize (call fn(...) [ "alloc-size"(N) ])) to (N).
Once you've decided that there are likely no more opportunities to
simplify llvm.objectsize calls, you can drop the "alloc-size" operand
bundles from all calls (this can possibly be folded into CGP or some
other pass, to avoid re-traversing the whole module).

The caveat, of course, is that in this scheme you won't be able to
fold (objectsize (call allocation_fn)) where the call to allocation_fn
was initially an indirect call that got devirtualized, since you need
to know the call sites that will call allocation functions when
emitting the calls.

  • Sanjoy
ab added a comment.Feb 16 2016, 8:49 PM

Before going further:

Honestly, and this is me speaking from a position of complete ignorance as to what the actual real-life use case of this would be, in what case are tiny wrapper functions unacceptable? The only things I have to pull from are here: http://lists.llvm.org/pipermail/cfe-dev/2012-June/022272.html -- can you tell me why the below alternatives don't work just as well? Or give me an example where it wouldn't be possible to write a similar wrapper? I'm unable to think of one.

FWIW, I agree. One could justify the expression support as simplifying the transition for users (the alternative you show does require manual work), but I think this attribute is enough.

That said, the expression support (D16390) is an interesting thought experiment (Disclaimer: just a curious bystander). So, I was thinking you'd IRGen the whole expression at every callsite; and this is what I meant with "wrapping" the calls:

define i8* @checked_malloc(i32 %N) {
	%p = call i8* @object_malloc(i32 %N)
	%size = add i32 %N, 8
	%1 = call i8* @llvm.assume_alloc_size_and_offset.p0i8(i8* %p, i32 %size, i32 -8)
	%2 = call i32 @llvm.objectsize.i32.p0i8(i8* %1)
	%3 = icmp lt %2, %N
	br i1 %3, label %Trap, label %Return
	...
}

I think this avoids all problems except:

  • @__autogenerated_get_object_size_of_object_malloc isn't allowed to have any side-effects (because the intrinsic would presumably be treated as though it has no side-effects, much unlike a call instruction).
  • We need to keep around anything passed to @__autogenerated_get_object_size_of_object_malloc until the intrinsic is deleted, so it may suffer from similar optimization problems as @llvm.assume.
  • We still need to emit this at every callsite to any allocation function.

Also, +1 to Sanjoy's operand bundles approach: I think it's equivalent but cleaner. (Same caveat applies.)

Thank you both for the feedback/thoughts.

@sanjoy -

Have you considered using operand bundles to express arbitrary allocation size information? With operand bundles, the representation would be something like

TIL those exist -- neat! I think that would be a better approach than requiring custom intrinsics in the example I posted above.

@ab -

One could justify the expression support as simplifying the transition for users (the alternative you show does require manual work),

I agree, but I'm not sure how strong of an argument that would be, given that in most cases, the difference seems to be writing a few expressions in an attribute vs writing those same expressions in a small wrapper.

but I think this attribute is enough

Yay :)

So, I was thinking you'd IRGen the whole expression at every callsite; and this is what I meant with "wrapping" the calls

Ah, okay. I ruled that out initially because of the "side-effects must be allowed" restriction, but didn't consider it when we lifted that. Apologies.

I think this avoids all problems except [...]

Agreed


Thought experiments aside, ISTM that no one's unhappy with the design of this attribute as-is. Is this correct?

Just some minor comment I wrote a few weeks ago, but didn't get to the end of the review (and won't have time in the short term).

include/llvm/Analysis/MemoryBuiltins.h
170 ↗(On Diff #45604)

It seems unrelated to the rest of this patch, can you commit now?

include/llvm/IR/Attributes.h
154 ↗(On Diff #45604)

s/form/for/

Also no \brief anywhere unless you want multiple sentences as part of the brief (the code that has \brief is legacy).

lib/Analysis/MemoryBuiltins.cpp
487 ↗(On Diff #45604)

This is not clear to me, is this if we overflow the 32 bits that the pointer can represent?
Is it consider UB or...?

george.burgess.iv marked 3 inline comments as done.

Addressed all feedback -- thank you!

include/llvm/Analysis/MemoryBuiltins.h
170 ↗(On Diff #45604)

r261144

include/llvm/IR/Attributes.h
154 ↗(On Diff #45604)

Is there something that I should be using instead of \brief, or should I just remove it entirely?

lib/Analysis/MemoryBuiltins.cpp
487 ↗(On Diff #45604)

This is a remnant of two things:

  • Trying to handle overflow sanely for llvm.objectsize (the user can call 32 and 64 bit variants of objectsize)
  • Trying to handle sizes larger than IntTyBits properly

...The first part of that has been moved to InstCombineCalls, so this can mostly go away. The interesting bit is when the user gives us e.g. void *special_malloc(uint64_t N) on 32-bit platforms. Other places in the code expect the APInts you hand back to be IntTyBits bits wide, so we need to shrink the uint64_t to be 32 bits.

An argument can be made for doing a checked truncation of N or for doing an unchecked truncation. Given that checking if N requires more than 32 bits is cheap, I think having well-defined behavior (read: failing) in this case is harmless.

I rewrote this bit of code so it's hopefully more clear (and added a test to be sure we get this right). :)

Rebased and split.

I split this patch in hopes of making it a bit easier to review. This review now just adds the allocsize attribute. With this patch applied alone, allocsize can sit on functions and look pretty, but is useless; the review for the split out bits (coming in ~5 minutes ;) ) makes llvm.objectsize actually use the allocsize attribute.

I intend to commit both patches together.

test/Verifier/allocsize.ll
16 ↗(On Diff #53285)

I know the tests look tiny; please note that the other patch has quite a few tests that come with it. :)

mehdi_amini accepted this revision.Apr 11 2016, 3:50 PM
mehdi_amini added a reviewer: mehdi_amini.

LGTM, but please see inline comment.

docs/LangRef.rst
1253 ↗(On Diff #53285)

The two parameters thing is not "super clean", but I can see how it models the reality (i.e. malloc vs calloc).

Maybe specify it as: allocsize(<EltSize>[, <NumElts = 1>])

1255 ↗(On Diff #53285)

Is "base-0" the right term? (I read it like "base-2" for binary, or "base-16" for hex).
If there is better it would be nice.

include/llvm/IR/Attributes.h
154 ↗(On Diff #53285)

You should use brief only if you want multiple sentences to be part of the brief, otherwise we have "autobrief" turned on: http://lists.llvm.org/pipermail/llvm-dev/2015-May/085973.html

279 ↗(On Diff #53285)

Is brief intended?

337 ↗(On Diff #53285)

No brief

504 ↗(On Diff #53285)

brief, and below.

This revision is now accepted and ready to land.Apr 11 2016, 3:50 PM
george.burgess.iv edited edge metadata.
george.burgess.iv marked 6 inline comments as done.

Addressed all feedback.

docs/LangRef.rst
1253 ↗(On Diff #53285)

The two parameters thing is not "super clean"

Agreed.

Maybe specify it as: allocsize(<EltSize>[, <NumElts = 1>])

I like the idea of making the names more descriptive, but I'm not sure that saying = 1 is the right thing here. Both of these are parameter numbers, so I feel that putting NumElts = 1 may be a bit misleading. :)

1255 ↗(On Diff #53285)

Good catch; I'll try "zero-indexed" instead.

include/llvm/IR/Attributes.h
154 ↗(On Diff #53285)

Good to know -- thanks!

mehdi_amini added inline comments.Apr 11 2016, 5:38 PM
docs/LangRef.rst
1253 ↗(On Diff #53338)

I was trying to express the default value with the "= 1", but I'm fine without.

This revision was automatically updated to reflect the committed changes.