Page MenuHomePhabricator

Introduce per-callsite inline intrinsics
AbandonedPublic

Authored by kuhar on Aug 23 2018, 5:29 PM.

Details

Summary

Traditionally, to force some inlining decisions one has to annotate function declarations with __attribute__((always_inline)) or __attribute__((noinline)). One problem with these attributes is that they affect every call site. Always inlining or forbidding inlining may not be desirable in every context, and a workaround for that is creating a few copies of functions, each with a different inline attribute. Furthermore, it's not always feasible (in every project) to modify library code and introduce new function attributes there.

This patch introduces a new way of forcing inlining decisions on a per-callsite basis. This allows for more fine-grained control over inlining, without creating any duplicate functions. The two new intrinsics for controlling inlining are:

  • __builtin_always_inline(Foo()) -- inlines the function Foo at the callsite, if possible. Internally, this applies the alwaysinline attribute to the generated call instruction.
  • __builtin_no_inline(Foo()) -- forbids the function Foo to be inlined at the callsite. Internally, this applies the noinline attribute to the generated call instruction.

The inline intrinsics support function, function pointer, member function, member function pointer, virutal function, and operator calls. Support for constructor calls (CXXTemporaryExpr) should also be possible, but is not the part of this patch.

Diff Detail

Event Timeline

kuhar created this revision.Aug 23 2018, 5:29 PM
kuhar added a comment.Aug 23 2018, 5:34 PM

Disclaimer: I'm a Clang newbie and I'm not sure if that's a good way to implement these intrinsics. I'm not sure about the following things:

  • The new enum CallInlineKind may not be in the right place
  • Not sure if adding the extra parameter to EmitSomething methods is the cleanest thing possible -- some functions have it now, some don't, and it makes argument lists longer
  • Not sure if just adding an extra attribute at each callsite is enough -- I'm not sure what is supposed to happen when there are conflicting inline attributes at call sites and function declarations (e.g. __builtin_always_inline and __attribute__ ((always_inline)))
  • I don't know how to handle constexpr functions
  • I don't know how to implement it for constructor calls

I would really appreciate any tips here :).

kuhar added a comment.Aug 24 2018, 8:58 AM

One extra question: do you think this deserves a separate RFC?

Quuxplusone added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
8236

I'd expect to be able to write

__builtin_always_inline(sqrt(x))
__builtin_no_inline(sqrt(x))

without caring whether sqrt was a real function or just a macro around __builtin_sqrt. How important is it that calls to builtin functions be errors, instead of just being ignored for this purpose?

8238

Spelling: s/dotr/dtor/
Semantics: Does this trigger in generic contexts? I feel like I'd want to be able to write

__builtin_always_inline(p->~T());

without caring whether T was a primitive type or not. How important is it that pseudo-destructor calls be errors, instead of just being ignored for this purpose?

include/clang/CodeGen/CGFunctionInfo.h
716

Here and throughout, wouldn't it be more traditional to name the parameter variable CIK? (You have sometimes CIK, sometimes inlineKind, sometimes InlineKind.)

It also feels needlessly convoluted to have a value of type CallInlineKind stored in a member named InlineCall (note the reversal of the words), but I'm not sure how that's generally dealt with.

test/SemaCXX/inline-builtins.cpp
13

This raises a question for me about the semantics of "always inlining" a "call":

struct A { A(); A(A&&); };
struct B { B(A); }
void foo(B);

__builtin_always_inline(foo(A{}));

Does __builtin_always_inline apply to foo, B(A) (used to create the parameter to foo), A() (used to create the argument to A(A&&)), or all of the above?
You should add a test case something like this, maybe with multiple function arguments.

72

Really good you thought of this case! But shouldn't this *not* be an error? 1_s is an operator call.

kuhar updated this revision to Diff 162495.Aug 24 2018, 4:02 PM
kuhar marked an inline comment as done.

Fix naming and add more tests.

include/clang/Basic/DiagnosticSemaKinds.td
8236

Very good point. Initially I thought this should be an error, but I think that since compiler can reason about intrinsics anyway, it makes sense to treat inline intrinsics as NoOps in this case.

8238

I thought about it an initially disliked the idea of of silently accepting code that cannot be inlined (because there's no function call).

This case is more general, because you can have the same problem with operator calls.
Consider a function template that does __builtin_always_inline(t + t). It makes sense to inline it for some custom type that implements the + operator, but it could be also used when t is e.g. an int.

Given that I the intrinsics already works on the best-effort basis (and doesn't not strictly guarantee that inlining/no inlining will happen; e.g., indirect calls), and that in the future I also want to introduce a third intrinsic for transitively inlining everything, I think it makes sense to treat it as as NoOp in generic context.

When the context is not generic, maybe it still would make sense to emit an error/warning? I don't like the idea of being able to write __builtin_always_inline(2 + 3) and the compiler not complaining at all... What do you think?

include/clang/CodeGen/CGFunctionInfo.h
716

Good suggestion, thanks!

test/SemaCXX/inline-builtins.cpp
13

It should apply to the outermost call, e.g.:
__builtin_always_inline(A(B(C()))) -- applies to A(...)
A(__builtin_always_inline(B(C()))) -- applies to B(...)
__builtin_always_inline(a.foo().bar()) -- applies to .bar()
__builtin_always_inline(a.foo()).bar() -- applies to .foo()

I added extra testcases for this. Ultimately, I believe that these intrinsic deserve a solid documentation on the clang www for builtins.

13

I also plan to have a third inline intrinsic: __builtin_flatten_inline that will (try to) transitively inline everything in parentheses.

72

This is simply not implemented yet. I added a TODO.
(From what I see, UDL don't look like operators on the AST level.)

kuhar edited reviewers, added: Quuxplusone, amharc; removed: grosser.Aug 24 2018, 4:08 PM
kuhar updated this revision to Diff 162519.Aug 24 2018, 5:50 PM

Rename missed CallInlineKind parameters to CIK

kuhar updated this revision to Diff 162520.Aug 24 2018, 5:54 PM
rsmith added a comment.Sep 4 2018, 1:21 PM

Support for constructor calls (CXXTemporaryExpr) should also be possible, but is not the part of this patch.

(You should handle the base class (CXXConstructExpr) that describes the semantics, rather than the derived class (CXXTemporaryObjectExpr) that describes a particular syntactic form for said semantics.)

Have you considered whether the builtin should apply to new expressions? (There are potentially three different top-level calls implied by a new expression -- an operator new, a constructor, and an operator delete -- so it's not completely obvious what effect this would have there.) And likewise for delete.

include/clang/Basic/DiagnosticSemaKinds.td
8236

I think it would be surprising if __builtin_no_inline(sqrt(x)) would use a target-specific sqrt instruction rather than making a library call. But given that these builtins are best-effort anyway, maybe it's acceptable.

8238

Nit: we generally use "cannot" rather than "must not" in diagnostics.

lib/CodeGen/CGBuiltin.cpp
3008–3013

Nit: please add braces around the outer if.

lib/Sema/SemaChecking.cpp
338–345

Use isa<CallExpr> rather than checking the exact class here; you should accept statements derived from these kinds too (eg, UserDefinedLiteralExpr). (You should explicitly disallow CUDAKernelCallExpr, though, since the builtins don't make any sense for a host -> device call.)

349

Nit: braces here.

356–360

Is this a necessary restriction? I would expect calls to blocks to be inlineable when the callee can be statically determined.

+rjmccall for CodeGen changes

kuhar added a comment.Sep 4 2018, 1:27 PM

+rjmccall for CodeGen changes

@rsmith @rjmccall
I have one high-level question about the CodeGen part that I wasn't able to figure out: is it possible to bail out of custom CodeGen in CGBuiltin and somehow say: emit whatever is inside (IE treat the builtin call node as a no-op)?

kuhar added a comment.EditedSep 4 2018, 1:33 PM

Thank you for the comments, Richard!

Have you considered whether the builtin should apply to new expressions? (There are potentially three different top-level calls implied by a new expression -- an operator new, a constructor, and an operator delete -- so it's not completely obvious what effect this would have there.) And likewise for delete.

I haven't thought about it, but to be honest I'm not sure what should happen. Intuitively, I think these 3 options would make most sense, listed from the most to the least reasonable in my opionion:

  1. apply the corresponding attribute to all of the implied calls
  2. apply to constructor/destructor only
  3. don't handle it at all -- emit an error

How would you suggest to handle it?

pchan added a subscriber: pchan.Sep 4 2018, 6:18 PM

+rjmccall for CodeGen changes

@rsmith @rjmccall
I have one high-level question about the CodeGen part that I wasn't able to figure out: is it possible to bail out of custom CodeGen in CGBuiltin and somehow say: emit whatever is inside (IE treat the builtin call node as a no-op)?

You can just emit the sub-expression. Make sure you pass down the ReturnValueSlot, and make sure you test C++ calls that return references where the result is really treated as an l-value.

I've commented about the language design on your RFC thread.

include/clang/Basic/DiagnosticSemaKinds.td
8240

This seems pretty trivial to add.

include/clang/CodeGen/CGFunctionInfo.h
517

Please don't propagate this information through the CGFunctionInfo. I really want this type to be *less* site-specific, not *more*.

lib/CodeGen/CGBuiltin.cpp
2998

I wonder if it would make more sense to give this its own Expr subclass. We do that with several other "builtin" calls, like __builtin_choose_expr and __builtin_offsetof. That would avoid the problem of ending up in CGBuiltin, and it would make it easier to recognize and look through these annotation calls in Sema or IRGen.

lib/CodeGen/CodeGenFunction.h
3570

I feel like the type of this argument should definitely be generalized to not just be about inlining.

Please also teach constant expression evaluation (lib/AST/ExprConstant.cpp) to look through these builtins when evaluating.

kuhar abandoned this revision.Sep 24 2019, 11:55 AM
xbolva00 added inline comments.Mon, Nov 4, 1:00 PM
include/clang/CodeGen/CGFunctionInfo.h
517

I like this patch but sadly, it was abandoned.

What way of passing this info would you recommend?

rjmccall added inline comments.Mon, Nov 4, 11:00 PM
include/clang/CodeGen/CGFunctionInfo.h
517

I don't remember much about this anymore, but CGCall is perfectly capable of adding attributes based on call-site-specific information.