Page MenuHomePhabricator

Implemented [[clang::musttail]] attribute for guaranteed tail calls.

Authored by haberman on Mar 29 2021, 9:46 AM.



This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.

The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.

Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.

Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rsmith added inline comments.Thu, Apr 8, 4:11 PM

Can we somehow avoid talking about ARC where it's not relevant? While it'd be nice to be more precise here, my main concern is that we shouldn't be mentioning ARC to people for whom it's not a meaningful term (eg, when not compiling Objective-C or Objective-C++). Perhaps the simplest approach would be to only mention ARC if getLangOpts().ObjCAutoRefCount is set?

221–226 ↗(On Diff #336203)

IgnoreImplicitAsWritten should already skip over implicit elidable constructors, so I would imagine this is skipping over elidable explicit constructor calls (eg, [[musttail]] return T(make()); would perform a tail-call to make()). Is that what we want?


In the case where we're forcibly eliding a constructor, we'll need to emit a return statement that returns musttail call expression here rather than emitting the original substatement. Otherwise the tail call we emit will be initializing a local temporary rather than initializing our return slot. Eg, given:

struct A {
  A(const A&);
  char data[32];
A f();
A g() {
  [[clang::musttail]] return f();

under -fno-elide-constructors when targeting C++11, say, we'll normally lower that into something like:

void f(A *return_slot);
void g(A *return_slot) {
  A temporary; //uninitialized
  f(&temporary); // call f
  A::A(return_slot, temporary); // call copy constructor to copy into return slot

... and with the current patch, it looks like we'll add a 'ret void' after the call to f, leaving g's return slot uninitialized and passing an address into f that refers to a variable that will no longer exist once f is called. We need to instead lower to:

void f(A *return_slot);
void g(A *return_slot) {
  f(return_slot); // call f

Probably the easiest way to do this would be to change the return value on the ReturnStmt to be the tail-called CallExpr when attaching the attribute.

haberman updated this revision to Diff 336310.Thu, Apr 8, 9:26 PM
haberman marked 3 inline comments as done.
  • Refined the implicit constructor skipping code.

I implemented this but I couldn't figure out how to actually trigger the ARC case, so I just removed that part of the diagnostic text for now.

221–226 ↗(On Diff #336203)

As discussed offline, it appears that IgnoreImplicitAsWritten() was not skipping the implicit constructor in this case. Per our discussion, I created a new version of IgnoreImplicitAsWritten() that does, with a FIXME to land it in Expr, and I made it skip implicit constructors only (and added tests for this case).



I had to change your test case to remove the destructor, otherwise it fails the trivial destruction check.

Take a look at the CodeGen tests and see if the output looks correct to you.

haberman updated this revision to Diff 336316.Thu, Apr 8, 10:32 PM
  • Rename and refine IgnoreElidableImplicitConstructorSingleStep().

Mostly just nits from me, but the attribute portions look good to me.


This worries me slightly -- not all CallExpr objects have a callee declaration ( That said, I'm struggling to come up with an example that isn't covered so this may be fine.


It'd be better not to go through the cast machinery twice -- you cast to the MemberPointerType and then cast to the same thing again (but in a different way).


This can be removed entirely.

haberman updated this revision to Diff 336511.Fri, Apr 9, 10:27 AM
haberman marked 9 inline comments as done.
  • Simplified some casts and type declarations.

That was my experience too, I wasn't able to find a case that isn't covered. I tried to avoid adding any diagnostics that I didn't know how to trigger or test.


I changed to auto, but I can't tell if you have another suggestion here also. I can't see how any of these casts can be removed.

aaron.ballman added inline comments.Fri, Apr 9, 12:12 PM

I'm not certain if I should take a shower after writing that code or not, but it's one potential way not to perform the cast twice.

If that code is too odious for others, we should at least change the dyn_cast<> in the else if to be an isa<>.

haberman updated this revision to Diff 336894.Mon, Apr 12, 10:41 AM
  • Switch to isa<> for type check.
  • Merge branch 'main' into musttail
haberman marked an inline comment as done.Mon, Apr 12, 10:42 AM
haberman added inline comments.

I changed dyn_cast<> to isa<>. If @rsmith concurs about the dyn_cast_or_null<> variant I'll switch to that.

rsmith added inline comments.Mon, Apr 12, 4:17 PM

I think this would be clearer, assuming it's equivalent (and if it's not equivalent, I think it'd be useful to include a comment explaining why).


This loop is problematic: it's generally not safe to modify an expression that is used as a subexpression of another expression. (Modifying the ReturnStmt is, by contrast, much less problematic because the properties of a statement have less complex dependencies on the properties of its subexpressions.) In particular, if there were any implicit conversions here that changed the type or value category or similar, the enclosing parentheses would have the wrong type / value category / similar. Also there are possibilities here other than CallExpr and ParenExpr, such as anything else that we consider to be "parentheses" (such as a GenericSelectionExpr).

But I think this loop should never be necessary, because all implicit conversions should always be on the outside of the parentheses. Do you have a testcase that needs it?


... would be more in line with our normal idioms.


This assert is incorrect. It would fail for a case like:

using T = int();
T *f();
int g() { [[clang::musttail]] return f()(); }

... where there is no declaration associated with the function pointer returned by f().

I think instead of looking for a callee declaration, you should instead inspect the callee expression. You can distinguish between a member function call and a non-member call by looking at the type of the callee. Perhaps the simplest way would be to distinguish between three cases:

(1) There is a callee declaration, which is a member function: this is a direct call to a member function; you can use the type of the callee declaration for your check.
(2) The callee expression is (after skipping parens) a pointer-to-member access operator (BinaryOperator::isPtrMemOp); you can use the type of the RHS operand (which will be a pointer to member function) for your check.
(3) Anything else: this is a non-member-function call, and you can directly inspect the type of the callee without caring about the callee declaration. (You might still find the type is not a function type at this stage, which indicates this is some kind of special form. In particular, it could be a BuiltinType::BoundMember for a pseudo-destructor call. I'm not sure if there are currently any other special cases that make it this far; there might not be, because most such cases are dependent.)


Use getAs rather than dyn_cast to look through type sugar. For example, in

void (f)() { [[clang::musttail]] return f(); }

... the type of f is a ParenType, not a FunctionProtoType.


You need to use getAs<MemberPointerType> here not isa in order to look through type sugar (eg, typedefs).

However, as noted above, a call via a member pointer doesn't necessarily have a CalleeDecl, so you'll need to do this check by looking for a callee expression that's the right kind of BinaryOperator instead.

1 ↗(On Diff #336894)

This is a C++ test so it should be in CodeGenCXX.

178–181 ↗(On Diff #336894)

It turns out that we consider p to be the callee decl in this case, so we'll need a better example :)

194 ↗(On Diff #336894)

This doesn't include enough of the output to be able to tell if we've generated correct code. Can you also include the define ... line, showing that %agg.result is the name of the first parameter?

1 ↗(On Diff #336894)

This should be in SemaCXX.

66 ↗(On Diff #336894)

Please add a FIXME to this; it seems like a bug that we can't tell the difference between needing to run a destructor for the return value and needing to run a destructor for some other temporary created in the return statement.

78 ↗(On Diff #336894)

The "is a member of different class (expected void" seems surprising here. Can we customize the diagnostic to instead say that we can't musttail from a non-member to a member (and vice versa for the other case)?

167–171 ↗(On Diff #336894)

Please also test the pseudo-destructor case:

void f() {
  int n;
  using T = int;
  [[clang::musttail]] return n.~T();
haberman updated this revision to Diff 337252.Tue, Apr 13, 1:46 PM
haberman marked 14 inline comments as done.
  • Addressed more review comments.

I removed it and my test cases still pass. I'm glad to know this isn't necessary: I was coding defensively because I didn't know that I could count on this invariant:

all implicit conversions should always be on the outside of the parentheses.

Functionally this looks good to me. I've suggested some minor cleanups and I understand you're doing some wordsmithing on the diagnostics; I think once those are complete this will be ready to land. Thank you!


I agree, that sounds like a nice cleanup. Delaying this to a future change makes sense to me.


You shouldn't need the const in the argument to cast, and we generally omit it; cast copies the pointer/referenceness and qualifiers from its argument anyway, and the explicit const in the type of R seems sufficient for readers. (I'm not even sure if cast intends to permit explicit qualfiiers here.)


I think this isa<CapturedDecl> check is redundant, because a CapturedDecl is not a FunctionDecl, so CallerDecl will always be null when CurContext is a CapturedDecl.


Even in invalid code we should never see a CallExpr whose callee has a null type; if Sema can't form an Expr that meets the normal expression invariants during error recovery, it doesn't build one at all. I think you can remove this if.


Given that we don't care about differences in qualifiers, it might be clearer to not include them in the diagnostics.


Nice new feature! Please also update Release Notes for clang.

haberman added inline comments.Tue, Apr 13, 4:20 PM

Without this if(), I crash on this test case. What do you think?

struct TestBadPMF {
  int (TestBadPMF::*pmf)();
  void BadPMF() {
    [[clang::musttail]] return ((*this)->*pmf)(); // expected-error {{left hand operand to ->* must be a pointer to class compatible with the right hand operand, but is 'TestBadPMF'}}

Dump of CalleeExpr is:

RecoveryExpr 0x106671e8 '<dependent type>' contains-errors lvalue
|-ParenExpr 0x10667020 'struct TestBadPMF' lvalue
| `-UnaryOperator 0x10667008 'struct TestBadPMF' lvalue prefix '*' cannot overflow
|   `-CXXThisExpr 0x10666ff8 'struct TestBadPMF *' this
`-MemberExpr 0x10667050 'int (struct TestBadPMF::*)(void)' lvalue ->pmf 0x10666ed0
  `-CXXThisExpr 0x10667040 'struct TestBadPMF *' implicit this
rsmith added inline comments.Tue, Apr 13, 4:56 PM

Ah, right, while the callee will always have a non-null type, that type might not be a pointer type.

I think what we're missing here is a check for a dependent callee; checking for a dependent context isn't enough to check for error-dependent constructs. Probably the simplest thing would be to change the isDependentContext() checks to also check if the return expression isInstantiationDependent(). (That would only help with the error-dependent cases for now, but we'd also need that extra check in the future if anything like goes forward, allowing dependent constructs in non-dependent contexts, especially in combination with

haberman updated this revision to Diff 337576.Wed, Apr 14, 3:59 PM
haberman marked 6 inline comments as done.
  • Word-smithed diagnostics and addressed other review comments.
haberman updated this revision to Diff 337581.Wed, Apr 14, 4:59 PM
  • More diagnostic wordsmithing.
haberman updated this revision to Diff 337589.Wed, Apr 14, 5:33 PM
  • Added release note for [[clang::musttail]].
haberman updated this revision to Diff 337590.Wed, Apr 14, 5:35 PM
  • Fixed release note escaping.
haberman updated this revision to Diff 337592.Wed, Apr 14, 5:57 PM
  • Fixed several cases in CodeGen test.

Thanks, cool :)

haberman updated this revision to Diff 337597.Wed, Apr 14, 6:15 PM
  • Fixed typo in comment.

Ok I think this is ready to land.

There are a few FIXME comments, I will follow up with some small changes to address them.

Harbormaster completed remote builds in B98788: Diff 337590.
rsmith accepted this revision.Thu, Apr 15, 4:47 PM
This revision is now accepted and ready to land.Thu, Apr 15, 4:47 PM
This revision was landed with ongoing or failed builds.Thu, Apr 15, 5:13 PM
This revision was automatically updated to reflect the committed changes.

Looks like this breaks tests on mac/arm:

Please take a look and revert for now if it takes a while to fix.

That is a great feature, thank you. Compiling state machines and scheme programs to C is now much prettier.