Page MenuHomePhabricator

Implemented [[clang::musttail]] attribute for guaranteed tail calls.

Authored by haberman on Mar 29 2021, 9:46 AM.



This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.

The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.

Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.

Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
haberman updated this revision to Diff 336316.Apr 8 2021, 10:32 PM
  • Rename and refine IgnoreElidableImplicitConstructorSingleStep().

Mostly just nits from me, but the attribute portions look good to me.


This worries me slightly -- not all CallExpr objects have a callee declaration ( That said, I'm struggling to come up with an example that isn't covered so this may be fine.


It'd be better not to go through the cast machinery twice -- you cast to the MemberPointerType and then cast to the same thing again (but in a different way).


This can be removed entirely.

haberman updated this revision to Diff 336511.Apr 9 2021, 10:27 AM
haberman marked 9 inline comments as done.
  • Simplified some casts and type declarations.

That was my experience too, I wasn't able to find a case that isn't covered. I tried to avoid adding any diagnostics that I didn't know how to trigger or test.


I changed to auto, but I can't tell if you have another suggestion here also. I can't see how any of these casts can be removed.

aaron.ballman added inline comments.Apr 9 2021, 12:12 PM

I'm not certain if I should take a shower after writing that code or not, but it's one potential way not to perform the cast twice.

If that code is too odious for others, we should at least change the dyn_cast<> in the else if to be an isa<>.

haberman updated this revision to Diff 336894.Apr 12 2021, 10:41 AM
  • Switch to isa<> for type check.
  • Merge branch 'main' into musttail
haberman marked an inline comment as done.Apr 12 2021, 10:42 AM
haberman added inline comments.

I changed dyn_cast<> to isa<>. If @rsmith concurs about the dyn_cast_or_null<> variant I'll switch to that.

rsmith added inline comments.Apr 12 2021, 4:17 PM

I think this would be clearer, assuming it's equivalent (and if it's not equivalent, I think it'd be useful to include a comment explaining why).


This loop is problematic: it's generally not safe to modify an expression that is used as a subexpression of another expression. (Modifying the ReturnStmt is, by contrast, much less problematic because the properties of a statement have less complex dependencies on the properties of its subexpressions.) In particular, if there were any implicit conversions here that changed the type or value category or similar, the enclosing parentheses would have the wrong type / value category / similar. Also there are possibilities here other than CallExpr and ParenExpr, such as anything else that we consider to be "parentheses" (such as a GenericSelectionExpr).

But I think this loop should never be necessary, because all implicit conversions should always be on the outside of the parentheses. Do you have a testcase that needs it?


... would be more in line with our normal idioms.


This assert is incorrect. It would fail for a case like:

using T = int();
T *f();
int g() { [[clang::musttail]] return f()(); }

... where there is no declaration associated with the function pointer returned by f().

I think instead of looking for a callee declaration, you should instead inspect the callee expression. You can distinguish between a member function call and a non-member call by looking at the type of the callee. Perhaps the simplest way would be to distinguish between three cases:

(1) There is a callee declaration, which is a member function: this is a direct call to a member function; you can use the type of the callee declaration for your check.
(2) The callee expression is (after skipping parens) a pointer-to-member access operator (BinaryOperator::isPtrMemOp); you can use the type of the RHS operand (which will be a pointer to member function) for your check.
(3) Anything else: this is a non-member-function call, and you can directly inspect the type of the callee without caring about the callee declaration. (You might still find the type is not a function type at this stage, which indicates this is some kind of special form. In particular, it could be a BuiltinType::BoundMember for a pseudo-destructor call. I'm not sure if there are currently any other special cases that make it this far; there might not be, because most such cases are dependent.)


Use getAs rather than dyn_cast to look through type sugar. For example, in

void (f)() { [[clang::musttail]] return f(); }

... the type of f is a ParenType, not a FunctionProtoType.


You need to use getAs<MemberPointerType> here not isa in order to look through type sugar (eg, typedefs).

However, as noted above, a call via a member pointer doesn't necessarily have a CalleeDecl, so you'll need to do this check by looking for a callee expression that's the right kind of BinaryOperator instead.

1 ↗(On Diff #336894)

This is a C++ test so it should be in CodeGenCXX.

178–181 ↗(On Diff #336894)

It turns out that we consider p to be the callee decl in this case, so we'll need a better example :)

194 ↗(On Diff #336894)

This doesn't include enough of the output to be able to tell if we've generated correct code. Can you also include the define ... line, showing that %agg.result is the name of the first parameter?

1 ↗(On Diff #336894)

This should be in SemaCXX.

66 ↗(On Diff #336894)

Please add a FIXME to this; it seems like a bug that we can't tell the difference between needing to run a destructor for the return value and needing to run a destructor for some other temporary created in the return statement.

78 ↗(On Diff #336894)

The "is a member of different class (expected void" seems surprising here. Can we customize the diagnostic to instead say that we can't musttail from a non-member to a member (and vice versa for the other case)?

167–171 ↗(On Diff #336894)

Please also test the pseudo-destructor case:

void f() {
  int n;
  using T = int;
  [[clang::musttail]] return n.~T();
haberman updated this revision to Diff 337252.Apr 13 2021, 1:46 PM
haberman marked 14 inline comments as done.
  • Addressed more review comments.

I removed it and my test cases still pass. I'm glad to know this isn't necessary: I was coding defensively because I didn't know that I could count on this invariant:

all implicit conversions should always be on the outside of the parentheses.

Functionally this looks good to me. I've suggested some minor cleanups and I understand you're doing some wordsmithing on the diagnostics; I think once those are complete this will be ready to land. Thank you!


I agree, that sounds like a nice cleanup. Delaying this to a future change makes sense to me.


You shouldn't need the const in the argument to cast, and we generally omit it; cast copies the pointer/referenceness and qualifiers from its argument anyway, and the explicit const in the type of R seems sufficient for readers. (I'm not even sure if cast intends to permit explicit qualfiiers here.)


I think this isa<CapturedDecl> check is redundant, because a CapturedDecl is not a FunctionDecl, so CallerDecl will always be null when CurContext is a CapturedDecl.


Even in invalid code we should never see a CallExpr whose callee has a null type; if Sema can't form an Expr that meets the normal expression invariants during error recovery, it doesn't build one at all. I think you can remove this if.


Given that we don't care about differences in qualifiers, it might be clearer to not include them in the diagnostics.


Nice new feature! Please also update Release Notes for clang.

haberman added inline comments.Apr 13 2021, 4:20 PM

Without this if(), I crash on this test case. What do you think?

struct TestBadPMF {
  int (TestBadPMF::*pmf)();
  void BadPMF() {
    [[clang::musttail]] return ((*this)->*pmf)(); // expected-error {{left hand operand to ->* must be a pointer to class compatible with the right hand operand, but is 'TestBadPMF'}}

Dump of CalleeExpr is:

RecoveryExpr 0x106671e8 '<dependent type>' contains-errors lvalue
|-ParenExpr 0x10667020 'struct TestBadPMF' lvalue
| `-UnaryOperator 0x10667008 'struct TestBadPMF' lvalue prefix '*' cannot overflow
|   `-CXXThisExpr 0x10666ff8 'struct TestBadPMF *' this
`-MemberExpr 0x10667050 'int (struct TestBadPMF::*)(void)' lvalue ->pmf 0x10666ed0
  `-CXXThisExpr 0x10667040 'struct TestBadPMF *' implicit this
rsmith added inline comments.Apr 13 2021, 4:56 PM

Ah, right, while the callee will always have a non-null type, that type might not be a pointer type.

I think what we're missing here is a check for a dependent callee; checking for a dependent context isn't enough to check for error-dependent constructs. Probably the simplest thing would be to change the isDependentContext() checks to also check if the return expression isInstantiationDependent(). (That would only help with the error-dependent cases for now, but we'd also need that extra check in the future if anything like goes forward, allowing dependent constructs in non-dependent contexts, especially in combination with

haberman updated this revision to Diff 337576.Apr 14 2021, 3:59 PM
haberman marked 6 inline comments as done.
  • Word-smithed diagnostics and addressed other review comments.
haberman updated this revision to Diff 337581.Apr 14 2021, 4:59 PM
  • More diagnostic wordsmithing.
haberman updated this revision to Diff 337589.Apr 14 2021, 5:33 PM
  • Added release note for [[clang::musttail]].
haberman updated this revision to Diff 337590.Apr 14 2021, 5:35 PM
  • Fixed release note escaping.
haberman updated this revision to Diff 337592.Apr 14 2021, 5:57 PM
  • Fixed several cases in CodeGen test.

Thanks, cool :)

haberman updated this revision to Diff 337597.Apr 14 2021, 6:15 PM
  • Fixed typo in comment.

Ok I think this is ready to land.

There are a few FIXME comments, I will follow up with some small changes to address them.

Harbormaster completed remote builds in B98788: Diff 337590.
rsmith accepted this revision.Apr 15 2021, 4:47 PM
This revision is now accepted and ready to land.Apr 15 2021, 4:47 PM
This revision was landed with ongoing or failed builds.Apr 15 2021, 5:13 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Apr 15 2021, 6:05 PM

Looks like this breaks tests on mac/arm:

Please take a look and revert for now if it takes a while to fix.

That is a great feature, thank you. Compiling state machines and scheme programs to C is now much prettier.

The error message here is very confusing:

/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:122:27: error: cannot perform a tail call to function 'error' because its signature is incompatible with the calling function
      [[clang::musttail]] return snmalloc::error(str);
/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:63:16: note: target function has different number of parameters (expected 2 but has 1)
  [[noreturn]] SNMALLOC_COLD void error(const char* const str);
/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:21:25: note: expanded from macro 'SNMALLOC_COLD'
#  define SNMALLOC_COLD __attribute__((cold))
/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:122:9: note: tail call required by 'musttail' attribute here
      [[clang::musttail]] return snmalloc::error(str);

The caller and callee both have one argument, the error is because the enclosing function has two parameters. The error appears wrong anyway for two reasons in this particular context:

  • The callee is [[noreturn]], so the stack layout doesn't make any difference, anything can be tail called if it's no-return.
  • The enclosing function is always_inline, so checking its argument-frame layout does not give useful information because it's the caller's argument-frame layout that matters.

@theraven: Can you post a minimal repro of your case? I don't follow your distinction between "caller" and "enclosing function."

Regarding noreturn and always_inline: maybe the rules for musttail could be relaxed in cases like the one you mention, but it would require changing the backend (LLVM). Here I changed the front-end only and used LLVM's existing musttail support, which meant accepting its existing limitations.

I would love to see an exception for always_inline: my use case would benefit greatly from this. In my own project I had to change a bunch of always_inline functions to macros to work around this rule. Unfortunately this is complicated by the fact that always_inline does not actually guarantee that inlining occurs.

Here's a minimal test:

void tail(int, float);

void caller(float x)
  return tail(42, x);

void outer(int x, float y)
        return caller(y);

This raises this error: error: cannot perform a tail call to function 'tail' because its signature is incompatible with the calling function
  return tail(42, x);
  ^ note: target function has different number of parameters (expected 1 but has 2)
void tail(int, float);
^ note: tail call required by 'musttail' attribute here

There's also an interesting counterexample:

void tail(int, float);

void caller(int a, float x)
  return tail(a, x);

void outer(float y)
        return caller(42, y);

This *is* accepted by clang, but then generates this IR at -O0:

define dso_local void @_Z5outerf(float %0) #2 {
  %2 = alloca i32, align 4
  %3 = alloca float, align 4
  %4 = alloca float, align 4
  store float %0, float* %4, align 4
  %5 = load float, float* %4, align 4
  store i32 42, i32* %2, align 4
  store float %5, float* %3, align 4
  %6 = load i32, i32* %2, align 4
  %7 = load float, float* %3, align 4
  call void @_Z4tailif(i32 %6, float %7)
  ret void

And this IR at -O1:

; Function Attrs: uwtable mustprogress
define dso_local void @_Z5outerf(float %0) local_unnamed_addr #2 {
  call void @_Z4tailif(i32 42, float %0)
  ret void

Note that in both cases, the alway-inline attribute is respected (even at -O0, the always-inline inliner runs) but the musttail annotation is lost. The inlining has inserted the call into a function with a different set of parameters and so it cannot have a musttail IR annotation.

It's not generically true that "anything can be tail-called if it's noreturn". For one, noreturn doesn't imply that the function doesn't exit by e.g. throwing or calling longjmp. For another, the most important user expectation of tail calls is that a long series of tail calls will exhibit zero overall stack growth; in a caller-pop calling convention, calling a function with more parameters may require growing the argument area in a way that cannot be reversed, so e.g. a long sequence of tail calls alternating between 1-argument and 2-argument functions will eventually exhaust the stack, which violates that user expectation.

chfast added a subscriber: chfast.Jan 9 2022, 4:59 AM
chfast added inline comments.

I reported a related issue. I wander if this is easy to fix.