This is an archive of the discontinued LLVM Phabricator instance.

Implemented [[clang::musttail]] attribute for guaranteed tail calls.
ClosedPublic

Authored by haberman on Mar 29 2021, 9:46 AM.

Details

Summary

This is a Clang-only change and depends on the existing "musttail"
support already implemented in LLVM.

The [[clang::musttail]] attribute goes on a return statement, not
a function definition. There are several constraints that the user
must follow when using [[clang::musttail]], and these constraints
are verified by Sema.

Tail calls are supported on regular function calls, calls through a
function pointer, member function calls, and even pointer to member.

Future work would be to throw a warning if a users tries to pass
a pointer or reference to a local variable through a musttail call.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rsmith added inline comments.Apr 1 2021, 12:41 PM
clang/lib/CodeGen/CGCall.cpp
5315–5317

Yes, I think we should validate this by an assertion if we can. We can check this by walking the cleanup scope stack (walk from CurrentCleanupScopeDepth to EHScopeStack::stable_end()) and making sure that there is no "problematic" enclosing cleanup scope. Here, "problematic" would mean any scope other than an EHCleanupScope containing only CallLifetimeEnd cleanups.

Looking at the kinds of cleanups that we might encounter here, I think there may be a few more things that Sema needs to check in order to not get in the way of exception handling. In particular, I think we should reject if the callee is potentially-throwing and the musttail call is inside a try block or a function that's either noexcept or has a dynamic exception specification.

Oh, also, we should disallow musttail calls inside statement expressions, in order to defend against cleanups that exist transiently within an expression.

clang/lib/CodeGen/CGExpr.cpp
4829

The more I think about this, the more it makes me nervous: if any of the Emit*CallExpr functions below incidentally emit a call on the way to producing their results via the CGCall machinery, and do so without recursing through this function, that incidental call will be emitted as a tail call instead of the intended one. Specifically:

  • I could imagine a block call involving multiple function calls, depending on the blocks ABI.
  • I could imagine a member call performing a function call to convert from derived to virtual base in some ABIs.
  • A CUDA kernel call in general involves calling a setup function before the actual function call happens (and it doesn't make sense for a CUDA kernel call to be a tail call anyway...)
  • A call to a builtin can result in any number of function calls.
  • If any expression in the function arguments emits a call without calling back into this function, we'll emit that call as a tail call instead of this one. Eg, [[clang::musttail]] return f(dynamic_cast<T*>(p)); might emit the call to __cxa_dynamic_cast as the tail call instead of emitting the call to f as the tail call, depending on whether the CGCall machinery is used when emitting the __cxa_dynamic_cast call.

Is it feasible to sink this check into the CodeGenFunction::EmitCall overload that takes a CallExpr, CodeGenFunction::EmitCXXMemberOrOperatorCall, and CodeGenFunction::EmitCXXMemberPointerCallExpr, after we've emitted the callee and call args? It looks like we might be able to check this immediately before calling the CGCall overload of EmitCall, so we could pass in the 'musttail' information as a flag or similar instead of using global state in the CodeGenFunction object; if so, it'd be much easier to be confident that we're applying the attribute to the right call.

clang/lib/Sema/SemaStmt.cpp
594

It's a bit awkward, but I think we should delay this check until after the others -- complaining about non-trivial destruction seems beside the point if the returned value isn't a function call.

Also, the diagnostic text for this error seems narrower than the cases it covers. For example:

void f(const char*);
void g(const char *s) {
  [[clang::musttail]] return f((s + "foo"s).c_str());
}

would be diagnosed as "attribute requires that the return type and all arguments are trivially destructible", and they are; the problem is that the return value creates a temporary object with non-trivial destruction.

clang/test/CodeGen/attr-musttail.cpp
72 ↗(On Diff #334556)

For completeness, can we also get a CHECK-NEXT: ret void here too?

haberman updated this revision to Diff 334890.Apr 1 2021, 9:14 PM
haberman marked 6 inline comments as done.
  • Addressed more comments for musttail.
  • Reject constructors and destructors from musttail.
  • Fixed a few bugs and fixed the tests.
  • Added Obj-C test.

I added tests for all the cases you mentioned. PTAL.

clang/lib/CodeGen/CGCall.cpp
5315–5317

I'm having trouble implementing the check because there doesn't appear to be any discriminator in EHScopeStack::Cleanup that will let you test if it is a CallLifetimeEnd. (The actual code just does virtual dispatch through EHScopeStack::Cleanup::Emit().

I temporarily implemented this by adding an extra virtual function to act as discriminator. The check fires if a VLA is in scope:

int Func14(int x) {
  int vla[x];
  [[clang::musttail]] return Bar(x);
}

Do we need to forbid VLAs or do I need to refine the check?

It appears that JumpDiagnostics.cpp is already diagnosing statement expressions and try. However I could not get testing to work. I tried adding a test with try but even with -fexceptions I am getting:

cannot use 'try' with exceptions disabled
clang/lib/CodeGen/CGExpr.cpp
4829

Done. It's feeling like IsMustTail, callOrInvoke, and Loc might want to get collapsed into an options struct, especially given the default parameters on the first two. Maybe could do as a follow up?

clang/test/CodeGen/attr-musttail.cpp
72 ↗(On Diff #334556)

I turned on LLVM verification via opt so I think this should get verified by the IR verifier. Is that sufficient?

haberman updated this revision to Diff 334985.Apr 2 2021, 10:28 AM
  • Fixed unit test by running opt in a separate invocation.
  • Formatting fixes.
rsmith added a comment.Apr 2 2021, 2:11 PM

Thanks, I think this is looking really good.

@rjmccall, no explicit need to review; I just wanted to make sure you'd seen this and had a chance to express any concerns before we go ahead.

clang/include/clang/Basic/AttrDocs.td
452

One thing I'd add:

If the callee is a virtual function that is implemented by a thunk, there is no guarantee in general that the thunk tail-calls the implementation of the virtual function, so such a call in a recursive cycle can still result in unbounded stack growth.

clang/lib/CodeGen/CGCall.cpp
5315–5317

Do we need to forbid VLAs or do I need to refine the check?

Assuming that LLVM supports musttail calls from functions where a dynamic alloca is in scope, I think we should allow VLAs. The musttail documentation doesn't mention this, so I think its OK, and I can't think of a good reason why you wouldn't be able to musttail call due to a variably-sized frame.

Perhaps a good model would be to add a virtual function to permit asking a cleanup whether it's optional / skippable.

I could not get testing to work.

You need -fcxx-exceptions to use try. At the -cc1 level, we have essentially-orthogonal settings for "it's valid for exceptions to unwind through this code" (-fexceptions) and "C++ exception handling syntax is permitted" (-fcxx-exceptions), and you usually need to enable both for CodeGen tests involving exceptions.

5318–5319

Given the potential for mismatch between the JumpDiagnostics checks and this one, especially as new more exotic kinds of cleanup are added, I wonder if we should use an ErrorUnsupported here instead of an assert.

I strongly suspect we can still reach the problematic case here for a tail call in a statement expression. I don't think it's feasible to check for all the ways that an arbitrary expression context can have pending cleanups, which we'd need in order to produce precise Sema diagnostics for that, so either we handle that here or we blanket reject all musttail returns in statement expressions. I think either approach is probably acceptable.

clang/lib/Sema/SemaStmt.cpp
596–599

I would have thought this assert would fire for void f() { [[clang::musttail]] return; }. If so, we should reject this case with a diagnostic.

601

IgnoreUnlessSpelledInSource is a syntactic check that's only really intended for tooling use cases; I think we want something a bit more semantic here, so IgnoreImplicitAsWritten would be more appropriate.

I think it would be reasonable to also skip "parentheses" here (which we treat as also including things like C's _Generic). Would Ex->IgnoreImplicitAsWritten()->IgnoreParens() work?

If we're going to skip elidable copy construction of the result here (which I think we should), should we also reflect that in the AST? Perhaps we should strip the return value down to being just the call expression? I'm thinking in particular of things like building in C++14 or before with -fno-elide-constructors, where code generation for a by-value return of a class object will synthesize a local temporary to hold the result, with a final destination copy emitted after the call. (Testcase: struct A { A(const A&); }; A f(); A g() { [[clang::musttail]] return f(); } with -fno-elide-constructors.)

609

A call expression doesn't necessarily have a known callee declaration. I would expect this assert to fire on a case like:

void f() {
  void (*p)() = f;
  [[clang::musttail]] return p();
}

We should reject this with a diagnostic.

632–633

Please pass in a flag here so the diagnostic can %select and produce a more specific description of the problem.

644

There are a couple of other contexts that can include a return statement: the caller could also be an ObjCMethodDecl (an Objective-C method) or a CapturedDecl (the body of a #pragma omp parallel region). I'd probably use a specific diagnostic ("cannot be used from a block" / "cannot be used from an Objective-C function") for the block and ObjCMethod case, and a nonsepcific-but-correct "cannot be used from this context" for anything else.

clang/test/CodeGen/attr-musttail.cpp
72 ↗(On Diff #334556)

We don't like Clang's tests depending on opt in general, but I think in this case it's an acceptable crutch until we fix Clang to run the verifier on its IR output again (as discussed offline, it looks like we lost that as part of the transition to the new pass manager). Please add a FIXME to remove the call to opt once that bug is fixed. Other than that, I'm fine with this approach.

rsmith added inline comments.Apr 2 2021, 2:15 PM
clang/lib/CodeGen/CGCall.cpp
5315–5317

Or maybe instead of "is optional / skippable", the right question is, "is this redundant if we're about to return?" That way we could potentially one day reuse the same mechanism to also skip emitting such cleanups when emitting a cleanup path into the return block.

CC'ing Varun Gandhi.

Is musttail actually supported generically on all LLVM backends, or does this need a target restriction?

You should structure this code so it's easy to add exceptions for certain calling conventions that can support tail calls with weaker restrictions (principally, callee-pop conventions). Mostly that probably means checking the calling convention first, or extracting the type restriction checks into a different function that you can skip. For example, I believe x86's fastcall convention can logically support any combination of prototypes as musttail as long as the return types are vaguely compatible.

clang/lib/CodeGen/CGCall.cpp
5318–5319

Yes, I think ErrorUnsupported is a much better idea.

clang/lib/Sema/SemaStmt.cpp
644

Blocks ought to be extremely straightforward to support. Just validate that the tail call is to a block pointer and then compare the underlying function types line up in the same way. You will need to be able to verify that there isn't a non-trivial conversion on the return types, even if the return type isn't known at this point in the function, but that's a problem in C++ as well due to lambdas and auto deduced return types.

Also, you can use isa<...> for checks like this instead of dyn_cast<...>.

rsmith added a comment.Apr 2 2021, 3:27 PM

You should structure this code so it's easy to add exceptions for certain calling conventions that can support tail calls with weaker restrictions (principally, callee-pop conventions). Mostly that probably means checking the calling convention first, or extracting the type restriction checks into a different function that you can skip. For example, I believe x86's fastcall convention can logically support any combination of prototypes as musttail as long as the return types are vaguely compatible.

The LLVM musttail flag doesn't seem to allow for any target-specific loosening of the rules at the moment, so I don't think we can get any benefit from such restructuring right now; do you think it's OK to defer this restructuring and use the stricter rules across all targets for now?

I think there is also value in having a target-independent set of restrictions, even if we could actually guarantee tail calls in more circumstances on some (or maybe most!) targets, in order to allow people to make portable use of the attribute and as data towards something that we might be able to standardize. (For example, the people working on coroutines in C++ wanted something like this, but wanted feedback from implementers on what set of restrictions would be necessary in order to portably guarantee a tail call.) In order to strike a balance between portability and usefulness here, maybe we could plan to eventually accept any musttail call we know the target can support, but warn on musttail calls that don't satisfy the stricter rules and therefore may be non-portable?

You should structure this code so it's easy to add exceptions for certain calling conventions that can support tail calls with weaker restrictions (principally, callee-pop conventions). Mostly that probably means checking the calling convention first, or extracting the type restriction checks into a different function that you can skip. For example, I believe x86's fastcall convention can logically support any combination of prototypes as musttail as long as the return types are vaguely compatible.

The LLVM musttail flag doesn't seem to allow for any target-specific loosening of the rules at the moment, so I don't think we can get any benefit from such restructuring right now; do you think it's OK to defer this restructuring and use the stricter rules across all targets for now?

Right, I wasn't suggesting that we needed to implement weaker rules right now, just that it'd be nice if the code didn't have to be totally restructured just to do it. Right now it's one big function that does all the checks.

I think there is also value in having a target-independent set of restrictions, even if we could actually guarantee tail calls in more circumstances on some (or maybe most!) targets, in order to allow people to make portable use of the attribute and as data towards something that we might be able to standardize. (For example, the people working on coroutines in C++ wanted something like this, but wanted feedback from implementers on what set of restrictions would be necessary in order to portably guarantee a tail call.) In order to strike a balance between portability and usefulness here, maybe we could plan to eventually accept any musttail call we know the target can support, but warn on musttail calls that don't satisfy the stricter rules and therefore may be non-portable?

I agree that we should not start loosening restrictions based on the vagaries of the platform CC, e.g. recognizing that a particular set of arguments happens to be passed solely in registers. I was thinking about callee-pop CCs, like fastcall and swiftasynccall, which are generally designed from the start to support almost unrestricted tail calls; e.g. the only restriction on tail calls between fastcall functions is that the return types are compatible. (IIRC — it's possible that highly-aligned arguments would change that.) Since tail calls are part of the designed feature set of these conventions, it seems appropriate to think about them when adding a tail-call feature.

Standard C conventions generally don't support unrestricted tail calls (because of variadics, unprototyped calls, easier assembly-writing, and history), so this would only apply as a target-specific extension when used in conjunction with a non-standard CC, which meshes well with your goals for standardization. I just want you to write the code so that maintainers can more easily skip some of the restrictions to cover a non-standard CC.

I'm not surprised that the C++ coroutine people want unrestricted tail calls; this is all pretty predictable, and it's essentially the point I made about generic coroutine lowering several years ago at LLVM dev. Really, they need to be asking for a standard calling convention that guarantees unrestricted tail calls. Of course, that would require the standard to admit the existence of calling conventions (other than language linkage :)).

aaron.ballman added inline comments.Apr 3 2021, 9:36 AM
clang/include/clang/Basic/AttrDocs.td
458

It'd be nice if we could nail down "similar" somewhat. I don't know if int and short are similar (due to promotions) or const int and int are similar, etc.

461–462

Not only is this not usable with K&R C declarations, but it's also not usable with ... variadic functions either, right?

clang/lib/Sema/SemaStmt.cpp
562–569

I disagree that ActOnAttributedStmt() is the correct place for this checking -- template checking should occur when the template is instantiated, same as happens for declaration attributes. I'd like to see this functionality moved to SemaStmtAttr.cpp. Keeping the attribute logic together and following the same patterns is what allows us to tablegenerate more of the attribute logic. Statement attributes are just starting to get more such automation.

haberman updated this revision to Diff 335103.Apr 3 2021, 1:14 PM
haberman marked 3 inline comments as done.
  • Addressed comments and tried moving check to SemaStmtAttr.cpp.
clang/lib/Sema/SemaStmt.cpp
562–569

I tried commenting out this code and adding the following code into handleMustTailAttr() in SemaStmtAttr.cpp:

if (!S.checkMustTailAttr(St, MTA))
  return nullptr;

This caused my test cases related to templates to fail. It also seemed to break test cases related to JumpDiagnostics. My interpretation of this is that handleMustTailAttr() is called during parsing only, and cannot catch errors at template instantiation time or that require a more complete AST.

What am I missing? Where in SemaStmtAttr.cpp are you suggesting that I put this check?

601

IgnoreImplicitAsWritten() doesn't skip ExprWithCleanups, and per your previous comment I was trying to find a CallExpr before doing the check prohibiting ExprWithCleanups with side effects.

I could write some custom ignore logic using clang::IgnoreExprNodes() directly.

If we're going to skip elidable copy construction of the result here (which I think we should)

To clarify, are you suggesting that we allow musttail through elidable copy constructors on the return value, even if -fno-elide-constructors is set? ie. we consider that musttail overrides the -fno-elide-constructors option on the command line?

haberman updated this revision to Diff 335106.Apr 3 2021, 1:39 PM
  • Added missing S.setFunctionHasMustTail().
haberman added inline comments.Apr 3 2021, 1:41 PM
clang/lib/Sema/SemaStmt.cpp
562–569

Scratch the part about JumpDiagnostics, that was me failing to call S.setFunctionHasMustTail(). I added that and now the JumpDiagnostics tests pass.

But the template test cases still fail, and I can't find any hook point in SemaStmtAttr.cpp that will let me evaluate these checks at template instantiation time.

aaron.ballman added inline comments.Apr 4 2021, 6:30 AM
clang/lib/Sema/SemaStmt.cpp
562–569

I think there's a bit of an architectural mixup, but I'm curious if @rsmith agrees before anyone starts doing work to make changes.

When transforming declarations, RebuildWhatever() calls the ActOnWhatever() function which calls ProcessDeclAttributeList() so that attributes are processed. RebuildAttributedStmt() similarly calls ActOnAttributedStmt(). However, ActOnAttributedStmt() doesn't call ProcessStmtAttributes() -- the logic is reversed so that ProcessStmtAttributes() is what calls ActOnAttributedStmt().

I think the correct answer is to switch the logic so that ActOnAttributedStmt() calls ProcessStmtAttributes(), then the template logic should automatically work.

haberman added inline comments.Apr 4 2021, 10:31 AM
clang/lib/Sema/SemaStmt.cpp
562–569

I think the correct answer is to switch the logic so that ActOnAttributedStmt() calls ProcessStmtAttributes()

I think this would require ProcessStmtAttributes() to be split into two separate functions. Currently that function is doing two separate things:

  1. Translation of ParsedAttr into various subclasses of Attr.
  2. Validation that the attribute is semantically valid.

The function signature for ActOnAttributedStmt() uses Attr (not ParsedAttr), so (1) must happen during the parse, before ActOnAttributedStmt() is called. But (2) must be deferred until template instantiation time for some cases, like musttail.

aaron.ballman added inline comments.Apr 5 2021, 7:30 AM
clang/lib/Sema/SemaStmt.cpp
562–569

I don't think the signature for ActOnAttributedStmt() is correct to use Attr instead of ParsedAttr. I think it should be StmtResult ActOnAttributedStmt(const ParsedAttributesViewWithRange &AttrList, Stmt *SubStmt); -- this likely requires a fair bit of surgery to make work though, which is why I'd like to hear from @rsmith if he agrees with the approach. In the meantime, I'll play around with this idea locally in more depth.

aaron.ballman added inline comments.Apr 5 2021, 12:15 PM
clang/lib/Sema/SemaStmt.cpp
562–569

I think my suggestion wasn't quite right, but close. I've got a patch in progress that changes this the way I was thinking it should be changed, but it won't call ActOnAttributedStmt() when doing template instantiation. Instead, it will continue to instantiate attributes explicitly by calling TransformAttr() and any additional instantiation time checks will require you to add a TreeTransfor::TransformWhateverAttr() to do the actual instantiation work (which is similar to how the declaration attributes work in Sema::InstantiateAttrs()).

I hope to put up a patch for review for these changes today or tomorrow. It'd be interesting to know whether they make your life easier or harder though, if you don't mind taking a look and seeing how well (or poorly) they integrate with your changes here.

aaron.ballman added inline comments.Apr 5 2021, 1:08 PM
clang/lib/Sema/SemaStmt.cpp
562–569

You can find that review at https://reviews.llvm.org/D99896.

rsmith added inline comments.Apr 5 2021, 1:10 PM
clang/lib/Sema/SemaStmt.cpp
562–569

I think the ideal model would be that we form a FooAttr from the user-supplied attribute description in an ActOn* function from the parser, and have a separate template instantiation mechanism to instantiate FooAttr objects, and those methods are unaware of the subject of the attribute. Then we have a separate mechanism to attach an attribute to its subjects that is used by both parsing and template instantiation. But I suspect there are reasons that doesn't work in practice -- where we need to know something about the subject in order to know how to form the FooAttr. That being the case, it probably makes most sense to model the formation and application of a FooAttr as a single process.

it won't call ActOnAttributedStmt() when doing template instantiation

Good -- not calling ActOn* during template instantiation is the right choice in general -- the ActOn* functions are only supposed to be called from parsing, with a Build* added if the parsing and template instantiation paths would share code (we sometimes shortcut that when the ActOn* and Build* would be identical, but I think that's turned out to be a mistake).

any additional instantiation time checks will require you to add a TreeTransform::TransformWhateverAttr() to do the actual instantiation work

That sounds appropriate to me in general. Are you expecting that this function would also be given the (transformed and perhaps original) subject of the attribute?

601

IgnoreImplicitAsWritten() doesn't skip ExprWithCleanups

That sounds like a bug. Are you sure? It looks like IgnoreImplicitAsWrittenSingleStep calls IgnoreImplicitSingleStep which calls IgnoreImplicitCastsSingleStep which skips FullExpr, and ExprWithCleanups is a kind of FullExpr.

To clarify, are you suggesting that we allow musttail through elidable copy constructors on the return value, even if -fno-elide-constructors is set? ie. we consider that musttail overrides the -fno-elide-constructors option on the command line?

Yes, I think the musttail attribute should override -fno-elide-constructors, because that's necessary in order to provide the tail call the user requested (and the local setting should override the global one). This is probably worth adding to the documentation.

(Also, -fno-elide-constructors is only supposed to affect code generation, not language semantics or program validity, so I think either we should always reject if a constructor call is required for the return value, regardless of whether it's elidable, or we should never reject in that case, and either way this determination should be made independent of the setting of -fno-elide-constructors. Given that choice, it seems more useful to bias towards the common case (-felide-constructors).)

haberman added inline comments.Apr 5 2021, 3:29 PM
clang/lib/Sema/SemaStmt.cpp
562–569

Would it be possible to defer that refactoring until after this change is in? There are a lot of other issues to resolve on this review as it is, and throwing a potential refactoring into the mix is making it a lot harder to get this into a state where it can be landed.

Once it's in I'm happy to collaborate on the other review.

aaron.ballman added inline comments.Apr 6 2021, 4:26 AM
clang/lib/Sema/SemaStmt.cpp
562–569

I'm fine with that -- my suggestion would be to ignore the template instantiation validation for the moment (add tests with FIXME comments where the behavior isn't what you want) and then when I get you the functionality you need to have more unified checking, you can refactor it at that time.

haberman updated this revision to Diff 336004.Apr 7 2021, 10:42 PM
haberman marked 19 inline comments as done.
  • Returned validation to ActOnAttributedStmt() so it works with templates.
  • Merge branch 'main' into musttail
  • Address more review comments.
haberman added inline comments.Apr 7 2021, 10:45 PM
clang/include/clang/Basic/AttrDocs.td
458

Done. I tried to summarize the C++ concept of "similar" types as defined in https://eel.is/c++draft/conv.qual#2 and implemented in https://clang.llvm.org/doxygen/classclang_1_1ASTContext.html#a1b1b3b7a67a30fd817ba85454780d8ad

clang/lib/Sema/SemaStmt.cpp
562–569

I would strongly prefer to submit correct code (that validates templates) and leave a FIXME to make it pretty, rather than submit pretty code and leave a FIXME to make it correct.

609

I think this case will work actually, the callee decl in this case is just the function pointer, which seems appropriate and type checks correctly.

I added a test for this.

644

Tail calls to a block are indeed straightforward and are handled below. This check is for tail calls from a block, which I tried to add support for but didn't have much luck (in particular, during parsing of a block I wasn't able to get good type information for the block).

I'd probably use a specific diagnostic ("cannot be used from a block" / "cannot be used from an Objective-C function") for the block and ObjCMethod case, and a nonsepcific-but-correct "cannot be used from this context" for anything else.

I implemented this as requested. I wasn't able to test OpenMP as you apparently can't return from an OpenMP block.

aaron.ballman added inline comments.Apr 8 2021, 7:29 AM
clang/lib/Sema/SemaStmt.cpp
562–569

I'm okay with that so long as the follow-up work actually happens (not to suggest that you plan to ignore the request!). "This is functional but not pretty" has a risk of becoming enshrined behavior as priorities shift, whereas "this is incomplete" generally does not.

Please add a FIXME comment here just to make sure it's clear we want the code to move in the future.

haberman updated this revision to Diff 336130.Apr 8 2021, 8:39 AM
  • Added FIXME for attribute refactoring.
haberman updated this revision to Diff 336141.Apr 8 2021, 9:08 AM
  • Factored duplicated code into a method on MustTailAttr.
haberman added inline comments.Apr 8 2021, 9:15 AM
clang/lib/Sema/SemaStmt.cpp
562–569

I added a FIXME. Just to set expectations, I'm happy to work with you on updating this code to fit your planned refactoring (either by offering comments/suggestions on a review by you or creating my own follow-up review per your suggestions). But I'll need a fair amount of input from you, since I don't fully grok what you find objectionable about the current code or what your desired end state is.

aaron.ballman added inline comments.Apr 8 2021, 9:34 AM
clang/lib/Sema/SemaStmt.cpp
562–569

Thanks for the FIXME. I'm totally happy to iterate with you on the refactoring. Mostly, it involves testing whether https://reviews.llvm.org/D99983 provides you with enough contextual information when performing template instantiation for you to be able to put the attribute checking logic into the right places.

The objectionable bit about the current approach is that ActOnAttributedStmt()/BuildAttributedStmt() are general functions for attributed statements that should not be doing per-attribute diagnostic work (this won't scale well as more statement attributes get added). My preferred approach based on what you have already is to call checkMustTailAttr() from handleMustTailAttr(), and call it from TreeTransform.h in a new TransformMustTailAttr() function when doing template instantiation (this part is what requires the other patch to land first).

haberman updated this revision to Diff 336153.Apr 8 2021, 9:44 AM
  • Moved calling convention check to happen as early as possible.

You should structure this code so it's easy to add exceptions for certain calling conventions that can support tail calls with weaker restrictions (principally, callee-pop conventions). Mostly that probably means checking the calling convention first, or extracting the type restriction checks into a different function that you can skip. For example, I believe x86's fastcall convention can logically support any combination of prototypes as musttail as long as the return types are vaguely compatible.

I moved the calling convention check to be as early as possible.

haberman updated this revision to Diff 336203.Apr 8 2021, 1:12 PM
  • Formatted files with clang-format.
haberman marked 2 inline comments as done.Apr 8 2021, 1:23 PM
haberman added inline comments.
clang/lib/Sema/SemaStmt.cpp
562–569

Sounds good. I will follow up with you on https://reviews.llvm.org/D99983.

haberman marked an inline comment as done.Apr 8 2021, 1:24 PM
rsmith added inline comments.Apr 8 2021, 4:11 PM
clang/include/clang/Basic/DiagnosticSemaKinds.td
2828–2829

Can we somehow avoid talking about ARC where it's not relevant? While it'd be nice to be more precise here, my main concern is that we shouldn't be mentioning ARC to people for whom it's not a meaningful term (eg, when not compiling Objective-C or Objective-C++). Perhaps the simplest approach would be to only mention ARC if getLangOpts().ObjCAutoRefCount is set?

clang/lib/AST/AttrImpl.cpp
221–226 ↗(On Diff #336203)

IgnoreImplicitAsWritten should already skip over implicit elidable constructors, so I would imagine this is skipping over elidable explicit constructor calls (eg, [[musttail]] return T(make()); would perform a tail-call to make()). Is that what we want?

clang/lib/CodeGen/CGStmt.cpp
668

In the case where we're forcibly eliding a constructor, we'll need to emit a return statement that returns musttail call expression here rather than emitting the original substatement. Otherwise the tail call we emit will be initializing a local temporary rather than initializing our return slot. Eg, given:

struct A {
  A(const A&);
  ~A();
  char data[32];
};
A f();
A g() {
  [[clang::musttail]] return f();
}

under -fno-elide-constructors when targeting C++11, say, we'll normally lower that into something like:

void f(A *return_slot);
void g(A *return_slot) {
  A temporary; //uninitialized
  f(&temporary); // call f
  A::A(return_slot, temporary); // call copy constructor to copy into return slot
}

... and with the current patch, it looks like we'll add a 'ret void' after the call to f, leaving g's return slot uninitialized and passing an address into f that refers to a variable that will no longer exist once f is called. We need to instead lower to:

void f(A *return_slot);
void g(A *return_slot) {
  f(return_slot); // call f
}

Probably the easiest way to do this would be to change the return value on the ReturnStmt to be the tail-called CallExpr when attaching the attribute.

haberman updated this revision to Diff 336310.Apr 8 2021, 9:26 PM
haberman marked 3 inline comments as done.
  • Refined the implicit constructor skipping code.
clang/include/clang/Basic/DiagnosticSemaKinds.td
2828–2829

I implemented this but I couldn't figure out how to actually trigger the ARC case, so I just removed that part of the diagnostic text for now.

clang/lib/AST/AttrImpl.cpp
221–226 ↗(On Diff #336203)

As discussed offline, it appears that IgnoreImplicitAsWritten() was not skipping the implicit constructor in this case. Per our discussion, I created a new version of IgnoreImplicitAsWritten() that does, with a FIXME to land it in Expr, and I made it skip implicit constructors only (and added tests for this case).

clang/lib/CodeGen/CGStmt.cpp
668

Done.

I had to change your test case to remove the destructor, otherwise it fails the trivial destruction check.

Take a look at the CodeGen tests and see if the output looks correct to you.

haberman updated this revision to Diff 336316.Apr 8 2021, 10:32 PM
  • Rename and refine IgnoreElidableImplicitConstructorSingleStep().

Mostly just nits from me, but the attribute portions look good to me.

clang/include/clang/AST/IgnoreExpr.h
127
clang/lib/Sema/SemaStmt.cpp
628
636–637

This worries me slightly -- not all CallExpr objects have a callee declaration (https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/Expr.cpp#L1367). That said, I'm struggling to come up with an example that isn't covered so this may be fine.

641
655
659
682
700

It'd be better not to go through the cast machinery twice -- you cast to the MemberPointerType and then cast to the same thing again (but in a different way).

clang/lib/Sema/SemaStmtAttr.cpp
214

This can be removed entirely.

haberman updated this revision to Diff 336511.Apr 9 2021, 10:27 AM
haberman marked 9 inline comments as done.
  • Simplified some casts and type declarations.
clang/lib/Sema/SemaStmt.cpp
636–637

That was my experience too, I wasn't able to find a case that isn't covered. I tried to avoid adding any diagnostics that I didn't know how to trigger or test.

700

I changed to auto, but I can't tell if you have another suggestion here also. I can't see how any of these casts can be removed.

aaron.ballman added inline comments.Apr 9 2021, 12:12 PM
clang/lib/Sema/SemaStmt.cpp
697–699

I'm not certain if I should take a shower after writing that code or not, but it's one potential way not to perform the cast twice.

If that code is too odious for others, we should at least change the dyn_cast<> in the else if to be an isa<>.

haberman updated this revision to Diff 336894.Apr 12 2021, 10:41 AM
  • Switch to isa<> for type check.
  • Merge branch 'main' into musttail
haberman marked an inline comment as done.Apr 12 2021, 10:42 AM
haberman added inline comments.
clang/lib/Sema/SemaStmt.cpp
697–699

I changed dyn_cast<> to isa<>. If @rsmith concurs about the dyn_cast_or_null<> variant I'll switch to that.

rsmith added inline comments.Apr 12 2021, 4:17 PM
clang/lib/Sema/SemaStmt.cpp
603–604

I think this would be clearer, assuming it's equivalent (and if it's not equivalent, I think it'd be useful to include a comment explaining why).

605–609

This loop is problematic: it's generally not safe to modify an expression that is used as a subexpression of another expression. (Modifying the ReturnStmt is, by contrast, much less problematic because the properties of a statement have less complex dependencies on the properties of its subexpressions.) In particular, if there were any implicit conversions here that changed the type or value category or similar, the enclosing parentheses would have the wrong type / value category / similar. Also there are possibilities here other than CallExpr and ParenExpr, such as anything else that we consider to be "parentheses" (such as a GenericSelectionExpr).

But I think this loop should never be necessary, because all implicit conversions should always be on the outside of the parentheses. Do you have a testcase that needs it?

618

... would be more in line with our normal idioms.

636–637

This assert is incorrect. It would fail for a case like:

using T = int();
T *f();
int g() { [[clang::musttail]] return f()(); }

... where there is no declaration associated with the function pointer returned by f().

I think instead of looking for a callee declaration, you should instead inspect the callee expression. You can distinguish between a member function call and a non-member call by looking at the type of the callee. Perhaps the simplest way would be to distinguish between three cases:

(1) There is a callee declaration, which is a member function: this is a direct call to a member function; you can use the type of the callee declaration for your check.
(2) The callee expression is (after skipping parens) a pointer-to-member access operator (BinaryOperator::isPtrMemOp); you can use the type of the RHS operand (which will be a pointer to member function) for your check.
(3) Anything else: this is a non-member-function call, and you can directly inspect the type of the callee without caring about the callee declaration. (You might still find the type is not a function type at this stage, which indicates this is some kind of special form. In particular, it could be a BuiltinType::BoundMember for a pseudo-destructor call. I'm not sure if there are currently any other special cases that make it this far; there might not be, because most such cases are dependent.)

687

Use getAs rather than dyn_cast to look through type sugar. For example, in

void (f)() { [[clang::musttail]] return f(); }

... the type of f is a ParenType, not a FunctionProtoType.

697

You need to use getAs<MemberPointerType> here not isa in order to look through type sugar (eg, typedefs).

However, as noted above, a call via a member pointer doesn't necessarily have a CalleeDecl, so you'll need to do this check by looking for a callee expression that's the right kind of BinaryOperator instead.

clang/test/CodeGen/attr-musttail.cpp
1 ↗(On Diff #336894)

This is a C++ test so it should be in CodeGenCXX.

178–181 ↗(On Diff #336894)

It turns out that we consider p to be the callee decl in this case, so we'll need a better example :)

194 ↗(On Diff #336894)

This doesn't include enough of the output to be able to tell if we've generated correct code. Can you also include the define ... line, showing that %agg.result is the name of the first parameter?

clang/test/Sema/attr-musttail.cpp
1 ↗(On Diff #336894)

This should be in SemaCXX.

66 ↗(On Diff #336894)

Please add a FIXME to this; it seems like a bug that we can't tell the difference between needing to run a destructor for the return value and needing to run a destructor for some other temporary created in the return statement.

78 ↗(On Diff #336894)

The "is a member of different class (expected void" seems surprising here. Can we customize the diagnostic to instead say that we can't musttail from a non-member to a member (and vice versa for the other case)?

167–171 ↗(On Diff #336894)

Please also test the pseudo-destructor case:

void f() {
  int n;
  using T = int;
  [[clang::musttail]] return n.~T();
}
haberman updated this revision to Diff 337252.Apr 13 2021, 1:46 PM
haberman marked 14 inline comments as done.
  • Addressed more review comments.
clang/lib/Sema/SemaStmt.cpp
605–609

I removed it and my test cases still pass. I'm glad to know this isn't necessary: I was coding defensively because I didn't know that I could count on this invariant:

all implicit conversions should always be on the outside of the parentheses.

Functionally this looks good to me. I've suggested some minor cleanups and I understand you're doing some wordsmithing on the diagnostics; I think once those are complete this will be ready to land. Thank you!

clang/lib/CodeGen/CGExpr.cpp
4829

I agree, that sounds like a nice cleanup. Delaying this to a future change makes sense to me.

clang/lib/Sema/SemaStmt.cpp
616

You shouldn't need the const in the argument to cast, and we generally omit it; cast copies the pointer/referenceness and qualifiers from its argument anyway, and the explicit const in the type of R seems sufficient for readers. (I'm not even sure if cast intends to permit explicit qualfiiers here.)

664

I think this isa<CapturedDecl> check is redundant, because a CapturedDecl is not a FunctionDecl, so CallerDecl will always be null when CurContext is a CapturedDecl.

711

Even in invalid code we should never see a CallExpr whose callee has a null type; if Sema can't form an Expr that meets the normal expression invariants during error recovery, it doesn't build one at all. I think you can remove this if.

771

Given that we don't care about differences in qualifiers, it might be clearer to not include them in the diagnostics.

clang/test/CodeGenCXX/attr-musttail.cpp
213

Nice new feature! Please also update Release Notes for clang.

haberman added inline comments.Apr 13 2021, 4:20 PM
clang/lib/Sema/SemaStmt.cpp
711

Without this if(), I crash on this test case. What do you think?

struct TestBadPMF {
  int (TestBadPMF::*pmf)();
  void BadPMF() {
    [[clang::musttail]] return ((*this)->*pmf)(); // expected-error {{left hand operand to ->* must be a pointer to class compatible with the right hand operand, but is 'TestBadPMF'}}
  }
};

Dump of CalleeExpr is:

RecoveryExpr 0x106671e8 '<dependent type>' contains-errors lvalue
|-ParenExpr 0x10667020 'struct TestBadPMF' lvalue
| `-UnaryOperator 0x10667008 'struct TestBadPMF' lvalue prefix '*' cannot overflow
|   `-CXXThisExpr 0x10666ff8 'struct TestBadPMF *' this
`-MemberExpr 0x10667050 'int (struct TestBadPMF::*)(void)' lvalue ->pmf 0x10666ed0
  `-CXXThisExpr 0x10667040 'struct TestBadPMF *' implicit this
rsmith added inline comments.Apr 13 2021, 4:56 PM
clang/lib/Sema/SemaStmt.cpp
711

Ah, right, while the callee will always have a non-null type, that type might not be a pointer type.

I think what we're missing here is a check for a dependent callee; checking for a dependent context isn't enough to check for error-dependent constructs. Probably the simplest thing would be to change the isDependentContext() checks to also check if the return expression isInstantiationDependent(). (That would only help with the error-dependent cases for now, but we'd also need that extra check in the future if anything like http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2277r0.html goes forward, allowing dependent constructs in non-dependent contexts, especially in combination with http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1306r1.pdf.)

haberman updated this revision to Diff 337576.Apr 14 2021, 3:59 PM
haberman marked 6 inline comments as done.
  • Word-smithed diagnostics and addressed other review comments.
haberman updated this revision to Diff 337581.Apr 14 2021, 4:59 PM
  • More diagnostic wordsmithing.
haberman updated this revision to Diff 337589.Apr 14 2021, 5:33 PM
  • Added release note for [[clang::musttail]].
haberman updated this revision to Diff 337590.Apr 14 2021, 5:35 PM
  • Fixed release note escaping.
haberman updated this revision to Diff 337592.Apr 14 2021, 5:57 PM
  • Fixed several cases in CodeGen test.

Thanks, cool :)

haberman updated this revision to Diff 337597.Apr 14 2021, 6:15 PM
  • Fixed typo in comment.

Ok I think this is ready to land.

There are a few FIXME comments, I will follow up with some small changes to address them.

Harbormaster completed remote builds in B98788: Diff 337590.
rsmith accepted this revision.Apr 15 2021, 4:47 PM
This revision is now accepted and ready to land.Apr 15 2021, 4:47 PM
This revision was landed with ongoing or failed builds.Apr 15 2021, 5:13 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Apr 15 2021, 6:05 PM

Looks like this breaks tests on mac/arm: http://45.33.8.238/macm1/7552/step_7.txt

Please take a look and revert for now if it takes a while to fix.

That is a great feature, thank you. Compiling state machines and scheme programs to C is now much prettier.

The error message here is very confusing:

/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:122:27: error: cannot perform a tail call to function 'error' because its signature is incompatible with the calling function
      [[clang::musttail]] return snmalloc::error(str);
                          ^
/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:63:16: note: target function has different number of parameters (expected 2 but has 1)
  [[noreturn]] SNMALLOC_COLD void error(const char* const str);
               ^
/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:21:25: note: expanded from macro 'SNMALLOC_COLD'
#  define SNMALLOC_COLD __attribute__((cold))
                        ^
/home/theraven/snmalloc2/src/mem/../ds/../aal/../ds/defines.h:122:9: note: tail call required by 'musttail' attribute here
      [[clang::musttail]] return snmalloc::error(str);
        ^

The caller and callee both have one argument, the error is because the enclosing function has two parameters. The error appears wrong anyway for two reasons in this particular context:

  • The callee is [[noreturn]], so the stack layout doesn't make any difference, anything can be tail called if it's no-return.
  • The enclosing function is always_inline, so checking its argument-frame layout does not give useful information because it's the caller's argument-frame layout that matters.

@theraven: Can you post a minimal repro of your case? I don't follow your distinction between "caller" and "enclosing function."

Regarding noreturn and always_inline: maybe the rules for musttail could be relaxed in cases like the one you mention, but it would require changing the backend (LLVM). Here I changed the front-end only and used LLVM's existing musttail support, which meant accepting its existing limitations.

I would love to see an exception for always_inline: my use case would benefit greatly from this. In my own project I had to change a bunch of always_inline functions to macros to work around this rule. Unfortunately this is complicated by the fact that always_inline does not actually guarantee that inlining occurs.

Here's a minimal test:

void tail(int, float);

__attribute__((always_inline))
void caller(float x)
{
  [[clang::musttail]]
  return tail(42, x);
}

void outer(int x, float y)
{
        return caller(y);
}

This raises this error:

tail.cc:7:3: error: cannot perform a tail call to function 'tail' because its signature is incompatible with the calling function
  return tail(42, x);
  ^
tail.cc:1:1: note: target function has different number of parameters (expected 1 but has 2)
void tail(int, float);
^
tail.cc:6:5: note: tail call required by 'musttail' attribute here
  [[clang::musttail]]
    ^

There's also an interesting counterexample:

void tail(int, float);

__attribute__((always_inline))
void caller(int a, float x)
{
  [[clang::musttail]]
  return tail(a, x);
}

void outer(float y)
{
        return caller(42, y);
}

This *is* accepted by clang, but then generates this IR at -O0:

define dso_local void @_Z5outerf(float %0) #2 {
  %2 = alloca i32, align 4
  %3 = alloca float, align 4
  %4 = alloca float, align 4
  store float %0, float* %4, align 4
  %5 = load float, float* %4, align 4
  store i32 42, i32* %2, align 4
  store float %5, float* %3, align 4
  %6 = load i32, i32* %2, align 4
  %7 = load float, float* %3, align 4
  call void @_Z4tailif(i32 %6, float %7)
  ret void
}

And this IR at -O1:

; Function Attrs: uwtable mustprogress
define dso_local void @_Z5outerf(float %0) local_unnamed_addr #2 {
  call void @_Z4tailif(i32 42, float %0)
  ret void
}

Note that in both cases, the alway-inline attribute is respected (even at -O0, the always-inline inliner runs) but the musttail annotation is lost. The inlining has inserted the call into a function with a different set of parameters and so it cannot have a musttail IR annotation.

It's not generically true that "anything can be tail-called if it's noreturn". For one, noreturn doesn't imply that the function doesn't exit by e.g. throwing or calling longjmp. For another, the most important user expectation of tail calls is that a long series of tail calls will exhibit zero overall stack growth; in a caller-pop calling convention, calling a function with more parameters may require growing the argument area in a way that cannot be reversed, so e.g. a long sequence of tail calls alternating between 1-argument and 2-argument functions will eventually exhaust the stack, which violates that user expectation.

chfast added a subscriber: chfast.Jan 9 2022, 4:59 AM
chfast added inline comments.
clang/lib/CodeGen/CGCall.cpp
5319

I reported a related issue. I wander if this is easy to fix. https://github.com/llvm/llvm-project/issues/53087.