This is an archive of the discontinued LLVM Phabricator instance.

Don't emit unwanted constructor calls in co_return statements
Needs ReviewPublic

Authored by aaronpuchert on Oct 10 2019, 5:53 PM.

Details

Summary

The implementation of return value optimization in D51741 turns

co_return value;

which is essentially

__promise.return_value(value);

into

__promise.return_value(decltype<value>(std::move(value)));

instead of

__promise.return_value(std::move(value));

Instead of doing a copy/move initialization, which is only required for
regular return statements, we just cast to an xvalue reference when we
can consume an object.

Also simplifies the test: some flags aren't needed, neither is main.
Instead we now check that no move constructor is emitted in certain
cases. (That is, we actually delete the move constructor.)

Diff Detail

Event Timeline

aaronpuchert created this revision.Oct 10 2019, 5:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 10 2019, 5:53 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Also remove FIXME comment.

Harbormaster completed remote builds in B39381: Diff 224513.
aaronpuchert added inline comments.Oct 10 2019, 6:02 PM
clang/lib/Sema/SemaCoroutine.cpp
1045–1046

Should be renamed to RVOCandidate, I don't think NRVO can happen here.

clang/lib/Sema/SemaCoroutine.cpp
1045–1046

(Btw, I have no comment on the actual code change in this patch. I'm here in my capacity as RVO-explainer-and-P1155-author, not code-understander. ;))

What's happening here is never technically "RVO" at all, because there is no "RV". However, the "N" is accurate. (See my acronym glossary for details.)
The important thing here is that when you say co_return x; the x is named, and it would be a candidate for NRVO if we were in a situation where NRVO was possible at all.

The actual optimization that is happening here is "implicit move."

I would strongly prefer to keep the name NRVOCandidate here, because that is the name that is used for the exact same purpose — computing "implicit move" candidates — in BuildReturnStmt. Ideally this code would be factored out so that it appeared in only one place; but until then, gratuitous differences between the two sites should be minimized IMO.

clang/test/SemaCXX/coroutine-rvo.cpp
91

This should work equally well with NoCopyNoMove, right? It should just call task<NCNM>::return_value(NCNM&&). I don't think you need MoveOnly in this test file anymore.

146–148

Ditto here, could you use NoCopyNoMove instead of Default?

158

And ditto here: NoCopyNoMove instead of Default?

Oh, and can you please make sure there are test cases for all the various cases covered in P1155? Specifically, I would expect all three of the following test cases to compile successfully. It looks like they compile successfully in trunk right now (Godbolt), so we're just testing that they don't get broken in the future.

struct Widget { Widget(); Widget(const Widget&) = delete; Widget(Widget&&); };
struct To { operator Widget() &&; };
task<Widget> nine() { To t; co_return t; }

struct Fowl { Fowl(Widget); };
task<Fowl> eleven() { Widget w; co_return w; }

struct Base { Base(); Base(const Base&) = delete; Base(Base&&); };
struct Derived : Base {};
task<Base> thirteen() { Derived result; co_return result; }
aaronpuchert planned changes to this revision.Oct 11 2019, 3:57 AM

Both of these should first do overload resolution for one parameter of type MoveOnly&&, and then, only if that overload resolution fails, should they fall back to overload resolution for one parameter of type MoveOnly&.

Still need to address that. With this change we're only doing the first step, we still need the fallback.

clang/lib/Sema/SemaCoroutine.cpp
1045–1046

Hmm, you're completely right. What do you think about ImplicitMoveCandidate? Otherwise I'll stick with the current name, as you suggested.

Ideally this code would be factored out so that it appeared in only one place; but until then, gratuitous differences between the two sites should be minimized IMO.

Isn't it already factored out? I let getCopyElisionCandidate do all the heavy lifting. (Except filtering out lvalue references...)

clang/test/SemaCXX/coroutine-rvo.cpp
91

I thought so, too, but there is some code that probably constructs the coroutine and that needs a move constructor. If you look at the AST:

FunctionDecl 0xee2e08 <line:70:1, line:72:1> line:70:16 param2val 'task<MoveOnly> (MoveOnly)'
|-ParmVarDecl 0xee2cf0 <col:26, col:35> col:35 used value 'MoveOnly'
`-CoroutineBodyStmt 0xee7df8 <col:42, line:72:1>
  |-CompoundStmt 0xee71b8 <line:70:42, line:72:1>
  | `-CoreturnStmt 0xee7190 <line:71:3, col:13>
  |   |-ImplicitCastExpr 0xee7100 <col:13> 'MoveOnly' xvalue <NoOp>
  |   | `-DeclRefExpr 0xee3088 <col:13> 'MoveOnly' lvalue ParmVar 0xee2cf0 'value' 'MoveOnly'
  |   `-CXXMemberCallExpr 0xee7168 <col:3> 'void'
  |     |-MemberExpr 0xee7138 <col:3> '<bound member function type>' .return_value 0xee5408
  |     | `-DeclRefExpr 0xee7118 <col:3> 'std::experimental::traits_sfinae_base<task<MoveOnly>, void>::promise_type':'task<MoveOnly>::promise_type' lvalue Var 0xee54e8 '__promise' 'std::experimental::traits_sfinae_base<task<MoveOnly>, void>::promise_type':'task<MoveOnly>::promise_type'
  |     `-ImplicitCastExpr 0xee7100 <col:13> 'MoveOnly' xvalue <NoOp>
  |       `-DeclRefExpr 0xee3088 <col:13> 'MoveOnly' lvalue ParmVar 0xee2cf0 'value' 'MoveOnly'

So no move constructor here. But then comes a bunch of other stuff, and finally,

`-CoroutineBodyStmt 0xee7df8 <col:42, line:72:1>
  [...]
  `-DeclStmt 0xee31f0 <line:71:3>
    `-VarDecl 0xee3118 <col:3> col:3 implicit used value 'MoveOnly' listinit
      `-CXXConstructExpr 0xee31c0 <col:3> 'MoveOnly' 'void (MoveOnly &&) noexcept'
        `-CXXStaticCastExpr 0xee30d8 <col:3> 'MoveOnly' xvalue static_cast<struct MoveOnly &&> <NoOp>
          `-DeclRefExpr 0xee30a8 <col:3> 'MoveOnly' lvalue ParmVar 0xee2cf0 'value' 'MoveOnly'
146–148

I used Default to show that there is an error even if both copy and move constructor are available, because return_value takes a Default&& then, but we pass a Default& (which is not casted to xvalue).

158

Also here: because there is an error, testing with Default is "stronger".

One more test to add:

struct Widget {
    task<Widget> foo() && {
        co_return *this;  // IIUC this should call return_value(Widget&), not return_value(Widget&&)
    }
};
clang/lib/Sema/SemaCoroutine.cpp
1045–1046

What do you think about ImplicitMoveCandidate?

I think that would be more correct in this case, but it wouldn't be parallel with the one in BuildReturnStmt, and I'm not sure whether it would be correct to change it over there too.

Isn't it already factored out?

I see some parallelism in the two other places that call getCopyElisionCandidate followed by PerformMoveOrCopyInitialization. Maybe this is as factored-out as it can get, but it still looks remarkably parallel. (And I wish NRVOVariable was named NRVOCandidate in the latter case.)

In BuildReturnStmt:

if (RetValExp)
  NRVOCandidate = getCopyElisionCandidate(FnRetType, RetValExp, CES_Strict);
if (!HasDependentReturnType && !RetValExp->isTypeDependent()) {
  // we have a non-void function with an expression, continue checking
  InitializedEntity Entity = InitializedEntity::InitializeResult(ReturnLoc,
                                                                 RetType,
                                                  NRVOCandidate != nullptr);
  ExprResult Res = PerformMoveOrCopyInitialization(Entity, NRVOCandidate,
                                                   RetType, RetValExp);

In BuildCXXThrow:

const VarDecl *NRVOVariable = nullptr;
if (IsThrownVarInScope)
  NRVOVariable = getCopyElisionCandidate(QualType(), Ex, CES_Strict);

InitializedEntity Entity = InitializedEntity::InitializeException(
    OpLoc, ExceptionObjectTy,
    /*NRVO=*/NRVOVariable != nullptr);
ExprResult Res = PerformMoveOrCopyInitialization(
    Entity, NRVOVariable, QualType(), Ex, IsThrownVarInScope);

Naming-wise, I also offer that David Stone's P1825 introduces the name "implicitly movable entity" for these things, and maybe we should call this variable ImplicitlyMovableEntity; however, I am not yet sure.

clang/test/SemaCXX/coroutine-rvo.cpp
91

Oh, I see! That's initially surprising, but I see why it has to be that way. The parameter comes in to this function on the stack, but the parameter variable has to be part of the coroutine frame, which is heap-allocated. So the first thing this function does is _move_ the parameter from the stack into the heap-allocated frame. Local variables can be constructed right there in the frame, but parameters need to be moved.

I wonder if this quirk is reflected in the CD wording for Coroutines. By my reading, the current CD says "no" — it expects the parameters to get _copied_ (not moved) from the stack into the coroutine frame. Attn @GorNishanov, @lewissbaker. http://eel.is/c++draft/dcl.fct.def.coroutine#5.6

146–148

Okay, that is a reasonable explanation. Me personally, I think overload resolution is so complicated that you have not necessarily made the test "stronger," just "different" — the fact that it passes for Default does not at all reassure me that it would necessarily pass for NCNM as well. But as YMMV, I won't press further.

I'll add your test case, but I'll probably reuse the existing data structures.

Oh, and can you please make sure there are test cases for all the various cases covered in P1155? Specifically, I would expect all three of the following test cases to compile successfully. It looks like they compile successfully in trunk right now (Godbolt), so we're just testing that they don't get broken in the future.

struct Widget { Widget(); Widget(const Widget&) = delete; Widget(Widget&&); };
struct To { operator Widget() &&; };
task<Widget> nine() { To t; co_return t; }

struct Fowl { Fowl(Widget); };
task<Fowl> eleven() { Widget w; co_return w; }

struct Base { Base(); Base(const Base&) = delete; Base(Base&&); };
struct Derived : Base {};
task<Base> thirteen() { Derived result; co_return result; }

These seem to work automatically, because in the end we're just building a function call, which does the right implicit conversions if needed.

One more test to add:

struct Widget {
    task<Widget> foo() && {
        co_return *this;  // IIUC this should call return_value(Widget&), not return_value(Widget&&)
    }
};

We're currently not considering *this as implicitly movable, because it's not a DeclRefExpr.

clang/lib/Sema/SemaCoroutine.cpp
1045–1046

I see some parallelism in the two other places that call getCopyElisionCandidate followed by PerformMoveOrCopyInitialization.

Note that I'm removing the latter call with this change, and also the call to InitializedEntity::InitializeResult: we don't want to initialize anything. Return statements have to actually initialize a return value, but co_return statements only call <promise>.return_value. So I think finding the candidate is about as much as we can factor out. (Although it would be nice if it could also catch the case of lvalue references.)

clang/test/SemaCXX/coroutine-rvo.cpp
91

You are right, a comment says that these are "statements that move coroutine function parameters to the coroutine frame, and store them on the function scope info."

I agree with your reading of the draft, it clearly talks about "lvalues". I would guess this is an oversight in the draft though, the moving seems pretty intentional: the statement index is CoroutineBodyStmt::SubStmt::FirstParamMove, and FunctionScopeInfo has a member CoroutineParameterMoves.

146–148

Since this is just reference binding it's actually completely irrelevant which constructors are available... so I might as well use NoCopyNoMove. Reference binding cannot ever call any constructor, unless I'm mistaken.

@GorNishanov Do I read the standard correctly that there are no constraints on what return_value is? That is, could it also be a function pointer or function object?

struct promise_function_pointer {
  // ...
  void (*return_value)(T&& value);
};
struct promise_function_object {
  // ...
  struct { void operator()(T&& value) {}; } return_value;
};

In both cases, the expression <p>.return_value(<expr-or-braced-init-list>) would be well-formed.

Add tests suggested by @Quuxplusone and add fallback to call by lvalue reference.

The latter turned out more complicated than I thought, because there seems to be no easy way to just try building a call expression without emitting errors if anything goes wrong. So essentially I look up the name myself and check if someone takes an rvalue reference.

aaronpuchert marked 15 inline comments as done.Oct 15 2019, 7:15 PM

Given the complexities of this implementation, I'm beginning to doubt whether implicit moves make sense for co_return at all. Since there can never be any kind of RVO, why not always require an explicit std::move? Implicit moves on return were introduced because an explicit move would inhibit NRVO, and without move there wouldn't be a graceful fallback for implementations that don't have NRVO.

Apply clang-format.

aaronpuchert added inline comments.Oct 29 2019, 3:30 PM
clang/lib/Sema/SemaCoroutine.cpp
999

Overlad resolution can actually still fail if there is a viable candidate, namely when there are multiple candidates and none is better than all others. It's a bit weird though to fall back to lvalue parameter then as if nothing happened.

1006–1012

I should add a test case where this is necessary.

aaronpuchert marked an inline comment as done.

Add test case where check for non-deduced parameter conversions is necessary.

clang/lib/Sema/SemaCoroutine.cpp
999

That is an interesting point! I had not considered it during P1155. I imagine that it might make implementation of P1155's new logic more difficult.

GCC 8 (but not trunk) implements the behavior I would expect to see for derived-to-base conversions: https://godbolt.org/z/fj_lNw

C++ always treats "an embarrassment of riches" equivalently to a "famine"; overload resolution can fail due to ambiguity just as easily as it can fail due to no candidates at all. I agree it's "a bit weird," but it would be vastly weirder if C++ did anything different from its usual behavior in this case.

I'm still amenable to the idea that co_return should simply not do the copy-elision or implicit-move optimizations at all. I wish I knew some use-cases for co_return, so that we could see if the optimization is useful to anyone.

clang/test/SemaCXX/coroutine-rvo.cpp
65

@aaronpuchert wrote:

Given the complexities of this implementation, I'm beginning to doubt whether implicit moves make sense for co_return at all. Since there can never be any kind of RVO, why not always require an explicit std::move? Implicit moves on return were introduced because an explicit move would inhibit NRVO, and without move there wouldn't be a graceful fallback for implementations that don't have NRVO.

I still think that it makes sense to do implicit-move on any control-flow construct that "exits" the current function. Right now that's return and throw, and they both do implicit-move (albeit inconsistently with plenty of implementation divergence documented in P1155). C++2a adds co_return as another way to "exit." I think it would be un-graceful and inconsistent to permit return p but require co_return std::move(p).

Now, I do think that C++2a Coroutines are needlessly complicated, which makes the implementation needlessly complicated. You've noticed that to codegen t.return_value(x) requires not just overload resolution, but actually figuring out whether task::return_value is a function at all — it might be a static data member of function-pointer type, or something even wilder. But eliminating implicit-move won't make that complexity go away, will it? Doing implicit-move just means doing the required overload resolution twice instead of once. That should be easy, compared to the rest of the mess.

That said, I would be happy to start a thread among you, me, David Stone, and the EWG mailing list to consider removing the words "or co_return" from class.copy.elision/3. Say the word.

[EDIT: aw shoot, I never submitted this comment last time]

Not answering inline because comments are getting longer.

I still think that it makes sense to do implicit-move on any control-flow construct that "exits" the current function. Right now that's return and throw, and they both do implicit-move (albeit inconsistently with plenty of implementation divergence documented in P1155). C++2a adds co_return as another way to "exit." I think it would be un-graceful and inconsistent to permit return p but require co_return std::move(p).

Even with return it's not obvious where to draw the line: basically we still require that nothing non-trivial happens between the ReturnStmt and the DeclRefExpr, so for example return f(x) won't be turned into return f(std::move(x)), and return a + b won't be turned into return std::move(a) + std::move(b). It's clear why this can't always happen: if an lvalue is referenced more than once, we might move it more than once. But one could argue that we should still move all local variables that only occur once. I work in a code base with a lot of code like

SomeBuilder b;
b.doStuff();
return std::move(b).getResult();

calling a getResult() && method at the end. It would be nice if I could omit the move. But I see why that's hard.

So basically we have the cases where implicit move is more or less necessary (because explicit moving would prevent NRVO), and we have the cases where it would be possible. And the difficulty is to find the right cutoff. If I understand this correctly, the current rules basically say that if we directly return a DeclRefExpr, possibly with implicit conversions, that will be implicitly moved, if possible. (Although it's a bit weird that we silently fall back in the case of ambiguity.)

With co_return the implicit move is never “necessary”, because NRVO can't happen, so we could just never do it. You're right that could be considered an inconsistency, but it wouldn't be hard to remember: "for co_return, always move."

You've noticed that to codegen t.return_value(x) requires not just overload resolution, but actually figuring out whether task::return_value is a function at all —

Function templates definitely make sense to me, I also added test cases for that.

it might be a static data member of function-pointer type, or something even wilder.

It could also be an object of class type with an operator(). I'm actually not sure if the standard intentionally allows all these variants.

(Unrelated, but in that blog post you write “for backward compatibility, C++17 had to make noexcept function pointers implicitly convertible to non-noexcept function pointers.” I don't see this as a matter of backward compatibility: just like you can implicitly cast a reference/pointer to const, you should be able to cast noexcept away from pointers/references.)

But eliminating implicit-move won't make that complexity go away, will it? Doing implicit-move just means doing the required overload resolution twice instead of once. That should be easy, compared to the rest of the mess.

Initially I thought so, and I wrote some very elegant (yet unfortunately wrong) code. I called buildPromiseCall, which returns an ExprResult, and if that was invalid I tried again without the xvalue conversion. Unfortunately, if that overload resolution fails, I get an error message even if I discard the resulting expression, and compilation fails. (I would also get warnings, if there are any.) So I looked into the call stack if I could disentangle the overload resolution from the diagnostics, but that wasn't so easy. That's why this is so much code.

At first I thought this was weird and there should be functionality for this in Sema, but I think there isn't. I believe that's because there is basically no precedent to “do overload resolution, and if that fails, silently do something else instead”. Normally, if overload resolution fails, that's just an immediate error, no need to wait for anything else. So basically I need to simulate Sema::BuildCallExpr, but without throwing errors. Fortunately many code paths aren't needed, so it's not terribly complicated, but I'm still missing at least two cases (function pointers and function objects).

For return this is much simpler, because we just do an initialization, and there are ways to “try” initialization without immediately throwing an error. For co_return I need to try (parameter) initialization and try to resolve the callee.

That said, I would be happy to start a thread among you, me, David Stone, and the EWG mailing list to consider removing the words "or co_return" from class.copy.elision/3. Say the word.

I'm somewhat undecided. Things would perhaps also be simpler if we require somewhere that return_value is a member function or member function template, thus excluding the case of a function pointer or function object. I don't really see a use case for that, because the promise_type is already an object itself, so why create another object within that? Right now the standard reads as if we're doing textual replacement and I think that's what opens the door to all kind of weird stuff.

clang/lib/Sema/SemaCoroutine.cpp
999

I agree it's "a bit weird," but it would be vastly weirder if C++ did anything different from its usual behavior in this case.

I would find it more natural to throw an error, i.e. not do the fallback, in the case of ambiguity. So the fallback should in my opinion only happen when there are no viable overload candidates, not when there are too many.

I see your construction as follows: we add both operations that take an lvalue and those that take an rvalue to a bigger “overload set”, and order those that take rvalues as higher/better than those that don't. One could say that we do overload resolution on a T&& argument where we allow binding a T&& to a T&, but this is less preferable in the overload ordering.

aaronpuchert added inline comments.Oct 30 2019, 1:44 PM
clang/lib/Sema/SemaCoroutine.cpp
1064

Another interesting question is whether this E should include the xvalue cast, if we did it. I have no idea what would be expected here. I'm tending towards replacing E by Arg.

Rebase and add a comment about the limitations of this implementation.

aaronpuchert added inline comments.Dec 14 2020, 10:51 AM
clang/lib/Sema/SemaCoroutine.cpp
999

In light of your comment

This example will be mentioned in my upcoming (not-yet-finished) WG21 paper P2266, as an example of why the two-pass mechanism sucks and should be removed from C++2b.

could we re-evaluate this point? I'm not sure what your paper is going to say, but I think I agree that the two-pass mechanism is weird.