Page MenuHomePhabricator

[clang] NRVO: Improvements and handling of more cases.
Needs ReviewPublic

Authored by mizvekov on Mar 31 2021, 5:42 PM.

Details

Summary

This expands NRVO propagation for more cases:

Parse analysis improvement:

  • Lambdas and Blocks with dependent return type can have their variables marked as NRVO Candidates.

Variable instantiation improvements:

  • Fixes crash when instantiating NRVO variables in Blocks.
  • Functions, Lambdas, and Blocks which have auto return type have their variables' NRVO status propagated. For Blocks with non-auto return type, as a limitation, this propagation does not consider the actual return type.

This also implements exclusion of VarDecls which are references to
dependent types.

Signed-off-by: Matheus Izvekov <mizvekov@gmail.com>

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nick added a subscriber: nick.Apr 7 2021, 11:28 AM

Didn't really check for correctness yet, just a superficial review.

I like the idea, splitting the functionality up definitely helps understanding this.

clang/include/clang/Sema/Sema.h
4738

Nitpick: I'd go with != None here.

clang/lib/Sema/SemaCoroutine.cpp
1585–1587

Perhaps this should just be direct initialization? Can't really find anything in the standard though.

clang/lib/Sema/SemaExprCXX.cpp
854

Any reason why you've moved that away from the comment that wants to explain it?

clang/lib/Sema/SemaStmt.cpp
3040–3042

The way you've designed the enumeration, couldn't you use std::min here?

3046–3047

The second backslash seems superfluous.

3096–3097

These comments seem to be separated from their context now...

3097

Braces are usually omitted if both branches are single statements.

3110–3111

Just remove "Return false if VD is a __block variable." No need to repeat the code, what follows is important.

3144

Can we maybe move this function up one place to make the diff a bit smaller? It appears to contain the first part of the original getCopyElisionCandidate.

3177

NRVOResult seems to be small, why not make this a proper function and let it return the result?

3193

parenthesized

3194–3196

Can't we just do this when we know what it deduces to? It sounds weird to handle dependent contexts here because we shouldn't attempt any return value initialization then.

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
1062

llvm_unreachable

mizvekov added inline comments.Apr 9 2021, 7:41 PM
clang/lib/Sema/SemaCoroutine.cpp
1585–1587

So I just checked this again. Regarding our conversation on IRC, what I said initially was correct: ReturnValue is always a member function expression, built by makeReturnObject -> buildPromiseCall -> buildMemberCall. So implicit move would never trigger here, and as far as I see there is no reason for this code to have used PerformMoveOrCopyInitialization in the first place.

clang/lib/Sema/SemaExprCXX.cpp
854

Yes, on C++2b this might modify Ex, and I moved it so this would happen before we do the check with the type of the expression just below here.

clang/lib/Sema/SemaStmt.cpp
3040–3042

Yes exactly!
The previous version of this patch I submitted used exactly that, but then for some reason I thought people could find std::min a bit odd here, and so I changed it.
But now I think I was right the first time around :)

3177

It does use the result parameter as input and output. This function can only "downgrade" a result it received previously.
I could make it receive a result and return the result, but then would probably just use it like this:

NRVORes = updNRVOResultWithRetType(NRVORes, ...);

Do you think that looks clearer?

mizvekov added inline comments.Apr 9 2021, 7:43 PM
clang/lib/Sema/SemaCoroutine.cpp
1585–1587

Err I meant: member function *call* expression

mizvekov added inline comments.Apr 9 2021, 7:47 PM
clang/lib/Sema/SemaExprCXX.cpp
854

But yeah I forgot to move the comment, good catch.

mizvekov added inline comments.Apr 10 2021, 9:31 AM
clang/lib/Sema/SemaCoroutine.cpp
1585–1587

Also, I don't think we could force direct initialization here, if the object returned by Gro is volatile for example.
With that said, copy elision from the sema action on the return statement of get_return_object should take care of removing this copy here, I think, I don't see any reason it would not work here just as well as for expressions coming from the parser.

clang/lib/Sema/SemaExprCXX.cpp
854

Actually, had my wires crossed there with the P2266 DR, I only need to move this line there.

clang/lib/Sema/SemaStmt.cpp
3194–3196

I think there are problems regarding the limitations of the current NRVO algorithm, but I had not had much time to study it or think of improvements, so I follow the current approach of trying to use as much available information as possible in order to eliminate candidates early.

With that said, this is not well covered by the test suite and I could probably get away with a lot less without breaking any existing tests.

mizvekov updated this revision to Diff 336626.Apr 10 2021, 11:59 AM
  • Addresses aaronpuchert's review points.
mizvekov marked 8 inline comments as done.Apr 10 2021, 12:03 PM
mizvekov added inline comments.
clang/lib/Sema/SemaStmt.cpp
3096–3097

This one comment here is talking about the line just below it.
Any other examples where it seems separated?

mizvekov marked an inline comment as done.Apr 10 2021, 12:04 PM
aaronpuchert added inline comments.Apr 10 2021, 3:07 PM
clang/lib/Sema/SemaStmt.cpp
3040–3042

Perhaps you can just add a comment to the enumeration saying that the values are totally ordered with regards to implication so that ⇒ is >=.

3096–3097

I meant the context of the comment itself. Originally this was quoting a clause

// - in a return statement in a function [where] ...
// ... the expression is the name of a non-volatile automatic object ...

and then additional parts later. Now you're having that initial part in a different function and it's not immediately clear what you're quoting here.

To be clear, this was about "(other than a function ... parameter)" below.

3177

Yes, that would seem more natural to me. Technically a caller could decide to not use the same variable in both places, for example it could pass a temporary or pass the result on directly.

mizvekov added inline comments.Apr 10 2021, 8:02 PM
clang/lib/Sema/SemaStmt.cpp
3177

But for this function, I don't see how it could make sense to actually use it like that.

You are either using it in a context where you don't have a ReturnType (throw, co_return), and you only care about the initial NRVOResult and don't call this function at all, or you are in a return statement, where you will call this function passing the initial NRVOResult, which will then have no further utility.

Quuxplusone added inline comments.Apr 10 2021, 8:17 PM
clang/lib/Sema/SemaStmt.cpp
3177

I haven't looked at the code, but FWIW, I'm always in favor of more "function-style" functions. Which would you rather deal with — void square(int& x) or int square(int x)? The same rationale applies uniformly everywhere. Even if we are "consuming" the argument in this particular instance, that might change after another few years of maintenance; and regardless, it might be easier for the maintainer to understand "function-style" code even if they aren't going to change it.

aaronpuchert added inline comments.Apr 11 2021, 1:46 PM
clang/lib/Sema/SemaCoroutine.cpp
1585–1587

In InitializationKind there is a distinction between IK_Copy (basically T x = init;) and IK_Direct (basically T x(init);). I don't know how this would be related to volatile, but it is related to explicit.

mizvekov updated this revision to Diff 336736.Apr 11 2021, 8:27 PM
mizvekov marked an inline comment as done.
  • Removes code that tried to allow copy elision for functions with dependent 'auto' return types. The reason is explained in new comments. Will try to address these in future work.
  • Addresses aaronpuchert's review points.
mizvekov marked an inline comment as done.Apr 11 2021, 8:30 PM
mizvekov added inline comments.
clang/lib/Sema/SemaStmt.cpp
3177

I changed the function name and made it return instead the candidate itself in case it was copy elisible, which allowed simplifying some other expressions afterwards.

But in answer to the points made in this thread: I will generally prefer the syntax which mostly matches the intended usage, making it easier to do the right thing and harder to do the wrong thing.

This function implements a fragment of the C++ standard text in a very specific context within clang. It is called from two places.
It's not a major piece of infrastructure or anything like that, to be concerned with unforeseen possible future uses :)

rsmith added inline comments.Mon, Apr 12, 6:39 PM
clang/include/clang/Sema/Sema.h
4731–4738

We generally use FooResult to mean "either a Foo or an indicator that an error has been diagnosed". Would the name NRVOInfo work instead?

Actually, I think the "O" here is misleading, because we're not talking about performing the optimization here, just about what variable was named in a return statement. And I think calling this "NRVO" is a bit misleading because it also covers implicit move, which isn't traditionally part of NRVO. So how about NamedReturnValueInfo or NamedReturnInfo or NamedReturnValue or ReturnValueName or something like that?

clang/lib/Sema/SemaStmt.cpp
3066–3075

I find this name a little unclear (what do we mean by "downgrade"?). Can we call this something a bit more specific, such as disallowCopyElision or disallowNRVO?

3146

This comment needs re-flowing.

3152
3158–3161

How does this happen? Are there any cases where we could do NRVO or should do an implicit move that are blocked by this?

It seems to me that we should (nearly always) be able to work out whether copy elision is possible here without knowing the deduced type:

  • if the return type is cv auto then it will always be deduced to the type of the returned variable, so we can always perform copy elision
  • if the return type is decltype(auto), then we can perform copy elision if the expression is unparenthesized and otherwise cannot; we could perhaps track whether the expression was parenthesized in NRVOResult, and can conservatively disallow copy elision if we don't know (eg, from template instantiation, where we're only looking at the variable and not the return statements)
  • if the return type is anything else involving auto, it can't possibly instantiate to a class type, so we'll never perform copy elision
3160–3161
3164

Please add braces rather than using a comma expression here and below.

3305

Should we skip this if NRVORes.isMoveEligible() && getLangOpts().CPlusPlus20? In that case, we already tried an unrestricted move a couple of lines ago, so we know that adding std::move won't help. (I think we can still hit the "adding std::move will help" case in C++20, but only in obscure corners such as a volatile-qualified local variable.)

3482

I find the pre-existing NRVOCandidate != nullptr more readable than this !!NRVOCandidate.

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
1059

Can you use getCurBlock()->FunctionType for this? We won't have a Scope, but we should still have a ScopeInfo.

mizvekov added inline comments.Mon, Apr 12, 7:20 PM
clang/lib/Sema/SemaStmt.cpp
3158–3161

Yeah, what you suggested is what I tried on a previous patch in this DR, but then studying the NRVO tracker carefully I thought about this counter example:

template<bool B> static auto bar() {
  {                                 
    Foo foo;                        
    if constexpr(B)                 
      return foo;                  
  }                                 
  {                                 
    Bar bar;                        
    if constexpr(!B)                
      return bar;                   
  }                                 
}                                   
`

Since we run the tracker before instantiation, we would see both return statements and mark both foo and bar as NRVO variables.
Ofcourse in the B = false case, we would end up constructing a Foo in a Bar return slot....

As a side note, It is actually funny that we currently perform this optimization (most likely accidentally):

template<bool B> static Foo bar() {
  {                                 
    Foo foo1;                        
    if constexpr(B)                 
      return foo1;                  
  }                                 
  {
    Foo foo2;
    return foo2                                 
  }                                 
}

In the B = false case, we end up constructing foo1 in the return slot even though we actually never return it.

mizvekov added inline comments.Mon, Apr 12, 8:08 PM
clang/lib/Sema/SemaStmt.cpp
3158–3161

Compiler explorer link for the accidental optimization I am talking about: https://godbolt.org/z/czdWodW87

rsmith added inline comments.Mon, Apr 12, 9:38 PM
clang/lib/Sema/SemaStmt.cpp
3158–3161

That's amazing, and makes total sense. Yeah, if constexpr pretty thoroughly destroys the assumptions we make in the Scope-based NRVO computation about what a templated function will look like after instantiation.

mizvekov updated this revision to Diff 337248.Tue, Apr 13, 1:31 PM
mizvekov marked 7 inline comments as done.
  • Address rsmith review comments.
mizvekov added inline comments.Tue, Apr 13, 1:33 PM
clang/include/clang/Sema/Sema.h
4731–4738

I like NamedReturnInfo, will change.

clang/lib/Sema/SemaStmt.cpp
3066–3075

How about demoteNRVOStatus?

3164

Using short utility lambda as an alternative to keep the main logic clear from clutter.

3305

This goes to the documentation on getNamedReturnInfo, return value, where it says that the Canidate member will be non-null in case implicit move would be permitted under the most permissive language standard.

So if Try

So if we are guarded under that condition, we would never

3305

Skipping it is just a small compilation performance optimization:

We are going to try that TryMoveInitialization again just below, this time with equivalent of ForceCXX20 = true. If we were already in c++20 mode, then that second call is going to be made with basically the same parameters and perform basically the same work and reach the same conclusion: Faliure, and the diagnostic won't be produced.

Now my take on why that looks confusing: To me TryMoveInitialization has a similar problem to the one I addressed in this patch for getCopyElisionCandidate, it relies on you calling it multiple times, once to try the the actual work, and if that fails, then you have to call it again, this time in 'permissive' more, to get the information you need for the diagnostics.

Now I did not try giving it the same treatment of returning an aggregate with all the information you need, so you can call it just once with one less parameter.
If you agree on that take, I'd be happy to try a similar improvement there as well.
If you don't think it's worth the noise, then it's all good, no worries :)

As a side note, I just noticed that the
if (QT->isLValueReferenceType()) { below is dead code, we are never going to consider this move eligible in any mode, and so Res.Candidate is going to be null.

We should not need to repeat here the checks we already do in getNRVOResult, so I am going to go ahead and remove it.

I think it may have been added because until just recently getCopyElisionCandidate was bugged and accepted LValue references, and I guess the problem was "fixed" in this diagnostic first.

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp
1059

Thanks a lot! That works and now we get copy elision on dependent blocks :)

mizvekov added inline comments.Tue, Apr 13, 1:35 PM
clang/lib/Sema/SemaStmt.cpp
3305

Disregard this comment, I realized I was answering the wrong question and for some reason it was not discarded.

Quuxplusone added inline comments.Tue, Apr 13, 1:53 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

I'm with Richard here: if you mean "disallow copy elision," say so!
IMO this should be member functions of NamedReturnInfo, and it should just be

void disallowCopyElision() {
    S &= ~CopyElidable;  // obviously this "should" work;
}                    // ensure that your enum values actually make it work

void disallowImplicitMove() {
    S = None;
}

The way you use it below is equivalent to e.g.

if (VD->isExceptionVariable()) {
  Info.disallowCopyElision();
  if (!hasCXX20)
    Info.disallowImplicitMove();
}

which I think is a lot easier to puzzle out what's going on and how it corresponds to the standardese.

3091–3092

Nit: please declare variables one-per-line. Change that , to a ; and write bool again on line 3085 instead of four space characters.

mizvekov added inline comments.Tue, Apr 13, 2:20 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

@Quuxplusone But I did not mean "disallow copy elision", the function will also disallow implicit move based on a parameter, so would be a bit misleading.

That solution you proposed though is more repetitive and prone to error: it does not enforce the invariant that you should always disallow copy elision when disallowing implicit move.
Even if you amend that to having one manipulator function which enforces the invariant, you still have two bools totaling four states, where only three states make sense.

I would consider a better name though, not necessarily happy with this one either.

mizvekov added inline comments.Tue, Apr 13, 2:27 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

Also I did consider making this a method, but this function is more of an implementation detail that is better to hide anyway.

@rsmith:

Actually, there is still something not quite right with this patch that I have to finish investigating.

The B5 test should not have gotten copy elision. I thought at first that we had gotten the function return type deduced during VarDecl instantiation, but we actually have a dependant type instead:

dump of FT:

FunctionProtoType 0x18597a3ed10 '<dependent type> (void)' dependent cdecl
`-BuiltinType 0x185960144e0 '<dependent type>' dependent

I think I assumed wrong that these blocks would behave similarly to lambdas with auto return.

mizvekov updated this revision to Diff 337266.Tue, Apr 13, 3:02 PM
  • Changed the downgrade* function name to disallowNRVO, given second thought, I think it's appropriate name.
  • Change to one variable declaration per statement as per Arthur's review.
mizvekov marked 3 inline comments as done.Tue, Apr 13, 3:52 PM
aaronpuchert added inline comments.Tue, Apr 13, 3:56 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

You can have more aptly named functions without dropping the invariant. I'd say you only need disallowCopyElision though, instead of disallowImplicitMove you could just directly overwrite with None.

3091–3092

I don't think the coding standards have a rule about this. I tend to advocate the opposite in most cases to avoid repetition, but for bool it's probably fine to repeat.

3151–3152

So the return value is trivially deducible from the post-state of Info (as Info.isCopyElidable() ? Info.Candidate : nullptr)? I guess it makes some code shorter to write, but generally it's preferable to have fewer implicit invariants.

Also it's a getter function that modifies its arguments... not saying there isn't precedent in the code, but I'd say less surprising function signatures are still better where we can have them.

mizvekov added inline comments.Tue, Apr 13, 4:29 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

But the dominant use case here is wanting to choose either one based on the language mode. So the signature taking a bool is more convenient.

Quuxplusone added inline comments.Tue, Apr 13, 4:36 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

Just to be clear, I don't see the use-case anywhere as wanting to "choose" anything based on a parameter. As my code indicated, I see the logic here as being simply "For exception variables [or whatever], disallow copy elision. _Additionally_, pre-'20, disallow even implicit move for them." And so on. No functions of boolean parameters, just straightforward English if-thens, based on the English in the standard.
I get that you're probably not going to write the straightforward code in the end; but I just wanted to push a tiny bit harder to see if anyone else sees what I see. :)

aaronpuchert added inline comments.Wed, Apr 14, 3:43 PM
clang/lib/Sema/SemaStmt.cpp
3066–3075

Your argument makes sense to me, but the function is called 5 times with a runtime parameter, so it's going to be quite a bit more code.

just straightforward English if-thens, based on the English in the standard.

Technically aren't we handling different standards at once here? So there is actually no language in any standard that contains these particular if-thens.

mizvekov updated this revision to Diff 337903.Thu, Apr 15, 2:31 PM
  • Added doc to disallowNRVO
  • Also detect implicit return type for blocks.