This is an archive of the discontinued LLVM Phabricator instance.

[Clang] [P2025] Analyze only potential scopes for NRVO
ClosedPublic

Authored by rusyaev-roman on Feb 14 2022, 3:47 PM.

Details

Summary

Before the patch we calculated the NRVO candidate looking at the
variable's whole enclosing scope. The research in [P2025] shows that looking
at the variable's potential scope is better and covers more cases where NRVO
would be safe and desirable.

Diff Detail

Event Timeline

Izaron requested review of this revision.Feb 14 2022, 3:47 PM
Izaron created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2022, 3:47 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

There is a nice proposal (P2025) Guaranteed copy elision for return variables by Anton Zhilin. The poll on the proposal showed that its ideas are very welcome: link to cplusplus/papers github. Although the proposal is not yet accepted into the Standard in its current wording, some ideas are still good and can be implemented "in advance".

This proposal aims to provide guaranteed copy elision for common cases of local variables being returned from a function, a.k.a. "guaranteed NRVO".

С++ сompiler implementations determine by their own rules whether there will be a NRVO (since it is not a required optimization). The proposal contains a collection of common cases of local variables being returned where copy elision is safe and makes senses and therefore is desirable to do.

The main requirement (omitting details for language-lawyers) is that:

Copy elision is guaranteed for return x; if every return "seen" by x is return x;

"seen by x" means here "all non-discarded return statements in x's potential scope". There are more details in the proposal.

The collection of common cases contains 20 examples: §4.1. Examples.
Here is the current status of these examples:

  • [OK] 13 out of 20 examples are working in Clang as expected.
  • [FAIL] 13th example: should be considered separately (as part of fixing consteval code)
  • [FAIL] 14th example: should be considered separately (as I haven't looked yet how CXXCatchStmt works).
  • [FAIL] 18th example: is unfixable now because of Clang's architecture: my comment on the issue.
  • [OK with the patch] 7th, 8th, 11th, 15th example: are working with this patch.

In order to make the last group of 4 examples working, there is a need to rewrite the logic for calculating the NRVO candidate.
The clang::Scope class has methods void AddDecl(Decl *D), void addNRVOCandidate(VarDecl *VD), void setNoNRVO(), that are called in the order of the scope's parsing.

  • void AddDecl(Decl *D) is called when there is a new Decl *D in the scope. The D's potential scope starts now.
  • void addNRVOCandidate(VarDecl *VD) is called when there is a return <named-variable> in the scope or when a children scope has successfully calculated its single NRVO candidate.
  • void setNoNRVO() is called when there is a return <not-a-named-var> in the scope or when a children scope is telling us to drop our NRVO candidates (nevertheless, the children scope now can still succesfully calculated the NRVO candidate).

We need to have a set of "unspoiled variables" to find the NRVO candidate effectively (yeah, introducing some made up terminology...) A variable is "unspoiled" if it is not forbidden to be the NRVO candidate yet. An AddDecl call adds the variable to this set. An addNRVOCandidate call "spoils" every variable except VD. A setNoNRVO "spoils" every variable. Only an "unspoiled variable" may be the NRVO candidate.

Cases that show the difference (they're covered in tests, though do we need an AST test as well?):

X test(bool B) {
  if (B) {
    X y; // before: nrvo, after: nrvo (same)
    return y;
  }
  X x; // before: no nrvo, after: nrvo (better)
  return x;
}
X test(bool B) {
  X x; // before: no nrvo, after: no nrvo (same)
  if (B)
    return x;
  X y; // before: no nrvo, after: nrvo (better)
  return y;
}
X test(bool A, bool B) {
  {
    {
      X x; // before: nrvo, after: nrvo (same)
      if (A)
        return x;
    }
    X y; // before: no nrvo, after: nrvo (better)
    if (B)
      return y;
  }
  X z;
  retur n z; // before: no nrvo, after: nrvo (better)
}
Izaron updated this revision to Diff 408999.Feb 15 2022, 12:10 PM

Fix Scope::dumpImpl with more precise log

Quuxplusone added a subscriber: Quuxplusone.

The code is above my pay grade, but FWIW, I super support the intent of this patch! Let's get p2025 support into Clang! :)

The collection of common cases contains 20 examples: §4.1. Examples. Here is the current status of these examples:

[OK] 13 out of 20 examples are working in Clang as expected.
[FAIL] 13th example: should be considered separately (as part of fixing consteval code)
[FAIL] 14th example: should be considered separately (as I haven't looked yet how CXXCatchStmt works).
[FAIL] 18th example: is unfixable now because of Clang's architecture: my comment on the issue.
[OK with the patch] 7th, 8th, 11th, 15th example: are working with this patch.

I think it would be an extremely good idea to commit all 20 of these test cases to clang/test/CodeGenCXX/, with their current behavior, as a preliminary patch. Then, D119792 can more clearly show (1) what behavior it's changing, (2) what behavior it's keeping the same, and (3) the fact that it's not regressing any behavior. Also, you'll help future-maintainers by giving them some extra test cases that have already been identified as interesting, even if you personally aren't changing behavior related to those particular test cases.

(I recently took the same tactic with D119772 as a preliminary for D119184 + D119778, and it was very helpful, at least to me. :))

I think it would be an extremely good idea to commit all 20 of these test cases to clang/test/CodeGenCXX/, with their current behavior, as a preliminary patch. Then, D119792 can more clearly show (1) what behavior it's changing, (2) what behavior it's keeping the same, and (3) the fact that it's not regressing any behavior. Also, you'll help future-maintainers by giving them some extra test cases that have already been identified as interesting, even if you personally aren't changing behavior related to those particular test cases.

(I recently took the same tactic with D119772 as a preliminary for D119184 + D119778, and it was very helpful, at least to me. :))

That's a superb idea, thanks! I will soon add all the cases to clang/test/CodeGenCXX/nrvo.cpp in an isolated patch. It indeed will explicitly show improvements in future patches and prevent silent NRVO regression.

This review will wait for D119927 to be merged, as it adds more tests.

Izaron updated this revision to Diff 416116.Mar 17 2022, 2:59 AM

Rebased the patch on top of D119927

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 2:59 AM
aaron.ballman added a reviewer: Restricted Project.Mar 17 2022, 4:13 AM

So P2025 has not successfully made it through EWG, so this would have to be under a 'flag'. Also, I believe this will end up being applied to C++23, so it would have to be under a C++23 flag, even when we make it a default behavior.

I don't have a good feeling on the implementation (I've not looked well into this stuff), but those two are necessary.

So P2025 has not successfully made it through EWG, so this would have to be under a 'flag'. Also, I believe this will end up being applied to C++23, so it would have to be under a C++23 flag, even when we make it a default behavior.

I felt like this particular patch don't really need to wait until the paper make it to C++XX.

RVO (unnamed return value optimization), a simpler optimization, has been used for a very very long time, before they made it mandatory in C++17 (effectively just describing the status quo in the Standard).

The paper contains an exhaustive set of NRVO use-cases. 13 out of 20 cases are already implemented in Clang, and the current patch makes it 17 out of 20. I could send a patch without mentioning the paper at all, but it would be harder to track progress.

Let me explain a bit more =)

The optimizations from this patch are allowed by the Standard (always have been allowed). So there is no need to restrict it under a flag or C++ version.

@erichkeane your comment indeed would be applicable if I allowed NRVO for non-movable types, that is currently prohibited by the Standard, but allowed in the proposal. Luckily my patch doesn't do this.

I've looked into the details as much as I can and I don't find explicit errors.


It is better to add tests in clang/unittest by using VarDecl::isNRVOVariable().

clang/include/clang/Sema/Scope.h
533–535

The suggested changes smell better.

ChuanqiXu added inline comments.Mar 17 2022, 10:41 PM
clang/include/clang/Sema/Scope.h
518

Firstly I am wondering why here doesn't check NRVO.getInt() to shortcut directly. But later I found it would change the logic in ::mergeNRVOIntoParent:

void Scope::mergeNRVOIntoParent() {
  if (VarDecl *Candidate = NRVO.getPointer()) {
    if (isDeclScope(Candidate))
      Candidate->setNRVOVariable(true);
  }
  ...

It would set NRVO for the candidate in NRVO if it is in current scope. With the context of addNRVOCandidate here, I could judge that the change would be:

X test(bool B) {
  X x; // before: no nrvo, after: no nrvo (same)
  if (B)
    return x;
  X y; // before: no nrvo, after: nrvo (better)
  return y; // Now NRVO.getInt()==true and NRVO.getPointer() == y;
}

Yeah, the behavior is still 'right'. y should be NRVO in this case. But the implementation smell bad, if NRVO.getInt() is true, we shouldn't do any thing. I am not sure if I state my points well. I mean the implementation might be correct, but it is hard to understand, read and maintain. It'd better to make the reader avoid doing mathmatics when reading the codes.

Izaron added inline comments.Mar 18 2022, 7:30 AM
clang/include/clang/Sema/Scope.h
518

I am not sure if I state my points well. I mean the implementation might be correct, but it is hard to understand, read and maintain. It'd better to make the reader avoid doing mathmatics when reading the codes.

I agree that it is really hard to understand and needs to be polished. It took long time for me to construct the code that won't break.

I think that one of the sources of the complexity is the NRVO variable itself.
If we could change

llvm::PointerIntPair<VarDecl *, 1, bool> NRVO;

to something like

VarDecl *NRVOCandidate;
bool InvalidatesParentNRVOCandidates;

And maybe rename setNoNRVO() to smth like invalidateNRVOCandidates and so on.

What do you think?

ChuanqiXu added inline comments.Mar 20 2022, 7:55 PM
clang/include/clang/Sema/Scope.h
518

From my understanding,

llvm::PointerIntPair<VarDecl *, 1, bool> NRVO;

is structurally equal to

VarDecl *NRVOCandidate;
bool InvalidatesParentNRVOCandidates;

so it doesn't change anything. My idea is that if it is possible to remove NRVOCandidatesInScope? Since there would be at most one NRVO candidate in one scope. So it should be possible to represent the NRVO candidate by one variable if we don't support nested scopes. If it makes sense? Then let me consider the case about nested scopes. For the nested scopes, the child scope is responsible to inform the parent scope that:

If any variable in the parent scope is an available NRVO candidate.

So the child should offer a set for unavailable NRVO candidates for the parent scope. So we could get the data structure:

llvm::SmallMap<VarDecl *, bool> NRVOCandidatesLattices;

NRVOCandidatesLattices[V] == true indicates that V is known available in current scope and all the child scopes. And NRVOCandidatesLattices[V] == false indicates that V is known unavailable in current scope and all the child scopes.

So we could get the implementation for addNRVOCandidate(VarDecl *VD):

if (VD not in NRVOCandidatesLattices) {
    NRVOCandidatesLattices[VD] = true; // We met the VD firstly, it should be available NRVO candidate.
    Update NRVOCandidatesLattices for other variables; // There should be only one available variables in NRVOCandidatesLattices.
}
else if (NRVOCandidatesLattices[VD])
    nothing;  // We met the available NRVO again, we shouldn't do anything;
else // NRVOCandidatesLattices[VD] == false;
    nothing;  // We've already know what happened.

And in mergeNRVOIntoParent , we could do:

VarDecl *VD = try to get the only one available NRVO candidate from UnavilableNRVOCandidates;
if (VD && isDeclScope(Candidate))
     set VD as NRVO candidate;

ParentScope->mergeUnavilableNRVOCandidates(UnavilableNRVOCandidates); // We could do lattice merges here. The implementation should be trivial.

The implementation looks good to me and easy to understand. Maybe we could add extra data structure to erase the iterations in mergeNRVOIntoParent and the first branch of addNRVOCandidate . But they are helpers and wouldn't make the framework hard to understand.

Hi!

Unfortunately I don't have time to finish this pull request, so please feel free to take it and get it done =)

(You may reuse the code from this PR or write a completely new implementation)

Hi!

Unfortunately I don't have time to finish this pull request, so please feel free to take it and get it done =)

(You may reuse the code from this PR or write a completely new implementation)

Ok. I'll continue this work.

rusyaev-roman commandeered this revision.Jul 24 2022, 1:47 PM
rusyaev-roman added a reviewer: Izaron.

Update the origianl implementation.

@ChuanqiXu , could you take a look again? I've updated the original implementation.

ChuanqiXu added inline comments.Jul 25 2022, 12:38 AM
clang/include/clang/Sema/Scope.h
223–224

Now NRVO has three states, None, nullptr and non-null value.But it doesn't show in comments and implementations.

clang/lib/Sema/Scope.cpp
133
rusyaev-roman added inline comments.Jul 25 2022, 1:40 PM
clang/include/clang/Sema/Scope.h
223–224

Actually it's used. I'll add additional comments for clarification.

clang/lib/Sema/Scope.cpp
133

nullptr should be considered below. I'll add a comment to demonstrate why it is.

Rebase and add additional comments.

@ChuanqiXu , I've added additional comments. Could you check again please?

ChuanqiXu added inline comments.Jul 25 2022, 7:21 PM
clang/lib/Sema/Scope.cpp
127–128

What if NRVO contains a value already? It is possible due to the value of NRVO could be set by its children.

155–156

There is a similar problem. It looks not right if the NRVO of the parent owns a value already.

rusyaev-roman added inline comments.Jul 25 2022, 10:29 PM
clang/lib/Sema/Scope.cpp
127–128

Actually this is intention. If the parent has already NRVO candidate, then it should be invalidated (or not). Let's consider the following examples:

X foo(bool b) {
   X x;
   X y;
   if (b)
      return x;
   else
      return y; // when we process this return statement, the parent has already NRVO and it will be invalidated (this is correct behavior)
}
X foo(bool b) {
   X x;
   if (b)
      return x;
   
   X y;
   // when we process this return statement, the parent has already NRVO and it WON't be invalidated
   //  (this is correct behavior), because a return slot will be available for it
   return y;
}
X foo(bool b) {
   X x;
   if (b)
      return x;

   // when we process this return statement, the parent has already NRVO and it WON't be invalidated (this is correct behavior)
   return x;
}
X foo(bool b, X x) {
   X y;
   
   if (b)
      return x;

   // when we process this return statement, the parent contains nullptr (invalid candidate) and it will be invalidated (this is correct behavior)
   return y;
}
X foo(bool b, X x) {
   if (b)
      return x;

   X y;
   // when we process this return statement, the parent contains nullptr (invalid candidate) and it WON't be invalidated (this is correct behavior)
   return y;
}
155–156

Yes, this is intention. You can take a look at the above comment.

ChuanqiXu added inline comments.Jul 25 2022, 11:01 PM
clang/lib/Sema/Scope.cpp
127–128

Oh, I see. Tricky. I don't find invalid cases now. But I recommend to comment that the children would maintain the ReturnSlots of their parents. (This is anti-intuition)

Have you tested any larger projects? Like libc++, libstdc++ or something like folly. I feel we need to do such tests to avoid we get anything wrong.

rusyaev-roman added inline comments.Jul 25 2022, 11:18 PM
clang/lib/Sema/Scope.cpp
127–128

I've already added a comment at the beginning of updateNRVOCandidate function where this point is mentioned:

//      ... Therefore, we need to clear return slots for other
//      variables defined before the current return statement in the current
//      scope and in outer scopes.

If it's not enough, please let me know.

Have you tested any larger projects?

Yes, I've built the clang itself and compiler-rt project. Then I've checked them to run 'check-all' (on built clang and compiler-rt). Everything works.

ChuanqiXu accepted this revision.Jul 25 2022, 11:22 PM
ChuanqiXu added inline comments.
clang/lib/Sema/Scope.cpp
127–128

Great! Clang should be large enough.

This revision is now accepted and ready to land.Jul 25 2022, 11:22 PM
rusyaev-roman added inline comments.Jul 25 2022, 11:28 PM
clang/lib/Sema/Scope.cpp
127–128

Thanks a lot for the careful review!

@ChuanqiXu , could you land this patch please?

Many thanks to @Izaron for the original implementation.

ChuanqiXu added inline comments.Jul 25 2022, 11:30 PM
clang/lib/Sema/Scope.cpp
127–128

Sure. What's your prefer Name and Mail address?

rusyaev-roman added inline comments.Jul 25 2022, 11:33 PM
clang/lib/Sema/Scope.cpp
127–128

Thanks!

Roman Rusyaev <rusyaev.rm@gmail.com>

ChuanqiXu added inline comments.Jul 25 2022, 11:39 PM
clang/lib/Sema/Scope.cpp
127–128

Oh, I forgot you need edit the ReleaseNotes at clang/docs/ReleaseNotes.rst

rusyaev-roman added inline comments.Jul 26 2022, 12:07 AM
clang/lib/Sema/Scope.cpp
127–128

I'm going to add a description in C++ Language Changes in Clang paragraph.

It will look like:

- Improved ``copy elision` optimization. It's possible to apply ``NRVO`` for an object if at the moment when
  any return statement of this object is executed, the ``return slot`` won't be occupied by another object.

Is it OK for you?

Add a release note.

ChuanqiXu added inline comments.Jul 26 2022, 12:16 AM
clang/lib/Sema/Scope.cpp
127–128

According to https://github.com/cplusplus/papers/issues/756, I would like to put this in C++2b Feature Support section. Although we don't add constraints (C++ std >= 23) to do this optimization, this is a C++23 feature to C++ standard.

@ChuanqiXu , the release notes were updated. Could you check and merge please?

rusyaev-roman added inline comments.Jul 26 2022, 12:18 AM
clang/lib/Sema/Scope.cpp
127–128

Actually this optimization is just an improvement of existing NRVO optimization in term of existing standard. This optimization doesn't implement the proposal itself and can be done without additional flags

rusyaev-roman added inline comments.Jul 26 2022, 12:21 AM
clang/lib/Sema/Scope.cpp
127–128

This is just the first step to support this proposal. All changes in the current patch are allowed by Standard before.

rusyaev-roman added inline comments.Jul 26 2022, 12:26 AM
clang/lib/Sema/Scope.cpp
127–128

So, I think the best place for the description of these changes in release notes is C++ Language Changes in Clang paragraph, because this change is improvement and can be done without context of mentioned proposal. What do you think?

ChuanqiXu added inline comments.Jul 26 2022, 12:28 AM
clang/lib/Sema/Scope.cpp
127–128

Got it, your words make sense.

rusyaev-roman added inline comments.Jul 26 2022, 1:34 AM
clang/lib/Sema/Scope.cpp
127–128

So, I think we are on the same page.
Could you land this patch please?

ChuanqiXu added inline comments.Jul 26 2022, 2:08 AM
clang/lib/Sema/Scope.cpp
127–128

Yeah, I am going to land the patch today and my working pipeline is full now : )

This revision was landed with ongoing or failed builds.Jul 26 2022, 3:57 AM
This revision was automatically updated to reflect the committed changes.