Before the patch we calculated the NRVO candidate looking at the
variable's whole enclosing scope. The research in [P2025] shows that looking
at the variable's potential scope is better and covers more cases where NRVO
would be safe and desirable.
Details
- Reviewers
rsmith mizvekov Izaron ChuanqiXu - Group Reviewers
Restricted Project - Commits
- rGfec5ff2a3230: [Clang] [P2025] Analyze only potential scopes for NRVO
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There is a nice proposal (P2025) Guaranteed copy elision for return variables by Anton Zhilin. The poll on the proposal showed that its ideas are very welcome: link to cplusplus/papers github. Although the proposal is not yet accepted into the Standard in its current wording, some ideas are still good and can be implemented "in advance".
This proposal aims to provide guaranteed copy elision for common cases of local variables being returned from a function, a.k.a. "guaranteed NRVO".
С++ сompiler implementations determine by their own rules whether there will be a NRVO (since it is not a required optimization). The proposal contains a collection of common cases of local variables being returned where copy elision is safe and makes senses and therefore is desirable to do.
The main requirement (omitting details for language-lawyers) is that:
Copy elision is guaranteed for return x; if every return "seen" by x is return x;
"seen by x" means here "all non-discarded return statements in x's potential scope". There are more details in the proposal.
The collection of common cases contains 20 examples: §4.1. Examples.
Here is the current status of these examples:
- [OK] 13 out of 20 examples are working in Clang as expected.
- [FAIL] 13th example: should be considered separately (as part of fixing consteval code)
- [FAIL] 14th example: should be considered separately (as I haven't looked yet how CXXCatchStmt works).
- [FAIL] 18th example: is unfixable now because of Clang's architecture: my comment on the issue.
- [OK with the patch] 7th, 8th, 11th, 15th example: are working with this patch.
In order to make the last group of 4 examples working, there is a need to rewrite the logic for calculating the NRVO candidate.
The clang::Scope class has methods void AddDecl(Decl *D), void addNRVOCandidate(VarDecl *VD), void setNoNRVO(), that are called in the order of the scope's parsing.
- void AddDecl(Decl *D) is called when there is a new Decl *D in the scope. The D's potential scope starts now.
- void addNRVOCandidate(VarDecl *VD) is called when there is a return <named-variable> in the scope or when a children scope has successfully calculated its single NRVO candidate.
- void setNoNRVO() is called when there is a return <not-a-named-var> in the scope or when a children scope is telling us to drop our NRVO candidates (nevertheless, the children scope now can still succesfully calculated the NRVO candidate).
We need to have a set of "unspoiled variables" to find the NRVO candidate effectively (yeah, introducing some made up terminology...) A variable is "unspoiled" if it is not forbidden to be the NRVO candidate yet. An AddDecl call adds the variable to this set. An addNRVOCandidate call "spoils" every variable except VD. A setNoNRVO "spoils" every variable. Only an "unspoiled variable" may be the NRVO candidate.
Cases that show the difference (they're covered in tests, though do we need an AST test as well?):
X test(bool B) { if (B) { X y; // before: nrvo, after: nrvo (same) return y; } X x; // before: no nrvo, after: nrvo (better) return x; }
X test(bool B) { X x; // before: no nrvo, after: no nrvo (same) if (B) return x; X y; // before: no nrvo, after: nrvo (better) return y; }
X test(bool A, bool B) { { { X x; // before: nrvo, after: nrvo (same) if (A) return x; } X y; // before: no nrvo, after: nrvo (better) if (B) return y; } X z; retur n z; // before: no nrvo, after: nrvo (better) }
The code is above my pay grade, but FWIW, I super support the intent of this patch! Let's get p2025 support into Clang! :)
The collection of common cases contains 20 examples: §4.1. Examples. Here is the current status of these examples:
[OK] 13 out of 20 examples are working in Clang as expected.
[FAIL] 13th example: should be considered separately (as part of fixing consteval code)
[FAIL] 14th example: should be considered separately (as I haven't looked yet how CXXCatchStmt works).
[FAIL] 18th example: is unfixable now because of Clang's architecture: my comment on the issue.
[OK with the patch] 7th, 8th, 11th, 15th example: are working with this patch.
I think it would be an extremely good idea to commit all 20 of these test cases to clang/test/CodeGenCXX/, with their current behavior, as a preliminary patch. Then, D119792 can more clearly show (1) what behavior it's changing, (2) what behavior it's keeping the same, and (3) the fact that it's not regressing any behavior. Also, you'll help future-maintainers by giving them some extra test cases that have already been identified as interesting, even if you personally aren't changing behavior related to those particular test cases.
(I recently took the same tactic with D119772 as a preliminary for D119184 + D119778, and it was very helpful, at least to me. :))
That's a superb idea, thanks! I will soon add all the cases to clang/test/CodeGenCXX/nrvo.cpp in an isolated patch. It indeed will explicitly show improvements in future patches and prevent silent NRVO regression.
So P2025 has not successfully made it through EWG, so this would have to be under a 'flag'. Also, I believe this will end up being applied to C++23, so it would have to be under a C++23 flag, even when we make it a default behavior.
I don't have a good feeling on the implementation (I've not looked well into this stuff), but those two are necessary.
I felt like this particular patch don't really need to wait until the paper make it to C++XX.
RVO (unnamed return value optimization), a simpler optimization, has been used for a very very long time, before they made it mandatory in C++17 (effectively just describing the status quo in the Standard).
The paper contains an exhaustive set of NRVO use-cases. 13 out of 20 cases are already implemented in Clang, and the current patch makes it 17 out of 20. I could send a patch without mentioning the paper at all, but it would be harder to track progress.
Let me explain a bit more =)
The optimizations from this patch are allowed by the Standard (always have been allowed). So there is no need to restrict it under a flag or C++ version.
@erichkeane your comment indeed would be applicable if I allowed NRVO for non-movable types, that is currently prohibited by the Standard, but allowed in the proposal. Luckily my patch doesn't do this.
I've looked into the details as much as I can and I don't find explicit errors.
It is better to add tests in clang/unittest by using VarDecl::isNRVOVariable().
clang/include/clang/Sema/Scope.h | ||
---|---|---|
555–557 | The suggested changes smell better. |
clang/include/clang/Sema/Scope.h | ||
---|---|---|
540–541 | Firstly I am wondering why here doesn't check NRVO.getInt() to shortcut directly. But later I found it would change the logic in ::mergeNRVOIntoParent: void Scope::mergeNRVOIntoParent() { if (VarDecl *Candidate = NRVO.getPointer()) { if (isDeclScope(Candidate)) Candidate->setNRVOVariable(true); } ... It would set NRVO for the candidate in NRVO if it is in current scope. With the context of addNRVOCandidate here, I could judge that the change would be: X test(bool B) { X x; // before: no nrvo, after: no nrvo (same) if (B) return x; X y; // before: no nrvo, after: nrvo (better) return y; // Now NRVO.getInt()==true and NRVO.getPointer() == y; } Yeah, the behavior is still 'right'. y should be NRVO in this case. But the implementation smell bad, if NRVO.getInt() is true, we shouldn't do any thing. I am not sure if I state my points well. I mean the implementation might be correct, but it is hard to understand, read and maintain. It'd better to make the reader avoid doing mathmatics when reading the codes. |
clang/include/clang/Sema/Scope.h | ||
---|---|---|
540–541 |
I agree that it is really hard to understand and needs to be polished. It took long time for me to construct the code that won't break. I think that one of the sources of the complexity is the NRVO variable itself. llvm::PointerIntPair<VarDecl *, 1, bool> NRVO; to something like VarDecl *NRVOCandidate; bool InvalidatesParentNRVOCandidates; And maybe rename setNoNRVO() to smth like invalidateNRVOCandidates and so on. What do you think? |
clang/include/clang/Sema/Scope.h | ||
---|---|---|
540–541 | From my understanding, llvm::PointerIntPair<VarDecl *, 1, bool> NRVO; is structurally equal to VarDecl *NRVOCandidate; bool InvalidatesParentNRVOCandidates; so it doesn't change anything. My idea is that if it is possible to remove NRVOCandidatesInScope? Since there would be at most one NRVO candidate in one scope. So it should be possible to represent the NRVO candidate by one variable if we don't support nested scopes. If it makes sense? Then let me consider the case about nested scopes. For the nested scopes, the child scope is responsible to inform the parent scope that: If any variable in the parent scope is an available NRVO candidate. So the child should offer a set for unavailable NRVO candidates for the parent scope. So we could get the data structure: llvm::SmallMap<VarDecl *, bool> NRVOCandidatesLattices; NRVOCandidatesLattices[V] == true indicates that V is known available in current scope and all the child scopes. And NRVOCandidatesLattices[V] == false indicates that V is known unavailable in current scope and all the child scopes. So we could get the implementation for addNRVOCandidate(VarDecl *VD): if (VD not in NRVOCandidatesLattices) { NRVOCandidatesLattices[VD] = true; // We met the VD firstly, it should be available NRVO candidate. Update NRVOCandidatesLattices for other variables; // There should be only one available variables in NRVOCandidatesLattices. } else if (NRVOCandidatesLattices[VD]) nothing; // We met the available NRVO again, we shouldn't do anything; else // NRVOCandidatesLattices[VD] == false; nothing; // We've already know what happened. And in mergeNRVOIntoParent , we could do: VarDecl *VD = try to get the only one available NRVO candidate from UnavilableNRVOCandidates; if (VD && isDeclScope(Candidate)) set VD as NRVO candidate; ParentScope->mergeUnavilableNRVOCandidates(UnavilableNRVOCandidates); // We could do lattice merges here. The implementation should be trivial. The implementation looks good to me and easy to understand. Maybe we could add extra data structure to erase the iterations in mergeNRVOIntoParent and the first branch of addNRVOCandidate . But they are helpers and wouldn't make the framework hard to understand. |
Hi!
Unfortunately I don't have time to finish this pull request, so please feel free to take it and get it done =)
(You may reuse the code from this PR or write a completely new implementation)
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Actually this is intention. If the parent has already NRVO candidate, then it should be invalidated (or not). Let's consider the following examples: X foo(bool b) { X x; X y; if (b) return x; else return y; // when we process this return statement, the parent has already NRVO and it will be invalidated (this is correct behavior) } X foo(bool b) { X x; if (b) return x; X y; // when we process this return statement, the parent has already NRVO and it WON't be invalidated // (this is correct behavior), because a return slot will be available for it return y; } X foo(bool b) { X x; if (b) return x; // when we process this return statement, the parent has already NRVO and it WON't be invalidated (this is correct behavior) return x; } X foo(bool b, X x) { X y; if (b) return x; // when we process this return statement, the parent contains nullptr (invalid candidate) and it will be invalidated (this is correct behavior) return y; } X foo(bool b, X x) { if (b) return x; X y; // when we process this return statement, the parent contains nullptr (invalid candidate) and it WON't be invalidated (this is correct behavior) return y; } | |
155–156 | Yes, this is intention. You can take a look at the above comment. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Oh, I see. Tricky. I don't find invalid cases now. But I recommend to comment that the children would maintain the ReturnSlots of their parents. (This is anti-intuition) Have you tested any larger projects? Like libc++, libstdc++ or something like folly. I feel we need to do such tests to avoid we get anything wrong. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | I've already added a comment at the beginning of updateNRVOCandidate function where this point is mentioned: // ... Therefore, we need to clear return slots for other // variables defined before the current return statement in the current // scope and in outer scopes. If it's not enough, please let me know.
Yes, I've built the clang itself and compiler-rt project. Then I've checked them to run 'check-all' (on built clang and compiler-rt). Everything works. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Great! Clang should be large enough. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Thanks a lot for the careful review! @ChuanqiXu , could you land this patch please? Many thanks to @Izaron for the original implementation. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Sure. What's your prefer Name and Mail address? |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Thanks! Roman Rusyaev <rusyaev.rm@gmail.com> |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Oh, I forgot you need edit the ReleaseNotes at clang/docs/ReleaseNotes.rst |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | I'm going to add a description in C++ Language Changes in Clang paragraph. It will look like: - Improved ``copy elision` optimization. It's possible to apply ``NRVO`` for an object if at the moment when any return statement of this object is executed, the ``return slot`` won't be occupied by another object. Is it OK for you? |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | According to https://github.com/cplusplus/papers/issues/756, I would like to put this in C++2b Feature Support section. Although we don't add constraints (C++ std >= 23) to do this optimization, this is a C++23 feature to C++ standard. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Actually this optimization is just an improvement of existing NRVO optimization in term of existing standard. This optimization doesn't implement the proposal itself and can be done without additional flags |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | This is just the first step to support this proposal. All changes in the current patch are allowed by Standard before. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | So, I think the best place for the description of these changes in release notes is C++ Language Changes in Clang paragraph, because this change is improvement and can be done without context of mentioned proposal. What do you think? |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Got it, your words make sense. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | So, I think we are on the same page. |
clang/lib/Sema/Scope.cpp | ||
---|---|---|
127–128 | Yeah, I am going to land the patch today and my working pipeline is full now : ) |
Now NRVO has three states, None, nullptr and non-null value.But it doesn't show in comments and implementations.