Page MenuHomePhabricator

Update NRVO logic to support early return
ClosedPublic

Authored by tzik on May 18 2018, 7:12 AM.

Details

Summary

The previous implementation misses an opportunity to apply NRVO (Named Return Value
Optimization) below. That discourages user to write early return code.

struct Foo {};

Foo f(bool b) {
  if (b)
    return Foo();
  Foo oo;
  return oo;
}

That is, we can/should apply RVO for a local variable if:

  • It's directly returned by at least one return statement.
  • And, all reachable return statements in its scope returns the variable directly.

While, the previous implementation disables the RVO in a scope if there are multiple return
statements that refers different variables.

On the new algorithm, local variables are in NRVO_Candidate state at first, and a return
statement changes it to NRVO_Disabled for all visible variables but the return statement refers.
Then, at the end of the function AST traversal, NRVO is enabled for variables in NRVO_Candidate
state and refers from at least one return statement.

Diff Detail

Repository
rC Clang

Event Timeline

tzik created this revision.May 18 2018, 7:12 AM
tzik edited the summary of this revision. (Show Details)May 18 2018, 7:13 AM
tzik changed the visibility from "Public (No Login Required)" to "No One".
tzik removed a subscriber: cfe-commits.
tzik changed the visibility from "No One" to "Public (No Login Required)".May 18 2018, 10:45 AM
tzik updated this revision to Diff 147967.May 22 2018, 2:33 AM

Fix NRVO on template code, and fix a nested scope case

tzik updated this revision to Diff 148002.May 22 2018, 7:36 AM

comment fix

tzik edited the summary of this revision. (Show Details)May 22 2018, 7:38 AM
tzik added a reviewer: rsmith.
tzik added subscribers: cfe-commits, arthur.j.odwyer.
Quuxplusone added inline comments.
lib/Sema/SemaDecl.cpp
12757–12758

What is the purpose of this change?
If it's "no functional change" it should be done separately IMHO; if it is supposed to change codegen, then it needs some codegen tests. (From looking at the code: maybe this affects codegen on functions that return member function pointers by value?)

test/CodeGenCXX/nrvo.cpp
139

You've changed this function from testing one thing with a FIXME, to testing a completely different thing. Could you add your new code as a new function, and leave the old FIXME alone until it's fixed?
Alternatively, if your patch actually fixes the FIXME, could you just replace the FIXME comment with a CHECK and leave the rest of this function alone?

254

Just for my own curiosity: this new test case is surely unaffected by this patch, right?

tzik added inline comments.May 22 2018, 10:12 PM
lib/Sema/SemaDecl.cpp
12757–12758

I think the previous implementation was incorrect.
Though computeNRVO clears ReturnStmt::NRVOCandidate when the corresponding variable is not NRVO variable, CGStmt checks both of ReturnStmt::NRVOCandidate and VarDecl::NRVOVariable anyway.
So, computeNRVO took no effect in the previous code, and the absence of computeNRVO here on function templates did not matter.
Note that there was no chance to call computeNRVO on function template elsewhere too.

OTOH in the new implementation, computeNRVO is necessary, and its absence on function templates matters.

We can remove computeNRVO here separately as a NFC patch and readd the new impl to the same place, but it's probably less readable, IMO.

test/CodeGenCXX/nrvo.cpp
139

My patch fixes the FIXME.
However, on the resulting code of NRVOing,

X y;
return y;

and

X x;
return x;

get to the same code and unified. And, the function is simplified as

X test3(bool B) {
  X x;
  return x;
}

without the if statement.
So, there will be nothing to CHECK left here if I leave the code as-is. I think that does not fit to the test scenario.

254

Hm, this is not so important anymore. This was to check if NRVO is working with function templates on the existing code, as computeNRVO was not called on them. And also this covered a regression in a draft patch, that didn't work on function templates.

Thanks! This looks like exactly the right way to compute when to apply NRVO. It'd be good to track (in your commit message at least) that this addresses PR13067.

lib/Sema/SemaDecl.cpp
12757–12758

What happens if we can't tell whether we have an NRVO candidate or not until instantiation:

template<typename T> X f() {
  T v;
  return v; // is an NRVO candidate if T is X, otherwise is not
}

(I think this is not hard to handle: the dependent construct here can only affect whether you get NRVO at all, not which variable you perform NRVO on, but I want to check that it is handled properly.)

test/CodeGenCXX/nrvo.cpp
139

... and this is one reason why we don't generally like clang's IR generation tests to enable optimizations. Please consider starting a new test file to test the output from clang's frontend rather than the output from -O1.

Quuxplusone added inline comments.May 24 2018, 12:34 AM
lib/Sema/SemaDecl.cpp
12757–12758

Ah, so if I understand correctly — which I probably don't —...
the parts of this diff related to isRecordType() make NRVO more reliable on template functions? Because the existing code has lots of checks for isRecordType() which means that they don't run for dependent (as-yet-unknown) types, even if those types are going to turn out to be record types occasionally?

I see that line 12838 runs computeNRVO unconditionally for *all* ObjCMethodDecls, even ones with scalar return types. So maybe running computeNRVO unconditionally right here would also be correct?
Datapoint: I made a patch to do nothing but replace this condition with if (Body) and to remove the similar condition guarding computeNRVO in Sema::ActOnBlockStmtExpr. It passed the Clang test suite with flying colors. I don't know if this means that the test suite is missing some tests, or if it means that the conditions are unnecessary.

IMHO if you're changing this anyway, it would be nice to put the remaining condition/optimization if (getReturnType().isScalarType()) return; inside computeNRVO itself, instead of repeating that condition at every call site.

tzik updated this revision to Diff 148417.May 24 2018, 8:08 AM

Add AST test. computeNRVO unconditionally.

tzik added inline comments.May 24 2018, 8:09 AM
lib/Sema/SemaDecl.cpp
12757–12758

v here is marked as an NRVO variable in its function template AST, and that propagates to the instantiation if the return type is compatible.
https://github.com/llvm-mirror/clang/blob/9c82d4ff6015e4477e924c8a495a834c2fced29e/lib/Sema/SemaTemplateInstantiateDecl.cpp#L743

12757–12758

the parts of this diff related to isRecordType() [...]

Right, they look not RecordTypes in the template context.

Because the existing code has lots of checks for isRecordType() [...]

Hm, maybe. Some of isRecordType don't have dependent type handling around them...

running computeNRVO unconditionally right here would also be correct?

Updated. Agree, scalar types should work, IIUC.

test/CodeGenCXX/nrvo.cpp
139

OK, I added one.

Some minor nits. I would also like to see a test for the unoptimized (-O0) IR generated by Clang for the case where there are two NRVO variables in the same function. Otherwise, this looks good to go!

lib/Sema/Scope.cpp
122–123

* on the right, please. Also auto -> Decl would be clearer and no longer. Is dyn_cast_or_null really necessary? (Can DeclsInScope contain nullptr?) I would expect that just dyn_cast would suffice.

lib/Sema/SemaDecl.cpp
12620–12624

* on the right, please.

tzik updated this revision to Diff 148570.May 25 2018, 2:57 AM
tzik marked 2 inline comments as done.

style fix. -O0 IR test.

tzik added inline comments.May 25 2018, 4:10 AM
lib/Sema/Scope.cpp
122–123

Done! Right, contents in DeclsInScope should be non-null, and dyn_cast should work well.

lib/Sema/SemaDecl.cpp
12620–12624

Done

xbolva00 added inline comments.
lib/Sema/Scope.cpp
128–129

auto * Parent = getParent();
if (Parent)

Parent>setNRVOCandidate(Candidate);

?

tzik updated this revision to Diff 148603.May 25 2018, 7:13 AM

reuse getParent() result

tzik added inline comments.May 25 2018, 7:16 AM
lib/Sema/Scope.cpp
128–129

Updated!

Just commenting to say that this LGTM and I have no further nitpicks. I have verified that I cannot detect any change in the behavior of -Wpessimizing-move or -Wreturn-std-move due to this change (and I can successfully detect the additional copy-elision due to this change, hooray).

rsmith accepted this revision.May 29 2018, 5:19 PM
This revision is now accepted and ready to land.May 29 2018, 5:19 PM

Thank you, do you need someone to commit this for you?

This revision was automatically updated to reflect the committed changes.
tzik added a comment.May 29 2018, 8:58 PM

Thank you, do you need someone to commit this for you?

No, I recently got the commit access to the repository.

Unfortunately this seems to miscompile, the stage1 built clang is crashing on the multistage buildbots:

http://lab.llvm.org:8011/builders/clang-s390x-linux-multistage/builds/2926 shows this crash present already at 333500
I've locally verified the crash and that that reverting this patch fixes it.

I'm going to revert it.

tzik added a comment.May 30 2018, 8:14 AM

Unfortunately this seems to miscompile, the stage1 built clang is crashing on the multistage buildbots:

http://lab.llvm.org:8011/builders/clang-s390x-linux-multistage/builds/2926 shows this crash present already at 333500
I've locally verified the crash and that that reverting this patch fixes it.

I'm going to revert it.

Ah... Thanks for taking care.

One of the sanitizer bots has more useful information on the problem: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/5743

Based on the ASan output, it looks like the miscompile is probably happening in SparseSolver<LatticeKey, LatticeVal, KeyInfo>::getValueState(LatticeKey) at around include/llvm/Analysis/SparsePropagation.h:240:

template <class LatticeKey, class LatticeVal, class KeyInfo>
LatticeVal
SparseSolver<LatticeKey, LatticeVal, KeyInfo>::getValueState(LatticeKey Key) {
  auto I = ValueState.find(Key);
  if (I != ValueState.end())
    return I->second; // Common case, in the map

  if (LatticeFunc->IsUntrackedValue(Key))
    return LatticeFunc->getUntrackedVal();
  LatticeVal LV = LatticeFunc->ComputeLatticeVal(Key);

  // If this value is untracked, don't add it to the map.
  if (LV == LatticeFunc->getUntrackedVal())
    return LV;
  return ValueState[Key] = LV;
}

I'm guessing we somehow get confused about whether LV is an NRVO variable during template instantiation, and apply NRVO to it despite there being a return of something else in its scope.

Slightly reduced testcase:

template<typename T> T f(T u, bool b) {
  T t;
  if (b) return t;
  return u;
}
struct X { X(); X(const X&); ~X(); void *data; };

template X f(X, bool);

... which compiles to:

X f(X u, bool b) {
  X::X(&return-slot);
  if (b) return;
  X::X(&return-slot, u);
  X::~X(&return-slot);
}

... due to t incorrectly being used as an NRVO variable.

tzik added a comment.May 31 2018, 5:41 AM

Slightly reduced testcase:

template<typename T> T f(T u, bool b) {
  T t;
  if (b) return t;
  return u;
}
struct X { X(); X(const X&); ~X(); void *data; };

template X f(X, bool);

... which compiles to:

X f(X u, bool b) {
  X::X(&return-slot);
  if (b) return;
  X::X(&return-slot, u);
  X::~X(&return-slot);
}

... due to t incorrectly being used as an NRVO variable.

Thanks for your investigation!
I thought that's covered by Scope::setNRVOCandidate with a null Candidate. As t is not marked as an NRVO variable before its instantiation. There's likely something I missed that turn the flag on while its instantiation.