This is an archive of the discontinued LLVM Phabricator instance.

[LifetimeAnalysis] Detect more cases when the address of a local variable escapes
AbandonedPublic

Authored by xazax.hun on Aug 20 2019, 10:22 AM.

Details

Reviewers
gribozavr
mgehre
Summary

This patch relaxes some of the checks so we can detect more cases where local variable escapes.

Diff Detail

Event Timeline

xazax.hun created this revision.Aug 20 2019, 10:22 AM

This change might be the cause for the false-positive in https://github.com/mgehre/llvm-project/issues/45. Could you check?

This change might be the cause for the false-positive in https://github.com/mgehre/llvm-project/issues/45. Could you check?

Sure, looking into it! Thanks for pointing this out.

Ok, I think I know what is going on.

Basically, we have a MutableArrayRef which is NOT a pointer as it is NOT annotated as gsl::Pointer. So in the AST we see a local non-pointer object, and we derive a pointer from this local object. This is why the warnings thinks that we return the address of a local variable. In case we do annotate MutableArrayRef to be a Pointer we will no longer see the warning.

Do you think we should stop the tracking after seeing a DerivedToBase cast? Or should we continue to report those errors so the users can add the missing annotation? I feel like this is something that is potentially up for a debate.,

gribozavr added inline comments.Aug 21 2019, 2:14 AM
clang/lib/Sema/SemaInit.cpp
6775

I'm afraid this change could disable some other analysis, which would hide other lifetime issues. For example, while 'ptr' can't itself dangle, if 'Temp().ptr' is bound to a local reference, it might be subject to complex lifetime extension rules, which this warning originally tried to check for.

6897

Same here.

xazax.hun marked an inline comment as done.Aug 21 2019, 8:59 AM
xazax.hun added inline comments.
clang/lib/Sema/SemaInit.cpp
6775

I understand your concern, and I thin this code path is very tricky as it is doing multiple things at once: lifetime extension and emitting warnings. There are two cases when pathOnlyInitializesGslPointer(Path) returns true:

  • We are initializing a gsl::Pointer, so no references involved, we should not do lifetime extension.
  • We are visiting the initialization of a reference (like walking a def-use chain) that was used to

In the second case, the reference initialization was already visited earlier (when we first saw the initialization and did not follow any use of the reference). In this earlier instance, since the reference is not a gsl::Pointer, the call to pathOnlyInitializesGslPointer is evaluated to false, and we did the lifetime extension.

To observe this behavior you can dump the AST for the following code:

namespace std {
template<typename T>
struct reference_wrapper {
  reference_wrapper(T &&);
};

template<typename T>
reference_wrapper<T> ref(T& t) noexcept;
}

std::reference_wrapper<const int> treatForwardingRefAsLifetimeConst() {
  const int &b = int(6);
  return std::ref(b); 
}

The result for me is:

`-FunctionDecl 0xfa5828 <line:34:1, line:37:1> line:34:35 treatForwardingRefAsLifetimeConst 'std::reference_wrapper<const int> ()'
  `-CompoundStmt 0xfa7848 <col:71, line:37:1>
    |-DeclStmt 0xfa5e58 <line:35:3, col:24>
    | `-VarDecl 0xfa5cf0 <col:3, col:23> col:14 used b 'const int &' cinit
    |   `-ExprWithCleanups 0xfa5de8 <col:18, col:23> 'const int' lvalue
    |     `-MaterializeTemporaryExpr 0xfa5db8 <col:18, col:23> 'const int' lvalue extended by Var 0xfa5cf0 'b' 'const int &'
    |       `-CXXFunctionalCastExpr 0xfa5d90 <col:18, col:23> 'int' functional cast to int <NoOp>
    |         `-IntegerLiteral 0xfa5d70 <col:22> 'int' 6
    `-ReturnStmt 0xfa7838 <line:36:3, col:20>
      `-ExprWithCleanups 0xfa7820 <col:10, col:20> 'std::reference_wrapper<const int>':'std::reference_wrapper<const int>'
        `-CXXConstructExpr 0xfa77f0 <col:10, col:20> 'std::reference_wrapper<const int>':'std::reference_wrapper<const int>' 'void (std::reference_wrapper<const int> &&) noexcept' elidable
          `-MaterializeTemporaryExpr 0xfa7798 <col:10, col:20> 'reference_wrapper<const int>':'std::reference_wrapper<const int>' xvalue
            `-CallExpr 0xfa7300 <col:10, col:20> 'reference_wrapper<const int>':'std::reference_wrapper<const int>'
              |-ImplicitCastExpr 0xfa72e8 <col:10, col:15> 'reference_wrapper<const int> (*)(const int &) noexcept' <FunctionToPointerDecay>
              | `-DeclRefExpr 0xfa7250 <col:10, col:15> 'reference_wrapper<const int> (const int &) noexcept' lvalue Function 0xfa7150 'ref' 'reference_wrapper<const int> (const int &) noexcept' (FunctionTemplate 0xfa5500 'ref')
              `-DeclRefExpr 0xfa5ed8 <col:19> 'const int' lvalue Var 0xfa5cf0 'b' 'const int &'

Please let me know if you had a different example in mind.

(Also, this unfortunately is a false negative example for now, as we tend to start with the less risky warnings :))

In the false-positive example, after the DerivedToBase, we see a constructor call which I think is the copy constructor.

  1. We should consider MutableArrayRef to be a gsl::Pointer according to the paper, because it publicly derives from one.
  2. Also in the paper, the copy constructor does should copies the pset of the argument instead of making the pointer point at the argument.

What do you think?

In the false-positive example, after the DerivedToBase, we see a constructor call which I think is the copy constructor.

Yeah, but we do not have problems with the copy constructors.

  1. We should consider MutableArrayRef to be a gsl::Pointer according to the paper, because it publicly derives from one.

So as soon as the user annotate a type as gsl::Pointer, this annotation should be applied to all derived classes? It would solve this particular issue. Also it feels a bit weird to change the ownership semantics in a derived class, I bet that would violate the Liskov substitution principle.

  1. Also in the paper, the copy constructor does should copies the pset of the argument instead of making the pointer point at the argument.

This is exactly what the warnings are doing now. If the first gsl::Pointer is constructed from a second gsl::Pointer, we will just check if the second gsl::Pointer will dangle at the and of the expression and so on.

Also it feels a bit weird to change the ownership semantics in a derived class, I bet that would violate the Liskov substitution principle.

And then we see that llvm::OwningArrayRef inherits from llvm::ArrayRef ... But maybe this direction (strengthening a Pointer into an Owner)
is okay. We'll need to go though the call rules, and see if something breaks when the caller passes an Owner to a function that takes a base class (i.e. Pointer).
My initial reaction is that this should work in general. An Owner is like a Pointer that points to {static}.

Also it feels a bit weird to change the ownership semantics in a derived class, I bet that would violate the Liskov substitution principle.

And then we see that llvm::OwningArrayRef inherits from llvm::ArrayRef ... But maybe this direction (strengthening a Pointer into an Owner)
is okay.

I am not sure. Consider the following code:

ArrayRef f() {
  ArrayRef r = getRef();
  return r;
}

According to the substitution principle I should be able to replace r with a derived class:

ArrayRef f() {
  OwningArrayRef r = getOwningRef();
  return r;
}

But this will introduce a lifetime issue.

When I understand you correctly, you are thinking about
the cases

OwningArrayRef getRef();

ArrayRef f() {
  ArrayRef r = getRef(); // warning: r points into temporary owner
  return r;
}

and

ArrayRef getRef() {
  OwningArrayRef r;
  return r;  // warning: returning reference to local variable
}

which we will both diagnose correctly with the current analysis. In which case would our analysis have problems with the fact that ArrayRef is a Pointer and the derived OwningArrayRef is a Owner?

Yeah, the analysis would work fine in this case. But that would mean that we should not propagate the Pointer annotations to derived classes as some of them in the wild do not follow the same ownership model.

Yes, it means that the automatic inference of Pointer/Owner from base classes has the same issues as all of those automatic inferences: Once can construct a case where they are not true.
So for having no false-positives at all, we should avoid this inference by default and not consider MutableArrayRef to be a gsl::Pointer.
Instead, we disable the analysis when it sees an unannotated class (like here, MutableArrayRef).

Eventually, we can explicitly add the correct annotations to the MutableArrayRef and OwningArrayRef source code,
and provide a command line option to opt into this kind of inference with possible false-positives.

Sounds good! Let's do that :)

gribozavr added inline comments.Aug 26 2019, 6:13 AM
clang/lib/Sema/SemaInit.cpp
6775

Thank you for the explanation. I don't feel I understand this code enough to accept such a subtle explanation for a change with potentially far-reaching consequences.

The "path" abstraction that this analysis is designed to catch problems with lifetime extension, and so far it seemed like we could tack our analysis onto it, but now it looks like the wrong abstraction for this purpose.

If you can find another reviewer who is more comfortable with this code, and they accept the patch, I have no objections though.

xazax.hun abandoned this revision.Aug 26 2019, 2:57 PM

It looks like this solution is not going to work in general. The problem is that it can be really hard to tell when it is valid to create a gsl::Pointer from an unannotated local type.

For example the code below is definitely buggy:

reference_wrapper<int> f() {
  int i;
  return i; // Invalid!
}

While the code below can be safe:

some_iterator f() {
  MyUnannotatedSpan local = ...;
  return std::begin(local); // this is fine
}

Note that, this problem will be solved once we have function annotations, as each constructor can be annotated what it actually does.