This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Differentiate lifetime extended temporaries
ClosedPublic

Authored by tomasz-kaminski-sonarsource on May 24 2023, 6:10 AM.

Details

Summary

This patch introduces a new CXXLifetimeExtendedObjectRegion as a representation
of the memory for the temporary object that is lifetime extended by the reference
to which they are bound.

This separation provides an ability to detect the use of dangling pointers
(either binding or dereference) in a robust manner.
For example, the ref is conditionally dangling in the following example:

template<typename T>
T const& select(bool cond, T const& t, T const& u) { return cond ? t : u; }

int const& le = Composite{}.x;
auto&& ref = select(cond, le, 10);

Before the change, regardless of the value of cond, the select() call would
have returned a temp_object region.
With the proposed change we would produce a (non-dangling) lifetime_extended_object
region with lifetime bound to le or a temp_object region for the dangling case.

We believe that such separation is desired, as such lifetime extended temporaries
are closer to the variables. For example, they may have a static storage duration
(this patch removes a static temporary region, which was an abomination).
We also think that alternative approaches are not viable.

While for some cases it may be possible to determine if the region is lifetime
extended by searching the parents of the initializer expr, this quickly becomes
complex in the presence of the conditions operators like this one:

Composite cc;
// Ternary produces prvalue 'int' which is extended, as branches differ in value category
auto&& x = cond ? Composite{}.x : cc.x;

// Ternary produces xvalue, and extends the Composite object
auto&& y = cond ? Composite{}.x : std::move(cc).x;

Finally, the lifetime of the CXXLifetimeExtendedObjectRegion is tied to the lifetime of
the corresponding variables, however, the "liveness" (or reachability) of the extending
variable does not imply the reachability of all symbols in the region.
In conclusion CXXLifetimeExtendedObjectRegion, in contrast to VarRegions, does not
need any special handling in SymReaper.

RFC: https://discourse.llvm.org/t/rfc-detecting-uses-of-dangling-references/70731

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald Transcript
tomasz-kaminski-sonarsource requested review of this revision.May 24 2023, 6:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2023, 6:10 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Please leave a link in the summary to the RFC to discuss.

I generally support the idea of a CXXLifetimeExtendedObj region, definitely cleaner than a CXXTempObjectRegion with static lifetime.

NoQ added a comment.May 25 2023, 3:27 PM

I totally support this! There's no reason why the region wouldn't carry such information.

This patch is surprisingly tiny, but this matches my grep observations: almost nothing in the static analyzer treats CXXTempObjectRegion specially. Even in case of live symbols/regions analysis, it simply relies on how temporary region is bound in the Store to the lifetime-extending reference, so it doesn't need to be handled explicitly.

I see that there's no change in MoveChecker which treats CXXTempObjectRegion specially, and I suspect it may start emitting more warnings after this patch, but I'm not sure if it's a good thing or a bad thing. Could be worth experimenting (eg. by analyzing LLVM itself, it has enough move semantics to demonstrate potential problems).

Do you have any preferences on creating a common base class for CXXTempObjectRegion and CXXLifetimeExtendedObjectRegion? Or, as an opposite extreme, simply add a nullable ExD field to CXXTempObjectRegion instead of making a new class? Both could save us some code when these regions do need to be treated uniformly, but it doesn't look like a popular situation to worry about.

I totally support this! There's no reason why the region wouldn't carry such information.

This patch is surprisingly tiny, but this matches my grep observations: almost nothing in the static analyzer treats CXXTempObjectRegion specially. Even in case of live symbols/regions analysis, it simply relies on how temporary region is bound in the Store to the lifetime-extending reference, so it doesn't need to be handled explicitly.

Indeed. However, I think that this could be a side product of CXXTempObjectRegion not representing actual short lived temporaries.

I see that there's no change in MoveChecker which treats CXXTempObjectRegion specially, and I suspect it may start emitting more warnings after this patch, but I'm not sure if it's a good thing or a bad thing. Could be worth experimenting (eg. by analyzing LLVM itself, it has enough move semantics to demonstrate potential problems).

I have performed our internal test, which includes LLVM, and it has no changes in the results due to this patch.
For the MoveChecker, after inspecting the code I believe that we should exclude only CXXTempObjectsRegions from that checker, as it is certainly possible to perform operations on moved from lifetime extended object.

Do you have any preferences on creating a common base class for CXXTempObjectRegion and CXXLifetimeExtendedObjectRegion? Or, as an opposite extreme, simply add a nullable ExD field to CXXTempObjectRegion instead of making a new class? Both could save us some code when these regions do need to be treated uniformly, but it doesn't look like a popular situation to worry about.

Despite them, both being referred to as temporary, they are very different in nature. Thus I believe they should be considered separately by each checker, thus I prefer them to not be related. This is why I skipped Temp word in the name of the region.
If any, I would group them with VarRegion.

Updated MoveChecker to handle CXXLifetimeExtendedObjects, in this case they need to be handled the same as other VarRegions.

Remove excessive reformating

Fixed formatting to match clang-format

Added function to query stack frame for temporary object.

xazax.hun accepted this revision.Jul 4 2023, 3:22 AM

Overall looks good to me. I think the changes to the notes already make the analyzer more useful so there are some observable benefits to this patch.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1280

Do we actually need this Expr in the memory region?

This revision is now accepted and ready to land.Jul 4 2023, 3:22 AM
tomasz-kaminski-sonarsource marked an inline comment as done.Jul 4 2023, 3:45 AM
tomasz-kaminski-sonarsource added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1280

Yes they are needed for region identification, as one variable my be lifetime extending multiple temporary.
For illustration consider following example from test:

auto const& viaReference = RefAggregate{10, Composite{}}; // extends `int`, `Composite`, and `RefAggregate`
clang_analyzer_dump(viaReference);    // expected-warning-re {{&lifetime_extended_object{RefAggregate, viaReference, S{{[0-9]+}}} }}
clang_analyzer_dump(viaReference.rx); // expected-warning-re {{&lifetime_extended_object{int, viaReference, S{{[0-9]+}}} }}
clang_analyzer_dump(viaReference.ry); // expected-warning-re {{&lifetime_extended_object{Composite, viaReference, S{{[0-9]+}}} }}

viaReference declaration is extending 3 temporaries, and they get identified by expression.

tomasz-kaminski-sonarsource marked an inline comment as done.Jul 4 2023, 7:20 AM

Overall looks good to me. I think the changes to the notes already make the analyzer more useful so there are some observable benefits to this patch.

Also, user-after-move now correctly detect the use of of move-from lifetime extended temporaries.

Reformatting and link to revelant core issue.

xazax.hun accepted this revision.Jul 4 2023, 8:07 AM
xazax.hun added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1280

Ah, OK. Makes sense, thanks!

xazax.hun added inline comments.Jul 4 2023, 8:15 AM
clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
556

I think strictly speaking this might be "unsound" resulting in false positives in scenarios like:

void f(bool reset) {
  static T&& extended = getTemp();
  if (reset) {
    extended = getTemp();
    return;
  }
  consume(std::move(extended));
  f(true);
  extended.use();
}

In case the call to f(true) is not inlined (or in other cases there is some aliasing the analyzer does not know about), we might not realize that the object is reset and might falsely report a use after move.

But I think this is probably sufficiently rare that we do not care about those.

clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
101

Is this a good wording for static lifetime extended temporaries?

tomasz-kaminski-sonarsource marked an inline comment as done.Jul 4 2023, 8:21 AM
tomasz-kaminski-sonarsource added inline comments.
clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
556

Aren't such case already excluded by the isa<StackSpaceRegion>(MR->getMemorySpace()), in a same way as we do for variables?
In such case, we should be creating getCXXStaticLifetimeExtendedObjectRegion so it will not be located on stack.

clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
101

Lifetime extended static temporaries should never enter here, as they are not in StackSpaceRegion.
Previously we have had CXXStaticTempObjects and now they are turned into CXXStaticLifetimeExtendedObjectRegion.

xazax.hun added inline comments.Jul 4 2023, 8:27 AM
clang/lib/StaticAnalyzer/Checkers/MoveChecker.cpp
556

Whoops, indeed, sorry.

clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
101

Ah, my bad.