This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][NFC] Avoid handling of LazyCompundVals in IteratorModeling
Needs ReviewPublic

Authored by baloghadamsoftware on Apr 1 2020, 10:17 AM.

Details

Summary

Since accessing the region of LazyCompoundVals is an undocumented and unreliable feature, try to find the region of the return value directly, skipping both the LazyCompoundVal and its later materialization.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Testing this patch on test/Analysis/iterator-modeling.cpp crashes with the following output:

Handling operator++()
  Return Value: lazyCompoundVal{0x55655390c9d8,i1}
  Bingo!
State->get<ObjectsUnderConstruction>(Key): &i1
Key.getItem().getKind(): 0
clang: /home/edmbalo/llvm-project/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:472: static clang::ento::ProgramStateRef clang::ento::ExprEngine::addObjectUnderConstruction(clang::ento::ProgramStateRef, const clang::ConstructionContextItem&, const clang::LocationContext*, clang::ento::SVal): Assertion `!State->get<ObjectsUnderConstruction>(Key) || Key.getItem().getKind() == ConstructionContextItem::TemporaryDestructorKind' failed.

...

What could be the problem here?

NoQ added a comment.Apr 1 2020, 11:28 AM

The assert points out that the object under construction is already tracked and you're trying to add it twice. You don't want to add it tho, you just want to find the region.

gamesh411 added inline comments.Apr 3 2020, 1:37 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
436

Maybe indicating wheter the call returns a C++ record type should be done via using and Optional here (to modernize the api)?

clang/lib/StaticAnalyzer/Checkers/IteratorModeling.cpp
342

Debug comment left here :)

Lots of things seem to work now. However, I have a question: what to do with the return values of non-inlined calls? It may also be a LazyCompoundVal, but non-inlined calls do not have StackFrameContext. How to retrieve the ConstructionContext in such cases?

baloghadamsoftware marked an inline comment as done.Apr 3 2020, 6:06 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
112 ↗(On Diff #254765)

Maybe we should merge we should merge this function into handleConstructionContext()?

baloghadamsoftware added a comment.EditedApr 3 2020, 6:08 AM

Output of

/home/edmbalo/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/ssd/edmbalo/llvm-project/build/lib/clang/11.0.0/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -std=c++11 -analyzer-checker=core,cplusplus,debug.DebugIteratorModeling,debug.ExprInspection -analyzer-config aggressive-binary-operation-simplification=true -analyzer-config c++-container-inlining=false /home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp

is the following:

Container PostCall
Original RetVal: lazyCompoundVal{0x55563691e490,i0}
  Retrieving Original
Updated RetVal: &i0
Generic PostCall
Original RetVal: lazyCompoundVal{0x55563691e490,i0}
  Retrieving Original
Updated RetVal: &i0
Container PostCall
Original RetVal: lazyCompoundVal{0x555636921ba0,temp_object{std::list<int>::const_iterator, S48444}}
  Retrieving Original
Updated RetVal: &temp_object{std::list<int>::const_iterator, S48444}
Generic PostCall
Original RetVal: lazyCompoundVal{0x555636921ba0,temp_object{std::list<int>::const_iterator, S48444}}
  Retrieving Original
Updated RetVal: &temp_object{std::list<int>::const_iterator, S48444}
Container PostCall
Original RetVal: lazyCompoundVal{0x555636922d28,i1}
    No Stack Frame!
  No construction context!!!
Generic PostCall
Original RetVal: lazyCompoundVal{0x555636922d28,i1}
    No Stack Frame!
  No construction context!!!
Container PostCall
Original RetVal: lazyCompoundVal{0x555636925348,i2}
  Retrieving Original
Updated RetVal: &i2
Generic PostCall
Original RetVal: lazyCompoundVal{0x555636925348,i2}
  Retrieving Original
Updated RetVal: &i2
Container PostCall
Original RetVal: conj_$3{long, LC1, S45775, #1}
Updated RetVal: conj_$3{long, LC1, S45775, #1}
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: conj_$11{long, LC1, S48640, #1}
Updated RetVal: conj_$11{long, LC1, S48640, #1}
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: 0 S64b
Updated RetVal: 0 S64b
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Bind Old: lazyCompoundVal{0x555636925348,i1}
Bind New: &temp_object{std::list<int>::const_iterator, S49796}
Container PostCall
Original RetVal: lazyCompoundVal{0x555636921918,temp_object{std::list<int>::const_iterator, S49796}}
  Retrieving Original
  Handling New
Updated RetVal: &temp_object{std::list<int>::const_iterator, S49796}
Generic PostCall
Original RetVal: lazyCompoundVal{0x555636921918,temp_object{std::list<int>::const_iterator, S49796}}
  Retrieving Original
  Handling New
Updated RetVal: &temp_object{std::list<int>::const_iterator, S49796}
Container PostCall
Original RetVal: lazyCompoundVal{0x55563692ea10,i3}
    No Stack Frame!
  No construction context!!!
Generic PostCall
Original RetVal: lazyCompoundVal{0x55563692ea10,i3}
    No Stack Frame!
  No construction context!!!
Container PostCall
Original RetVal: 1 S64b
Updated RetVal: 1 S64b
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: 0 S64b
Updated RetVal: 0 S64b
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: 1 S64b
Updated RetVal: 1 S64b
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: conj_$3{long, LC1, S45775, #1}
Updated RetVal: conj_$3{long, LC1, S45775, #1}
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: 0 S64b
Updated RetVal: 0 S64b
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
Container PostCall
Original RetVal: conj_$11{long, LC1, S48640, #1}
Updated RetVal: conj_$11{long, LC1, S48640, #1}
Container PostCall
Original RetVal: Unknown
Updated RetVal: Unknown
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:900:3: warning: Not a symbol
  clang_analyzer_denote(clang_analyzer_iterator_position(i1), "$i1");
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:904:3: warning: TRUE
  clang_analyzer_eval(clang_analyzer_iterator_validity(i0)); //expected-warning{{TRUE}}
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:905:3: warning: FALSE
  clang_analyzer_eval(clang_analyzer_iterator_validity(i1)); //expected-warning{{TRUE}}
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:906:3: warning: TRUE
  clang_analyzer_eval(clang_analyzer_iterator_validity(i2)); //expected-warning{{TRUE}}
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:908:3: warning: $L.begin()
  clang_analyzer_express(clang_analyzer_iterator_position(i0)); // expected-warning{{$L.begin()}}
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:909:3: warning: Not a symbol
  clang_analyzer_express(clang_analyzer_iterator_position(i1)); // expected-warning{{$i1}}
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/edmbalo/llvm-project/clang/test/Analysis/iterator-modeling.cpp:911:3: warning: $L.end()
  clang_analyzer_express(clang_analyzer_iterator_position(i2)); // expected-warning{{$L.end()}}
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7 warnings generated.

More progress...

baloghadamsoftware marked 3 inline comments as done.Apr 6 2020, 4:04 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
127 ↗(On Diff #255279)

This does not find anything in the map for base initializers.

230 ↗(On Diff #255279)

This assertion seems to be unfounded. What to do with base initializers? However, in our case it seems that this function should never be executed because to the objects under construction are already in the map. However, I do not know how to handle base initializers.

clang/test/Analysis/iterator-modeling.cpp
1791 ↗(On Diff #255279)

We are crashing here (assertion) on calling the base initializer.

NoQ added a comment.Apr 6 2020, 4:15 AM

Uh-oh, it's annoying indeed that you can't easily obtain the current CFG element from inside a CallEvent.

Given that every CallEvent is in fact associated with a CFGElement (but not every CallEvent is associated with an Expr or have a Decl), can we modify the CallEvent structure to store a CFGElementRef instead of llvm::PointerUnion<const Expr *, const Decl *> Origin;? That might be some work but it should increase our sanity by a lot.

In D77229#1963485, @NoQ wrote:

Uh-oh, it's annoying indeed that you can't easily obtain the current CFG element from inside a CallEvent.

Given that every CallEvent is in fact associated with a CFGElement (but not every CallEvent is associated with an Expr or have a Decl), can we modify the CallEvent structure to store a CFGElementRef instead of llvm::PointerUnion<const Expr *, const Decl *> Origin;? That might be some work but it should increase our sanity by a lot.

Yes, that seems a good idea. For now I solved this problem, but storing something instead of searching for it linearily sound better in regards of performance. However my current problem is worse, I have no idea at all, how to solve it.

Another problem: parameters can also be LazyCompoundVals, but CallEvent::getParameterLocation() does not handle non-inlined calls.

NoQ added inline comments.Apr 6 2020, 10:05 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
127 ↗(On Diff #255279)

Yup, that's because they're not handled through the objects-under-construction map. See ExprEngine::handleConstructor() - the switch on constructor kinds.

That said, i don't think you need to look it up from the map; just re-compute the region instead exactly how ExprEngine::handleConstructor() does it.

NoQ added inline comments.Apr 6 2020, 10:35 AM
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
127 ↗(On Diff #255279)

(this should be easier because it'll be exactly one line of code here, assuming the common part is factored out from handleConstructor())

Experimenting...

baloghadamsoftware marked an inline comment as done.Apr 7 2020, 4:51 AM

Any idea for LazyCompoundVal parameters of functions not inlined?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
231–235 ↗(On Diff #255641)

Did you mean this piece of code? It returns &temp_object{struct simple_iterator_base, S44016}. Is this correct? If so, I will factor out this code and put it into a common function to be used by both this function and the original one.

Any idea for LazyCompoundVal parameters of functions not inlined?

OK, I can reach them from the ConstructionContext of the arguments. However, they are temporaries and for some strange reason not alive which means that checkDeadSymbols() removes the iterator positions associated to them before the postCall() of the call itself. How to correctly prolong their lives until the end of the postCall()?

OK, I can reach them from the ConstructionContext of the arguments. However, they are temporaries and for some strange reason not alive which means that checkDeadSymbols() removes the iterator positions associated to them before the postCall() of the call itself. How to correctly prolong their lives until the end of the postCall()?

Not so strange, of course, they are destructed before the postCall() as they should be, but the question still remains: how to keep them alive for post-checking the call, but not forever.

NoQ added a comment.Apr 7 2020, 10:18 AM

Not so strange, of course, they are destructed before the postCall() as they should be, but the question still remains: how to keep them alive for post-checking the call, but not forever.

Nope, temporaries are destroyed at the end of the full-expression, which is much later than PostCall.

What code are you debugging?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
231–235 ↗(On Diff #255641)

No, this one's for members, we've been talking about base classes.

baloghadamsoftware marked an inline comment as done.Apr 7 2020, 12:14 PM
In D77229#1967269, @NoQ wrote:

Not so strange, of course, they are destructed before the postCall() as they should be, but the question still remains: how to keep them alive for post-checking the call, but not forever.

Nope, temporaries are destroyed at the end of the full-expression, which is much later than PostCall.

What code are you debugging?

OK, maybe I phrased it incorrectly. It is not destroyed, but not "live" anymore. Thus isLive() returns false. I am trying to debug vector_insert_begin() (line 954).

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
231–235 ↗(On Diff #255641)

Oh yes, I see it now. But which one then? Maybe line 585? Or the whole switch expression? Sorry, I am not sure I fully understand this piece of code.

baloghadamsoftware marked an inline comment as done.Apr 8 2020, 1:39 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
231–235 ↗(On Diff #255641)

Now it returns &Base{SymRegion{reg_$0<struct simple_derived_iterator * this>},simple_iterator_base}. Is it correct?

Another showstopper: the test container_modeling.cpp passed with analyzer option container-inlining=false. However it faild with container-inlining=true -DINLINE=1. The problem is that the CFGElement for function iterator begin() { return iterator(_start); } is just a CFGStmt and not CFGCXXRecordTypedCall. I started debugging CFGBuilder and its construction context is never created. The strange thing is that to have a construction context created the function consumeConstructionContext() should have been called for the function. This function is called by findConstructionContexts(), but this never happens for begin(), just for the construction inside it. Actually, VisitCallExpr() never calls this function, only for the arguments. I am totally lost here. What is wrong?

NoQ added a comment.Apr 8 2020, 7:18 AM

I am totally lost here. What is wrong?

Looks like a bug. Every constructor and every function that returns an object by value should have a construction context.

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
231–235 ↗(On Diff #255641)

I don't know. What code are you analyzing in this thread of discussion?

NoQ added a comment.Apr 8 2020, 7:19 AM

The problem is that the CFGElement for function iterator begin() { return iterator(_start); } is just a CFGStmt and not CFGCXXRecordTypedCall.

In which code? It's not about the function, it's about the caller context.

baloghadamsoftware marked an inline comment as done.Apr 8 2020, 10:33 AM
In D77229#1969524, @NoQ wrote:

The problem is that the CFGElement for function iterator begin() { return iterator(_start); } is just a CFGStmt and not CFGCXXRecordTypedCall.

In which code? It's not about the function, it's about the caller context.

In the very first test of container_modeling.cpp:

void begin(const std::vector<int> &V) {
  V.begin();

  clang_analyzer_denote(clang_analyzer_container_begin(V), "$V.begin()");
  clang_analyzer_express(clang_analyzer_container_begin(V)); // expected-warning{{$V.begin()}}
                                                             // expected-note@-1{{$V.begin()}}
}
clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
231–235 ↗(On Diff #255641)

Lines 1778-1813 of iterator-modeling.cpp. The only one test not commented out.

In D77229#1969524, @NoQ wrote:

The problem is that the CFGElement for function iterator begin() { return iterator(_start); } is just a CFGStmt and not CFGCXXRecordTypedCall.

In which code? It's not about the function, it's about the caller context.

Hmm, it seems that if I do not use the return value of the function explicitly, then no construction context is generated for it. Maybe this is how it should work so I have to modify the tests.

OK. I solved almost every problem except that checkDeadSymbols() removes the iterator position keyed by the temporary region representing the argument from the map before it is needed. Either I must somehow recognize that although the region is not "live" but the call whose argument it stores is not postchecked yet. I have no idea how to do this. Or, which I would prefer is to modify SymbolReaper so that isLive() returns true for temporary regions storing arguments of calls not postchecked yet. I have no idea either how to do this. Maybe should we take the Expr of CXXTempObjectRegion, use ParentMap to recognize whether it is a parameter for a FunctionDecl, but how to check whether the call for the function of the FunctionDecl is not postchecked yet? @NoQ, could you please, help me in this? Thx!

OK, I made some experiments. In function vector_insert_begin() of test file iterator-modeling.cpp, line 960 we pass i0 to std::vector::insert(). This function has no body at all so regardless of inlining of STL functions is enabled, this function is never inlined. Probably for this reason, because it does not have its StackFrame function handleConstructionContext() returns a C++ temporary at the end of the function. This temporary is not added to the construction context of the function thus it is not marked as live before removing dead symbols.

I see two ways now to resolve this. The first is to add these temporaries into the construction context. This requires the assertion in ExprEngine::finishArgumentConstruction() (around line 544, but I have some debug printouts there, at the moment so the line number may not be accurate) to be changed because it assumes that the region is a VarRegion. The second is somehow to recognize these temporary regions and mark them as alive based on the CallEvent by somehow checking the regions bound to its arguments but I wonder how exactly to do this.

@NoQ, I need you comments about the right direction about proceeding further.

NoQ added a comment.Apr 14 2020, 7:12 AM

Aha, ok, sounds like i thought that it's not worth it to inline the constructor for an argument when the call itself is not inlined, therefore i didn't model the construction context for this call. See also D49443#1193290. Your example shows that i was wrong to give up and this is something we should totally implement.

The proper solution is to model the target region in ExprEngine as a parameter region based on the ParmVarDecl that would have been used if the function was inlined, instead of the dummy temporary. Then it'll be kept alive as an object under construction.

The most annoying part is to make sure that you're using the same FunctionDecl consistently everywhere. If this turns out to be too annoying, we could also replace VarRegion{ParmVarDecl} with a new sort of region that doesn't include a specific ParmVarDecl but merely a parameter index (and maybe a type or a call site expr), so that not to bother with redeclarations; this is the right thing to do anyway.

In D77229#1980665, @NoQ wrote:

Aha, ok, sounds like i thought that it's not worth it to inline the constructor for an argument when the call itself is not inlined, therefore i didn't model the construction context for this call. See also D49443#1193290. Your example shows that i was wrong to give up and this is something we should totally implement.

The proper solution is to model the target region in ExprEngine as a parameter region based on the ParmVarDecl that would have been used if the function was inlined, instead of the dummy temporary. Then it'll be kept alive as an object under construction.

The most annoying part is to make sure that you're using the same FunctionDecl consistently everywhere. If this turns out to be too annoying, we could also replace VarRegion{ParmVarDecl} with a new sort of region that doesn't include a specific ParmVarDecl but merely a parameter index (and maybe a type or a call site expr), so that not to bother with redeclarations; this is the right thing to do anyway.

I am trying to understand where I have to implement this but I have some problems. The first is that when a function is not inlined it does not have a StackFrameContext. What to use instead? I usually try to use the caller's stack frame but it seems to be incorrect and causes assertions. Or should we create a stack frame for non-inlined functions as well?

Where are the parameter regions created? In case ConstructionContext::ArgumentKind of function ExprEngine::handleConstructionContext()? Or in CallEvent::getParameterLocation()? Or somewhere else? I tried to look at D49443, especially the diff where you gave up regions for parameters of non-inlined functions but there you only had changes in CallEvent.

For using the same FunctionDecl: would it help if we always use the "first" Decl in the chain?

NoQ added a comment.Apr 15 2020, 7:58 AM

I am trying to understand where I have to implement this but I have some problems. The first is that when a function is not inlined it does not have a StackFrameContext. What to use instead? I usually try to use the caller's stack frame but it seems to be incorrect and causes assertions. Or should we create a stack frame for non-inlined functions as well?

Even if the function is inlined, we need the stack frame context before it's actually created (in fact, it's needed even before we construct the arguments). The whole point of D49443 was to learn how to pro-actively create stack frame contexts and lifetime-extend it back in time so that they were live until the call starts (where they actually become live). This part doesn't really change in your case.

For using the same FunctionDecl: would it help if we always use the "first" Decl in the chain?

The word you're looking for is "canonical decl". The difficulty is not finding such decl but it's making sure it's used consistently, otherwise you won't be able to look up the parameter.

I commented out lines which prevent creation of StackFrameContext for non-inlined functions. Now everything works regarding the iterator checkers. I also simplified things and created new methods for CallEvent: getArgObject() and getReturnObject().

Two tests fail now: temporaries.cpp emits TRUE in lines 894 and 918 (I wonder whether they are correct) and explain-svals.cpp emits lazily frozen compound value of parameter '' in line 96 which is surely incorrect.

The main problem remains to solve is what we already discussed: either to find always the same Decl or create a new kind of region for arguments.

My first attempt for a new special kind of region for parameters of functions without definitions. Currently it causes more failed tests and assertions than ignoring the problem of different Decls. I will investigate these problems tomorrow, but this path seems difficult.

NoQ added a comment.Apr 21 2020, 5:41 PM

My first attempt for a new special kind of region for parameters of functions without definitions. Currently it causes more failed tests and assertions than ignoring the problem of different Decls. I will investigate these problems tomorrow, but this path seems difficult.

Thank you!! I recommend a separate patch for this :) Yes it's a bit intrusive because you'll need to add support for the new region in the RegionStore but there's nothing extraordinary here as most of the time it'll be treated the same way as the old parameter region.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
438–442

I believe these functions don't need to be optional. There's always a parameter location and a location to return to (even if the latter is incorrect due to unimplemented construction context, it'll still be some dummy temporary region that you can use).

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

There should be only one way to express the parameter region. Let's call this one simply ParamRegion or something like that, and assert(!isa<ParmVarDecl>(D)) in the constructor of VarRegion.

1066 ↗(On Diff #259060)

This should be the type of the parameter, not the return type of the call.

clang/lib/StaticAnalyzer/Checkers/IteratorModeling.cpp
225

Do you still need checkBind at all?

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
273–274

This function obviously does not depend on BlockCount. You might as well always use 0 as BlockCount.

The ideal solution, of course, is for CallEvent to keep a CFGElementRef from the start.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
559 ↗(On Diff #259060)

We could still dump CallE->getID() and Index.

1208 ↗(On Diff #259060)

StackArguments.

First version that seems to be working :-)

baloghadamsoftware marked 7 inline comments as done.Apr 22 2020, 7:16 AM

Thank you for your comments. Of course I plan to upload all these changes in at least three different patches. But first I needed a version that is working. However if I split this up I will have to write lots of unit tests.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
438–442

Do you mean we should return the original SVal if parameter or return value location fails? (Thus LazyCompoundVal in worst case?

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

Do you mean that we should use ParamRegion in every case, thus also when we have the definitioan for the function? I wonder whether it breaks too many things.

1066 ↗(On Diff #259060)

Yes, sorry, my mistake.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
273–274

I plan to make that change in a separate patch.

baloghadamsoftware marked 2 inline comments as done.Apr 23 2020, 12:31 AM
baloghadamsoftware added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

This will surely not work. The common handling of ParamVarDecl and VarDecl is soo deeply rooted in the whole analyzer that separating them means creation of a totally new analyzer engine from scratch.

baloghadamsoftware marked an inline comment as done.Apr 23 2020, 2:34 AM
baloghadamsoftware added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

More specifically: whenever a function is inlined, its parameters are used as variables via DeclRefExprs. A DeclRefExpr refers to a Decl which is a ParamVarDecl but that has reference neither for the CallExpr (since it is not related to the call, but to the FunctionDecl or ObjCMethodDecl) nore for its Index in the call. The former is the real problem that cannot be solved even on theoretical level: a function which is inlined cannot depend on the different CallExprs where it is called. Even worse, if the function is analyzed top-level it has not CallExpr at all so using ParamRegion for its parameters is completely impossible.

NoQ added inline comments.Apr 23 2020, 5:27 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
438–442

It simply should not fail.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

A DeclRefExpr refers to a Decl which is a ParamVarDecl but that has reference neither for the CallExpr (since it is not related to the call, but to the FunctionDecl or ObjCMethodDecl)

The call site is available as part of the current location context.

nore for its Index in the call

It's available as part of ParmVarDecl.

The former is the real problem that cannot be solved even on theoretical level: a function which is inlined cannot depend on the different CallExprs where it is called

It already depends on the CallExpr. ParmVarDecl-based VarRegions are different for different call sites even if ParmVarDecl is the same; moreover, they reside in non-overlapping memory spaces.

Even worse, if the function is analyzed top-level it has not CallExpr at all so using ParamRegion for its parameters is completely impossible.

Ok, let's make an exception for this case and either use the old VarRegion (given that there's no redecl confusion in this case) or allow the CallExpr to be null (it's still trivially easy to extract all the necessary information).

The common handling of ParamVarDecl and VarDecl is soo deeply rooted in the whole analyzer that separating them means creation of a totally new analyzer engine from scratch.

I don't see any indication of that yet.

baloghadamsoftware marked an inline comment as done.Apr 24 2020, 5:39 AM
baloghadamsoftware added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

or allow the CallExpr to be null

How do we calculate the type then? Or should we store it explicitly?

baloghadamsoftware marked an inline comment as done.Apr 24 2020, 7:49 AM
baloghadamsoftware added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1044 ↗(On Diff #259060)

I don't see any indication of that yet.

But I do. Even if I fix most of the crashes almost all the tests fail for different reasons.

But I also have crashes: what to do with functions called through pointers? These calls do not have direct callee, thus determining the type of the parameter based on the call is difficult. If the function has no prototype, then it is impossible. Maybe we should store the type explicitely indeed?

All regions of parameters are ParamRegions, type stored explicitly, no crashes but 130 of 550 analyzer tests failing.

I spent some time on thinking about this during the weekend. It is no wonder that the tests fail. It seems that many parts of the Analyzer core exploit the fact that the MemRegions of parameters are VarRegions. This is no wonder because all ParamVarDecl are VarDecl and in an inlined function parameters behave like variables. This is the correct behavior which I think should not be changed. Even if I put weeks or months of work into it the code will be worse at the end because there will be lots of branches upon the nature of the variable and alternative ways for reaching the Decl from their regions. This is not the proper way, MemRegions for parameters in inlined functions should remain VarRegions because parameters are variables inside. On the other hand, from outside it does not matter what kind of regions they are. We mainly need the region there to use it as key for GDM maps. That is why ParamRegions should only be created for functions that are not inlined. If they have no definition then they cannot be inlined so a ParamRegion is OK for them. However, if they have a definition then there is no problem finding the same Decl because the definition is that same Decl that we use everywhere so it can and should remain a VarRegion.

NoQ added a comment.Apr 27 2020, 5:01 AM

How do we calculate the type then?

Argument expression type + account for value kind (lvalue argument expression means reference parameter, xvalue argument expression mean rvalue reference parameter).

Note that for C-style variadic functions, parameter declarations for variadic arguments are also not available. It doesn't mean that we will never be able to analyze them. We can analyze everything: after all, we can CodeGen everything from the AST. On the contrary, the solution you're implementing now would work for C-style variadic functions as well, finally allowing us to inline them.

NoQ added a comment.Apr 27 2020, 5:11 AM

Even if I put weeks or months of work into it the code will be worse at the end because there will be lots of branches upon the nature of the variable and alternative ways for reaching the Decl from their regions.

What specifically do you want from the Decl? Is it the type? We have it. Is it the very fact that it's a parameter? We have it. Is it the index of the parameter? We have it. Is it the name of the parameter? It's so irrelevant that it's not even necessarily consistent across redeclarations, but if you really need it you may try to obtain the Decl whenever it's available, as the amount of cases when the Decl is available hasn't changed. What else does a Decl have to offer?

NoQ added a comment.Apr 27 2020, 5:17 AM

no crashes but 130 of 550 analyzer tests failing.

You most likely need to fill in the reachibility and liveness code with the behavior of the new region. I don't seem to see it in your current code but their default behavior is clearly incorrect; instead, they should behave exactly like the old region used to.

In D77229#2005042, @NoQ wrote:

no crashes but 130 of 550 analyzer tests failing.

You most likely need to fill in the reachibility and liveness code with the behavior of the new region. I don't seem to see it in your current code but their default behavior is clearly incorrect; instead, they should behave exactly like the old region used to.

That sound a lot of code duplication for me.

In D77229#2005042, @NoQ wrote:

no crashes but 130 of 550 analyzer tests failing.

You most likely need to fill in the reachibility and liveness code with the behavior of the new region. I don't seem to see it in your current code but their default behavior is clearly incorrect; instead, they should behave exactly like the old region used to.

Do you mean the getBinding...() function of RegionStore and the isLive...() functions of SymbolManager?

NoQ added a comment.Apr 27 2020, 7:29 AM

This is not the proper way, MemRegions for parameters in inlined functions should remain VarRegions because parameters are variables inside. On the other hand, from outside it does not matter what kind of regions they are. We mainly need the region there to use it as key for GDM maps. That is why ParamRegions should only be created for functions that are not inlined. If they have no definition then they cannot be inlined so a ParamRegion is OK for them. However, if they have a definition then there is no problem finding the same Decl because the definition is that same Decl that we use everywhere so it can and should remain a VarRegion.

I'm not opposed to this solution. I'm not sure it's actually easier; i agree that it may even be less code but because i already tried to do this and failed, i'm worried that the tail of regressions may actually be longer.

That said, as of now you cannot know in advance whether the function will be inlined or not. "Having a definition" is probably the best approximation that you'll ever get.

Also it complicates the construction of the region, given that in order to figure out whether we have a definition you suddenly need access to the Environment (in case of inlined calls by function pointers you cannot figure out from the AST whether we have a definition or not) which is a fairly unfortunate layering violation. Is such information always available in all the places in which we want to reconstruct the region? Will it always be available? How will you ensure that?

Also please check if the following example behaves correctly:

class C { ... };

void foo(C, int) {}

int set_foo_ptr(void (**func)(C, int)) {
  *func = foo;
  return 0;
}

void bar() {
  void (*func)(C, int);
  func(C(), set_foo_ptr(&func));
}

What's the evaluation order here? Do we actually invoke foo() from bar() here or do we read from the func variable before assigning into it thus causing a call to uninitialized pointer? In the former case your plan is doomed because you need a region into which C() lands earlier than you get to know any Decl at all.

NoQ added a comment.Apr 27 2020, 7:31 AM
In D77229#2005042, @NoQ wrote:

no crashes but 130 of 550 analyzer tests failing.

You most likely need to fill in the reachibility and liveness code with the behavior of the new region. I don't seem to see it in your current code but their default behavior is clearly incorrect; instead, they should behave exactly like the old region used to.

Do you mean the getBinding...() function of RegionStore and the isLive...() functions of SymbolManager?

Yup, you're right, i should have mentioned basically the whole RegionStore's load/store process instead of just the escape/invalidation process that relies on reachibility.

In D77229#2005033, @NoQ wrote:

How do we calculate the type then?

Argument expression type + account for value kind (lvalue argument expression means reference parameter, xvalue argument expression mean rvalue reference parameter).

OK, I reid this piece of code instead of storing the type explicitly:

QualType ParamRegion::getValueType() const {
  const Expr *Arg = nullptr;
  if (const auto *CE = dyn_cast<CallExpr>(OriginExpr)) {
    Arg = CE->getArg(Index);
  } else if (const auto *CCE = dyn_cast<CXXConstructExpr>(OriginExpr)) {
    Arg = CCE->getArg(Index);
  } else if (const auto *OCME = dyn_cast<ObjCMessageExpr>(OriginExpr)) {
    Arg = OCME->getArg(Index);
  } else if (const auto *CNE = dyn_cast<CXXNewExpr>(OriginExpr)) {
    Arg = CNE->getPlacementArg(Index);
  } else {
    // FIXME: Any other kind of `Expr`?
    llvm_unreachable("Maybe we forgot something...");
  }

  assert(Arg);

  switch (Arg->getValueKind()) {
  case VK_RValue:
    return Arg->getType();
  case VK_LValue:
    return getContext().getLValueReferenceType(Arg->getType());
  case VK_XValue:
    return getContext().getRValueReferenceType(Arg->getType());
  default:
    llvm_unreachable("Invalid value kind.");
  }
}

This results in the following assertion:

llvm-project/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:333: clang::ento::ProgramStateRef clang::ento::ExprEngine::createTemporaryRegionIfNeeded(clang::ento::ProgramStateRef, const clang::LocationContext*, const clang::Expr*, const clang::Expr*, const clang::ento::SubRegion**): Assertion `!InitValWithAdjustments.getAs<Loc>() || Loc::isLocType(Result->getType()) || Result->getType()->isMemberPointerType()' failed.

And this one:

'Assume' not implemented for this NonLoc
UNREACHABLE executed at /home/edmbalo/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:67!

Furthermore, there are also problems with CXXNewExpr:

llvm-project/clang/include/clang/AST/ExprCXX.h:2239: clang::Expr* clang::CXXNewExpr::getPlacementArg(unsigned int): Assertion `(I < getNumPlacementArgs()) && "Index out of range!"' failed.

Furthermore, there are also problems with CXXNewExpr:

llvm-project/clang/include/clang/AST/ExprCXX.h:2239: clang::Expr* clang::CXXNewExpr::getPlacementArg(unsigned int): Assertion `(I < getNumPlacementArgs()) && "Index out of range!"' failed.

This one seems to be resolved by subtracting 1 from Index since the first argument is not a placement argument but the size.

llvm-project/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:333: clang::ento::ProgramStateRef clang::ento::ExprEngine::createTemporaryRegionIfNeeded(clang::ento::ProgramStateRef, const clang::LocationContext*, const clang::Expr*, const clang::Expr*, const clang::ento::SubRegion**): Assertion `!InitValWithAdjustments.getAs<Loc>() || Loc::isLocType(Result->getType()) || Result->getType()->isMemberPointerType()' failed.

For this one I have to increment the Index by one if it is a member call. Either at the creation of the region or at getting the type. However, after an hour of research I found no way to determine this for operator calls. An operator call is CXXOperatorCallExpr without a method that tells whether it is a member or a non-member operator. This is a problem because ParmVarDecl::getFunctionScopeIndex() returns the real index, but CallExpr::getArg() takes the "raw" index, where 0 means the implicit this argument for member calls.

I implemented the basic isLive() and getBinding() functions which reduced the number of failing tests by 0. I also implemented dynamic calculation of the type which increased the number of failing tests by 4. The reason is probably that when ParamRegion is created in Expr::VisitCommonDeclRefExpr() we cannot retrieve information whether the call is a C++ member operator call. If it is, we should shift the index by 1. Using the wrong index and thus returning the wrong type causes failing tests and crashes (assertions).

NoQ added a comment.Apr 27 2020, 12:57 PM

If your change causes a crash, please debug it. That's a completely normal thing that programmers do every day. Unfortunately, I can't afford to spoon-feed you the solution step-by-step.

NoQ added inline comments.Apr 27 2020, 12:57 PM
clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
601–603 ↗(On Diff #260387)

I implemented the basic isLive() and getBinding() functions which reduced the number of failing tests by 0.

This line in particular is very incorrect. In fact, most of the time the result will be false.

In D77229#2006106, @NoQ wrote:

If your change causes a crash, please debug it. That's a completely normal thing that programmers do every day. Unfortunately, I can't afford to spoon-feed you the solution step-by-step.

As I wrote, I debugged and analyzed it and I am facing a problem for which I need help. Operator calls are very poorly implemented in the AST.

As I wrote, I debugged and analyzed it and I am facing a problem for which I need help. Operator calls are very poorly implemented in the AST.

Yes, indeed very poorly. A C++ operator call is either implemented as CXXOperatorCall or CXXMethodCall randomly. In the former case the this parameter is explicit at the beginning, in the latter case it is implicit. I managed to solve it using this piece of code:

const auto *Call = cast<Expr>(CallSite);
unsigned Index = PVD->getFunctionScopeIndex();
// For `CallExpr` of C++ member operators we must shift the index by 1
if (isa<CXXOperatorCallExpr>(Call)) {
  if (const auto *CMD = dyn_cast<CXXMethodDecl>(SFC->getDecl())) {
    if (CMD->isOverloadedOperator())
      ++Index;
  }
}

But this is not the only problem with indices. I have no solution for the following call (is it really a call?) resulting in an assertion because overindexing in nullability.mm:

Dummy *_Nonnull (^myblock)(void) = ^Dummy *_Nonnull(void) {

Even if I try to get the FunctionDecl it still says that it has 0 parameters, exactly what CallExpr says, but we have a parameter here with index of 0. How can it be? How should I handle this strange case? I only have very basic Objective-C knowledge.

In D77229#2006106, @NoQ wrote:

If your change causes a crash, please debug it. That's a completely normal thing that programmers do every day. Unfortunately, I can't afford to spoon-feed you the solution step-by-step.

I do not require it, but I noticed to you at the very beginning that I need assistance. I work on the analyzer for less than 5 years, and not full time, because I also create Tidy checkers sometimes, I also have our internal stuff (most of which I would like to upstream). As far as I know, you have more than 10 years experience in the Analyzer. I am already have deep knowledge in some parts (e.g. constraint handling, AST), I am somewhat familiar with others (SValBuilder, ExprEngine, BugReporter etc.) but I never touched regions and store.

NoQ added a comment.Apr 28 2020, 12:50 AM

Yes, indeed very poorly. A C++ operator call is either implemented as CXXOperatorCall or CXXMethodCall randomly. In the former case the this parameter is explicit at the beginning, in the latter case it is implicit.

We have methods on CallEvent to handle this, namely getAdjustedParameterIndex() and getASTArgumentIndex(). They were added in D49443.

But this is not the only problem with indices. I have no solution for the following call (is it really a call?) resulting in an assertion because overindexing in nullability.mm:

Dummy *_Nonnull (^myblock)(void) = ^Dummy *_Nonnull(void) {

Even if I try to get the FunctionDecl it still says that it has 0 parameters, exactly what CallExpr says, but we have a parameter here with index of 0. How can it be? How should I handle this strange case? I only have very basic Objective-C knowledge.

These are blocks. They are like lambdas, just different syntax. They aren't part of Objective-C; they're a non-standard extension to C supported by both gcc and clang.

The line you quoted is not a call, it only defines a block and assigns it into a variable of block pointer type. Here's the full code of the test:

358 void testNilReturnWithBlock(Dummy *p) {
359   p = 0;
360   Dummy *_Nonnull (^myblock)(void) = ^Dummy *_Nonnull(void) {
361     return p; // TODO: We should warn in blocks.
362   };
363   myblock();
364 }

The block is invoked on line 363. Parameter p of the surrounding function is accessible from the block because it was captured; it is not a parameter of the block itself.

As far as I know, you have more than 10 years experience in the Analyzer. I work on the analyzer for less than 5 years, and not full time, because I also create Tidy checkers sometimes, I also have our internal stuff (most of which I would like to upstream).

I started in mid-2014. I too am regularly distracted to stuff that isn't particularly educational about the analyzer.

Oh, it is not only Objective-C. I have been programming in C for 25+ years and teaching it for 15+ years, but I never met such syntactical construction:

dispatch_sync(queue, ^(void){

Here it is worse than at the Objective-C code above, because the index is 1, but the number of args is 0. Thus this is an overindexing by 2. What are these strange constructions and how to get parameter type if they have no arguments? (Even if I get the function declaration from the stack frame, they have no parameters.) How can we have a parameter for them? How to handle them, how to return the type for a parameter that should not exist but it does?

In D77229#2007125, @NoQ wrote:

Yes, indeed very poorly. A C++ operator call is either implemented as CXXOperatorCall or CXXMethodCall randomly. In the former case the this parameter is explicit at the beginning, in the latter case it is implicit.

We have methods on CallEvent to handle this, namely getAdjustedParameterIndex() and getASTArgumentIndex(). They were added in D49443.

Unfortunately, in ExprEngine::VisitCommonDeclRefExpr we do not have CallEvent, only CallExpr.

These are blocks. They are like lambdas, just different syntax. They aren't part of Objective-C; they're a non-standard extension to C supported by both gcc and clang.

The line you quoted is not a call, it only defines a block and assigns it into a variable of block pointer type. Here's the full code of the test:

358 void testNilReturnWithBlock(Dummy *p) {
359   p = 0;
360   Dummy *_Nonnull (^myblock)(void) = ^Dummy *_Nonnull(void) {
361     return p; // TODO: We should warn in blocks.
362   };
363   myblock();
364 }

The block is invoked on line 363. Parameter p of the surrounding function is accessible from the block because it was captured; it is not a parameter of the block itself.

Sorry, all my studies were about standard things only. Everything else was banned. This is the philosophy of my university, it seems.

What do you suggest, how to retrieve the type for them? We stored the CallExpr as OriginExpr for the region. Should we store something else in case of blocks, or should we handle this in getType()? How to distinguish parameters of the block from the captured parameters of the function? This case sounds very complicated to handle.

baloghadamsoftware marked an inline comment as done.Apr 28 2020, 1:12 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
601–603 ↗(On Diff #260387)

OK, but what should I check for liveness here? We do not have a Decl. We could retrieve the Expr for the Arg, except for CXXNewOperator because there the size argument cannot be retrieved.

Getting the type of parameter regions has no crashes on the test suite.

Wrong diff uploaded previously.

baloghadamsoftware marked 5 inline comments as done.Apr 28 2020, 9:02 AM

Very slow progress, but lots of open questions. @NoQ, please try to answer them.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
558

Is this the correct handling of this kind of calls?

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2470

What to do with function parameters captured by a block? I tried to search for the owner function backwards in the chain of stack frames, but I did not find it.

2478

Here we have no CallEvent so no better came to my mind.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
204 ↗(On Diff #260659)

This whole branch should be moved to a separate function called e.g. getArgExpr(). But what to do with param 0 of CXXNewExpr()? It is the size parameter for which we do not have an Expr. For the greater indices we subtract 1 from the Index.

clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
601–603 ↗(On Diff #260387)

Here could help the function getArgExpr() of ParamRegion, I suppose. We should put it into the place of getOriginExpr(). Is this correct?

Now I made some good progress, thank you @NoQ for your suggestions. Not all of them are implemented yet (currently the +-1 with the indices seem to be working, thus I will fix that part later), but now I managed to reduce failing tests from 340 to 19. Most of these tests are about missing notes, so the next task is handling the new ParamRegion in the bug reporter visitors. I already made some progress there too.

Bug reporter visitors updated, failing tests reduced from 19 to 6.

NoQ added inline comments.May 3 2020, 2:06 PM
clang/lib/StaticAnalyzer/Core/CallEvent.cpp
558

It's not ideal because we're producing a different region for the inherited constructor whereas in reality it's the same region as in the inheriting constructor.

Let's not dig too deeply into this though; i'll be perfectly happy with doing whatever doesn't crash and adding a FIXME for later. I'm pretty sure my previous patch is also incorrect for the same reason.

A few tests won't hurt though, and they might help you find the right approach as well. After all, the analyzer is a C++ interpreter; all questions about "is this the correct handling...?" can be answered by comparing the behavior of the analyzer with the behavior of the actual program at run-time.

Only 4 tests failing, but still lot to do.

Fixed and error in call_once.cpp but stuck with debugging of objc-radar17039661.m.

Wrong diff uploaded.

baloghadamsoftware marked an inline comment as done.May 6 2020, 11:53 AM

See the difference between the correct (master branch) and the incorrect (my attempt) outputs.

clang/lib/StaticAnalyzer/Core/Store.cpp
472 ↗(On Diff #262431)

Probably this loop could be written better, without break at the end, but for now it des what it should do. For captured parameters of functions and blocks we must look for the original CallExpr and LocationContext. If it does not exist (we analyze the block of the lambda top-level) we revert to VarRegion since the captured parameters are simple variales for the block or lambda. However, we cannot do this if the block or lambda is not analyzed top-level, thus the approach I use above seems to be the correct one. However, this completely breaks the test objc-radar17039661.m. Even order of the postCall() hooks is changed and the test fails because it cannot find the bug. I try to attach the two different outputs annotated by debug printouts. @NoQ, do you have an idea what could be wrong here? First I thought on BlockDataRegions where it seems I have to duplicate lots of code and also change the capture interface to also enable ParamRegions for the captures. However, in this case it does not seem to play a role.

I also tried to compare the exploded graphs, but unfortunately the rewriter Python script crashed on them.

baloghadamsoftware marked an inline comment as done.May 6 2020, 10:47 PM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/Store.cpp
472 ↗(On Diff #262431)

The really strange thing is that I originally used a recoursive approach here, instead of the loop, which I still believe is the right one. However, in that case the test failed even if I removed all creations of ParamRegions. The only difference then was that the LocationContext of the captured region was the top-level LocationContext. This alone changed the calling order of the checker hooks and this happens here as well. It is not the VarRegion vs ParamRegion problem but the LocationContext of the region. I still do not see why this influences the calling order of these hooks. I am already debugging it for almost 15 hours without any clue.

Added support for ParamRegions as original region for captured variables by blocks and lambdas. Only 2 failing tests left.

baloghadamsoftware marked an inline comment as done.May 8 2020, 2:52 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/Store.cpp
472 ↗(On Diff #262431)

Problem solved. Captured variables are always variables, even if they were paramseters originally. I added support for binding the old values to the new regions upon capture for ParamRegions as well. This solved the failing test.

baloghadamsoftware retitled this revision from [Analyzer][WIP] Avoid handling of LazyCompundVals in IteratorModeling to [Analyzer] Avoid handling of LazyCompundVals in IteratorModeling.
baloghadamsoftware edited the summary of this revision. (Show Details)

Rebased to previous patches and updated according to them.

No new functions in CallEvent needed, instead new functions in the iterator library.

Rebased on master which now contains the handling of pointers as iterators. All tests passed.

Crash Fix: do not retrieve first iterator argument in advance. (It may happen that although argument count is more than zero, there is no argument at all.)

else branch merged with inner if.

Aside from infrastructural questions which I am not qualified ( nor particularly knowledgeable :3 ) to address, this looks good to me.

clang/lib/StaticAnalyzer/Checkers/STLAlgorithmModeling.cpp
77 ↗(On Diff #282876)

999 seems a bit arbitrary here, consider using std::numeric_limits<unsigned>::max(), or llvm::Optional<unsigned>.

Updated according to the latest comment.

baloghadamsoftware marked an inline comment as done.Aug 13 2020, 4:06 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Checkers/STLAlgorithmModeling.cpp
77 ↗(On Diff #282876)

I refactored this part. The earlier approach was very ugly and non-standard. I just intended to make it work as quickly as possible, but I forgot to refactor after that. Thank you for noticing it!

gamesh411 accepted this revision.Aug 14 2020, 12:36 AM

Thanks! LGTM now.

This revision is now accepted and ready to land.Aug 14 2020, 12:36 AM
NoQ added inline comments.Sep 21 2020, 11:38 PM
clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

I still believe you have not addressed the problem while moving the functions from D81718 to this patch. The caller of this function has no way of knowing whether the return value is the prvalue of the iterator or the glvalue of the iterator.

Looks like most callers are safe because they expect the object of interest to also be already tracked. But it's quite possible that both are tracked, say:

Container1<T> container1 = ...;
Container2<Container1::iterator> container2 = { container1.begin() };
container2.begin(); // ???

Suppose Container1::iterator is implemented as an object and Container2::iterator is implemented as a pointer. In this case getIteratorPosition(getReturnIterator()) would yield the position of container1.begin() whereas the correct answer is the position of container2.begin().

This problem may seem artificial but it is trivial to avoid if you simply stop defending your convoluted solution of looking at value classes instead of AST types.

clang/lib/StaticAnalyzer/Checkers/IteratorModeling.cpp
553–554

When does this happen?

NoQ requested changes to this revision.Sep 22 2020, 12:59 AM
NoQ added inline comments.
clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

Ugh, the problem is much worse. D82185 is entirely broken for the exact reason i described above and you only didn't notice it because you wrote almost no tests.

Consider the test you've added in D82185:

void begin_ptr_iterator(const cont_with_ptr_iterator<int> &c) {
  auto i = c.begin();

  clang_analyzer_eval(clang_analyzer_iterator_container(i) == &c); // expected-warning{{TRUE}}
}

It breaks very easily if you modify it slightly:

void begin_ptr_iterator(const cont_with_ptr_iterator<int> &c) {
  auto i = c.begin();
  ++i; // <==

  clang_analyzer_eval(clang_analyzer_iterator_container(i) == &c); // Says FALSE!
}

The iterator obviously still points to the same container, so why does the test fail? Because you're tracking the wrong iterator: you treated your &SymRegion{conj_$3} as a glvalue whereas you should have treated it as a prvalue. In other words, your checker thinks that &SymRegion{conj_$3} is the location of an iterator object rather than the iterator itself, and after you increment the pointer it thinks that it's a completely unrelated iterator.

There's a separate concern about why does it say FALSE (should be UNKNOWN) but you get the point.

This revision now requires changes to proceed.Sep 22 2020, 12:59 AM
NoQ added inline comments.Sep 22 2020, 1:08 AM
clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

The better way to test D82185 would be to make all existing tests with iterator objects pass with iterator pointers as well. Like, make existing container mocks use either iterator objects or iterator pointers depending on a macro and make two run-lines in each test file, one with -D and one without it. Most of the old tests should have worked out of the box if you did it right; the few that don't pass would be hidden under #ifdef for future investigation.

baloghadamsoftware marked an inline comment as done.Sep 22 2020, 9:51 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

Thank you for your review and especially for this tip! It is really a good idea. I changed it now and it indeed shows the problem you reported. It seems that my checker mixes up the region of the pointer-typed variable (&i and &j) with the region they point to (&SymRegion{reg_$1<int * SymRegion{reg_$0<const std::vector<int> & v>}._start>} for i before the increment and &Element{SymRegion{reg_$1<int * SymRegion{reg_$0<const std::vector<int> & v>}._start>},1 S64b,int} for both i and j after the increment).

What I fail to see and I am asking you help in it is that the relation between this problem and the getReturnIterator() function. This function retrieves the object from the construction context if there is one, but for plain pointers there is never one. Thus this function is always Call.getReturnValue() like before this patch.

NoQ added inline comments.Sep 22 2020, 1:24 PM
clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

I am asking you help

I spent way more time on that already than i find reasonable. Please figure this out on your own by fixing the bug.

clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

I do not see why I got so a rude answer. I was just asking help in seeing the relation between the bug and this function. Because I do not see any. I think the bug is somewhere in handling unary and binary operators for pointers. I struggled with that part for this same reason and I thought I solved it but now I see that I did not. However, this function just looks for object under construction in the construction context of the function. If the function returns an object by value, then there will be one. In other cases there will be none. I do not see how this relates to pointers and glvalues and prvalues. For parameters you were fully right and I fixed that.

clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

OK. I tested it now on master. It is exactly the same bug. It is no wonder, because for non-objects such as pointers this function is exactly the same as Call.getReturnValue() since there are no objects under construction at all in this case. I will debug and fix the bug now and rebase this patch on that fix.

NoQ added inline comments.Sep 23 2020, 9:45 PM
clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

seeing the relation between the bug and this function

The bug is in the feature that provides additional test coverage for the function. Due to the bug we're still at zero test coverage for the issue under discussion. Because my previous attempts at productive discussion have so far failed due to you arguing that "all tests are passing", i'm asking you for two months already to present an actual concrete test to cover the issue. That's it. That's the relation.

clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

Sorry, if you feel that your attempts at productive discussion have so far failed. Actually, I try to do everything you suggest just to push this my patches forward. I try to implement the test cases you suggest with my best effort and if they fail, then I fix them. However, this far I have seen no test cases which pass before and fail after this particular patch. I understood your concern with the arguments and I changed the code. However, I still do not understand it for return values. Is it OK if I check the AST return type for CXXRecordType? It does not seem to solve anything (it does not solve the bug you described), but if you require that, then I will do it. I worked months to establish an infrastructure to be able to replace my previous solution which was based on LazyCompundVals. This patch is just what you wanted before continuing my other patches: use this new infrastructure instead of reaching the region behind LazyCompoundVals. All my patches are stopped because of this one. The checkers give false positives, there are no note tags for iterator changes. All I want is to fix these issues, but if you cannot explain me what is actually wrong with this one, then these checkers remain unusable. Our company is committed to develop in upstream, but I feel that my efforts are in vain to upstream my work.

clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

Now I created D88216 and here I got a few failures. I will look up at them, of course.

Sorry, if you think that it is an ignorance and arrogance on my side that I say that "all tests are passing". *It is not!* I am just struggling to understand you what is wrong in my approach. Our technical background about the C++ language and the analyzer are very different. Also, as I mentioned I am not a good tester. You claim that I included *zero* test cases when I really thought that I implemented a *full coverage*. That is the purpose of the review: if I forgot something, then you notice politely that I forgot to test it for some cases and it will not work. There is no need to be harsh and aggressive with someone who did not offend you. I feel that I am still learning the Analyzer. There are more and more parts that became clearer during the years, but I am still very far from a fully understanding. I do not think that I deserve this tone just because of that.

Even after rereading you comments I am not sure what do you mean I am missing here. Should I check for the AST type of the return value? Or should I check whether it is really an iterator? To me the return value seems "binary" in the sense that either the return value is a LazyCompoundVal *and* we have az object under construction *or* the return value is something else (A symbol, a region or an Unknown) *and* we do not have any object under construction. Is this incorrect then?

clang/lib/StaticAnalyzer/Checkers/Iterator.cpp
330–336 ↗(On Diff #285326)

To me the return value seems "binary" in the sense that either the return value is a LazyCompoundVal *and* we have az object under construction *or* the return value is something else (A symbol, a region or an Unknown) *and* we do not have any object under construction. Is this incorrect then?

Yes, it is. In some cases the return value is Unknown. Not much more use for us than LazyCompoundVal. (I still not fully understand in which cases we get Unknown.)

Test coverage was hugely increased by rebasing this patch to D88216. Two kind of tests failed. This last update addreses them:

  1. Handle the assignment operator of iterator objects. This is necessary for assignments more complex than an assignment to a variable (which is handled by checkBind()), such as an assignment to a structure/class member.
  1. Put back post-checking of MaterializeTemporaryExpr. This special kind of expression is not only used to materialize temporary expressions from LazyCompoundVals but also for temporaries as return values.

Furthermore I added some comments explaining the functions which retrieve an iterator argument and iterator return value.

Now all tests pass again with the increased coverage. @NoQ if you are still in doubt, please suggest me further test cases to explore. I understand your concerns but cannot address them without some suggestions where my solution could fail.

baloghadamsoftware retitled this revision from [Analyzer] Avoid handling of LazyCompundVals in IteratorModeling to [Analyzer][NFC] Avoid handling of LazyCompundVals in IteratorModeling.Sep 24 2020, 11:29 PM

Oh, I think now what do you mean: iterators stored in containers, thus iterators iterating over iterators. Yes, the current working of the checker does not support it because it stores iterator positions for both the prvalues and the glvalues. This is so from the beginning and this patch does not change anything about this behavior. Of course, this design concept is questionable, we can change it in the future. But not in this patch, this one is purely an NFC: it has exactly the same functionality as the previous versions, the only difference is that it does not hamper anymore with LazyCompoundVals but reaches the real region of the objects (and only in case of objects by value, I check the AST type for paramters), exactly as you requested.

Anyway, to me storing iterators in containers sounds dangerous. Iterators are like lighted matches which you use and then extinguish. You do not put them into the drawer on into a paper box. If you store iterators in a container, how can you track which one of them is invalidated? This sounds a very bad practice to me.

Anyway, we can change the whole iterator modeling and checking in the future to support even this bad practice, but in their current state it is much more beneficial for the users to reduce the huge number of false positives and add note tags for the reported bugs, I think. Maybe we could better explain the current working in the comments at the beginning, and put FIXME-s there. Eventually we can also add such test cases, but with the current functionality even that cannot be done, because we currently do not keep track the content of the containers.

@gamesh411, @whisperity, @martong and others, please suggest me new test cases if you think the current ones are not enough.

Now I completely know what the source of our misunderstanding is. You thought that this patch will fix an issue, namely that we store iterator positions for both the iterator values and the locations of the iterator variables. While this is definitely a wrong approach, it could not be fixed until we got rid from the hack with LazyCompoundVals. These functions keep this existing issue unfixed. However, I could not understand you because I was completely focusing on how these functions could introduce a new issue and I could not find it. No wonder. Thus all my efforts were focused on finding a test cases which passed in the earlier versions but fail in this one because of these functions. Of course I could not find any and I was continuously proving that my patch is correct in the sense that it does not make things worse than they were. That is why I implemented the pointer-based iterators (it would have been better after this patch and a subsequent one that fixes this issue) where I was facing problems because of this wrong approach. In the same time you continuously came with new examples which tried to prove the issue, but I could not understand it because all these new test cases failed in the master version as well. We talked about two very different things because we had very different perceptions about the goal of this patch. That was all.