This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] [NFC] Parameter Regions
AbandonedPublic

Authored by baloghadamsoftware on May 11 2020, 3:58 AM.

Details

Summary

Currently, parameters of functions without their definition present cannot be represented as regions because it would be difficult to ensure that the same declaration is used in every case. To overcome this, we introduce a new kind of region called ParamRegion which does not store the Decl of the parameter variable. Instead it stores the index of the parameter which enables retrieving the actual Decl every time using the function declaration of the stack frame.

Diff Detail

Event Timeline

Currently, parameters of functions without their definition present cannot be represented as regions because it would be difficult to ensure that the same declaration is used in every case. To overcome this, we introduce a new kind of region called ParamRegion which does not store the Decl of the parameter variable. Instead it stores the index of the parameter which enables retrieving the actual Decl every time using the function declaration of the stack frame.

I know vaguely of what you're talking about because I've been trying to follow your works so far, but I'd prefer to include a bit more context for those not so familiar with MemRegions. Such as, "Usually, we identify a MemRegion with ..., but for parameters, this is difficult because ..., as seen on this example ... . So, this patch introduces a new region, etcetc."

Some immediate questions:

  • What identifies a MemRegion, TypedValueRegions in particular? Why are parameters so special that they deserve their own region? Do you have a concrete, problematic example?
  • Why is it an issue that we don't always use the same declaration? Does this result in a crash, incorrect modeling?
  • Why are we making ParamRegion not a subclass of VarRegion?
  • The patch includes some code duplication, which may be okay, but do we need it?

I haven't read every line thoroughly just yet, so I'll catch up with this patch later! In any case, thank you so much for your investment in the health of the codebase!

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
435–447

Hmm. So, the surrounding code definitely suggests that we're definitely finishing for a parameter here, but it isn't always a parameter region? I know you struggled with this, have you gained any insight as to why?

clang/lib/StaticAnalyzer/Checkers/ValistChecker.cpp
176–181

This is interesting. I looked up DeclRegion, and it seems to be the region that is tied to a ValueDecl. VarDecl is a subtype of ValueDecl, and ParmVarDecl is a subtype of VarDecl, so wouldn't it make sense for ParamRegion to be a subtype of VarRegion?

baloghadamsoftware marked 2 inline comments as done.May 11 2020, 6:12 AM

Thank you for quickly looking into this.

  • What identifies a MemRegion, TypedValueRegions in particular? Why are parameters so special that they deserve their own region? Do you have a concrete, problematic example?

ParamRegion contains different data: mainly the index of the parameter. This makes it independent of the actual Decl of the parameter.

  • Why is it an issue that we don't always use the same declaration? Does this result in a crash, incorrect modeling?

See D49443#1193290.

  • Why are we making ParamRegion not a subclass of VarRegion?

Because VarRegion is a DeclRegion which stores the Decl of the variable. Here the main purpose is to not to store it.

  • The patch includes some code duplication, which may be okay, but do we need it?

Yes, and this comes from my previous answers.

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
435–447

Not all regions for ParmVarDecl are ParamRegions. Exceptions are parameters captured by a lambda or a block as well as parameters of functions analyzed top-level.

clang/lib/StaticAnalyzer/Checkers/ValistChecker.cpp
176–181

DeclRegion stores the Decl, ParamRegion retrieves it based on the Index it stores. There is no is-a relation between them.

Minor fix to get rid of a warning, some comments added.

baloghadamsoftware marked an inline comment as done.May 11 2020, 6:17 AM
baloghadamsoftware added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1051

We do not use this at all. However, if I remove it then tests begin to fail with strange reasons, some of them even crashes at strange points. It seems that we need this for profiling the subclass.

balazske added inline comments.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h
406–407

I would expect that this returns a ParamRegion (from the name of the function). (The implementation shows that the type can just be changed.)

clang/lib/StaticAnalyzer/Checkers/ValistChecker.cpp
176–181

During the lifetime of a ParamRegion is it possible that it will return different Decl objects?

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
744–745

This comment is not fully true any more.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
231–232

Is it correct to create VR only to have this assert?

clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp
358

This break is at wrong place.

Thank you for your comments, @balazske. You are completely right, I updated the patch now.

baloghadamsoftware marked 5 inline comments as done.May 11 2020, 9:51 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Checkers/ValistChecker.cpp
176–181
balazske added inline comments.May 11 2020, 11:58 PM
clang/lib/StaticAnalyzer/Checkers/ValistChecker.cpp
176–181

Purpose of the question is that if the answer is "no", the decl could be stored at construction of a ParamRegion and it could be a subclass of DeclRegion. From the code I do not see that the belonging Decl can change, probably yes if the stack frame changes somehow. So the reason to have the ParamRegion seems to be to have the expression and index available, maybe use these for profiling (instead of a Decl). Still it could be made a subclass of DeclRegion that stores additionally the expr and index and profiling works with these values.

Thank you for quickly looking into this.

  • What identifies a MemRegion, TypedValueRegions in particular? Why are parameters so special that they deserve their own region? Do you have a concrete, problematic example?

ParamRegion contains different data: mainly the index of the parameter. This makes it independent of the actual Decl of the parameter.

  • Why is it an issue that we don't always use the same declaration? Does this result in a crash, incorrect modeling?

See D49443#1193290.

  • Why are we making ParamRegion not a subclass of VarRegion?

Because VarRegion is a DeclRegion which stores the Decl of the variable. Here the main purpose is to not to store it.

To me that sounds like a technicality. Sure, inheriting from VarRegion would make you carry around a field or two you may not need/use, but is that a big deal? To me it feels like all the problems you're solving should be an implementation detail. The interface of ParamRegion looks like a superset of VarRegion's. The code changes suggests that every time we're interested in a VarRegion, we're also interested in parameters.

I may not see the obvious just yet, but I invite you to correct me! :)

NoQ added a comment.May 12 2020, 1:24 PM

Blanket reply! ParamRegion is not a DeclRegion because it does not necessarily have a corresponding Decl. For instance, when calling a function through an unknown function pointer, you don't have the declaration of a function, let alone its parameters. But for parameters of C++ object type you still need your ParamRegion because arguments are constructed into it.

clang/lib/StaticAnalyzer/Checkers/NSErrorChecker.cpp
208–209

Yes it is a StackArgumentsSpaceRegion! Because we're a parameter.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
164

SSR shouldn't ever be null. Any parameter must be on the stack.

1604–1609

I suggest making a function in MemRegionManager that takes a Decl and returns a region. There should only be one place in the code that decides when exactly do we produce a ParamRegion instead of a VarRegion.

1750

Does this method also suffer from the parameter vs. argument off-by-one problem with operators?

1753–1754

Does this actually ever happen?

1758

Why would that ever happen? If it's just a sanity check, it should be an assertion.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
2031

Let's simply make a blanket assertion in ParamRegion's constructor.

NoQ added inline comments.May 12 2020, 1:24 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1051

Profile is completely essential. Without it different regions will be treated as if it's the same region.

1054–1055

It looks like you decided to keep VarRegion for the top frame.

I want that asserted here, in the constructor, so that never to make a mistake on this front.

In D79704#2032271, @NoQ wrote:

Blanket reply! ParamRegion is not a DeclRegion because it does not necessarily have a corresponding Decl. For instance, when calling a function through an unknown function pointer, you don't have the declaration of a function, let alone its parameters.

Well hold on for a second then -- how is this, if it is, different for member pointers? Aren't they represented with a VarRegion?

But for parameters of C++ object type you still need your ParamRegion because arguments are constructed into it.

Could you give a specific code example?

Since this patch isn't WIP, lets add unit tests that shows the added interface, highlighting pain points such as this.

Szelethus requested changes to this revision.May 13 2020, 5:11 AM

Yeah, this patch should definitely have unit tests. All similar patches should.

This revision now requires changes to proceed.May 13 2020, 5:11 AM
baloghadamsoftware marked 9 inline comments as done.May 13 2020, 5:58 AM

Than you for your reviews, @NoQ and @Szelethus.

In D79704#2032271, @NoQ wrote:

Blanket reply! ParamRegion is not a DeclRegion because it does not necessarily have a corresponding Decl. For instance, when calling a function through an unknown function pointer, you don't have the declaration of a function, let alone its parameters. But for parameters of C++ object type you still need your ParamRegion because arguments are constructed into it.

Hmm, I removed the part that tries to retrieve the parameter from the arguments of the CallExpr-like Expr because it was never invoked (I used an assert(false) there and executed all the tests and they passed). This means that we always found a Decl. However, this Decl is not stored but retrieved always based on the actual stack frame: we get the Decl of the function/method/block/etc. from the stack frame and find its parameter using the Index. This is always successful, at least in all the analyzer tests.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1051

I know, that was not the question. It is the OriginExpr which I tried to remove because I do not use it at all, except for profiling. It seems that without this field the Profile does not work correctly.

1054–1055

Yes, because in the top frame we neither have CallExpr nor Decl. So we should assert here that we are not in the top frame, right? How do we check this based on these parameters?

clang/lib/StaticAnalyzer/Checkers/NSErrorChecker.cpp
208–209

So I can remove the inner branch?

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
164

OK, I will remove this check.

1604–1609

Good idea!

1750

Actually I removed all usage of the expression thus I got rid of the problem.

1753–1754

I will check it.

1758

This happens all the time a lambda or a block captures a parameter of the enclosing function. I decided to keep the regions of captured variables VarRegions regardless whether they are parameters or plain variables. This was the only way I could found. This must not cause any problem.

clang/lib/StaticAnalyzer/Core/RegionStore.cpp
2031

OK.

Szelethus added inline comments.May 13 2020, 6:01 AM
clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
435–447

That would be a lovely unit test case! :)

Yeah, this patch should definitely have unit tests. All similar patches should.

I wonder how I could make unit tests for this. I already looked up the unit tests and found no unit test for regions.

baloghadamsoftware marked an inline comment as done.May 13 2020, 7:50 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
1753–1754

Oh yes. If we have no CallSite, then we are in the top frame. Should I check LC->inTopFrame() instead?

Updated according to the comments from @NoQ.

baloghadamsoftware marked 4 inline comments as done.May 13 2020, 7:54 AM

Unit test added.

baloghadamsoftware marked an inline comment as done.May 13 2020, 9:20 AM
NoQ added a comment.May 13 2020, 10:41 AM
In D79704#2032271, @NoQ wrote:

Blanket reply! ParamRegion is not a DeclRegion because it does not necessarily have a corresponding Decl. For instance, when calling a function through an unknown function pointer, you don't have the declaration of a function, let alone its parameters.

Well hold on for a second then -- how is this, if it is, different for member pointers? Aren't they represented with a VarRegion?

They aren't even Loc!

We currently don't have a representation for an unknown pointer-to-member. A known pointer-to-member is represented by a FieldDecl (not sure about static members) but it's still a NonLoc and as such isn't a region at all. Because, well, you can't dereference a pointer-to-member; you can only apply it to a base pointer like an offset. The mental model is "an offset". Therefore it's NonLoc.

Pointers to member functions are a different thing; they're function pointers so they're kinda regions.

But for parameters of C++ object type you still need your ParamRegion because arguments are constructed into it.

Could you give a specific code example?

struct S {
  S() {
    this; // What region does 'this' point to...
  }
};

void foo(void (*bar)(S)) {
  bar(S()); // ...in this invocation?
}
NoQ added inline comments.May 14 2020, 2:11 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
1051

Wait, how is it not used? You can't get the region type without OriginExpr.

1054–1055

Yes, you did what i meant. The first assertion is redundant because isa is implied by cast. Let's also assert that OE is non-null.

clang/lib/StaticAnalyzer/Checkers/NSErrorChecker.cpp
208–209

In fact you can go even further and change the type of the overridden getMemorySpace() function to return const StackArgumentsSpaceRegion * ("covariant return types") so that to drop the cast as well.

clang/lib/StaticAnalyzer/Core/CallEvent.cpp
231–232

It's a valid pattern but it should be followed by (void)VR; in order to avoid unused variable warning in no-assert builds.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
187

This doesn't work when the callee is unknown. You need to consult the origin expr here.

191

This doesn't work when the callee is unknown.

579

PVD may be null.

1240

Let's make a stronger assertion:

if (SFC->inTopFrame())
  return getVarRegion(PVD, LC);
assert(CallSite && "Destructors with parameters?");
baloghadamsoftware marked an inline comment as done.May 14 2020, 4:02 AM

Good news: I executed an analysis with all open-source checks enabled for several open-source projects: BitCoint, CURL, OpenSSL, PostGreS, TMux, Xerces using the patched and the non-patched version of Clang. There is no difference in the set of bug reports. Only one error message became more detailed for BitCoin which uses Boost: instead of Potential memory leak I get Potential leak of memory pointed to by 'Finder.m_Pred.m_Storage.m_dynSet' in line 135 of Boost header algorithm/string/detail/classification.hpp and almost the same without Finder. in line 139.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

Please give me an example where the callee is unknown. As I wrote, originally I put a getExpr() method here as well and getType() fell back to it if it could not find the Decl(). However, it was never invoked on the whole test suite. (I put an assert(false) into it, and did not get a crash.

NoQ added inline comments.May 14 2020, 5:10 AM
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

Please give me an example where the callee is unknown.

In D79704#2034571, @NoQ wrote:

Could you give a specific code example?

struct S {
  S() {
    this; // What region does 'this' point to...
  }
};

void foo(void (*bar)(S)) {
  bar(S()); // ...in this invocation?
}
baloghadamsoftware marked an inline comment as done.May 14 2020, 6:54 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

OK, but it still does not crash the analyzer, even if I enable creation of stack frames for all the callees, even for those without definition. What else should I do to enforce the crash (null pointer dereference)? Try to use getParameterLocation() in a unit test?

In D79704#2034571, @NoQ wrote:
In D79704#2032271, @NoQ wrote:

Blanket reply! ParamRegion is not a DeclRegion because it does not necessarily have a corresponding Decl. For instance, when calling a function through an unknown function pointer, you don't have the declaration of a function, let alone its parameters.

Well hold on for a second then -- how is this, if it is, different for member pointers? Aren't they represented with a VarRegion?

They aren't even Loc!

We currently don't have a representation for an unknown pointer-to-member. A known pointer-to-member is represented by a FieldDecl (not sure about static members) but it's still a NonLoc and as such isn't a region at all. Because, well, you can't dereference a pointer-to-member; you can only apply it to a base pointer like an offset. The mental model is "an offset". Therefore it's NonLoc.

Pointers to member functions are a different thing; they're function pointers so they're kinda regions.

But for parameters of C++ object type you still need your ParamRegion because arguments are constructed into it.

Could you give a specific code example?

struct S {
  S() {
    this; // What region does 'this' point to...
  }
};

void foo(void (*bar)(S)) {
  bar(S()); // ...in this invocation?
}

I remember this coming up so many times, and I'm an inch away from getting it but I'm not there just yet ^-^ Sorry if you have to repeat yourself too many times.

The example you have shown sounds like a case we could (should?) abstract away. If we model in accordance to the AST as closely as possible, ParamRegion should totally be a VarRegion. If something tricky like this came up, the getDecl function should just return with a nullptr, shouldn't it? The code changes make me feel like we're doing a lot of chore (and make it super easy to forget checking for parameters explicitly). The gain, which is I guess correctness, seems negligible, and I'm not sure this is actually correct, as I mentioned in D79704#inline-730731. I don't see why the most defining feature of a DeclRegion is supposed to be an always existing Decl, rather than a mirror of the AST.

NoQ added a comment.May 14 2020, 12:38 PM

If something tricky like this came up, the getDecl function should just return with a nullptr, shouldn't it?

How far are you willing to push it? For instance, are you ok with, say, TypedValueRegion::getValueType() return a null QualType in this single rare cornercase? Because that's what we'll also have to do if a Decl isn't available in VarRegion. Unless we add more stuff into VarRegion, which is basically what we're doing (and it just suddenly turns out that we don't need the Decl anymore).

The code changes make me feel like we're doing a lot of chore (and make it super easy to forget checking for parameters explicitly).

I wouldn't mind having a common base class for these regions. DeclRegion should probably be dropped in this case, in order to avoid dealing with multiple inheritance. And it's definitely the one that needs to be dropped because it currently does nothing except "inheritance for code reuse": there are basically no other common properties between its sub-classes than the accidental code reuse in its methods.

The gain, which is I guess correctness, seems negligible, and I'm not sure this is actually correct, as I mentioned in D79704#inline-730731.

The gain is that we finally have a way to express some regions that we previously could not express at all. This gain isn't huge; i agree that @baloghadamsoftware could most likely get away without it. Apart from the example with no decl at all, i'm pretty excited about eliminating a lot of annoying ambiguities with respect to virtual method calls and function redeclarations that i've been struggling in D49443.

More importantly, this change is correct because that's how programs actually behave. You don't need to know everything (and a Decl is literally everything) about the function you're calling in order to call it. The calling convention only requires you to put arguments into certain places in memory and these places are entirely determined by the indices and types of the arguments. And that's exactly what we're trying to mimic. We technically don't even need the OriginExpr itself but it's a convenient (and so far seemingly harmless) way of capturing all the information that we actually need (which is, well, indices and types of arguments).

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

Yes. I demonstrated how you can get the region without a decl but you should turn this into a test that actually calls one of the problematic functions. Like, clang_analyzer_dump() it or something.

Alright, I'm up to speed I think, cheers!

In D79704#2037100, @NoQ wrote:

The code changes make me feel like we're doing a lot of chore (and make it super easy to forget checking for parameters explicitly).

I wouldn't mind having a common base class for these regions. DeclRegion should probably be dropped in this case, in order to avoid dealing with multiple inheritance. And it's definitely the one that needs to be dropped because it currently does nothing except "inheritance for code reuse": there are basically no other common properties between its sub-classes than the accidental code reuse in its methods.

Totally. Something like a VarRegionBase but with a more clever name.

NoQ added a comment.May 15 2020, 4:34 AM
In D79704#2037100, @NoQ wrote:

The code changes make me feel like we're doing a lot of chore (and make it super easy to forget checking for parameters explicitly).

I wouldn't mind having a common base class for these regions. DeclRegion should probably be dropped in this case, in order to avoid dealing with multiple inheritance. And it's definitely the one that needs to be dropped because it currently does nothing except "inheritance for code reuse": there are basically no other common properties between its sub-classes than the accidental code reuse in its methods.

Totally. Something like a VarRegionBase but with a more clever name.

We can even call it VarRegion and have sub-classes be ParamVarRegion and NonParamVarRegion or something like that.

baloghadamsoftware marked an inline comment as done.May 15 2020, 7:11 AM
In D79704#2038280, @NoQ wrote:
In D79704#2037100, @NoQ wrote:

The code changes make me feel like we're doing a lot of chore (and make it super easy to forget checking for parameters explicitly).

I wouldn't mind having a common base class for these regions. DeclRegion should probably be dropped in this case, in order to avoid dealing with multiple inheritance. And it's definitely the one that needs to be dropped because it currently does nothing except "inheritance for code reuse": there are basically no other common properties between its sub-classes than the accidental code reuse in its methods.

Totally. Something like a VarRegionBase but with a more clever name.

We can even call it VarRegion and have sub-classes be ParamVarRegion and NonParamVarRegion or something like that.

Do you mean getDecl() should be pure virtual with different implementation for ParamVarRegion (retrieves dynamically based on its Index) and NonParamVarRegion (stores). However, there are some places in the code that check for DeclRegion and use its getDecl() so to avoid code duplication we can keep DeclRegion, but remove storing of Decl from it and make getDecl() pure virtual already at that level.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

Of course I tried clang_analyzer_explain(), but neither clang_analyzer_dump() helps. I also tried a unit test now where I call getParameterLocation() explicitly. It turns out that parameter regions are never created for functions without Decl because of the first lines in CallEvent::getCalleeAnalysisDeclContext(). This function needs the Decl to retrieve the AnalysisDeclContext of the callee, which is needed to retrieve its stack frame by getCalleeStackFrame(). Without stack frame we do not create ParamRegion. The other two functions creating ParamRegion (CallEvent::addParameterValuesToBindings and the newly created MemRegionManager::getRegionForParam) start from ParmVarDecl which always belongs to a Decl. So we can safely assume that currently all parameter regions have a Decl. Of course, this can be changed in the future, but I must not include dead code in a patch that cannot even be tested in the current phase. Even creation of callee stack frame for functions without definition is not part of this patch, but of the subsequent one.

NoQ added a comment.May 17 2020, 1:44 PM

It turns out that parameter regions are never created for functions without Decl because of the first lines in CallEvent::getCalleeAnalysisDeclContext().

Removing these lines is more or less the whole point of your work.

NoQ added inline comments.May 17 2020, 1:51 PM
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

neither clang_analyzer_dump() helps

So, does it dump a parameter region? Because it obviously should.

baloghadamsoftware marked an inline comment as done.May 18 2020, 1:05 AM
In D79704#2040798, @NoQ wrote:

It turns out that parameter regions are never created for functions without Decl because of the first lines in CallEvent::getCalleeAnalysisDeclContext().

Removing these lines is more or less the whole point of your work.

Is it? The point of my work is to avoid the crashes you mentioned in D49443#1193290 by making the regions of the parameters independent of their declarations so we can remove the limitation that callee's stack frame is only created for functions with definition. (Can you provide me with examples which the crashes you mentioned with VarRegions? I need them for testing my solution.) It could be a future step to also remove the limitation that also allows getting the region for parameters without declaration. Even removing the original limitation (creation of stack frame without definition) is part of my planned subsequent patch, not this one. I am trying to follow incremental development to avoid too huge patches.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

It dumps a temporary region as until now. So this patch does not change the behavior of the analyzer at this point.

Unit test updated to cover all cases.

In D79704#2040798, @NoQ wrote:

It turns out that parameter regions are never created for functions without Decl because of the first lines in CallEvent::getCalleeAnalysisDeclContext().

Removing these lines is more or less the whole point of your work.

Anyway, when we remove it in a later patch, what should we use for stack frame of the callee? Currently we retrieve the AnalysisDeclContext for the Decl and then get a stack frame for it. If Decl is missing, then we cannot get its AnalysisDeclContext either, so how can we get a stack frame? However, currently we need a solution where we can get stack frame for callees with Decl but without definition. Since you wrote that this causes crashes because at different points different Decls of the same ParmVarDecl are used we created ParamRegion that does not store the Decl but dynamically retrieves it based on the contents of the actual stack frame.

Wow, this is some nice work and huge effort from all the participants, it was pretty informative to read through. Thanks @baloghadamsoftware and @NoQ for working on this!

This means that we always found a Decl. However, this Decl is not stored but retrieved always based on the actual stack frame: we get the Decl of the function/method/block/etc. from the stack frame and find its parameter using the Index. This is always successful, at least in all the analyzer tests.

Can we find the FunctionDecl if the call happens through a function pointer? Or in that case do we actually find the VarDecl of the function pointer?

Without stack frame we do not create ParamRegion. The other two functions creating ParamRegion (CallEvent::addParameterValuesToBindings and the newly created MemRegionManager::getRegionForParam) start from ParmVarDecl which always belongs to a Decl. So we can safely assume that currently all parameter regions have a Decl. Of course, this can be changed in the future, but I must not include dead code in a patch that cannot even be tested in the current phase. Even creation of callee stack frame for functions without definition is not part of this patch, but of the subsequent one.

So, in this patch, we are trying to solve only that problem when the callee FunctionDecl does not have a definition?

If Decl is missing, then we cannot get its AnalysisDeclContext either, so how can we get a stack frame? However, currently we need a solution where we can get stack frame for callees with Decl but without definition.

This happens in case of top-level params and in case of function pointers, plus the special argument construction, am I right?

Based on the answers to the above questions, possibly we are trying to solve two problems here. Could we split then this patch into two?

  1. callee FunctionDecl does not have a definition. This could assert on having an accessable Decl in ParamVarRegion. Here we could have the foundations of the next patch, i.e. we could have ParamVarRegion and NonParamVarRegion.
  2. function pointers, plus the special argument construction: ParamVarRegion may not have a Decl and we'd remove the related lines from getCalleeAnalysisDeclContext. This obviously requires more tests. And more brainstorming to get the AnalysisDeclContext when there is no Decl available for the function in getCalleeAnalysisDeclContext.

What do you think?

clang/lib/StaticAnalyzer/Checkers/ValistChecker.cpp
176–181

During the lifetime of a ParamRegion is it possible that it will return different Decl objects?

AFAIU, Yes. The CallExpr may refer to a FunctionDecl (the callee), but that Decl may not be the Canonical Decl of the redeclaration chain.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

I think we should run this patch against a test suite of open source projects. That could be tmux, curl, redis, xerces, bitcoin, protobuf (those are that we use in CTU at least). We should not have any crashes on theses projects because they exercise widely used language constructs. Besides, they provide a relatively broad set of use of the C and C++ languages, so that's quite a good coverage. In case of any assertion we can use creduce to nail the actual problem.

clang/unittests/StaticAnalyzer/ParamRegionTest.cpp
73

I think the test code would be more valuable if we could use the ASTMatcher to match for a given Decl and then for that Decl we could find the corresponding Region. Then we should have different test cases for the different regions (with expectations formulated on those regions). Perhaps a Decl* -> Region mapping is needed in the test Fixture.

SVal explanation extended and tests added for it.

Seems like many changes could be simplified or absolutely not needed in this patch if ParamRegion (or with a better name ParamVarRegion) was derived from VarRegion. Is there any difficulties in that derivation?

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
312

Is this used anywhere in this patch?

And why isn't it a CallExpr (instead of Expr)?

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountChecker.cpp
519

Again, ParamRegion should derive from VarRegion.

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
439

This would be simpler (noop?) again if ParamRegion was derived from VarRegion.

618

What if ParamRegion was derived from VarRegion ?

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
195

In all other cases we assert, here we return with a nullptr. Why?

919

Both the VarRegion and ParamRegion cases here do the same.
This suggest that maybe it would be better to have VarRegion as a base class for ParamVarRegion.
This is aligned to what Artem suggested:

We can even call it VarRegion and have sub-classes be ParamVarRegion and NonParamVarRegion or something like that.

But maybe NonParamVarRegion is not needed since that is actually the VarRegion.

1234

Why the return type is not ParamVarRegion*?

clang/lib/StaticAnalyzer/Core/SymbolManager.cpp
572

This is almost the copy-paste code for isLive(VarRegion). This again suggests that maybe ParamRegion (or ParamVarRegion?) should be derived from VarRegion. And in isLive(VarRegion) we should dyn_cast to ParamVarRegion and do the extra stuff if needed.

baloghadamsoftware marked 7 inline comments as done.May 20 2020, 6:45 AM

Can we find the FunctionDecl if the call happens through a function pointer? Or in that case do we actually find the VarDecl of the function pointer?

Yes, if it has a declaration, then we can retrieve it from the stack frame.

So, in this patch, we are trying to solve only that problem when the callee FunctionDecl does not have a definition?

Not even that. This patch is just a non-functional change that prepares the ground for such improvements. The next patch intends to solve them problem where the function has declaration but no definition.

This happens in case of top-level params and in case of function pointers, plus the special argument construction, am I right?

Top-level is a special case: there we do not have Expr either so we cannot handle it. For top-level functions the regions remain VarRegions. There is nothing special in parameters when analyzed top-level.

Based on the answers to the above questions, possibly we are trying to solve two problems here. Could we split then this patch into two?

  1. callee FunctionDecl does not have a definition. This could assert on having an accessable Decl in ParamVarRegion. Here we could have the foundations of the next patch, i.e. we could have ParamVarRegion and NonParamVarRegion.
  2. function pointers, plus the special argument construction: ParamVarRegion may not have a Decl and we'd remove the related lines from getCalleeAnalysisDeclContext. This obviously requires more tests. And more brainstorming to get the AnalysisDeclContext when there is no Decl available for the function in getCalleeAnalysisDeclContext.

That is exactly my suggestion, but not this, current patch, but the next one. The next patch is for functions without definition but declarations, and there could be another one in the future for functions without declaration. Of course this latter requires some extra code into ParamRegions to handle cases where we do not have a Decl. I cannot write it now because I cannot make a test for it, not even a unit test.

Seems like many changes could be simplified or absolutely not needed in this patch if ParamRegion (or with a better name ParamVarRegion) was derived from VarRegion. Is there any difficulties in that derivation?

It is one of my suggestions, I am still waiting for @NoQ's opinion: let us remove the Decl from DeclRegion and make getDecl() pure virtual there. Also not implement it in VarRegion, but as @NoQ suggested, create something like PlainVarRegion or NonParamVarRegion where we store it, and ParamVarRegion where we retrieve it instead of storing it.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
312

The origin expression of a call may be a set of different kinds of expressions: CallExpr, CXXConstructExpr, BlockExpr or ObjCMessageExpr.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
191

As I mentioned, I already did it: D79704#2036067

195

It was accidentally left here from an earlier experiment.

919

Usually we do not inherit from concrete classes by changing a method implementation that already exist. The other reason is even stronger: NonParamVarRegion must store the Decl of the variable, ParamVarRegion must not.

1234

Because for top-level functions and captured parameters it still returns VarRegion.

clang/unittests/StaticAnalyzer/ParamRegionTest.cpp
73

That was the original test, but the current one is more generic: checks for all parameters and checks whether we create the correct type of region for them.

Thanks for addressing my comments!

clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
312

Is this function used anywhere in this patch? Could not find any reference to it.

The origin expression of a call may be a set of different kinds of expressions: CallExpr, CXXConstructExpr, BlockExpr or ObjCMessageExpr.

Okay, makes sense, could you please document it then in the comment for the function? And perhaps we should assert on these kinds as a sanity check.

clang/lib/StaticAnalyzer/Core/MemRegion.cpp
919

Usually we do not inherit from concrete classes by changing a method implementation that already exist.

That's what we exactly do all over the place in the AST library.
Which method(s) should we change by the way?

Nevermind, I am perfectly fine having VarRegion as base and NonParamVarRegion and ParamVarRegion as derived.

1234

Okay, makes sense, could you please document it then in the comment for the function?

baloghadamsoftware marked an inline comment as done.

FIXME and assertions added.

The big question to decide: Either we keep ParamRegion as a separate region in the class hierarchy and at the few places where DeclRegion or VarRegion is used and parameters are possible we duplicate the few lines. (Current status.)

The other way is to remove Decl from DeclRegion and make getDecl() pure virtual. Also we split VarRegion into two subclasses, one of the is the non-parameter variable regions and the other is ParamVarRegion. The first one stores the Decl, the second one calculates it. This saves code duplication, however after we implement creation of stack frames for calls without Decl we must update all the code using DeclRegion and VarRegion where parameters are possible to handle nullptr as return value for getDecl().

@NoQ, please decide, which way to go.

Superseded by D80522.