This is an archive of the discontinued LLVM Phabricator instance.

getDescriptiveName() for MemRegion
ClosedPublic

Authored by Alexander_Droste on Jan 10 2016, 11:59 PM.

Details

Summary

This patch adds a utility function to extract the variable name (including indices) from a memory region.

Diff Detail

Event Timeline

Alexander_Droste retitled this revision from to getVariableName() for MemRegion.
Alexander_Droste updated this object.
Alexander_Droste added a reviewer: xazax.hun.
Alexander_Droste added subscribers: dcoughlin, zaks.anna.
xazax.hun added inline comments.Jan 11 2016, 3:23 AM
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
643 ↗(On Diff #44432)

Do you think 200 is a reasonable size here? In case the namespaces are not included in the printed name, I think it is very rare that a name would be longer than 100 characters.

658 ↗(On Diff #44432)

Can you use ER here instead of additional getAs?

  • reduced buffer size for variable name
  • simplified R->getAs<clang::ento::ElementRegion>()->getSuperRegion() to ER->getSuperRegion().
Alexander_Droste marked 2 inline comments as done.Jan 11 2016, 6:13 AM
Alexander_Droste added inline comments.
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
643 ↗(On Diff #44476)

You're right, 200 was a bit overambitious. I reduced it to 50.

xazax.hun added inline comments.Jan 13 2016, 5:49 AM
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
653 ↗(On Diff #44476)

What will be the content of IndexInArray in case the index is not a concrete int? Will the above code work at all in that case? In case the index is a variable do we want to include the name of the variable in the index?

656 ↗(On Diff #44476)

I do not really like this part. You could either use Twine to avoid some temporal values (and you could also eliminate the string copy this way) or you could construct the indices using the reverse of idx and than reverse the whole string after the loop.

Alexander_Droste marked an inline comment as done.Jan 17 2016, 12:17 PM
Alexander_Droste added inline comments.
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
653 ↗(On Diff #44476)

Right, getAs<..::ConcreteInt>() might return a nullptr.
What do you think of including the number if it is a nonloc::ConcreteInt,
the name if it is a nonloc::SymbolValKind (getAs<nonloc::SymbolVal>().getSymbol())
and else keeping the brackets empty?

656 ↗(On Diff #44476)

So would be invoking push_back within and reverse after the loop sufficient?

  • reduce copying
  • differentiate ER->getIndex().getAs<> cases

How about this?

xazax.hun added inline comments.Jan 20 2016, 2:19 PM
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
660 ↗(On Diff #45239)

The else should go into the same line as the closing }.

662 ↗(On Diff #45239)

These are the only cases that can occur? So it is not possible to have a loc index? Maybe we could get the value for loc and nonloc ConcreteInts and use getVariableName for everzthing else? Would that wok?

669 ↗(On Diff #45239)

In this case idx also needs to be reserved. Since it is very rare that there are a huge level of []s, it is possible that the Twine solution is more desirable.

682 ↗(On Diff #45239)

What is happening here exactly? In some cases there are ' symbols around the name and in other cases there aren't? Is there a way to solve this inconsistency (so we do not need to check for the last character)? Also it would be great that this functions always had consistent result, so either never put ' around the name or always.

tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
662 ↗(On Diff #45239)

getIndex() returns a nonloc and loc is in no subclass relation to nonloc.
Therefore, getIndex() cannot be a loc type.

There are actually 5 nonloc subclasses but I assumed
that it only makes sense to check for SymbolVal and
ConcreteInt.
http://clang.llvm.org/doxygen/classclang_1_1ento_1_1NonLoc.html

So would you suggest to only check for ConcreteInt
and call getVariableName() recursively if it isn't one?

669 ↗(On Diff #45239)

Ok, I'll look into the Twine API.

682 ↗(On Diff #45239)

I think I wasn't completely sure if single quotes
are always included when printPretty() is called.
I looked into MemRegion.cpp and they always seem
to be included, why the second case can be removed.

tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
662 ↗(On Diff #45239)

If we want to call getVariableName recursively, should we strip the single quotes then?
Else we would get something like 'x['a']['b']'. What do you think?

xazax.hun added inline comments.Feb 3 2016, 1:13 PM
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
662 ↗(On Diff #45239)

I agree. I think stripping is the way to go.

xazax.hun edited edge metadata.EditedFeb 23 2016, 6:52 AM

What is the status of this? Do you need some help with any of the modifications? I'd love to see this patch in the repository soon.

Alexander_Droste edited edge metadata.

Changes:

  • remove if (VariableName.size() && VariableName.back() == '\'') check
  • call getVariableName for index if Region is no ConcreteInt

Hi Gábor,
sorry about the delay! Here's the updated patch.
Is it really possible to reduce string copying in this case, with the help of llvm::Twine?
The problem with llvm::Twine seems to me that it cannot be modified after initialisation.
So there's no way to do sth. like: Twine t("a" + "b"); t += "c". A new Twine object needs to
be created for each subsequent concatenation. Do you have an idea how twine could be
applied here? Else I would claim that this function gets only used for diagnostics why
this level of efficiency might not be required.

Hi Gábor,
Is it really possible to reduce string copying in this case, with the help of llvm::Twine?
The problem with llvm::Twine seems to me that it cannot be modified after initialisation.
So there's no way to do sth. like: Twine t("a" + "b"); t += "c". A new Twine object needs to
be created for each subsequent concatenation. Do you have an idea how twine could be
applied here? Else I would claim that this function gets only used for diagnostics why
this level of efficiency might not be required.

Hi!

It works the following way:

If you write something like A = B + C + D; it will create two temporary strings, all of them might allocate memory.
If you write something like A = (Twine(B) + C + D).str(); it will only do one temporary string and one allocation.
So basically twine creates a data structure of pointers that points to the strings that you would like to concatenate, and once you do the conversion to string, it will do all the concatenations with one allocation.

  • Changed idx = R->getVariableName(); to idx = ER->getVariableName(); as it better describes what is happening on a semantic level.
xazax.hun added inline comments.Feb 24 2016, 8:39 AM
lib/StaticAnalyzer/Core/MemRegion.cpp
636

Small nit, I prefer to call the default constructor here.

638

Small nit: the mentioning the namespaces explicitly here and for smallstring and at some other places is redundant.

  • remove string idx variable -> remove string copy assignments
  • use Twine to reduce temporary string objects, built during concatenation

Is ArrayIndices = llvm::Twine(ArrayIndices + "]" + intValAsString.str() + "[").str();
preferable to: ArrayIndices += llvm::Twine("]" + intValAsString.str() + "[").str();
with respect to efficiency? If not, I would prefer the second version, as it is a bit more concise.

Since we went with the twine solution, I think we can get rid of the reverse stuff.

Note that:
Twine should be used like: (Twine(A) + B + C).str()
So the operator+ has twine as one of the operands.

Once these are fixed, it looks good to me.
Do you have commit access or do you need someone to commit this for you?

xazax.hun added inline comments.Feb 24 2016, 9:25 AM
lib/StaticAnalyzer/Core/MemRegion.cpp
672

nit: this is in fact an append. It is cleaner instead of using insert.

  • removed reverse
  • fixed twine usage

I think VariableName.insert(VariableName.size() - 1, ArrayIndices); is needed here.
varName, indices -> 'varname[...]' If append would be used, the second single quote would be before the indices, like this: 'varname'[...]

I don't have commit access, so I need someone to commit this for me.

xazax.hun accepted this revision.Feb 25 2016, 1:35 AM
xazax.hun added a reviewer: dcoughlin.
xazax.hun edited edge metadata.

It looks good to me. Devin could you take a look at this as well?

This revision is now accepted and ready to land.Feb 25 2016, 1:36 AM
dcoughlin edited edge metadata.Mar 6 2016, 4:12 PM

Hi Alexander,

Some comments in line. Also, I don't see any tests. Is this code tested by your MPI patch?

include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
160

I'm not sure that 'getVariableName()' conveys what this function does. The returned name is not always the name of a variable. How do you intend to use this method? How is it different than than printPretty()? Is it for debugging purposes? If so, then perhaps 'getDebugDescription()'? Is it for presentation to the user? Then perhaps 'getDescriptiveName()'?

lib/StaticAnalyzer/Core/MemRegion.cpp
635

It seems like this could potentially be simplified by adding a recursive helper that threads the output stream and appends to it after the base case has been hit. You could also modify prettyPrint() to take a flag indicating whether it should be quoted or not (and defaulting to true). What do you think?

Hi Devin,
thanks for taking the time! Yes, this is kind of implicitly tested by the MPI patch but I agree that it is necessary to add tests to MemRegion.

include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
160

The returned name is not always the name of a variable.

In which cases doesn't this function return the name? Isn't the worst case when printPretty can't be called and the string is empty?

The function is intended to return the variable name and collects index variable names / ConcreteInts along the way if the passed region is an ElementRegion. So printPretty is called on the 'top' region of an array at the end of this function. If the passed region is no ElementRegion the functionality is identical to printPretty.

The main motivation to write this function was to include specific indices in a diagnostic if obtainable, like presented in the MPI-Checker testfile
http://reviews.llvm.org/D12761#3c51dfcf.
E.g. Request 'sendReq1[1][7][9]' has no matching nonblocking call. My intent was not to use the function for debugging purposes.

So should we use getDescriptiveName or maybe sth. like getVariableNameWithIndices?.

lib/StaticAnalyzer/Core/MemRegion.cpp
635

Modifying printPretty seems like a nice solution to me. If you don't prefer the other idea, I would go this route.

dcoughlin added inline comments.Mar 7 2016, 1:41 PM
include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
160

Can the string ever be something like 'a.f[0][1]' (i.e., with a field name)? Or is it always 'a[0][1]'? If the first is possible I would suggest getDescriptiveName -- if not, then getVariableNameWithIndices seems fine.

lib/StaticAnalyzer/Core/MemRegion.cpp
635

Sounds good to me!

include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
160

Yes, the string includes the name of the struct. But this is not due to the while loop in this function but how printPretty is defined for FieldRegion. In case of a FieldRegion printPretty is called for the super-region first before the field declation name gets appended.

void FieldRegion::printPrettyAsExpr(raw_ostream &os) const {
  assert(canPrintPrettyAsExpr());
  superRegion->printPrettyAsExpr(os);
  os << "." << getDecl()->getName();
}

I'll go with getDescriptiveName.

Alexander_Droste retitled this revision from getVariableName() for MemRegion to getDescriptiveName() for MemRegion.
Alexander_Droste updated this object.
Alexander_Droste edited edge metadata.

Looking at the printPretty implementation once more, I realized that printPrettyAsExpression
is actually printPretty without quotes. Therefore, I didn't have to change printPretty and
getDescriptiveName handles the quoting differentiation.

I searched for a MemRegion testfile in llvm/tools/clang/test/** but couldn't find one.
Where should the MemRegion tests go?

I thunk the closest we have is region-store* tests, which is not quite the same. Feel free to add a new test.

Thanks for pointing this out. What's actually the best way to test the function?
If I test this function with an integration test, I need to rely on a checker which uses the function to output diagnostics.
Is it possible to test this with a unit test? My assumption is that parts of the MemRegion would then need to be mocked.

  • create MemRegion.cpp, in order to set up test cases for getDescriptiveName
test/Analysis/MemRegion.cpp
3

The problem about these tests is that they introduce a cyclic commit dependency. MPI-Checker depends on getDescriptiveName. getDescriptiveName depends on MPI-Checker because of the tests.

Further, MPI-Checker depends on this function:

SourceRange MemRegion::sourceRange() const {
  const VarRegion *const VR = dyn_cast<VarRegion>(this->getBaseRegion());
  const FieldRegion *const FR = dyn_cast<FieldRegion>(this);
  const CXXBaseObjectRegion*const COR = dyn_cast<CXXBaseObjectRegion>(this);

  // Check for more specific regions first.
  // FieldRegion
  if (FR) {
    return FR->getDecl()->getSourceRange();
  }
  // CXXBaseObjectRegion
  else if (COR) {
    return COR->getDecl()->getSourceRange();
  }
  // VarRegion
  else if (VR) {
    return VR->getDecl()->getSourceRange();
  }
  // Return invalid source range (can be checked by client).
  else {
    return SourceRange{};
  }
}

Initially, my idea was to submit the sourceRange patch after getDescriptiveName. Maybe it would be most convenient, to include the sourceRange function into this patch and finally commit all connected patches in one go.

I'd be fine if we test this function with the usual regression tests by observing the output of the MPI checker. We could update that test with more checks once the function is updated.

With that approach, you'd be committing both patches at the same time.

I'd be fine if we test this function with the usual regression tests by observing the output of the MPI checker. We could update that test with more checks once the function is updated.
With that approach, you'd be committing both patches at the same time.

You mean to:
Temporarily remove the MemRegion.cpp testfile -> committing getDescriptiveName -> committing sourceRange -> committing MPI-Checker -> then readding the testfile; right?

I submitted the sourceRange patch here: http://reviews.llvm.org/D18309
If also this patch would get committed as part of the package, there would be no need for an incremental commit procedure.

lib/StaticAnalyzer/Core/MemRegion.cpp
653

I wasn't able to build a test case yet for which the analyzer could not determine the constant value. Is there a way to trick the analyzer so that the else case is used ? Then I could test for something like 'sendReq1[a][7][b]'.

xazax.hun added inline comments.Mar 30 2016, 11:00 AM
lib/StaticAnalyzer/Core/MemRegion.cpp
653

You can try use a value returned from a function that has an unknown body. E.g.:

int getUnknown();

void f() {

int a = getUnKnown();

}

zaks.anna added inline comments.Mar 30 2016, 11:03 AM
lib/StaticAnalyzer/Core/MemRegion.cpp
653

What happens when you try 'sendReq1[a][7][b]'? Does it know the values for "a" and "b" for some reason? If 'a' would be an input parameter and the analyzer did not see a call site, it won't know the value of 'a'.

test/Analysis/MemRegion.cpp
3

getDescriptiveName depends on MPI-Checker because of the tests.

Would committing the tests separately fix the cyclic commit dependency problem? (Though, I'd still prefer to review them along with this patch.)

lib/StaticAnalyzer/Core/MemRegion.cpp
653

If the return value of a function is used for which the body is not known Clang crashes.

int getUnknown(void);
int idxA = getUnknown();
MPI_Request sendReq1[10][10][10];
MPI_Wait(&sendReq1[1][idxA][9], MPI_STATUS_IGNORE); // expected-warning{{Request 'sendReq1[1][7][9]' has no matching nonblocking call.}}

Clang also crashes if the index-variable is not initialized.

int idxA;
MPI_Request sendReq1[10][10][10];
MPI_Wait(&sendReq1[1][idxA][9], MPI_STATUS_IGNORE); // expected-warning{{Request 'sendReq1[1][7][9]' has no matching nonblocking call.}}

In case the variable is initialized with a constant, the ConcreteInt is determined.

test/Analysis/MemRegion.cpp
3

Yes, that would fix the cyclic commit dependency.

Here's the crash report that appears in case of the unknown function body:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   clang                         	0x0000000105f544b7 abort + 39 (Signals.inc:440)
1   clang                         	0x0000000108b4a2a9 llvm::isa_impl_cl<clang::ento::ElementRegion, clang::ento::MemRegion const*>::doit(clang::ento::MemRegion const*) + 89 (Casting.h:95)
2   clang                         	0x0000000108b4a238 llvm::isa_impl_wrap<clang::ento::ElementRegion, clang::ento::MemRegion const*, clang::ento::MemRegion const*>::doit(clang::ento::MemRegion const* const&) + 24 (Casting.h:122)
3   clang                         	0x0000000108b4a212 llvm::isa_impl_wrap<clang::ento::ElementRegion, clang::ento::MemRegion const* const, clang::ento::MemRegion const*>::doit(clang::ento::MemRegion const* const&) + 34 (Casting.h:112)
4   clang                         	0x0000000108b4a005 bool llvm::isa<clang::ento::ElementRegion, clang::ento::MemRegion const*>(clang::ento::MemRegion const* const&) + 21 (Casting.h:133)
5   clang                         	0x0000000108bd8388 llvm::cast_retty<clang::ento::ElementRegion, clang::ento::MemRegion const*>::ret_type llvm::dyn_cast<clang::ento::ElementRegion, clang::ento::MemRegion const>(clang::ento::MemRegion const*) + 24 (Casting.h:298)
6   clang                         	0x000000010891a527 clang::ento::mpi::MPIChecker::checkUnmatchedWaits(clang::ento::CallEvent const&, clang::ento::CheckerContext&) const + 167 (MPIChecker.cpp:59)
7   clang                         	0x00000001089219a9 clang::ento::mpi::MPIChecker::checkPreCall(clang::ento::CallEvent const&, clang::ento::CheckerContext&) const + 57 (MPIChecker.h:37)
8   clang                         	0x0000000108921960 void clang::ento::check::PreCall::_checkCall<clang::ento::mpi::MPIChecker>(void*, clang::ento::CallEvent const&, clang::ento::CheckerContext&) + 48 (Checker.h:169)
9   clang                         	0x0000000108ae0ca2 clang::ento::CheckerFn<void (clang::ento::CallEvent const&, clang::ento::CheckerContext&)>::operator()(clang::ento::CallEvent const&, clang::ento::CheckerContext&) const + 66 (CheckerManager.h:58)
10  clang                         	0x0000000108ae0c3a (anonymous namespace)::CheckCallContext::runChecker(clang::ento::CheckerFn<void (clang::ento::CallEvent const&, clang::ento::CheckerContext&)>, clang::ento::NodeBuilder&, clang::ento::ExplodedNode*) + 250 (CheckerManager.cpp:269)
11  clang                         	0x0000000108ad6da5 void expandGraphWithCheckers<(anonymous namespace)::CheckCallContext>((anonymous namespace)::CheckCallContext, clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&) + 821 (CheckerManager.cpp:123)
12  clang                         	0x0000000108ad6a00 clang::ento::CheckerManager::runCheckersForCallEvent(bool, clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&, clang::ento::CallEvent const&, clang::ento::ExprEngine&, bool) + 288 (CheckerManager.cpp:286)

To me it seems like, dyn_cast<ElementRegion>(MR) being called in checkUnmatchedWaits (line 59) causes the problem, if nullptr is returned.

template <class X, class Y>
LLVM_ATTRIBUTE_UNUSED_RESULT inline typename cast_retty<X, Y *>::ret_type
dyn_cast(Y *Val) {
  return isa<X>(Val) ? cast<X>(Val) : nullptr;
}

It's a bug in the checker. dyn_cast should not be called on a null pointer. You could either check for nun or call dyn_cast_or_null.

It's a bug in the checker. dyn_cast should not be called on a null pointer. You could either check for nun or call dyn_cast_or_null.

Thanks for pointing this out! Adding guards that check for nullptr fixed the problem.

lib/StaticAnalyzer/Core/MemRegion.cpp
653

..MPI-Checker doesn't emit diagnostics which include variable names as indices like sendReq1[a][7][b]. The problem is that the checker has to reason about single regions. If the specific index is not known, getDescriptiveName() tries to obtain the variable name. But those cases are skipped by the checker, as it does not know which specific region is used.

What is the status of this?

As far as I understand it is blocked on that there is no checker that we could use to test this function with unknown variables as indexes?

What is the status of this?
As far as I understand it is blocked on that there is no checker that we could use to test this function with unknown variables as indexes?

Yes, this is blocked due to the MPI-Checker dependency. The best thing I can offer at the moment is to remove my tests and leave the coverage up to you.
I think the question is if an initial commit of this function without tests would be tolerated, what would also remove the cyclic commit dependency.

Gabor,

Can other checkers use this function? I am Ok with this being committed with limited coverage.

Can other checkers use this function? I am Ok with this being committed with limited coverage.

Once I rebase http://reviews.llvm.org/D15227, it will provide us with the limited coverage. In case that patch is accepted before the MPI checkers I can commit this patch along with that one.

I tested this patch using http://reviews.llvm.org/D15227

Unfortunately for non-array variables the getDescriptiveName returned var_name[0]. Note the spurious [0] part.

Could you look into that?

I tested this patch using http://reviews.llvm.org/D15227
Unfortunately for non-array variables the getDescriptiveName returned var_name[0]. Note the spurious [0] part.
Could you look into that?

Could you provide a code example where this effect turns up?

Looking at the code it seems that this data structure:
typedef SmallVector<const MemRegion *, 2> RegionVector;
tricks the function into rating the MemRegion as an ElementRegion.

Sure, the test case of http://reviews.llvm.org/D15227 is faiing because of this.

In test/Analysis/valist-unterminated.c, line 10
Initialized va_list 'va[0]' is leaked
is emitted instead of
Initialized va_list 'va' is leaked

Looking at the code it seems that this data structure:
typedef SmallVector<const MemRegion *, 2> RegionVector;
tricks the function into rating the MemRegion as an ElementRegion.

That's not the problem. I'm using the same data structure in the MPI-Checker patch.

Might the problem be in the va_list checker?
Obviously the va_list variable is identified as an ElementRegion what seems not to be correct.
Only if the passed region is an ElementRegion indices get appended.

I will look into it tomorrow.

Might the problem be in the va_list checker?
Obviously the va_list variable is identified as an ElementRegion what seems not to be correct.
Only if the passed region is an ElementRegion indices get appended.

The memory region for the va_list that was obtained from the analyzer in same case was indeed an element region in the va_list checker. I fixed this issue, and now it works properly.

The memory region for the va_list that was obtained from the analyzer in same case was indeed an element region in the va_list checker. I fixed this issue, and now it works properly.

Then this patch might be ready to commit. :)

Alexander_Droste marked 18 inline comments as done.Apr 16 2016, 4:10 AM
xazax.hun closed this revision.Jun 13 2016, 12:06 AM

This was committad as the part of http://reviews.llvm.org/rL272529