This patch adds a utility function to extract the variable name (including indices) from a memory region.
Diff Detail
Event Timeline
- reduced buffer size for variable name
- simplified R->getAs<clang::ento::ElementRegion>()->getSuperRegion() to ER->getSuperRegion().
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
643 | You're right, 200 was a bit overambitious. I reduced it to 50. |
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 | What will be the content of IndexInArray in case the index is not a concrete int? Will the above code work at all in that case? In case the index is a variable do we want to include the name of the variable in the index? | |
656 | I do not really like this part. You could either use Twine to avoid some temporal values (and you could also eliminate the string copy this way) or you could construct the indices using the reverse of idx and than reverse the whole string after the loop. |
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 | Right, getAs<..::ConcreteInt>() might return a nullptr. | |
656 | So would be invoking push_back within and reverse after the loop sufficient? |
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
699 | The else should go into the same line as the closing }. | |
701 | These are the only cases that can occur? So it is not possible to have a loc index? Maybe we could get the value for loc and nonloc ConcreteInts and use getVariableName for everzthing else? Would that wok? | |
708 | In this case idx also needs to be reserved. Since it is very rare that there are a huge level of []s, it is possible that the Twine solution is more desirable. | |
721 | What is happening here exactly? In some cases there are ' symbols around the name and in other cases there aren't? Is there a way to solve this inconsistency (so we do not need to check for the last character)? Also it would be great that this functions always had consistent result, so either never put ' around the name or always. |
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
701 | getIndex() returns a nonloc and loc is in no subclass relation to nonloc. There are actually 5 nonloc subclasses but I assumed So would you suggest to only check for ConcreteInt | |
708 | Ok, I'll look into the Twine API. | |
721 | I think I wasn't completely sure if single quotes |
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
701 | If we want to call getVariableName recursively, should we strip the single quotes then? |
tools/clang/lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
701 | I agree. I think stripping is the way to go. |
What is the status of this? Do you need some help with any of the modifications? I'd love to see this patch in the repository soon.
Changes:
- remove if (VariableName.size() && VariableName.back() == '\'') check
- call getVariableName for index if Region is no ConcreteInt
Hi Gábor,
sorry about the delay! Here's the updated patch.
Is it really possible to reduce string copying in this case, with the help of llvm::Twine?
The problem with llvm::Twine seems to me that it cannot be modified after initialisation.
So there's no way to do sth. like: Twine t("a" + "b"); t += "c". A new Twine object needs to
be created for each subsequent concatenation. Do you have an idea how twine could be
applied here? Else I would claim that this function gets only used for diagnostics why
this level of efficiency might not be required.
Hi!
It works the following way:
If you write something like A = B + C + D; it will create two temporary strings, all of them might allocate memory.
If you write something like A = (Twine(B) + C + D).str(); it will only do one temporary string and one allocation.
So basically twine creates a data structure of pointers that points to the strings that you would like to concatenate, and once you do the conversion to string, it will do all the concatenations with one allocation.
- Changed idx = R->getVariableName(); to idx = ER->getVariableName(); as it better describes what is happening on a semantic level.
- remove string idx variable -> remove string copy assignments
- use Twine to reduce temporary string objects, built during concatenation
Is ArrayIndices = llvm::Twine(ArrayIndices + "]" + intValAsString.str() + "[").str();
preferable to: ArrayIndices += llvm::Twine("]" + intValAsString.str() + "[").str();
with respect to efficiency? If not, I would prefer the second version, as it is a bit more concise.
Since we went with the twine solution, I think we can get rid of the reverse stuff.
Note that:
Twine should be used like: (Twine(A) + B + C).str()
So the operator+ has twine as one of the operands.
Once these are fixed, it looks good to me.
Do you have commit access or do you need someone to commit this for you?
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
672 ↗ | (On Diff #48953) | nit: this is in fact an append. It is cleaner instead of using insert. |
- removed reverse
- fixed twine usage
I think VariableName.insert(VariableName.size() - 1, ArrayIndices); is needed here.
varName, indices -> 'varname[...]' If append would be used, the second single quote would be before the indices, like this: 'varname'[...]
I don't have commit access, so I need someone to commit this for me.
Hi Alexander,
Some comments in line. Also, I don't see any tests. Is this code tested by your MPI patch?
include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h | ||
---|---|---|
160 ↗ | (On Diff #48959) | I'm not sure that 'getVariableName()' conveys what this function does. The returned name is not always the name of a variable. How do you intend to use this method? How is it different than than printPretty()? Is it for debugging purposes? If so, then perhaps 'getDebugDescription()'? Is it for presentation to the user? Then perhaps 'getDescriptiveName()'? |
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
635 ↗ | (On Diff #48959) | It seems like this could potentially be simplified by adding a recursive helper that threads the output stream and appends to it after the base case has been hit. You could also modify prettyPrint() to take a flag indicating whether it should be quoted or not (and defaulting to true). What do you think? |
Hi Devin,
thanks for taking the time! Yes, this is kind of implicitly tested by the MPI patch but I agree that it is necessary to add tests to MemRegion.
include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h | ||
---|---|---|
160 ↗ | (On Diff #48959) |
In which cases doesn't this function return the name? Isn't the worst case when printPretty can't be called and the string is empty? The function is intended to return the variable name and collects index variable names / ConcreteInts along the way if the passed region is an ElementRegion. So printPretty is called on the 'top' region of an array at the end of this function. If the passed region is no ElementRegion the functionality is identical to printPretty. The main motivation to write this function was to include specific indices in a diagnostic if obtainable, like presented in the MPI-Checker testfile So should we use getDescriptiveName or maybe sth. like getVariableNameWithIndices?. |
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
635 ↗ | (On Diff #48959) | Modifying printPretty seems like a nice solution to me. If you don't prefer the other idea, I would go this route. |
include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h | ||
---|---|---|
160 ↗ | (On Diff #48959) | Can the string ever be something like 'a.f[0][1]' (i.e., with a field name)? Or is it always 'a[0][1]'? If the first is possible I would suggest getDescriptiveName -- if not, then getVariableNameWithIndices seems fine. |
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
635 ↗ | (On Diff #48959) | Sounds good to me! |
include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h | ||
---|---|---|
160 ↗ | (On Diff #48959) | Yes, the string includes the name of the struct. But this is not due to the while loop in this function but how printPretty is defined for FieldRegion. In case of a FieldRegion printPretty is called for the super-region first before the field declation name gets appended. void FieldRegion::printPrettyAsExpr(raw_ostream &os) const { assert(canPrintPrettyAsExpr()); superRegion->printPrettyAsExpr(os); os << "." << getDecl()->getName(); } I'll go with getDescriptiveName. |
Looking at the printPretty implementation once more, I realized that printPrettyAsExpression
is actually printPretty without quotes. Therefore, I didn't have to change printPretty and
getDescriptiveName handles the quoting differentiation.
I searched for a MemRegion testfile in llvm/tools/clang/test/** but couldn't find one.
Where should the MemRegion tests go?
I thunk the closest we have is region-store* tests, which is not quite the same. Feel free to add a new test.
Thanks for pointing this out. What's actually the best way to test the function?
If I test this function with an integration test, I need to rely on a checker which uses the function to output diagnostics.
Is it possible to test this with a unit test? My assumption is that parts of the MemRegion would then need to be mocked.
test/Analysis/MemRegion.cpp | ||
---|---|---|
3 ↗ | (On Diff #50551) | The problem about these tests is that they introduce a cyclic commit dependency. MPI-Checker depends on getDescriptiveName. getDescriptiveName depends on MPI-Checker because of the tests. Further, MPI-Checker depends on this function: SourceRange MemRegion::sourceRange() const { const VarRegion *const VR = dyn_cast<VarRegion>(this->getBaseRegion()); const FieldRegion *const FR = dyn_cast<FieldRegion>(this); const CXXBaseObjectRegion*const COR = dyn_cast<CXXBaseObjectRegion>(this); // Check for more specific regions first. // FieldRegion if (FR) { return FR->getDecl()->getSourceRange(); } // CXXBaseObjectRegion else if (COR) { return COR->getDecl()->getSourceRange(); } // VarRegion else if (VR) { return VR->getDecl()->getSourceRange(); } // Return invalid source range (can be checked by client). else { return SourceRange{}; } } Initially, my idea was to submit the sourceRange patch after getDescriptiveName. Maybe it would be most convenient, to include the sourceRange function into this patch and finally commit all connected patches in one go. |
I'd be fine if we test this function with the usual regression tests by observing the output of the MPI checker. We could update that test with more checks once the function is updated.
With that approach, you'd be committing both patches at the same time.
I'd be fine if we test this function with the usual regression tests by observing the output of the MPI checker. We could update that test with more checks once the function is updated.
With that approach, you'd be committing both patches at the same time.
You mean to:
Temporarily remove the MemRegion.cpp testfile -> committing getDescriptiveName -> committing sourceRange -> committing MPI-Checker -> then readding the testfile; right?
I submitted the sourceRange patch here: http://reviews.llvm.org/D18309
If also this patch would get committed as part of the package, there would be no need for an incremental commit procedure.
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 ↗ | (On Diff #50551) | I wasn't able to build a test case yet for which the analyzer could not determine the constant value. Is there a way to trick the analyzer so that the else case is used ? Then I could test for something like 'sendReq1[a][7][b]'. |
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 ↗ | (On Diff #50551) | You can try use a value returned from a function that has an unknown body. E.g.: int getUnknown(); void f() { int a = getUnKnown(); } |
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 ↗ | (On Diff #50551) | What happens when you try 'sendReq1[a][7][b]'? Does it know the values for "a" and "b" for some reason? If 'a' would be an input parameter and the analyzer did not see a call site, it won't know the value of 'a'. |
test/Analysis/MemRegion.cpp | ||
3 ↗ | (On Diff #50551) |
Would committing the tests separately fix the cyclic commit dependency problem? (Though, I'd still prefer to review them along with this patch.) |
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 ↗ | (On Diff #50551) | If the return value of a function is used for which the body is not known Clang crashes. int getUnknown(void); int idxA = getUnknown(); MPI_Request sendReq1[10][10][10]; MPI_Wait(&sendReq1[1][idxA][9], MPI_STATUS_IGNORE); // expected-warning{{Request 'sendReq1[1][7][9]' has no matching nonblocking call.}} Clang also crashes if the index-variable is not initialized. int idxA; MPI_Request sendReq1[10][10][10]; MPI_Wait(&sendReq1[1][idxA][9], MPI_STATUS_IGNORE); // expected-warning{{Request 'sendReq1[1][7][9]' has no matching nonblocking call.}} In case the variable is initialized with a constant, the ConcreteInt is determined. |
test/Analysis/MemRegion.cpp | ||
3 ↗ | (On Diff #50551) | Yes, that would fix the cyclic commit dependency. |
Here's the crash report that appears in case of the unknown function body:
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 clang 0x0000000105f544b7 abort + 39 (Signals.inc:440) 1 clang 0x0000000108b4a2a9 llvm::isa_impl_cl<clang::ento::ElementRegion, clang::ento::MemRegion const*>::doit(clang::ento::MemRegion const*) + 89 (Casting.h:95) 2 clang 0x0000000108b4a238 llvm::isa_impl_wrap<clang::ento::ElementRegion, clang::ento::MemRegion const*, clang::ento::MemRegion const*>::doit(clang::ento::MemRegion const* const&) + 24 (Casting.h:122) 3 clang 0x0000000108b4a212 llvm::isa_impl_wrap<clang::ento::ElementRegion, clang::ento::MemRegion const* const, clang::ento::MemRegion const*>::doit(clang::ento::MemRegion const* const&) + 34 (Casting.h:112) 4 clang 0x0000000108b4a005 bool llvm::isa<clang::ento::ElementRegion, clang::ento::MemRegion const*>(clang::ento::MemRegion const* const&) + 21 (Casting.h:133) 5 clang 0x0000000108bd8388 llvm::cast_retty<clang::ento::ElementRegion, clang::ento::MemRegion const*>::ret_type llvm::dyn_cast<clang::ento::ElementRegion, clang::ento::MemRegion const>(clang::ento::MemRegion const*) + 24 (Casting.h:298) 6 clang 0x000000010891a527 clang::ento::mpi::MPIChecker::checkUnmatchedWaits(clang::ento::CallEvent const&, clang::ento::CheckerContext&) const + 167 (MPIChecker.cpp:59) 7 clang 0x00000001089219a9 clang::ento::mpi::MPIChecker::checkPreCall(clang::ento::CallEvent const&, clang::ento::CheckerContext&) const + 57 (MPIChecker.h:37) 8 clang 0x0000000108921960 void clang::ento::check::PreCall::_checkCall<clang::ento::mpi::MPIChecker>(void*, clang::ento::CallEvent const&, clang::ento::CheckerContext&) + 48 (Checker.h:169) 9 clang 0x0000000108ae0ca2 clang::ento::CheckerFn<void (clang::ento::CallEvent const&, clang::ento::CheckerContext&)>::operator()(clang::ento::CallEvent const&, clang::ento::CheckerContext&) const + 66 (CheckerManager.h:58) 10 clang 0x0000000108ae0c3a (anonymous namespace)::CheckCallContext::runChecker(clang::ento::CheckerFn<void (clang::ento::CallEvent const&, clang::ento::CheckerContext&)>, clang::ento::NodeBuilder&, clang::ento::ExplodedNode*) + 250 (CheckerManager.cpp:269) 11 clang 0x0000000108ad6da5 void expandGraphWithCheckers<(anonymous namespace)::CheckCallContext>((anonymous namespace)::CheckCallContext, clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&) + 821 (CheckerManager.cpp:123) 12 clang 0x0000000108ad6a00 clang::ento::CheckerManager::runCheckersForCallEvent(bool, clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&, clang::ento::CallEvent const&, clang::ento::ExprEngine&, bool) + 288 (CheckerManager.cpp:286)
To me it seems like, dyn_cast<ElementRegion>(MR) being called in checkUnmatchedWaits (line 59) causes the problem, if nullptr is returned.
template <class X, class Y> LLVM_ATTRIBUTE_UNUSED_RESULT inline typename cast_retty<X, Y *>::ret_type dyn_cast(Y *Val) { return isa<X>(Val) ? cast<X>(Val) : nullptr; }
It's a bug in the checker. dyn_cast should not be called on a null pointer. You could either check for nun or call dyn_cast_or_null.
It's a bug in the checker. dyn_cast should not be called on a null pointer. You could either check for nun or call dyn_cast_or_null.
Thanks for pointing this out! Adding guards that check for nullptr fixed the problem.
lib/StaticAnalyzer/Core/MemRegion.cpp | ||
---|---|---|
653 ↗ | (On Diff #50551) | ..MPI-Checker doesn't emit diagnostics which include variable names as indices like sendReq1[a][7][b]. The problem is that the checker has to reason about single regions. If the specific index is not known, getDescriptiveName() tries to obtain the variable name. But those cases are skipped by the checker, as it does not know which specific region is used. |
What is the status of this?
As far as I understand it is blocked on that there is no checker that we could use to test this function with unknown variables as indexes?
What is the status of this?
As far as I understand it is blocked on that there is no checker that we could use to test this function with unknown variables as indexes?
Yes, this is blocked due to the MPI-Checker dependency. The best thing I can offer at the moment is to remove my tests and leave the coverage up to you.
I think the question is if an initial commit of this function without tests would be tolerated, what would also remove the cyclic commit dependency.
Gabor,
Can other checkers use this function? I am Ok with this being committed with limited coverage.
Once I rebase http://reviews.llvm.org/D15227, it will provide us with the limited coverage. In case that patch is accepted before the MPI checkers I can commit this patch along with that one.
I tested this patch using http://reviews.llvm.org/D15227
Unfortunately for non-array variables the getDescriptiveName returned var_name[0]. Note the spurious [0] part.
Could you look into that?
I tested this patch using http://reviews.llvm.org/D15227
Unfortunately for non-array variables the getDescriptiveName returned var_name[0]. Note the spurious [0] part.
Could you look into that?
Could you provide a code example where this effect turns up?
Looking at the code it seems that this data structure:
typedef SmallVector<const MemRegion *, 2> RegionVector;
tricks the function into rating the MemRegion as an ElementRegion.
Sure, the test case of http://reviews.llvm.org/D15227 is faiing because of this.
In test/Analysis/valist-unterminated.c, line 10
Initialized va_list 'va[0]' is leaked
is emitted instead of
Initialized va_list 'va' is leaked
Looking at the code it seems that this data structure:
typedef SmallVector<const MemRegion *, 2> RegionVector;
tricks the function into rating the MemRegion as an ElementRegion.
That's not the problem. I'm using the same data structure in the MPI-Checker patch.
Might the problem be in the va_list checker?
Obviously the va_list variable is identified as an ElementRegion what seems not to be correct.
Only if the passed region is an ElementRegion indices get appended.
The memory region for the va_list that was obtained from the analyzer in same case was indeed an element region in the va_list checker. I fixed this issue, and now it works properly.
The memory region for the va_list that was obtained from the analyzer in same case was indeed an element region in the va_list checker. I fixed this issue, and now it works properly.
Then this patch might be ready to commit. :)
Do you think 200 is a reasonable size here? In case the namespaces are not included in the printed name, I think it is very rare that a name would be longer than 100 characters.