This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] SVal Visitor.
ClosedPublic

Authored by NoQ on Dec 11 2015, 5:52 AM.

Details

Summary

It seems that in several places in the code Clang Static Analyzer tries to recursively traverse the SVal hierarchy, so i made a visitor for SVal, SymExpr, and MemRegion hierarchies. Actually, three separate visitors, but they're rarely useful on their own, so there's FullSValVisitor to merge the three for visiting the whole thing. The approach was literally copied from StmtVisitor etc in an obvious manner.

One thing that could make the visitor a lot more useful, which i'd probably love to implement, is a simple re-usable VisitChildren() method (in case the visitor's return type is void). Because we cannot write such method in every visitor as easily as we do it for, say, StmtVisitor (we don't have a full-featured iterator for child values/symbols/regions). This would allow a trivial implementation of methods like "find all ElementRegion's inside this SVal, and mark their indices"). To-think: how should such method handle lazy compound values?

This review is a bit green, in a sense that there's not much actually delivered yet, apart from the visitor header itself. Some further todo's would be:

  • Refactor some pieces of the code to use the visitor. In fact, we already have the SymbolVisitor class somewhere. Probably SymbolReaper could be simplified.
  • Some checkers, that rely on exploring the hierarchy, may be making use of the visitor. Even if existing checkers don't use it, developers of new checkers may like it.
  • The object responsible for this alpha-renaming thing would most likely look like a FullSValVisitor that returns an SVal.
  • Not sure, maybe split the three visitors into three different header files?

In order to make sure that the visitor header compiles, i started a simple example visitor - the SValExplainer. It explains symbolic values in a human-readable manner, returning an std::string. SValExplainer can be used:

  • for pretty-printing values to the analyzer's end-user, eg. in checker warning messages, or even in "[assuming ...]" diagnostic pieces instead of pretty-printed expressions.
  • for deep-testing analyzer internals, when the test needs to ensure that a particular kind of SVal is produced durign analysis. In fact, one of the tests a FIXME test that exposes a certain problem in the core.
  • as a documentation for SVal kinds (because novice checker developers are often confused about the meaning of different SVal kinds). Users may also rely on it to understand how the analyzer works during debugging, eg. quickly explain what does this particular SVal they obtained in a certain callback actually means.

Todos for SValExplainer include:

  • Explaining more values. In particular, i could use a bit of advice for Objective-C-specific values, because i know very little about this language. I might have also forgotten something. Memory spaces are worth it, most likely.
  • Improving natural language. Probably some bugs would be exposed later. Not sure if the long "of"-chains the explainer produces sound naturally.
  • Probably add various constructor-time flags if there are multiple users of the explainer having different expectations.

In order to test SValExplainer, a new callback was added to the debug.ExprInspectionChecker, namely clang_analyzer_explain(), that causes an explanation of its argument value to be printed as a warning message. I also added another callback - clang_analyzer_getExtent() in order to obtain SymbolExtent for testing. Testing how extents are modeled would probably be useful later as well. Regexps are used in the tests in order to match the start and the end of the warning message.

So, essentially, i'm humbly requesting a quick glance on this code, if this facility is useful, if some stuff is clearly useless, and whether any of the todos are actually wanted.
I'd probably make more updates in the process.

Diff Detail

Repository
rL LLVM

Event Timeline

NoQ updated this revision to Diff 42516.Dec 11 2015, 5:52 AM
NoQ retitled this revision from to [analyzer] SVal Visitor..
NoQ updated this object.
NoQ added reviewers: zaks.anna, dcoughlin, xazax.hun.
NoQ added a subscriber: cfe-commits.
zaks.anna edited edge metadata.Dec 15 2015, 5:09 PM

Can/Should something like this be used when dumping SVals (during debugging)? (Possibly in addition to the debug checker.)
What are the advantages of implementing this using visitors? Can this be implemented similarly to SVal::dumpToStream? Do you envision other use cases for the visitors?

A couple of suggestions regarding the implementation of the visitors if we decide to keep them.
You should either use http://llvm.org/docs/TableGen/ like ./include/clang/AST/DeclVisitor.h or even better use something similar to https://github.com/apple/swift/blob/master/include/swift/AST/ExprNodes.def and it's users.

include/clang/StaticAnalyzer/Checkers/SValExplainer.h
88 ↗(On Diff #42516)

Using a different name here could lead to confusion.

Sorry, I forgot to read the description before commenting; I see it is intended to be used not only for debugging purposes:)

NoQ added a comment.Dec 16 2015, 5:24 AM

Good point, will try to make a .def file.

There's a tiny inconsistency with SVal naming that would most likely need to be fixed in this approach:

nonloc::SymbolVal => SymbolValKind
loc::MemRegionVal => MemRegionKind // no "Val"!

Hmm, maybe make a .def file for symbols and regions only? SVals are very small anyway.

Are you saying that we need to rename "SymbolValKind" to "SymbolKind"? That would probably be a tiny change.

NoQ updated this revision to Diff 43683.Dec 28 2015, 7:09 AM
NoQ edited edge metadata.
NoQ marked an inline comment as done.

An attempt on the .def-files.

The next step would probably be the VisitChildren() thing, and I'll see if it allows to refactor and simplify some code.


Forgot to answer: I guess there are a few minor-but-good things about the visitors in our case compared in-class methods (such as dumpToStream()):

  1. Easy to develop incrementally - no need to put stubs into all subclasses for methods we didn't yet implement.
  2. Easy to create incomplete visitors (eg. we want to visit only SVal's that appear as Store values, and we won't ever see a MemSpaceRegion appear as a Store value)
  3. Cover the whole sub-class with a single method (eg. VisitTypedValueRegion() covers all kinds of TypedValueRegion's).
  4. Easy to create checker-specific traversal methods - if a particular checker needs to visit the hierarchy, it's not forced to adjust all classes.

The question in what way the visitor is better than a recursive function with a very large switch is a bit more complicated (because a visitor essentially is a recursive function with a very large switch); only point 3 of the above still applies. With VisitChildren(), however, it becomes much more convenient.

Hmm. One more thing about VisitChildren(): Normally such method is re-implemented in every visitor easily using child iterators. We don't, however, have fully functional iterators for SVal/SymExpr/MemRegion children (only partial solutions like symbol_begin()..symbol_end()), and also such iterators would need to be polymorphic around these three classes (eg. a symbol-child of a region). In fact, i can make such polymorphic iterators, and that'd probably be a more generic solution.

zaks.anna added inline comments.Jan 5 2016, 4:13 PM
include/clang/StaticAnalyzer/Core/PathSensitive/Regions.def
31 ↗(On Diff #43683)

I'd rather rename the "Kind" suffix. Is that possible?

Having REGION and NORMAL_REGION is strange.

include/clang/StaticAnalyzer/Core/PathSensitive/SVals.def
13 ↗(On Diff #43683)

Again, we should go ahead and change the kind values if it would make things more uniform. (Can be done in a separate patch committed before this one.)

27 ↗(On Diff #43683)

Loc and NonLoc have not been defined yet.

NoQ added inline comments.Jan 6 2016, 6:52 AM
include/clang/StaticAnalyzer/Core/PathSensitive/Regions.def
31 ↗(On Diff #43683)

In fact, MemSpaceRegion is quite special, because it is the only thing in the hierarchy that can be both derived from (normally such values are marked as "abstract") and instantiated (for which it has a kind defined). Instances of MemSpaceRegion are used, at least, for holding some code regions. Probably create a new memspace for such regions and remove the kind value?

Ok, i'd go ahead and prepare a separate review for unifying the naming convention :)

zaks.anna added inline comments.Jan 6 2016, 11:36 AM
include/clang/StaticAnalyzer/Core/PathSensitive/Regions.def
31 ↗(On Diff #43683)

Please, do if you agree that it makes sense.

Otherwise, this patch LGTM.

NoQ updated this revision to Diff 44475.Jan 11 2016, 6:11 AM
NoQ marked 5 inline comments as done.

Renamed the kinds for consistency (review D16062), this diff is updated to use the new naming convention. The 'kind' column gets removed from the def-files.

NoQ updated this revision to Diff 44629.Jan 12 2016, 6:08 AM

Another rebase on top of D16062.

zaks.anna accepted this revision.Jan 12 2016, 3:40 PM
zaks.anna edited edge metadata.

LGTM. Thank you!

This revision is now accepted and ready to land.Jan 12 2016, 3:40 PM
NoQ updated this revision to Diff 44734.Jan 13 2016, 4:14 AM
NoQ edited edge metadata.

Rebase on top of D12901 - support SymbolCast in the explainer, as it finally appears in the wild.

NoQ added a comment.Jan 13 2016, 5:29 AM

Nope, will commit without SymbolCast support for now, encountered some issues with D12901 that would probably be worth a separate commit.

This revision was automatically updated to reflect the committed changes.
NoQ updated this revision to Diff 44758.Jan 13 2016, 9:17 AM
NoQ removed rL LLVM as the repository for this revision.

Reverted the patch due to a few issues. This revision should fix these issues.

The explain-svals test is fixed to target a specific target, in order to make sure that the definition of size_t always agrees with the target triple, otherwise the test would fail (shame on me, should have guessed!).

Not quite sure what to do with the failure of the Modules buildbot (http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/10562/steps/compile.llvm.stage2/logs/stdio). I added the new .def-files to the list of "textual" headers in module.modulemap, but i'm not brave enough to go ahead and commit again and make that sort of thing work through trial and error, as maybe there are more things i need to do.

Fix a small whitespace error introduced by the patch.

Right after committing D16062, i noticed that MemRegion itself also doesn't need to be a friend of MemRegionManager. Added this to this patch, as they're related, i guess.

NoQ added a reviewer: rsmith.Jan 13 2016, 10:33 AM

Richard: excuse me, adding you because you are an expert on the modulemap, could you have a quick look at the proposed changes here and probably point me in the right direction, because i'm not quite sure how to test the modules-enabled build on a local machine before committing?

Sorry for a bit of a newbie panic/noise in this review.

NoQ added a comment.Jan 15 2016, 8:08 AM

Managed to reproduce the build error with -fmodules on my machine.
Committed the updated patch as r257893, the buildbot seems happy.
I hope this review is actually closed now :)