This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Introduce SuperChecker<>, a convenient alternative to Checker<> for storing subcheckers
Needs ReviewPublic

Authored by Szelethus on Sep 8 2019, 3:08 PM.

Details

Summary

Please don't take a shot for each time I write checker, it'd end bad.

The term "super checker" or "modeling checker", and the term "subchecker" always existed conceptually, but they weren't ever formalized. In the last half a year or so, we referred to them as

  • Super/modeling checker: A checker that models some C++ code, but doesn't (or, as of now, just shouldn't) emit any diagnostics, at least not under its own name.
  • Subcheckers: Checkers that are a part of super/modeling checkers, enabling/disabling them (ideally) only toggles whether a diagnostic is emitted from the checker it is a part of. They don't possess a checker object on their own, and are basically glorified checker options.

While checker dependencies were in similar shoes (existed conceptually but not formalized), this change isn't as critical, it just removes boilerplate code. When you look at IteratorChecker, SecuritySyntaxChecker or RetainCountBase they all use a similar, some cases faulty implementation to keep track of which subcheckers are enabled and what their name is, so its about time we combined them.

I envision this interface to be used to enforce our currently non-existent specification on the checker system.

In detail:

  • Introduce SuperChecker:
    • It is essentially the same as Checker, but requires an enum template argument to keep track of which subcheckers are enabled.
    • While previously we defined subcheckers as checkers that don't have a checker object on their own, SuperChecker does create a CheckerBase object for them, but they still can't have checker callbacks.
  • Add CheckerManager::registerSubChecker adds a new checker to a SuperChecker. It is ensured runtime that the SuperChecker was previously registered, and that a subchecker isn't registered multiple times.
  • Add thorough test cases for the new interface.

Diff Detail

Event Timeline

Szelethus created this revision.Sep 8 2019, 3:08 PM
NoQ added a comment.Sep 9 2019, 6:55 PM

I have mixed feelings. Removing boilerplate is good, but the very fact that we're legalizing this pattern indicates that our checkers will keep bloating up, while i always wanted to actually split them instead (like, make sub-checkers into their own separate classes, possibly spread out into different files, kinda micro checkers as opposed to monolithic checkers (?)). But i guess it's about whoever gets things done first :)

I'd love to see how this affects our actual checkers, did you already try porting them? Do you plan to help with tracking per-sub-checker bug types and check names?

SuperChecker

WDYT about MultiChecker? ("A checker that implements multiple checks and presents them as different checkers.")

clang/unittests/StaticAnalyzer/RegisterCustomCheckersTest.cpp
113

The CXX23ModelingDiagKind:: qualifier is unnecessary here, right? Or did you mean to make an enum class? Does it even work with enum classes?

In D67336#1664168, @NoQ wrote:

I have mixed feelings. Removing boilerplate is good, but the very fact that we're legalizing this pattern indicates that our checkers will keep bloating up, while i always wanted to actually split them instead (like, make sub-checkers into their own separate classes, possibly spread out into different files, kinda micro checkers as opposed to monolithic checkers (?)). But i guess it's about whoever gets things done first :)

I completely agree with you about splitting the checkers. I also plan for the iterator checkers to separate them from the modelling and make a few mini checker classes beside the still huge modelling class. I am confident that that is the right direction.

I have mixed feelings. Removing boilerplate is good, but the very fact that we're legalizing this pattern indicates that our checkers will keep bloating up, while i always wanted to actually split them instead (like, make sub-checkers into their own separate classes, possibly spread out into different files, kinda micro checkers as opposed to monolithic checkers (?)).

The subchecker system as it works now is more diverse than that. Some systems use them purely for diagnostics (this patch is targeting those), while some affect the modeling as well. I think we should allow purely diagnostic checkers, because we can't really justify making an entire class for them, let alone an entire file, and they really are an integral part of the checker. However, we absolutely shouldn't promote adding further modeling into an existing checker whenever its avoidable.

The unfortunate truth is (at least the way I see it) is that we can't really force anyone to write better code. I like to think this patch neither legalizes nor prevents someone from bloating a file further, but rather introduces a new tool to split the giant checkers up, or make future additions more manageable.

The high level idea would be that when CallDescriptionMap is too simple, allow checkers to create their own events, and register subcheckers into them. This would for instance solve the problem of the unknown callback order of regular callbacks among checkers, which might be a better alternative then leaving this to CheckerRegistry.

Btw I tried to write this comment for literally months now, but I admit that I don't yet the where such a subsystem could be deployed in the already existing checkers. I always think of MallocChecker, but I don't see how we could do it there just yet.

@NoQ, in San José, you mentioned an example with std::set that would really demand a strong checker infrastructure, but I've since forgotten it. Could you explain it again please?

The way I see perhaps we need a 3rd class of checkers (beside super and sub). And that would be those checkers which are not dependent closely on any super checker but they do emit diagnostics. E.g. the PlacementNewChecker is implemented in it's own, it emits some diagnostics, and does not model. However, it depends on MallocChecker's modeling when we are interested in dynamically allocated buffers sizes. Or would you add PlacementNewChecker as MallocChecker's subchecker? That seems a bit overkill to me.

On the other hand I see that e.g. MallocChecker should have several subcheckers (doubleDelete, etc). And these subcheckers do closely operate on the data stored in their super checker.
(Another approach could be if we have a global data storage where every modeling checker puts its own data, and other checkers can access this, actually Regions is one example to this.)

What is a CallDescriptionMap? Could you please explain further?