This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Reimplement dependencies between checkers
ClosedPublic

Authored by Szelethus on Nov 12 2018, 12:02 PM.

Details

Summary

The solution is implemented as follows:

  • Add a new field to the Checker class in CheckerBase.td called Dependencies, which is a list of Checkers. I know some of us have mixed feelings on the use of tblgen, but I personally found it very helpful, because few things are more miserable then working with the preprocessor, and this extra complication is actually another point where things can be verified, properly organized, and so on. The best way for me to prove that it shouldn't be dropped is actually use it and document it properly, so I did just that in this patch.
  • Move unix checkers before cplusplus, as there is no forward declaration in tblgen :/
  • Add the following new checkers:
    • StackAddrEscapeBase
    • StackAddrEscapeBase
    • CStringModeling
    • DynamicMemoryModeling (base of the MallocChecker family)
    • IteratorModeling (base of the IteratorChecker family)
    • ValistBase
    • SecuritySyntaxChecker (base of bcmp, bcopy, etc...)
    • NSOrCFErrorDerefChecker (base of NSErrorChecker and CFErrorChecker)
    • IvarInvalidationModeling (base of IvarInvalidation checker family)
    • RetainCountBase (base of RetainCount and OSObjectRetainCount)
  • Clear up and registry functions in MallocChecker, happily remove old FIXMEs.
  • Add a new addDependency function to CheckerRegistry.
  • Neatly format RUN lines in files I looked at while debugging.

[1](Copied from D54397's summary!) While on the quest of fixing checker options the same way I cleaned up non-checker options, although I'm making good progress, I ran into a wall: In order to emit warnings on incorrect checker options, we need to ensure that checkers actually acquire their options properly -- but, I unearthed a huge bug in checker registration: dependencies are currently implemented in a way that breaks the already very fragile registration infrastructure.

Here is where the problem really originates from: this is a snippet from CheckerRegistry::initializeManager.

// Initialize the CheckerManager with all enabled checkers.
for (const auto *i :enabledCheckers) {
  checkerMgr.setCurrentCheckName(CheckName(i->FullName));
  i->Initialize(checkerMgr);
}

Note that Initialize is a function pointer to register##CHECKERNAME, like registerInnerPointerChecker. Since for each register function call the current checker name is only set once, when InnerPointerChecker attempts to also register it's dependent MallocChecker, both it and MallocChecker will receive the cplusplus.InnerPointer name. This results in MallocChecker trying to acquire it's Optimistic option as cplusplus.InnerPointerChecker:Optimistic.

Clearly, this problem has to be solved at a higher level -- it makes no sense to be able to affect other checkers in a registry function. Since I'll already make changes to how checker registration works, I'll probably tune some things for the current C++ standard, add much needed documentation, and try to make the code a lot less confusing.

Diff Detail

Repository
rL LLVM

Event Timeline

Szelethus created this revision.Nov 12 2018, 12:02 PM
NoQ added a comment.Nov 12 2018, 2:18 PM

Hmm.

All intentions in this patch are great and i love them!

I think that the refactoring work in MallocChecker is a bit pre-mature; i don't have a clear picture of how the ideal MallocChecker should look like, and i'm not sure this patch moves us into the right direction. Like, it tries to deal with multiple API packs (malloc-free, new-delete, new[]-delete[], g_malloc()-g_free(), alloca()-???, obtaininnerpointer-invalidateinnerpointer, CStringChecker interop) and multiple warning kinds (use-after-free, double free, leak, free at offset, mismatched deallocators) and i've no idea what's the proper way to split this into smaller checkers, how many Checker sub-classes do we need, how should they be split up into multiple files and inter-operate via headers (maybe it's better for sub-checkers include MallocChecker.h than for MallocChecker to include sub-checkers' .hs?), what exactly should the "base" checker do, how many Checkers.td entries do we need, how should the Checkers.td dependency graph look like, AAAAAAAAAAAAAA. I want to see a picture of all this stuff drawn first, before jumping into solving this.

Can we reduce this patch to simply introducing the dependency item in Checkers.td and using it in, like, one place (eg., MallocChecker/CStringChecker inter-op) and discuss the rest of the stuff later before we jump into it?

In D54438#1296155, @NoQ wrote:

Can we reduce this patch to simply introducing the dependency item in Checkers.td and using it in, like, one place (eg., MallocChecker/CStringChecker inter-op) and discuss the rest of the stuff later before we jump into it?

Short answer, no. The original intent was to somehow make sense out of checker options, and look where that got me :D There are no more shortcuts to make, sadly. I strongly believe it'd be better to figure out how those checkers should operate before moving forward with this patch, because I know I wouldn't like to touch any of those checkers once this patch goes through, so let's tough it out first.

Also, doing some checker related work might be refreshing after the never ending refactoring.

I'll brush the dust off some notebooks and see what I can come up with.

NoQ added a comment.Nov 12 2018, 3:11 PM

Mm, i don't understand. I mean, what prevents you from cutting it off even earlier and completely omitting that part of the patch? Somebody will get to this later in order to see how exactly does the separation needs to be performed.

Szelethus added a comment.EditedNov 12 2018, 3:28 PM

Mm, i don't understand. I mean, what prevents you from cutting it off even earlier and completely omitting that part of the patch? Somebody will get to this later in order to see how exactly does the separation needs to be performed.

I somewhat misunderstood what you meant in your earlier comment.

Originally, I created CStringBase, but not MallocBase, hence the insane amount of workaround, witch eventually lead to the createion of MallocBase, but I never removed the now unnecessary stuff, but now that you've shed some light to it, it makes little sense for it to stay around. I guess you could say that I couldn't see the forest for the trees.

Thanks! There's still a lot of work to be done, as this patch supplies a new way of expressing dependencies, but doesn't actually stop anyone from doing this mistake again, but let that be an issue for another time.

Szelethus added a comment.EditedNov 12 2018, 3:55 PM

Actually, no. The main problem here is that InnerPointerChecker doesn't only want to register MallocBase, it needs to modify the checker object. In any case, either MallocChecker.cpp needs the definition of InnerPointerChecker, or vice versa. Sure, I could separate out the part that adds a new way to register dependencies, but it doesn't stop anyone from making the same mistake again, so what would be the point of that? I think it would be just a lot cleaner to split those checkers up properly, and completely replace the current system of registering checkers to stop anyone accidentally doing the same mistake again.

Another important point to make is that my original goal of adding additional security to AnalyzerOptions heavily depends on checkers being set up properly, so unless this issue is out of the way, I can't progress further :/

NoQ accepted this revision.Nov 12 2018, 7:54 PM

Ok, so the point is to fix the current checker name problem, right? I guess let's land it then :)

Code looks great. I'm thinking aloud in inline comments a little bit, but don't mind me.

@rnkovacs, do you have any immediate opinions on how the header structure changes here?

lib/StaticAnalyzer/Checkers/InnerPointer/InnerPointerChecker.h
23–25 ↗(On Diff #173725)

We should eventually initialize these in place.

lib/StaticAnalyzer/Frontend/CheckerRegistry.cpp
160 ↗(On Diff #173725)

Because we already forget by now what the second part of this pair is, i believe that decomposing a std::pair into two variables as early as possible is a good idea.

This revision is now accepted and ready to land.Nov 12 2018, 7:54 PM
Szelethus added a comment.EditedNov 13 2018, 2:14 AM

Allow me to disagree again -- the reason why CheckerRegistry and the entire checker registration process is a mess is that suboptimal solutions were added instead of making the system do well on the long term. This is completely fine, I am guilty of this myself, but now that this problem has been highlighted, it's time to get things right. Some clever refactoring is long overdue.

I don't mind delaying my checker option related works, if that means that checker registration doesn't need hacking each time a flaw like this is discovered.

There are other things I need to figure out -- some checkers refuse to register themselves in their registry funcions,

  • which makes moving CheckerManager::registerChecker<T> out of registry funcions impossible,
  • which makes passing the checkers full name directly to CheckerManager impossible,
  • which makes replacing the current registration infrastructure impossible as CheckerManager has to be mutable when supplied to the registry function,
  • which will always leaves toom for errors like this to arise.

This patch is WIP because it's a proof of concept, and sill requires a lot of preparation. I very much appreciate your feedback, because now I'm more confident that I'm on the right track.

Since I spent the last couple weeks lookig at these files, there's no better opportunity for me to fix these.

Szelethus edited the summary of this revision. (Show Details)Nov 13 2018, 2:35 AM
Szelethus planned changes to this revision.EditedNov 13 2018, 6:05 AM

Here's how I'm planning to approach this issue:

  • Split MallocChecker and CStringChecker up properly, introduce MallocCheckerBase and CStringCheckerBase. I'm a little unsure about how I'll do this, but I'll share my thoughts when I come up with something clever. One thing I'm sure about is that it makes little sense to split InnerPointerChecker up, creating a header for MallocCheckerBase would probably be more ideal.
  • Some checkers do not register themselves in all cases[1], which should probably be handled at a higher level, preferably in Checkers.td. Warnings could be emitted when a checker is explicitly enabled but will not be registered.
  • Once all checkers in all cases properly register themselves, refactor this patch to introduce dependencies in between the subcheckers of MallocChecker and CStringChecker. This will imply that every registry function registers one, and only one checker.
  • Move CheckerManager::registerChecker<T> out of the registry functions, rename them from register##CHECKERNAME to setup##CHECKERNAME, and supply the mutable checker object and the immutable CheckerManager. The downside is that all checkers will have to be default constructible, and their fields will have to be set up in setup##CHECKERNAME, but I see this as an acceptable tradeoff for the extra security.
    • This will make it impossible to affect other checkers. I'm a little unsure whether this is achievable considering how intertwined everything is in MallocChecker, but even if the supplied CheckerManager will have to non-const, getCurrentCheckerName and setCurrentCheckerName could be eliminated for sure.
  • Properly document why CheckerRegistry and CheckerManager are separate entities, how are the tblgen files processed, how are dependencies handled, and so on.
  • Rebase my local checker option-related branches, and celebrate. Unless another skeleton falls out of the closet. You never know I guess.

[1]

void ento::registerObjCDeallocChecker(CheckerManager &Mgr) {
  const LangOptions &LangOpts = Mgr.getLangOpts();
  // These checker only makes sense under MRR.
  if (LangOpts.getGC() == LangOptions::GCOnly || LangOpts.ObjCAutoRefCount)
    return;

  Mgr.registerChecker<ObjCDeallocChecker>();
}
NoQ added a comment.Nov 13 2018, 11:38 AM
  • Some checkers do not register themselves in all cases[1], which should probably be handled at a higher level, preferably in Checkers.td. Warnings could be emitted when a checker is explicitly enabled but will not be registered.

[1]

void ento::registerObjCDeallocChecker(CheckerManager &Mgr) {
  const LangOptions &LangOpts = Mgr.getLangOpts();
  // These checker only makes sense under MRR.
  if (LangOpts.getGC() == LangOptions::GCOnly || LangOpts.ObjCAutoRefCount)
    return;

  Mgr.registerChecker<ObjCDeallocChecker>();
}

The can of worms here is as follows.

It is clear that not all checkers are applicable to all languages or all combinations of compiler flags. In this case, significantly different semantics are observed under different Objective-C memory management policies and the checker isn't designed to work correctly for anything but the "manual" memory management policy. The way the code is now written, it is impossible to register the checker for translation units that are compiled with incompatible flags.

This should not cause a warnings because it should be fine to compile different translation units with different flags within the same project, while the decision to enable the checker explicitly is done once per project.

Now, normally these decisions are handled by the Clang Driver - the gcc compatibility wrapper for clang that converts usual gcc-compatible run-lines into -cc1 run-lines. Eg., the unix package is only enabled on UNIX targets and the osx package is only enabled on macOS/iOS targets and the core package is enabled on all targets and security.insecureAPI is disabled on PS4 because that's what the Driver automatically appends to the -cc1 run-line. Even though scan-build does not call the analyzer through the Driver, it invokes a -### run-line to obtain flags first, so it still indirectly consults the Driver.

But the problem with the Driver is that nobody knows about this layer of decision-making (unlike the obvious Checkers.td) and therefore it's kinda messy to have all checkers hardcoded separately within the Driver. It kinda makes more sense to split the checkers into packages and let the Driver choose which packages are enabled.

But the problem with packages is that they are very hard to change because they are the UI that external GUIs rely upon. And right now we don't yet have separate packages for Objective-C MRR/ARC/GC modes (btw, GC needs to be removed).

And even if we were to go in that direction, the amount of packages would explode exponentially with the number of different independent flags we need to track. If only we had hashtags, that would have been a great direction to go.

So, generally, i suggest not to jump onto this as long as the analyzer as a whole works more or less correctly. Restricting functionality for the purpose of avoiding mistakes should be done with extreme care because, well, it restricts functionality. Before restricting funcitonality, we need to be more or less certain that nobody will ever need such functionality or that we will be able to provide a safe replacement when necessary without also introducing too much complexity (aka bugs). Which is why i recommend not to bother too much about it. There are much more terrible bugs all over the place, no need to struggle that hard to prevent these small bugs.

Another approach to the original problem would be to replace the language options check in registerBlahBlahChecker() with a set of language options checks in every checker callback. It's annoying to write and clutters the checker but it preserves all the functionality without messing with the registration process. So if you simply want to move these checks out of the way of your work, this is probably the way to go.

Thanks you so much! One of the reasons why I love working on this is because the great feedback I get. I don't wanna seem annoyingly grateful, but it's been a very enjoyable experience so far.

This should not cause a warnings because it should be fine to compile different translation units with different flags within the same project, while the decision to enable the checker explicitly is done once per project.

Okay, I don't insist on it :)

Now, normally these decisions are handled by the Clang Driver - the gcc compatibility wrapper for clang that converts usual gcc-compatible run-lines into -cc1 run-lines. Eg., the unix package is only enabled on UNIX targets and the osx package is only enabled on macOS/iOS targets and the core package is enabled on all targets and security.insecureAPI is disabled on PS4 because that's what the Driver automatically appends to the -cc1 run-line. Even though scan-build does not call the analyzer through the Driver, it invokes a -### run-line to obtain flags first, so it still indirectly consults the Driver.

But the problem with the Driver is that nobody knows about this layer of decision-making (unlike the obvious Checkers.td) and therefore it's kinda messy to have all checkers hardcoded separately within the Driver. It kinda makes more sense to split the checkers into packages and let the Driver choose which packages are enabled.
So, generally, i suggest not to jump onto this as long as the analyzer as a whole works more or less correctly. Restricting functionality for the purpose of avoiding mistakes should be done with extreme care because, well, it restricts functionality. Before restricting funcitonality, we need to be more or less certain that nobody will ever need such functionality or that we will be able to provide a safe replacement when necessary without also introducing too much complexity (aka bugs). Which is why i recommend not to bother too much about it. There are much more terrible bugs all over the place, no need to struggle that hard to prevent these small bugs.

Another approach to the original problem would be to replace the language options check in registerBlahBlahChecker() with a set of language options checks in every checker callback. It's annoying to write and clutters the checker but it preserves all the functionality without messing with the registration process. So if you simply want to move these checks out of the way of your work, this is probably the way to go.

Do you mean something along the lines of supplying a boolean shouldRegister##CHECKERNAME function to checkers? That would essentially achieve the same thing, but still allows me to move CheckerManager::registerChecker ultimately out of checker registry function.

NoQ added a comment.Nov 13 2018, 1:24 PM

Hmm, makes sense. Maybe static bool BlahBlahChecker::isApplicable(const LangOpts &LO) or something like that.

Btw, maybe instead of default constructor and ->setup(Args) method we could stuff all of the setup(Args)'s arguments into the one-and-only constructor for the checker?

In D54438#1297602, @NoQ wrote:

Hmm, makes sense. Maybe static bool BlahBlahChecker::isApplicable(const LangOpts &LO) or something like that.

Btw, maybe instead of default constructor and ->setup(Args) method we could stuff all of the setup(Args)'s arguments into the one-and-only constructor for the checker?

Neat. Make all checkers have only a single constructor, expecting a const CheckerManager &! Since getCurrentName and the like will be thing of the past, for once, maybe we could also initialize checker options in constructors too. Hopefully.

This comment was removed by Szelethus.
Szelethus added a comment.EditedDec 2 2018, 11:48 AM

Okay. I was wrong, and you were right.

MallocChecker seriously suffers from overcrowding. Especially with the addition of InnerPointerChecker, it should be split up, but doing that is very much avoidable for the purposes of this effort while still achieving every goal of it (which I'll detail in the next couple points). But, I spent so much time with it, it kinda grew on me, so I'll probably tackle the problem on my downtime.

The change of course as follows, checkmarks indicate that I already have a working solution locally:

  • ✔️ Introduce the boolean ento::shouldRegister##CHECKERNAME(const LangOptions &LO) function very similarly to ento::register##CHECKERNAME. This will force every checker to implement this function, but maybe it isn't that bad: I saw a lot of ObjC or C++ specific checkers that should probably not register themselves based on some LangOptions (mine too), but they do anyways.
    • ❌ Rename register##CHECKERNAME to enable##CHECKERNAME, and let's reserve the term "registering" to CheckerRegistry.
    • This patch will not feature any functional change, despite the observation I just made.
  • ✔️ There are in fact a variety of checkers that contain subcheckers like MallocChecker, which I think is all good, but the concept that if a subchecker is enabled it only sort of registeres the main checker (in MallocChecker's case it enables modeling, but not reporting) is very hard to follow. I propose that all such checkers should clearly have a base, called something DynamicMemoryModeling, or IteratorModeling etc.
  • ✔️ UnixAPI contains a dual checker, but they actually don't interact at all, split them up.
  • ✔️ Introduce dependencies, all checkers register themselves and only themselves. On my local branch I also like that it's super obvious which checker is interacting with which one, for example, IteratorChecker also has multiple parts, but with this patch, it's super obvious from a glance at Checkers.td.
    • ✔️ We can actually ensure that dependencies are registered before the checkers that depend on them, thanks to asserts added in the first part.
  • ✔️ Add the getChecker function to CheckerManager, assert when it is called on a checker that isn't already registered. Add an assert when CheckerManager::registerChecker is called on an already registered checker.
  • ❌ Move CheckerManager::registerChecker<T> out of the registry functions.
    • ❌ Since shouldRegister##CHECKERNAME is a thing, we can move everything to the checker's constructor, supply a CheckerManager, eliminating the function entirely.
    • ❌ At long last, get rid of CheckerManager::setCurrentCheckerName and CheckerManager::getCurrentCheckerName.
      • ❌ It was discussed multiple times (D48285#inline-423172, D49438#inline-433993), that acquiring the options for the checker isn't too easy, as it's name will be assigned only later on, so currently all checkers initialize their options past construction. This can be solved either by supplying the checker's name to every constructor, or simply storing all enabled checkers in AnalyzerOptions, and acquire it from there. I'll see.
  • ✖️ Properly document why CheckerRegistry and CheckerManager are separate entities, how are the tblgen files processed, how are dependencies handled, and so on.
  • ✖️ Rebase my local checker option-related branches, and celebrate. I wouldn't like to go in any more detail, who knows what lies ahead.
NoQ added a comment.Dec 6 2018, 4:23 PM
  • ✔️ There are in fact a variety of checkers that contain subcheckers like MallocChecker, which I think is all good, but the concept that if a subchecker is enabled it only sort of registeres the main checker (in MallocChecker's case it enables modeling, but not reporting) is very hard to follow. I propose that all such checkers should clearly have a base, called something DynamicMemoryModeling, or IteratorModeling etc.

This will be the most time-consuming part, as i imagine. You'll have to split each callback into two callbacks in two different checkers (PostCall vs. PostStmt<CallExpr> doesn't work!) and make sure that they are called in the correct order. I expect most modeling to go into PostStmt ("post-conditions") and most checking go to PreStmt ("pre-conditions"). I.e., "if the pre-condition of the statement is not fulfilled, the checker reports a bug. Otherwise, the model enforces the post-condition after the statement"). Some code (eg., on which particular statements does the checker react?) might be annoying to de-duplicate. Not sure how callbacks that don't have pre/post variants will behave.

Szelethus added a comment.EditedDec 6 2018, 4:33 PM
In D54438#1322266, @NoQ wrote:
  • ✔️ There are in fact a variety of checkers that contain subcheckers like MallocChecker, which I think is all good, but the concept that if a subchecker is enabled it only sort of registeres the main checker (in MallocChecker's case it enables modeling, but not reporting) is very hard to follow. I propose that all such checkers should clearly have a base, called something DynamicMemoryModeling, or IteratorModeling etc.

This will be the most time-consuming part, as i imagine. You'll have to split each callback into two callbacks in two different checkers (PostCall vs. PostStmt<CallExpr> doesn't work!) and make sure that they are called in the correct order. I expect most modeling to go into PostStmt ("post-conditions") and most checking go to PreStmt ("pre-conditions"). I.e., "if the pre-condition of the statement is not fulfilled, the checker reports a bug. Otherwise, the model enforces the post-condition after the statement"). Some code (eg., on which particular statements does the checker react?) might be annoying to de-duplicate. Not sure how callbacks that don't have pre/post variants will behave.

Actually, it's already implemented, and the beauty of it is that it requires pretty much no change to the checker files. MallocChecker is a mess, but splitting it up is avoidable in order to pull this off. I'll probably upload a patch in the coming days with my proposed solution. Super excited about it!

Side note: Before realizing this, I worked out a neat plan to split up MallocChecker that addresses all of the issues you mentioned above. I'll share once it I get the time to put together a nice looking proposal -- since everything is so intertwined, I really have to go into tiny details with it, a high level "gonna look like along the lines of this and that" won't cut it. I already have a bunch of txt files, whiteboards and papers filled with it.

MTC added a subscriber: MTC.Dec 6 2018, 6:24 PM
Szelethus updated this revision to Diff 177182.EditedDec 7 2018, 5:24 AM
Szelethus retitled this revision from [analyzer][WIP] Reimplement dependencies between checkers to [analyzer] Reimplement dependencies between checkers.
Szelethus edited the summary of this revision. (Show Details)
Szelethus added reviewers: MTC, baloghadamsoftware.
  • No longer splitting up InnerPointerChecker
  • After spending a lot of more time with the checker files, and adding various asserts in followup patches, realize that dependencies in between checkers are far, far more common than anticipated. I find it really cool that this is clearly visible now from the tblgen file. Add the following new checkers:
    • StackAddrEscapeBase
    • CStringModeling
    • DynamicMemoryModeling (base of the MallocChecker family)
    • IteratorModeling (base of the IteratorChecker family)
    • ValistBase
    • SecuritySyntaxChecker (base of bcmp, bcopy, etc...)
    • NSOrCFErrorDerefChecker (base of NSErrorChecker and CFErrorChecker)
    • IvarInvalidationModeling (base of IvarInvalidation checker family)
  • Make sure checkers are only enabled if all of their dependencies can be enabled
  • Put strong emphasis on preserving the order of insertions when collecting enabled checkers
  • It is no longer possible to enable CallAndMessageUnInitRefArg without CallAndMessageChecker. Disabling core checkers isn't supported anyways.
This revision is now accepted and ready to land.Dec 7 2018, 5:24 AM
Szelethus updated this revision to Diff 177189.EditedDec 7 2018, 5:34 AM
  • Register the checker after it's dependencies (accidentally left this change in the next branch). This implies that unix.MallocChecker:Optimistic is no longer a thing, and is now called unix.DynamicMemoryModeling:Optimistic. This could break backward compatibility, but is also addressable, since it can always be converted to whatever full name DynamicMemoryModeling will have in the future. @george.karpenkov @NoQ feelings on this?
  • In test/Analysis/Inputs/expected-plists/nullability-notes.m.plist, the name of the checker associated with a report changed, but this is NOT an issue raised by this patch -- simply changing the order of analyzer-checker=ABC,CBA to CBA,ABC will cause the same issue. Please don't mind me, but I'm not immediately interested in fixing this issue.
Szelethus requested review of this revision.Dec 7 2018, 5:35 AM

Please take a second look, this is a pretty important to get right.

  • ❌ Move CheckerManager::registerChecker<T> out of the registry functions.
    • ❌ Since shouldRegister##CHECKERNAME is a thing, we can move everything to the checker's constructor, supply a CheckerManager, eliminating the function entirely.
    • ❌ At long last, get rid of CheckerManager::setCurrentCheckerName and CheckerManager::getCurrentCheckerName.
      • ❌ It was discussed multiple times (D48285#inline-423172, D49438#inline-433993), that acquiring the options for the checker isn't too easy, as it's name will be assigned only later on, so currently all checkers initialize their options past construction. This can be solved either by supplying the checker's name to every constructor, or simply storing all enabled checkers in AnalyzerOptions, and acquire it from there. I'll see.

Pulling this off is not only difficult, certainly super-invasive, but also unnecessary -- in the final patch (D55429) I used a far simpler (~7 lines of code) solution, that still ensures that the checker naming problem never comes up again.

Thank you so much @NoQ for all the feedback! This project has been a super fun. I still expect some skeletons to fall out of the closet, but I'm fairly confident that the overall direction is set and is good.

NoQ accepted this revision.Dec 7 2018, 2:46 PM

I'm seeing no problems with this patch and i'm happy that we're replacing hacks with well-defined patterns :)

  • Register the checker after it's dependencies (accidentally left this change in the next branch). This implies that unix.MallocChecker:Optimistic is no longer a thing, and is now called unix.DynamicMemoryModeling:Optimistic. This could break backward compatibility, but is also addressable, since it can always be converted to whatever full name DynamicMemoryModeling will have in the future. @george.karpenkov @NoQ feelings on this?

I believe nobody uses this flag, feel free to remove it entirely.

  • In test/Analysis/Inputs/expected-plists/nullability-notes.m.plist, the name of the checker associated with a report changed, but this is NOT an issue raised by this patch -- simply changing the order of analyzer-checker=ABC,CBA to CBA,ABC will cause the same issue. Please don't mind me, but I'm not immediately interested in fixing this issue.

It wasn't correct in the first place (should have been NullablePassedToNonnull instead). Anyway, no problem!

This revision is now accepted and ready to land.Dec 7 2018, 2:46 PM
NoQ added a comment.Dec 7 2018, 2:50 PM

Generally, because the primary user of checker names is clang-tidy, the correct name to display is the name of the specific checker that the user needs to disable in order to get rid of the warning.

NoQ added a comment.Dec 7 2018, 2:59 PM

Also you might need to rebase on top of D55400.

I can't fix these right away, but I don't want myself to forget it before commiting. Pay no attention.

include/clang/StaticAnalyzer/Checkers/CheckerBase.td
51–56 ↗(On Diff #177189)

MallocBase was renamed to DynamicMemoryModeling

include/clang/StaticAnalyzer/Frontend/CheckerRegistry.h
104 ↗(On Diff #177189)

Should be ConstCheckerInfoList.

139 ↗(On Diff #177189)

typo: depend

lib/StaticAnalyzer/Frontend/CheckerRegistry.cpp
174 ↗(On Diff #177189)

The same assert in there just a couple lines above.

utils/TableGen/ClangSACheckersEmitter.cpp
102–103 ↗(On Diff #177189)

This is incorrect.

Hmmm, I left plenty of room for improvement in the tblgen generation, we could generate compile-time errors on cycles within the dependency graph, try to include the generated file only once, but these clearly are very low-prio issues, so I'll do it later. I'll have to revisit the entire thing soon enough.

I still expect some skeletons to fall out of the closet

This patch doesn't handle -analyzer-disable-checker, which leads to assertation failures in later pathes. Since the way which checker should/shouldnt be enabled is implemented rather poorly, I'll probably try to find a long-term solution.

Szelethus updated this revision to Diff 182726.Jan 20 2019, 4:51 PM
  • Rebase
  • Resolve the issue mentioned above by not enabling checkers that has any of their dependencies explicitly disabled
  • Introduce osx.RetainCountBase to "solve" the issue mentioned in D55400#1364683, but the test files are still untouched. This leads to 9 lit test failures, but since this patch is ancient, I'd rather update it now and follow up with another, final version later.
Szelethus updated this revision to Diff 183312.EditedJan 24 2019, 7:19 AM

Fixed the tests I mentioned in the earlier comment. This makes this patch ready to go, if the two new patches it depends on are okay.

I don't like how cluttered this patch has gotten, but since ensuring the registration order has a lot of side effects, I just couldn't make this patch any leaner. Gotta have a big patch for fixing a big mess.

Szelethus edited the summary of this revision. (Show Details)Jan 24 2019, 7:20 AM
Szelethus set the repository for this revision to rC Clang.
NoQ accepted this revision.Jan 25 2019, 7:02 PM

*gets hyped for the upcoming patchlanding party*

Hmmm, I left plenty of room for improvement in the tblgen generation, we could generate compile-time errors on cycles within the dependency graph, try to include the generated file only once, but these clearly are very low-prio issues, so I'll do it later. I'll have to revisit the entire thing soon enough.

Hmm. The dependency cycle check sounds cool as long as we do actually have problems with dependency cycles. I guess we could just enable/disable the whole cycle of checkers all together? This might even be a useful feature.

What do you mean by "include the generated file only once"?

lib/StaticAnalyzer/Frontend/CheckerRegistry.cpp
225–229 ↗(On Diff #183312)

Aha, ok, so the current approach to conflict resolution is to only enable the checker if none of its dependencies were disabled on the command line. So if, say, DependentChecker depends on BaseChecker, once an -analyzer-disable-checker BaseChecker flag is passed, it needs to be reverted via -analyzer-checker BaseChecker before the -analyzer-checker DependentChecker directive would take effect, regardless of in which order it is with respect to the BaseChecker-enabling/disabling directives.

So we kinda choose to err on the side of disabling in ambiguous situations. Which kinda makes sense because the disable-argument is more rare and indicates an irritation of the user. But it is also kinda inconsistent with how the options interact in absence of dependencies: "the flag that was passed last overrides all previous flags". And we can kinda fix this inconsistency by introducing a different behavior:

  • whenever an -analyzer-checker flag is passed, remove the "disabled" marks from all checkers it depends on;
  • whenever an -analyzer-disable-checker flag is passed, remove the "enabled" marks from all checkers that depend on it.

This approach still ensures that the set of enabled checkers is consistent (i.e., users are still not allowed to shoot themselves in the foot by running a checker without its dependencies), but it also looks respects every flag in an intuitive manner. WDYT?

This revision was automatically updated to reflect the committed changes.
In D54438#1372324, @NoQ wrote:

*gets hyped for the upcoming patchlanding party*

Oh yeah, super excited about this! It was a blast!

Hmmm, I left plenty of room for improvement in the tblgen generation, we could generate compile-time errors on cycles within the dependency graph, try to include the generated file only once, but these clearly are very low-prio issues, so I'll do it later. I'll have to revisit the entire thing soon enough.

Hmm. The dependency cycle check sounds cool as long as we do actually have problems with dependency cycles. I guess we could just enable/disable the whole cycle of checkers all together? This might even be a useful feature.

That sounds pretty cool actually! For all the crap I give people about documentation, I'm struggling quite a bit with this one (will probably end up making one for the entirety of Frontend/), but will definitely include this in there.

What do you mean by "include the generated file only once"?

Never mind. It was thinking aloud, which I later realized is nonsense.

Aha, ok, so the current approach to conflict resolution is to only enable the checker if none of its dependencies were disabled on the command line. So if, say, DependentChecker depends on BaseChecker, once an -analyzer-disable-checker BaseChecker flag is passed, it needs to be reverted via -analyzer-checker BaseChecker before the -analyzer-checker DependentChecker directive would take effect, regardless of in which order it is with respect to the BaseChecker-enabling/disabling directives.

Exactly.

So we kinda choose to err on the side of disabling in ambiguous situations. Which kinda makes sense because the disable-argument is more rare and indicates an irritation of the user. But it is also kinda inconsistent with how the options interact in absence of dependencies: "the flag that was passed last overrides all previous flags". And we can kinda fix this inconsistency by introducing a different behavior:

  • whenever an -analyzer-checker flag is passed, remove the "disabled" marks from all checkers it depends on;
  • whenever an -analyzer-disable-checker flag is passed, remove the "enabled" marks from all checkers that depend on it.

This approach still ensures that the set of enabled checkers is consistent (i.e., users are still not allowed to shoot themselves in the foot by running a checker without its dependencies), but it also looks respects every flag in an intuitive manner. WDYT?

Sounds great! Got nothing against that.

Hmm, I didn't really add tests about the dependency related potential implicit disabling, not even a warning, so there still is more work to be done.

uabelho added inline comments.
cfe/trunk/include/clang/StaticAnalyzer/Checkers/Checkers.td
230

Any reason this checker shouldn't get a dependecy too?

If I run it with

clang -cc1 -analyze -analyzer-checker=core -analyzer-checker=nullability.NullReturnedFromNonnull empty.c

on an empty file empty.c I get

clang: ../tools/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h:171: CHECKER *clang::ento::CheckerManager::getChecker() [CHECKER = (anonymous namespace)::NullabilityChecker]: Assertion `CheckerTags.count(tag) != 0 && "Requested checker is not registered! Maybe you should add it as a " "dependency in Checkers.td?"' failed.

If I add
Dependencies<[NullabilityBase]>,
to it, then it doesn't trigger the assert.

I don't know anything about this code, what do you think about it?

Szelethus marked an inline comment as done.Jan 28 2019, 5:44 AM
Szelethus added inline comments.
cfe/trunk/include/clang/StaticAnalyzer/Checkers/Checkers.td
230

Yup, should be there. Thanks for catching this one! I'll be able to commit the fix in about 4ish hours, or if it blocks you, feel free to so before that.

Since I didnt add test for each and every affected checker, I wonder whether there is any more of these that I forgot/messed up while rebasing.

Thanks again! :)

uabelho added inline comments.Jan 28 2019, 5:50 AM
cfe/trunk/include/clang/StaticAnalyzer/Checkers/Checkers.td
230

Sounds good, no panic!

Thanks!

After this landed, options for RetainCountChecker stopped working - e.g. I can't use osx.cocoa.RetainCount:blah=X.
Do you know why is this the case / how to fix it?

After this landed, options for RetainCountChecker stopped working - e.g. I can't use osx.cocoa.RetainCount:blah=X.
Do you know why is this the case / how to fix it?

That's bad. I'll look into this ASAP, might take a day or two though if that's alright.

Szelethus added a comment.EditedJan 31 2019, 2:00 AM

After this landed, options for RetainCountChecker stopped working - e.g. I can't use osx.cocoa.RetainCount:blah=X.
Do you know why is this the case / how to fix it?

That's bad. I'll look into this ASAP, might take a day or two though if that's alright.

Nvm, just popped into my head. I suspect that this has been a bug for a while, this patch merely unearthed it. As you said:

registerChecker gets-or-creates a checker object. A checker name (used for getting the options) is set the first time it's created.
The checker which was created first "wins" and gets to name the resulting checker.
In practice it basically means that options and checkers reusing the same class do not work.

This patch now ensures that every time osx.cocoa.RetaiCount and/or osx.OSObjectREtainCount is enabled, osx.RetainCountBase is registered first, which makes the checker object's name just that, making (incorrectly ofc) AnalyzerOptions look for osx.RetainCountBase:blah=X.

Should we not have multiple checkers belong to the same checker object, the approach of requesting this checker object for it's options would make sense, but since this isn't the case, this bug exposes a weakness of AnalyzerOptions::getChecker*Option, namely that these subcheckers aren't able to possess any options at all.

So clearly, AnalyzerOptions should just take the actual checker name as parameter, which each checker knows via the Name field, or via CheckerManager::getCurrentCheckName in the checker registry function. I'll try to sort this out tonight.