Page MenuHomePhabricator

[analyzer] NFC: BugReporter Separation Ep.I.
ClosedPublic

Authored by NoQ on Aug 21 2019, 6:05 PM.

Details

Summary

The ultimate goal is to allow Clang-Tidy to reap the benefits of our BugReporter while also not being entirely dependent on the Static Analyzer (eg., be able to get compiled with CLANG_ENABLE_STATIC_ANALYZER=OFF). This would involve eventually moving the basic BugReporter and BugReport/BasicBugReport classes into libAnalysis.

This is the minimal patch that separates the BugReport class into BasicBugReport and PathSensitiveBugReport in a sensible manner and compiles. I looked through all their APIs and made a quick guess on whether they do or do not require path-sensitive reasoning and put them into the respective class based on that.

There's no common base class for the two BugReporters yet.

Checker API breakage was fairly bearable, in most cases i just had to specify whether i want a basic report or a path-sensitive report. Most path-insensitive checkers were already using the EmitBasicReport() wrapper so they are unaffected.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald Transcript
NoQ marked 6 inline comments as done.Aug 21 2019, 6:12 PM

Most of the patch is boring but here are a few places that i wanted attract attention to.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
225 ↗(On Diff #216534)

Yes, we allow inheriting from these specific BugReport sub-classes, because there turned out to be one checker that was using it.

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.h
56 ↗(On Diff #216534)

I'm not pointing any fingers...

77 ↗(On Diff #216534)

...and still not pointing any fingers.

clang/lib/StaticAnalyzer/Core/BugReporter.cpp
2713 ↗(On Diff #216534)

I vaguely remember that this pattern is frowned upon. Does anybody want me to extract this into a common non-virtual function?

2983–2984 ↗(On Diff #216534)

Same question about extracting into a non-virtual function.

2993–2996 ↗(On Diff #216534)

*crosses fingers*

Super high level question: CheckerManager knows whether a checker, and I suspect a checker callback is path sensitive or not, do you think we can automate this decision (whether the bug report is path sensitive of syntactic)? For the purposes of clang-tidy, can we say such a thing that if we can't prove a bug report to be path sensitive, it will not be that?

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.h
56 ↗(On Diff #216534)

Mhm, I can see that :^)

NoQ added a comment.Aug 21 2019, 8:39 PM

Super high level question: CheckerManager knows whether a checker, and I suspect a checker callback is path sensitive or not, do you think we can automate this decision (whether the bug report is path sensitive of syntactic)? For the purposes of clang-tidy, can we say such a thing that if we can't prove a bug report to be path sensitive, it will not be that?

I want to reserve the ability for path-sensitive checkers to occasionally emit path-insensitive reports. This is due to my pipe dream of making a poor man's data flow engine by simply trusting ExprEngine when it says that it has explored all execution paths through a function. Cf. UnreachableCodeChecker, TestAfterDivZeroChecker. There are no other reasons because if we're not presenting a path we must admit that the problem happens on all paths, which implies some sort of must-problem, which implies no proper symbolic execution.

Maybe also we can consider reporting path-insensitive reports when we are about to emit a path-sensitive report but look at the AST at the last minute and it turns out that the problem is entirely local.

NoQ updated this revision to Diff 216706.Aug 22 2019, 2:04 PM

Rebase T__T

You got me convinced.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
223 ↗(On Diff #216706)

I think I added a visitor range not long ago?

NoQ marked an inline comment as done.Aug 23 2019, 4:07 PM
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
223 ↗(On Diff #216706)

Yup, i just moved it from BugReport to PathSensitiveBugReport.
(hint: this yellow bar on the left of the insertion indicates that, you can also hover it to see where was the code moved from)

Szelethus added inline comments.Aug 25 2019, 7:26 AM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
75 ↗(On Diff #216706)

Shouldn't we make this an abstract class?

teemperor resigned from this revision.Aug 26 2019, 3:58 AM

Sorry for the silly comments, but my main point is, I guess, that I don't quite understand the design towards which you are refactoring, and to meaningfully review patches I need to be on the same page.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
75 ↗(On Diff #216706)

I'm not sure that intrusive linked list is the right data structure for the job. I'd personally put bug reports into a vector and make a custom data structure if a vector becomes a performance problem.

122 ↗(On Diff #216706)

Where can I read about the uniqueing logic? Does it mean that out of two bug reports with the same location only one gets displayed, regardless of other properties?

126–135 ↗(On Diff #216706)

I don't think getUniqueingDecl() and getDeclWithIssue() make sense for most ClangTidy checkers.

159 ↗(On Diff #216706)

We don't care about MSVC 2013 now.

174 ↗(On Diff #216706)

Another location-related method... I guess I would appreciate a more abstract writeup about what BugReport's data model is (in its doc comment).

186 ↗(On Diff #216706)

Ranges should be associated with a message.

NoQ marked 11 inline comments as done.Aug 29 2019, 7:46 PM
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
75 ↗(On Diff #216706)

Shouldn't we make this an abstract class?

Mmm, yeah, moved virtual methods from BugReport to BasicBugReport whenever PathSensitiveBugReport immediately overrides them and it looks much better now!

I'm not sure that intrusive linked list is the right data structure for the job.

Me neither, but it seems to be orthogonal to this patch, and this patch is already huge, so i'll do this in a follow-up patch, ok?

122 ↗(On Diff #216706)

Added some comments ^^

126–135 ↗(On Diff #216706)

getDeclWithIssue() corresponds to the "Function/Method" column we have in our scan-build index page:

This column is not super important but probably kinda nice to have. I don't mind leaving it empty for clang-tidy checks, or possibly auto-detect it post factum when possible (e.g., find the smallest decl in which the warning location is contained).

getUniqueingDecl() is, as far as i understand, only currently used by the issue hash mechanism which governs deduplication across translation units (e.g., we emit a warning against the same header in multiple translation units - it's vital for the static analyzer because it's not uncommon for an execution path that starts in the main file to end in a header). And even then, it's only present in plist output. I'm not convinced it's useful at all. It's likely that we can remove it.

Also clang-tidy checks wouldn't need to specify their UniqueingDecl manually as it silently defaults to their DeclWithIssue.

So basically we have these two methods on the base class for bureaucratic reasons: our existing algorithms handle all kinds of warnings uniformly based on these computed properties of bug reports. I think we should be perfectly fine if some kinds of bug reports return null from them, indicating that "this property doesn't make sense for them". We could instead reinvent protocols on top of our custom RTTI ("does this warning conform to Uniqueable? if so, i'll try to unique it"), but i don't think it's worth the complexity.

159 ↗(On Diff #216706)

Woohoo!

186 ↗(On Diff #216706)

Mmm, what do you mean?

Currently these ranges are attached to the warning message, path note messages can't have ranges, and extra path-insensitive notes can have a separate set of ranges attached to them by passing them through addNote().

NoQ updated this revision to Diff 218013.Aug 29 2019, 7:47 PM
NoQ marked an inline comment as done.

Fxd.

NoQ updated this revision to Diff 218176.Aug 30 2019, 3:20 PM

Change a hard cast to a soft cast because i suspect that we technically want to allow mixing path-sensitive and path-insensitive bug reports in a single equivalence class. Like, we're not doing it right now but there's nothing that really prevents it.

NoQ retitled this revision from [analyzer] BugReporter Separation Ep.I. to [analyzer] NFC: BugReporter Separation Ep.I..Aug 30 2019, 3:29 PM
NoQ marked an inline comment as done.
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
75 ↗(On Diff #216706)

Thank you for the conversation so far! This is not a complete review from me, but I'm trying to avoid branching in too many directions at once.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
122 ↗(On Diff #216706)

Thanks for the comments, but sorry, this one is still unclear.

(1)

// Imagine a ClangTidy checker that finds calls to math functions where it is 

double foo() {
  return sqrt(-1); // warning: 'sqrt' will always return NaN
}

What should be the "decl with issue" here? 'foo' or 'sqrt'?

(2)

#include <vector>
using std::vector; // warning: using names from 'std' is error-prone

(3)

void foo(Foo f) {
  bar(std::move(f)); // warning: passing result of 'std::move' as a const reference argument, no move will actually happen
}

Is it 'foo', 'bar', 'std::move', or the copy constructor of 'Foo'?

126–135 ↗(On Diff #216706)

getDeclWithIssue() corresponds to the "Function/Method" column we have in our scan-build index page:

Judging from the screenshot, that column seems to play a function of a human-readable summary of the warning location. Is that correct?

Should we determine it automatically from getLocation()? Is there a use case for getDeclWithIssue() be different from the function/class that getLocation() is pointing into?

(e.g., find the smallest decl in which the warning location is contained).

Oh you are saying the same thing. Why not have it always be auto-detected? Is there a use case for manual curation?

getUniqueingDecl() [...] I'm not convinced it's useful at all. It's likely that we can remove it.

I agree about eliminating it -- actually, replacing it with comparing source locations. In our experience running ClangTidy, uniquing bug reports based on source locations works quite well. Can you give it a go in a separate patch that we can land ahead of time? I see there are only 4 callers of the BugReport constructor that accepts the location.

Also the doc comment is somewhat contradictory. "declarations that ... contains ... the uniqueing location. This is not actively used for uniqueing..."

186 ↗(On Diff #216706)

I see. What looks weird to me is that methods related to the warning itself are on BugReport, but notes and fixits are their own data structures. It creates an inconsistent API, and makes notes and fixits feel bolted on.

Do you think it would make sense to change the API to be more uniform?

78 ↗(On Diff #218176)

Make NoteList private, or eliminate completely?

98 ↗(On Diff #218176)

I think using identical names for constructor parameters and member variables is idiomatic in LLVM.

108 ↗(On Diff #218176)

What's the difference between description and short description?

What are the expected grammatical conventions? For example, Clang and ClangTidy warning are sentence fragments that start with a lowercase letter and don't end with a period.

116 ↗(On Diff #218176)

The primary location of the bug report that points at the undesirable behavior in the code.

UIs should attach the warning description to this location. The warning description should describe bad behavior at this location.

142 ↗(On Diff #218176)

"Add a new note at the end of this path-insensitive report."

147 ↗(On Diff #218176)

I'm concerned that PathDiagnosticLocation is too complex of an abstraction for ClangTidy. It might even be too complex for Static Analyzer checkers. It is OK for the Static Analyzer core though.

Take a look at users of addNote -- they use helpers like makeLocation. I don't think that would scale to the number of checkers that we have. The API should be simpler.

157 ↗(On Diff #218176)

ArrayRef?

175 ↗(On Diff #218176)

s/iterator_range/ArrayRef/?

210 ↗(On Diff #218176)

I think the delegation should be done in the opposite way -- getLocation should be the source of truth, and getUniqueingLocation should call getLocation.

440 ↗(On Diff #218176)

Could it just be ArrayRef<...> visitors(), without exposing extra typedefs?

gribozavr added inline comments.Sep 2 2019, 5:34 AM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
174 ↗(On Diff #216706)

What I meant by "data model" is a high-level description like this.

A bug report is one instance of a problem found by a checker. It is similar to a warning in Clang.

A bug report has:

  • a description which is ...
  • a short description (optional), which is ...
  • zero or more ranges, which ...
  • zero or more fixits, which provide advice about ...
  • zero or more notes, which ...

Each fixit has: ...

Each note has...

Szelethus added inline comments.Sep 3 2019, 4:29 AM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
108 ↗(On Diff #218176)

I just did some work related to this! For 99% of the cases (and for one, I'm pretty sure this estimate is quite accurate) there is no difference. AFAIK CodeChecker doesnt even use the short description field emitted in the plist output.

There is one checker however that uses it, RetainCount, though I dislike that particular case (the short version is something like "the counter here isn't zero", and the full version is "the counter here is +2"). Could we get rid of this?

As for diagnostic messages, we start with uppercase but dont terminate the sentence.

Szelethus added a comment.EditedSep 3 2019, 1:13 PM

Also, thank you @gribozavr for your comments -- its very nice to have someone review a bigger part of our development interface who is knowledgeable about Clang, but not the this part of the Static Analyzer specifically. Looking at your inlines, these are very fair criticisms, I found (find) these confusing when I just started contributing as well.

NoQ marked an inline comment as done.Sep 3 2019, 2:47 PM
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
186 ↗(On Diff #216706)

Hmm, i guess this is an artifact of how path-sensitive checkers usually emits warnings and their respective notes in completely different parts of their code (warnings come from the checker itself, path notes are generated by so-called "bug visitors" which aren't necessarily even a part of the checker).

Generally we need our notes to be attached to their respective warnings; say, in HTML report they need to be displayed on the same HTML page. But yeah, we should make our APIs more uniform because there's an obvious duplication of effort.

I also suspect that we'll need a new API in general, because in the current shape the BugReporter will look fairly alien and overly-complicated to clang-tidy developers that are used to the conciseness of diag() << .... I'm not sure if it'll boil down to providing convenient wrappers or i'll prefer to rewrite our checkers as well. I think we should talk about this separately on the mailing list.

NoQ marked 2 inline comments as done.Sep 3 2019, 4:42 PM
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
186 ↗(On Diff #216706)

http://lists.llvm.org/pipermail/cfe-dev/2019-September/063229.html

Basically, PathDiagnostic is the uniform API that we've been looking for. It's a vector of PathDiagnosticPieces each of which represents a note of certain kind (path note, normal note, control flow note, etc.). BugReporter is a mechanism for converting a BugReport into a PathDiagnostic. For path-sensitive reports this mechanism is extremely sophisticated: the checker only supplies a single ExplodedNode that corresponds to the end of path and the BugReporter automatically adds notes (often dozens of them, sometimes hundreds) to explain the path. For path-insensitive reports the conversion is extremely trivial and therefore there's very little motivation to use the BugReporter when only path-insensitive reports are expected.

NoQ updated this revision to Diff 219009.Sep 5 2019, 5:18 PM
NoQ marked 15 inline comments as done and an inline comment as not done.

Fix most comments.

Yup, i'm also very grateful for the comments :)

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
122 ↗(On Diff #216706)

Made another attempt at the decl with issue comment :)

126–135 ↗(On Diff #216706)

Oh you are saying the same thing. Why not have it always be auto-detected? Is there a use case for manual curation?

I don't think anybody needs manual curation. We just literally always had a good decl to put in there, so i guess that's why we never bothered implementing autodetection so far.

P.S. despite http://lists.llvm.org/pipermail/cfe-dev/2019-September/063229.html this discussion is still relevant, i'll just duplicate these docs to PathDiagnostic.

I agree about eliminating it -- actually, replacing it with comparing source locations. In our experience running ClangTidy, uniquing bug reports based on source locations works quite well.

I'll take a closer look at the current messy state of this thing.

This thing is part of the "issue hash" facility that was supposed to be more than just a uniqueing effort: it was supposed to be stable across changes in the source code under analysis, so that people could easily understand how their commits introduce new issues. Source locations are too granular for that because they easily skew when you add unrelated code above the source location. But the "offset into the current decl" is a much more stable metric. Cf. https://github.com/llvm-mirror/clang/commit/97bfb558f69c09b01a5c1510f08dc91eb62329a7

I don't fully understand how much this was made use of, so i'll take a closer look.

174 ↗(On Diff #216706)

I guess i'll carry over this comment to the upcoming PathDiagnostic-related patch because we're not handing off BugReporter anyway.

108 ↗(On Diff #218176)

Added comments that capture the current state of things.

142 ↗(On Diff #218176)

Ugh, this whole comment is wrong. We support additional out-of-path notes on path-sensitive reports since basically forever.

147 ↗(On Diff #218176)

Yup, absolutely.

Even though i'm abandoning the idea of handing off BugReporter directly to Clang-Tidy, the problem is still going to be there, so i'll try to come up with something in later patches.

210 ↗(On Diff #218176)

Hmm, this annoying SourceManager argument can indeed be removed.

440 ↗(On Diff #218176)

Not immediately because when visitors are being actually run the code permits itself to move unique pointers out of this vector and then put it back. So it needs write access (at least in the form of non-const iterators) and it has problems with mutating the container while iterating on top of that (as visitors may add other visitors while they run). So i'd rather not touch it for now.

NoQ updated this revision to Diff 219192.Sep 6 2019, 3:49 PM

Just rebase.

Szelethus accepted this revision.Sep 6 2019, 4:27 PM

This patch set the goal of splitting BugReport into two different report kinds, and I think it did that well. Not only that, we drastically improved the documentation, formalized many things that weren't in the core before, so I wouldn't like you to bear the burdern of never ending rebases (it doesn't make reviewing easier either). Let's work on the rest of the code in a followup patch.

I agree with @gribozavr, we could do better, and should do better. Many core classes in the analyzer feel like everyone just added 1-2 functions, 1-2 branches to get something done quickly, and while those functions in the context of the patch they were added in may have been obvious, it lead us to a cluster on non-descriptive, undocumented, confusing interface and implementation code (n+1 location related functions in this instance). While adding new checkers, better support for C++17 and whatnot is what will ultimately make the end user experience better, I like how we're investing a lot of effort into the health of the codebase nowadays, and I think it'll pay off in the long term.

This revision is now accepted and ready to land.Sep 6 2019, 4:27 PM

But, of course, let's wait for @gribozavr to give his blessings as well, I'm only accepting because removing/changing other parts of the API seems to deserve a separate revision :)

gribozavr accepted this revision.Sep 9 2019, 9:57 AM

I think this patch is a good improvement, and I don't want to hold it back -- but like we discussed before, and like you wrote on the mailing list, I would want a more simple API for ClangTidy.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
113 ↗(On Diff #219009)

Thanks for the doc comments! Please use three slashes here and in getShortDescription.

131 ↗(On Diff #219009)

Thanks for the explanation, just one question -- I don't understand what is meant by "canonical".

The bug can be found in a non-canonical declaration.

void f(); // canonical decl

void f() { // non-canonical decl
  *(int*)0 = 0; // bug
}
141 ↗(On Diff #219009)

Replace "is primarily used by" with "For example, leak checker that produces a different primary and uniqueing location. ..."

NoQ marked 5 inline comments as done.Sep 9 2019, 1:33 PM
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
113 ↗(On Diff #219009)

Whoops.

131 ↗(On Diff #219009)

Ugh, i somehow keep thinking that the definition, if available, will always be the getCanonicalDecl(), even though i stepped into this a few times already >.<

That's why we have bugs like D57619.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2019, 1:40 PM