This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] NFC: Improve upon the concept of BugReporterVisitor.
ClosedPublic

Authored by NoQ on Feb 18 2019, 9:04 PM.

Details

Summary

As i mentioned in D58067#1393674, i dislike the way bug reporter visitors have to reverse-engineer the checker's behavior instead of asking the checker what happened directly.

The approach suggested here is to allow the checker to take notes of what happens as it adds transitions, and then when the report is thrown, reconstruct path messages from these notes.

Because for the sake of future-proofness such messages should probably be allowed to be expensive to construct (even though i'm not aware of any specific examples of expensive-to-construct messages), the checker notes are defined to be lambdas that simply capture the necessary information but don't process it until a report is actually emitted. As a useful side effect, this allows the message to depend on the BugReport object itself, which is completely essential because most checkers won't emit path messages for every state update they make. For instance, when MallocChecker diagnoses a use-after-free, it only emits the "memory is freed" path message for the symbol that was accessed after free, but not for other symbols that were allocated or deallocated along the path. With lambda notes, we can check if the symbol is marked as "interesting" in the BugReport before emitting the path message. In the future we may want to extend the BugReport object to carry arbitrary Checker-specific data that the checker can take advantage of within its note lambdas.

Checker notes from which path messages are constructed are implemented as ProgramPointTags of special sub-kind: NoteTag. Then a special visitor is added to every report in order to scan the report for those tags and invoke the lambdas.

Here's how it looks in the checker in my next patch that actually makes use of the new functionality:

const NoteTag *T = C.getNoteTag([PVD]() -> std::string {
  SmallString<64> Str;
  llvm::raw_svector_ostream OS(Str);
  OS << "Deallocating object passed through parameter '" << PVD->getName() << '\'';
  return OS.str();
});

C.addTransition(C.getState()->set<ReleasedParameter>(true), T);

And there's no need to write all of this anymore:

class MyVisitor: public BugReporterVisitor {
/*Manual captures...*/
public:
  MyVisitor(/*Manual captures...*/) { ... }

  void Profile(llvm::FoldingSetNodeID &ID) {
    // The usual static int business.
    // And mention all manual captures.
    // Or maybe almost all, depends.
  }

  std::shared_ptr<PathDiagnosticPiece> VisitNode(const ExplodedNode *N,
                                               BugReporterContext &BRC,
                                               BugReport &BR) override {
    // The usual GDM update reverse-engineering idiom.
    if (N->getState()->get<MyTrait>(...) != N->getFirstPred()->get<MyTrait>(...)) {
       // Then the same message generating code.

       // Then a bunch of boilerplate to generate the piece itself:
       const Stmt *S = PathDiagnosticLocation::getStmt(N);
       if (!S)
         return nullptr;
       PathDiagnosticLocation Loc = PathDiagnosticLocation::create(S);
       return std::make_shared<PathDiagnosticEventPiece>(Loc, OS.str());
    }
  }
};

...

BR.addVisitor(MyVisitor(/*Manual captures...*/));

Diff Detail

Event Timeline

NoQ created this revision.Feb 18 2019, 9:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2019, 9:04 PM

I love the idea! It looks way cleaner, tbh how messy visitors are have kept me away from implementing one for my own checker. Couple thoughts:

  1. It would be great to have unit tests for this. Side note, I have already managed to make CSA unit tests tun under check-clang-analysis, but it makes check-clang and check-all run them twice, either way, if you find it helpful, I have a branch for it here: * I need to get home where I can push it to github, imagine a nice link here *. That should make development a tad bit less painful :)
  2. I guess most of the visitors could be changed to this format, do you have plans to convert them? I'll happily land a hand, because it sounds like a big chore. I guess that would also test this implementation fairly well.
  3. Either in your workbook, or even better, in the new sphinx documentation, it would be great to see a how-to. Although I guess it's fine to leave some time for this to mature.

Go at it, this really is great! ^-^

NoQ marked an inline comment as done.
NoQ added a subscriber: Charusso.
  1. I guess most of the visitors could be changed to this format, do you have plans to convert them? I'll happily land a hand, because it sounds like a big chore. I guess that would also test this implementation fairly well.

I don't have an immediate plan but i'll definitely convert visitors when i touch them and get annoyed. Also i believe that this new functionality is much more useful for core visitors than for checker visitors, simply because there's much more information to reverse-engineer in every visitor. Eg., trackExpressionValue would have been so much easier if it didn't have to figure out where did an assignment happen, but instead relied on ExprEngine to write down this sort of info as a tag in, say, evalStore. There are just too many ways to introduce a store/environment binding that represents moving a value from one place to another and it hurts me when i see all of them duplicated in the visitor as a military-grade portion of spaghetti. The same apples to the ConditionBRVisitor that might probably even benefit from having ConstraintManager explain the high-level assumption that's being made as it's being made; it might have also made @Charusso's work on supporting various sorts of assumptions much easier. At the same time, there aren't that many ways to close a file descriptor, so these are much easier to catch and explain in a visitor, so the main problem in checkers is the boilerplate. Checker APIs, however, are much more important to get polished because they're used much more often.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
222–236

Note: you can't use this API in checker callbacks that don't provide a CheckerContext. Let's have a quick look at the list of such callbacks:

  • checkEndAnalysis(). I'm not super worried about this one because it's a really weird place to put an intermediate note. If someone really really needs this, it should be possible to access ExprEngine directly from this callback.
  • evalAssume(). This one's a bummer - it could really benefit from the new functionality, but we can't squeeze a checker context into it because it definitely doesn't make sense to add transitions in the middle of assume(). We should probably be able to allow returning a note tag from that callback somehow.
  • checkLiveSymbols(). I'm not worried about this one because it doesn't correspond to any actual event in the program and there's no way to change the program state at this point. If we want to extract some information from it anyway, i guess we can add opaque checker-specific data into SymbolReaper and transfer it to checkDeadSymbols().
  • checkPointerEscape(), checkRegionChanges(). These are usually used for invalidation that normally doesn't deserve a note. But it can be argued that it may deserve a note sometimes (eg., "note: function call might have changed the value of a global variable"), so i guess i'm a tiny bit worried, but we can have a closer look at this when we find any actual examples.
  • checkEndOfTranslationUnit(), checkASTDecl(),checkASTCodeBody(). These don't fire during path-sensitive analysis, so there's nothing to worry about.

Thank you for working this. I totally agree with you: whenever the checker spawns a new node in the exploded graph there is no point to leave it unmarked and then revers engineer it. BugReporterVisitors should only care for nodes which are not spawned by the checker reporting the bug. This way we could get rid of some unnecessary visitors, e.g. CXXSelfAssignmentBRVisitor. This also opens an easier way for checkers similar to CXXSelfAssignmentChecker which does not report any bug but creates a new execution path which may end in a bug reported by another checker. Using note tags there is no need to create a separate BugReporterVisitor for every such checker and add it globally to every bug report.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
348

I am not sure whether BugReporterVisitors.h is the best place for this structure. I would rather put this into ProgramPoint.h or maybe BugReporter.h.

Charusso requested changes to this revision.Feb 20 2019, 12:52 PM

First of all, thanks you for working on this, as I wanted to do the same, but I did not know how to. I did not know also that 15 of the checkers already implements BugReporterVisitors, but I completely shocked it took 10 years of pain to rewrite it. It feels like this patch a little-bit brute-force, and I believe this should be the future direction of reporting. Also I believe in that we could hook a lot more useful information from *all* the checkers, and we do not care that much about the core BugReporterVisitors as they do their job enough well. So with that, I would out-chain this patch from the MIG-patches because it is a lot more serious problem-set. I also would like to ask you to take it more serious, as all your mentioned benefits will rely on this simple and cool approach, so it has to be flexible as possible. I am not into the internal stuff that much, so I cannot argue about the lack of CheckerContext functions, but I think this is a huge problem and breaks the flexibility a lot.

I tried out the patch and saw the very recommended source code of MallocChecker's so lightweight BugReporterVisitor implementation and I think your API is too strict about having a lambda function, so here is my ideas:

  • CheckerContext.h: As I see we only work with a string, nothing else, so I would extend that class:
ExplodedNode *addNote(ProgramStateRef State, StringRef Message) {
  return addTransitionImpl(State, false, nullptr, setNoteTag(Message)); // Note the 'setNoteTag()'
}

getNoteTag(): you could differentiate a setter with the true getter what you only use in TagVisitor::VisitNode() (where you truly hook your Callback to get extra information later on):

const NoteTag *setNoteTag(StringRef Message) {
  return Eng.getNoteTags().setNoteTag(Message);
}
  • BugReporterVisitors.h: basically a NoteTag is a string, nothing else, I think you would like to hook the Callback only when you are about to obtain the message:
class NoteTag {
  std::string Msg;
  NoteTag(StringRef Message) : ProgramPointTag(&Kind), Msg(Message)

  class Factory {
    const NoteTag *getNoteTag(Callback &&Cb)
    const NoteTag *setNoteTag(StringRef Message)
  };
};
  • Documentation: if we are about the rewrite the entire bug-reporting business, it worth some long doxygen comments.

The outcome is that cool I have to be the reviewer in two patches, and I hope it helps you creating a better API, even if I got something wrong.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
364

It feels like it returns the NoteTag, so I would rename it as getMessage().

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
223

It is very randomly spaced, lightweight.

This revision now requires changes to proceed.Feb 20 2019, 12:52 PM
NoQ added a comment.Feb 20 2019, 6:20 PM

Thx!~

This is a cool approach, but it is difficult to use this API in other checkers. If you do not out-chain D58367 I would like to see something like that here:

SmallString<64> Str;
llvm::raw_svector_ostream OS(Str);
OS << "Deallocating object passed through parameter '" << PVD->getName()
   << '\'';

C.addNote(C.getState()->set<ReleasedParameter>(true), OS.str());

The idea to allow passing a raw std::string instead of a lambda that generates a string sounds like a great improvement, i'll do this.

I don't think i want to disallow lambdas entirely, because

  • Some dynamic logic is definitely often necessary - eg., the MallocChecker example above. Essentially, in most cases we don't know what message should we produce (or even whether we should produce it at all) until the bug report is actually emitted. This specific checker is very lucky in that regard: every transition produced by this checker on the execution path does deserve a message, and this message doesn't depend on the bug report in any manner, but only on the event that caused it to appear.
  • I want to make the API flexible enough to avoid performance problems. Generating the string is usually cheap, but transitions are made much more often than warnings are issued (simply because most code doesn't cause us to produce a bug), so producing the message every time we make a transition is a much higher multiplier for the cost of generating the string than producing the message every time we make a transition-that-leads-to-a-bug.

Adding a comfy wrapper around addTransition that constructs the tag automatically is also a great idea, i'll do this, but i'd like to delay this a little bit because, well, addTransition API is generally a large source of pain for checker developers and could use a rework in order to make it more understandable.

In particular, the implementation you propose does a lot more than adding a note: it also adds, well, a transition. Namely, the reader would most likely expect the following code

State1 = State->set<Trait1>(Value1);
C.addNote(State1, "Message 1");
State2 = State1->set<Trait2>(Value2);
C.addNote(State2, "Message 2");

to make two updates to the state and display a note for each of them:

/-------------\
| Predecessor |
| State       |
\-------------/
      ||
      \/
/-------------\
| Node1       |
| State1      |
| "Message 1" |
\-------------/
      ||
      \/
/-------------\
| Node2       |
| State2      |
| "Message 2" |
\-------------/

The actual behavior, however, would be to split the execution path in two parallel paths, analyze them separately later, and mark each of them with the respective message:

         /-------------\
         | Predecessor |
         | State       |
         \-------------/
           ||       ||
           \/       \/
/-------------\   /-------------\
| Node1       |   | Node2       |
| State1      |   | State2      |
| "Message 1" |   | "Message 2" |
\-------------/   \-------------/

We already have this problem with generateNonFatalErrorNode(): even though it explicitly states that it will generate a node, it's very unobvious that generating two error nodes not only allows you to report two bugs, but also causes an accidental state split. Additionally, such accidental state splits are hard to notice because most of the time it doesn't cause any visible differences apart from a 2x slowdown of the remaining analysis on that execution path. In case of addNote() it should be a bit better because you'd be able to notice that one of the state updates isn't taking place, leading to false positives, but in case of generateNonFatalErrorNode() there are often no state updates being made (which is bad on its own, but for a separate reason).

I do have one idea on how to make this less confusing, but it's a bigger piece of work.

we do not care that much about the core BugReporterVisitors as they do their job enough well.

If only that was so :)

So with that, I would out-chain this patch from the MIG-patches because it is a lot more serious problem-set. I also would like to ask you to take it more serious, as all your mentioned benefits will rely on this simple and cool approach, so it has to be flexible as possible.

I've been thinking for about a month about this until now, and i'm actually very surprised that i see no obvious flexibility issues here. Stateful visitors (eg., the ones that only highlight the *last* interesting event) can be easily implemented by keeping via lambdas as a private state data in the bug reporter. Mutually recursive systems of multiple visitors that add each other dynamically during the visit (such as the trackExpressionValue that's infamous for that exact reason) should be (and most likely could be) untangled anyway, and once they're untangled (eg., by keeping just one instance of the visitor while dynamically updating its tracking information), the flexibility issue disappears; i'm almost happy that it would no longer be possible to entangle the code that way. Dunno, this is weird - usually i quickly come up with examples of why something wouldn't work and decide to implement it anyway, but this time i'm surprisingly secure about implementing it.

I am not into the internal stuff that much, so I cannot argue about the lack of CheckerContext functions, but I think this is a huge problem and breaks the flexibility a lot.

The issue here is that the checker doesn't always have an ability to mutate the graph by adding nodes. Now it becomes more of a problem because the checker may want to emit notes more often than it wants to generate nodes. This problem can be easily mitigated by passing a different sort of context (not CheckerContext but something else) into these callbacks that will store lambdas in a different manner. It slightly sucks that we'd have to come up with a duplicate API for that, but that's not something we need to solve immediately - i believe that we can always add these later.

Another way to describe this limitation is that "lambda visitors" are only capable of adding notes to nodes that were produced by the checker itself. Old-style visitors, on the other hand, can drop notes in arbitrary places - only once per node, but it's still something. Lambdas, of course, also produce at most one message per node, but for lambdas it's not a problem because the checker can make as many nodes as it wants. Whereas an arbitrary visitor cannot retroactively provide itself with more nodes if they weren't constructed artificially ahead of time. I think this is not a big flexibility issue on its own because it should be possible to make lambdas from multiple sources cooperate with each other by sharing a certain amount of arbitrary state information within the BugReport object.

getNoteTag(): you could differentiate a setter with the true getter what you only use in TagVisitor::VisitNode() (where you truly hook your Callback to get extra information later on):

const NoteTag *setNoteTag(StringRef Message) {
  return Eng.getNoteTags().setNoteTag(Message);
}

Mmm, i don't think i understand this part. Like, getNoteTag() is not really a getter. It's a factory method on an allocator that causes it to produce an object. What is the semantics of a setter that would correspond to it?

  • Documentation: if we are about the rewrite the entire bug-reporting business, it worth some long doxygen comments.

Yup, totally.

Like, we're not really rewriting anything yet, just trying out a new approach. I'll definitely not jump into rewriting everything until the new approach gets tested a lot and turns out to actually work.

One important thing to document is the lifetime of these lambdas (equal to ExprEngine's lifetime, which is a fairly popular lifetime to have in the Analyzer). Captures would need to live that long and can probably be cleaned up at top-level checkBeginFunction() if they are memory-managed manually.

In general I think it would be cool to think all of the problems in the same time as these really depends on the CoreEngine and how we operates with ExplodedNodes.

In D58367#1405185, @NoQ wrote:

In particular, the implementation you propose does a lot more than adding a note: it also adds, well, a transition.

I think a possible workaround to handle special nodes is adding them in CoreEngine as successors (instead of predecessors) so when we are traversing backwards it immediately known the next (pred) "not-special" node will be its parent. This is the same logic in ConditionBRVisitor where we only work with pair of nodes to see if we would report something.

The issue here is that the checker doesn't always have an ability to mutate the graph by adding nodes. Now it becomes more of a problem because the checker may want to emit notes more often than it wants to generate nodes. This problem can be easily mitigated by passing a different sort of context (not CheckerContext but something else) into these callbacks that will store lambdas in a different manner.
I think this is not a big flexibility issue on its own because it should be possible to make lambdas from multiple sources cooperate with each other by sharing a certain amount of arbitrary state information within the BugReport object.

I think it ties with the problem above, so having some special ExplodedNode could fix both of them because may you would introduce some ExplodedNodeContext to store more information.

const NoteTag *setNoteTag(StringRef Message) {
  return Eng.getNoteTags().setNoteTag(Message);
}

Mmm, i don't think i understand this part. Like, getNoteTag() is not really a getter. It's a factory method on an allocator that causes it to produce an object. What is the semantics of a setter that would correspond to it?

All the problems what you mentioned is not really a job of a factory, so I just emphasized we only get the true benefit of the callback in the get part. I think leave it as it is now makes sense but later on there would be a lot more function to hook crazy lambdas to *set* information.

Please note that it is a very high-level feedback, as I just checked the CoreEngine and saw why we are splitting states.

NoQ updated this revision to Diff 187877.Feb 21 2019, 3:19 PM

Unblock the unrelated MIGChecker patches by untangling them from this core change. This patch will land after them and now includes an update to the checker that demonstrates the use of (and, well, tests) the new API.

Comments were not addressed yet :)

NoQ updated this revision to Diff 189948.Mar 8 2019, 3:22 PM

Found a bug! The lambda has to manually make sure that the bug report actually does have something to do with the checker. The attached testcase shows that we definitely need to avoid attaching a MIG checker note to a core.DivideZero bug report. I stuffed a check into the lambda to demonstrate what sort of logic do we need to avoid this problem:

if (&BR.getBugType() != &BT)
  return "";

But of course i'd much rather automate that because most checkers will need this sort of logic and because it looks really ugly and is very easy to forget. My best idea so far, for the next patch, is to store a pointer to the Checker object in both the note and the bug report and automatically compare them in TagVisitor before even trying to emit a note. Such check should probably be optional, because otherwise only checkers would be able to use note tags, while i still want the Core to be able to use it as well.

In D58367#1423451, @NoQ wrote:

Found a bug! The lambda has to manually make sure that the bug report actually does have something to do with the checker.

I think message-semantic is better than storing some data in two places. BugReport has getCheckName() and may if the NoteTag knows the name of its author then it is comparable.
I am not sure but may you could hack the lifetime of that StringRef from CheckName to not to report on something purged out (/not exists?).

xazax.hun accepted this revision.Mar 9 2019, 1:14 AM

LG!

It is an interesting idea to use this facility for trackExpressionValue. But I would expect such a mechanism to trigger quite often. I wonder if the memory consumption would increase significantly by storing a lambda for almost every binding for each path.
Right now we reclaim the memory after we finished analyzing a top-level function. If memory proves to be a problem, we could maybe reclaim memory for every non-buggy path analyzed? Of course, I prefer the simplicity of the current solution and hope that we never need to consider more complicated cleanup logic :)

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
400

Isn't this redundant?

clang/lib/StaticAnalyzer/Checkers/MIGChecker.cpp
165–174

I am not very familiar with this check but wonder don't you miss an "isInteresting" check somewhere? Where do we filter the notes that are unrelated to the bug paths?

Szelethus added a comment.EditedMar 12 2019, 5:31 AM

Would NoteTags be displayed in a dumped exploded graph?

In D58367#1405185, @NoQ wrote:

So with that, I would out-chain this patch from the MIG-patches because it is a lot more serious problem-set. I also would like to ask you to take it more serious, as all your mentioned benefits will rely on this simple and cool approach, so it has to be flexible as possible.

I've been thinking for about a month about this until now, and i'm actually very surprised that i see no obvious flexibility issues here. Stateful visitors (eg., the ones that only highlight the *last* interesting event) can be easily implemented by keeping via lambdas as a private state data in the bug reporter. Mutually recursive systems of multiple visitors that add each other dynamically during the visit (such as the trackExpressionValue that's infamous for that exact reason) should be (and most likely could be) untangled anyway, and once they're untangled (eg., by keeping just one instance of the visitor while dynamically updating its tracking information), the flexibility issue disappears; i'm almost happy that it would no longer be possible to entangle the code that way. Dunno, this is weird - usually i quickly come up with examples of why something wouldn't work and decide to implement it anyway, but this time i'm surprisingly secure about implementing it.

In D58367#1402847, @NoQ wrote:
  1. I guess most of the visitors could be changed to this format, do you have plans to convert them? I'll happily land a hand, because it sounds like a big chore. I guess that would also test this implementation fairly well.

I don't have an immediate plan but i'll definitely convert visitors when i touch them and get annoyed. Also i believe that this new functionality is much more useful for core visitors than for checker visitors, simply because there's much more information to reverse-engineer in every visitor.

As I understand it, this solution could be used to entirely get rid of the current bugreporter visitor structure (at least for checkers), right? The discussion seems to conclude that this is just as general, far easier to understand, far easier to implement, and is basically better in every regard without an (edit: significant) hit to performance? Because if so, I'm definitely against supporting two concurrent implementations of the same functionality -- in fact, we should even just forbid checkers to add custom visitors.

I can't say much about the rest of the discussion, seems like you two are far more knowledgable on this topic than me :^)

it hurts me when i see all of them duplicated in the visitor as a military-grade portion of spaghetti.

Made my day :D

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
386–398

Hmmm, did you consider the other memory management options we have in LLVM, like BumpPtrAllocator? Sounds a bit better for this use case.

NoQ marked an inline comment as done.Mar 14 2019, 6:11 PM

Would NoteTags be displayed in a dumped exploded graph?

That's a great question. Tags themselves are always printed out, together with their description, and description for note tags is defined by this patch as follows:

StringRef getTagDescription() const override {
  if (MemoizedMessage)
    return *MemoizedMessage;
  else
    return "Untriggered Note Tag";
}

This "tag description" thing is only visible in exploded graph dumps; it doesn't have any other meaning. It is clear that it's very appealing to not only mention that the tag is attached, but also to print out the message it would produce. However, it is impossible to obtain the message until the specific bug report is provided. Not only the report must already be emitted, but also it needs to be chosen to represent its class of equivalent bug reports.

For now it means that if you simply do the usual -analyzer-viz-egraph-graphviz thing or the debug.ViewExplodedGraph thing, the graph will be visualized *before* the note tag lambdas are invoked, and it won't be able to show you the text:

However, if you take your debugger, break at the end of BugReporter::FlushReport() and execute p ((GRBugReporter *)this)->Eng.ViewGraph(0) (or 1 if you want it trimmed), it'll show you the exact message:

I think it'd be great to delay -analyzer-viz-egraph-graphviz so that all note tags were resolved by the time the graph is printed.

Now, why was this a great question? This question was great because it helped me realize that the memoization is a broken idea because the same lambda may produce multiple different messages if it participates in multiple bug reports. Which means we need a more sophisticated memoization, i.e. map<BugReport *, Optional<string>> or something like that (assuming that bug reports can actually be identified by pointers).

In D58367#1402847, @NoQ wrote:
  1. I guess most of the visitors could be changed to this format, do you have plans to convert them? I'll happily land a hand, because it sounds like a big chore. I guess that would also test this implementation fairly well.

I don't have an immediate plan but i'll definitely convert visitors when i touch them and get annoyed. Also i believe that this new functionality is much more useful for core visitors than for checker visitors, simply because there's much more information to reverse-engineer in every visitor.

As I understand it, this solution could be used to entirely get rid of the current bugreporter visitor structure (at least for checkers), right? The discussion seems to conclude that this is just as general, far easier to understand, far easier to implement, and is basically better in every regard without an (edit: significant) hit to performance? Because if so, I'm definitely against supporting two concurrent implementations of the same functionality -- in fact, we should even just forbid checkers to add custom visitors.

I suspect so. Supporting two different implementations doesn't sounds scary because both of them are fairly easy (at least ever since George has eradicated the fix-point-iteration for mutually recursive visitors), but once note tags start landing and settling down, i'll definitely try to avoid accepting new visitors :)

clang/lib/StaticAnalyzer/Checkers/MIGChecker.cpp
165–174

Yeah, this check is special; it only tracks a single boolean flag in the program state and doesn't have a notion of interesting symbols or regions. Which is kinda why i started with this checker - it's easier than other checkers with the new approach (and harder than other checkers with the old approach).

As I understand it, this solution could be used to entirely get rid of the current bugreporter visitor structure (at least for checkers), right? The discussion seems to conclude that this is just as general, far easier to understand, far easier to implement, and is basically better in every regard without an (edit: significant) hit to performance? Because if so, I'm definitely against supporting two concurrent implementations of the same functionality -- in fact, we should even just forbid checkers to add custom visitors.

I am not sure we could get rid of all the checker-specific visitors, but most probably many of them. However, there are cases where we should find something "last" which is best done bottom-up.

NoQ updated this revision to Diff 192372.Mar 26 2019, 3:20 PM
NoQ marked 6 inline comments as done.

Remove memoization for now. Address a few inline comments. I'm mostly done with this first patch, did i accidentally miss anything?

  1. It would be great to have unit tests for this.

Mm, i have problems now that checker registry is super locked down: i need a bug report object => i need a checker (or at least a CheckName) => i need the whole AnalysisConsumer => i can no longer override ASTConsumer methods for testing purposes (because AnalysisConsumer is a final sub-class of ASTConsumer). Do you have a plan B for that?

I also generally don't see many good unit tests we could write here, that would add much to the integration tests we already have, but this memoization problem could have made a nice unit test.

In D58367#1423451, @NoQ wrote:

Found a bug! The lambda has to manually make sure that the bug report actually does have something to do with the checker.

I think message-semantic is better than storing some data in two places. BugReport has getCheckName() and may if the NoteTag knows the name of its author then it is comparable.
I am not sure but may you could hack the lifetime of that StringRef from CheckName to not to report on something purged out (/not exists?).

Storing and comparing the checker pointer ought to be cheaper and more precise than storing and comparing its name as a string that can be easily extracted from it. Still, storing a string may be interesting when it comes to identifying non-checker note sources. I guess we'll have to think more about that.

Charusso accepted this revision.Mar 26 2019, 3:51 PM

Cool! I have found this message-semantic idea useful in Unity where every GameObject could talk with each other in a very generic way (https://docs.unity3d.com/ScriptReference/GameObject.SendMessage.html).
I am looking forward for more!

This revision is now accepted and ready to land.Mar 26 2019, 3:51 PM
Szelethus accepted this revision.Mar 27 2019, 9:51 AM

Amazing work! Thanks!

In D58367#1443783, @NoQ wrote:

Remove memoization for now. Address a few inline comments. I'm mostly done with this first patch, did i accidentally miss anything?

  1. It would be great to have unit tests for this.

Mm, i have problems now that checker registry is super locked down: i need a bug report object => i need a checker (or at least a CheckName) => i need the whole AnalysisConsumer => i can no longer override ASTConsumer methods for testing purposes (because AnalysisConsumer is a final sub-class of ASTConsumer). Do you have a plan B for that?

Saw this, will think about it! Though I'm a little unsure what you mean under the checker registry being locked down.

I also generally don't see many good unit tests we could write here, that would add much to the integration tests we already have, but this memoization problem could have made a nice unit test.

I don't insist, but unlike most of the subprojects within clang, we have very-very few infrastructural unit tests. I don't even find them to be that important for testing purposes, but more as minimal examples: I remember that unittests/StaticAnalyzer/RegisterCustomChecker.cpp was a godsent when I worked on checker dependencies.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
600

Prefer using.

NoQ updated this revision to Diff 192501.Mar 27 2019, 12:51 PM
NoQ marked 2 inline comments as done.
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporter.h
600

Thx!~

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
348

Moved to BugReporter.h.

ProgramPoint.h is too low-level, it's not even in libStaticAnalyzer*, so talking about stuff like BugReporterContext within it sounds a bit hard to get compiled.

386–398

Hmm, apparently i didn't. Fxd.

400

It isn't - i made a private constructor as usual.

xazax.hun added inline comments.Mar 27 2019, 12:58 PM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
400

I had the impression that nested classes can access the members of their parent classes since C++11.

NoQ added a comment.Mar 27 2019, 1:08 PM

*un-forgets to actually post comments*

Cool! I have found this message-semantic idea useful in Unity where every GameObject could talk with each other in a very generic way (https://docs.unity3d.com/ScriptReference/GameObject.SendMessage.html).

I mean, for instance, the whole Objective-C is built in this very manner. But it's not always worth it to opt in into such highly dynamic system that bypasses all type checks - in many cases there's a more straightforward solution.

Amazing work! Thanks!

In D58367#1443783, @NoQ wrote:

Remove memoization for now. Address a few inline comments. I'm mostly done with this first patch, did i accidentally miss anything?

  1. It would be great to have unit tests for this.

Mm, i have problems now that checker registry is super locked down: i need a bug report object => i need a checker (or at least a CheckName) => i need the whole AnalysisConsumer => i can no longer override ASTConsumer methods for testing purposes (because AnalysisConsumer is a final sub-class of ASTConsumer). Do you have a plan B for that?

Saw this, will think about it! Though I'm a little unsure what you mean under the checker registry being locked down.

I mean, you hunted down checker names by making sure all objects are constructed properly, which probably means that it's harder to construct these objects improperly for testing purposes. I'm not really sure - maybe it has always been that way :) Anyway, all i need is a CheckName, which is merely a string under the hood, but a very hard-to-obtain one.

I also generally don't see many good unit tests we could write here, that would add much to the integration tests we already have, but this memoization problem could have made a nice unit test.

I don't insist, but unlike most of the subprojects within clang, we have very-very few infrastructural unit tests. I don't even find them to be that important for testing purposes, but more as minimal examples: I remember that unittests/StaticAnalyzer/RegisterCustomChecker.cpp was a godsent when I worked on checker dependencies.

Yeah, this one's really nice. Another place where i find unittests to be important is when it comes to checking the separation of concerns, eg. the story behind D23963 ("RegionStore isn't responsible for the semantics of brace initialization"). I'd love to have unit tests for our environment/store/constraint managers and i'll definitely make some when i find myself implementing a new data structure of this kind.

NoQ updated this revision to Diff 192502.Mar 27 2019, 1:09 PM
NoQ marked an inline comment as done.
NoQ marked an inline comment as done.
NoQ added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
400

Wait, seriously?!~ Thx!

This revision was automatically updated to reflect the committed changes.
NoQ reopened this revision.Mar 29 2019, 4:09 PM
NoQ marked an inline comment as done.

Reverted in rC357332!

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30957/steps/check-clang%20asan/logs/stdio

=================================================================
==22233==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1088 byte(s) in 17 object(s) allocated from:
    #0 0xc770f8 in operator new(unsigned long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:106
    #1 0x9c6feef in __libcpp_allocate /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/new:238:10
    #2 0x9c6feef in allocate /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/memory:1813
    #3 0x9c6feef in __value_func<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h:236:9), std::__1::allocator<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h:236:9)> > /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:1716
    #4 0x9c6feef in function<(lambda at /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h:236:9), void> /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/functional:2290
    #5 0x9c6feef in clang::ento::CheckerContext::getNoteTag(std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > (clang::ento::BugReport&)>&&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h:236
    #6 0x9c6f061 in checkPostCall /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Checkers/MIGChecker.cpp:165:24
    #7 0x9c6f061 in void clang::ento::check::PostCall::_checkCall<(anonymous namespace)::MIGChecker>(void*, clang::ento::CallEvent const&, clang::ento::CheckerContext&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/Checker.h:183
    #8 0x9fbd78c in operator() /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h:69:12
    #9 0x9fbd78c in runChecker /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp:290
    #10 0x9fbd78c in expandGraphWithCheckers<(anonymous namespace)::CheckCallContext> /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp:138
    #11 0x9fbd78c in clang::ento::CheckerManager::runCheckersForCallEvent(bool, clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNodeSet const&, clang::ento::CallEvent const&, clang::ento::ExprEngine&, bool) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/CheckerManager.cpp:307
    #12 0xa07d1ef in runCheckersForPostCall /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/CheckerManager.h:274:5
    #13 0xa07d1ef in clang::ento::ExprEngine::evalCall(clang::ento::ExplodedNodeSet&, clang::ento::ExplodedNode*, clang::ento::CallEvent const&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp:578
    #14 0xa07c657 in clang::ento::ExprEngine::VisitCallExpr(clang::CallExpr const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp:495:5
    #15 0xa01249f in clang::ento::ExprEngine::Visit(clang::Stmt const*, clang::ento::ExplodedNode*, clang::ento::ExplodedNodeSet&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:1539:7
    #16 0xa003888 in clang::ento::ExprEngine::ProcessStmt(clang::Stmt const*, clang::ento::ExplodedNode*) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:743:5
    #17 0xa002d48 in clang::ento::ExprEngine::processCFGElement(clang::CFGElement, clang::ento::ExplodedNode*, unsigned int, clang::ento::NodeBuilderContext*) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:590:7
    #18 0x9fdcdfe in clang::ento::CoreEngine::HandlePostStmt(clang::CFGBlock const*, unsigned int, clang::ento::ExplodedNode*) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:438:12
    #19 0x9fdaa85 in clang::ento::CoreEngine::dispatchWorkItem(clang::ento::ExplodedNode*, clang::ProgramPoint, clang::ento::WorkListUnit const&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:192:7
    #20 0x9fd9941 in clang::ento::CoreEngine::ExecuteWorkList(clang::LocationContext const*, unsigned int, llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp:148:5
    #21 0x987ae4f in ExecuteWorkList /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h:170:19
    #22 0x987ae4f in RunPathSensitiveChecks /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp:740
    #23 0x987ae4f in (anonymous namespace)::AnalysisConsumer::HandleCode(clang::Decl*, unsigned int, clang::ento::ExprEngine::InliningModes, llvm::DenseSet<clang::Decl const*, llvm::DenseMapInfo<clang::Decl const*> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp:715
    #24 0x98619d5 in HandleDeclsCallGraph /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp:506:5
    #25 0x98619d5 in runAnalysisOnTranslationUnit /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp:553
    #26 0x98619d5 in (anonymous namespace)::AnalysisConsumer::HandleTranslationUnit(clang::ASTContext&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp:584
    #27 0xa2a0d52 in clang::ParseAST(clang::Sema&, bool, bool) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/Parse/ParseAST.cpp:169:13
    #28 0x742e94d in clang::FrontendAction::Execute() /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/Frontend/FrontendAction.cpp:934:8
    #29 0x731950a in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/Frontend/CompilerInstance.cpp:949:11
    #30 0x764c8c8 in clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:271:25
    #31 0xc8b2ee in cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/tools/driver/cc1_main.cpp:218:13
    #32 0xc83732 in ExecuteCC1Tool /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/tools/driver/driver.cpp:309:12
    #33 0xc83732 in main /b/sanitizer-x86_64-linux-fast/build/llvm/tools/clang/tools/driver/driver.cpp:381
    #34 0x7facad0612e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

SUMMARY: AddressSanitizer: 1088 byte(s) leaked in 17 allocation(s).
This revision is now accepted and ready to land.Mar 29 2019, 4:09 PM
This revision was automatically updated to reflect the committed changes.
NoQ reopened this revision.Mar 29 2019, 4:56 PM

Re-reopen after the Diffusion auto-close.

This revision is now accepted and ready to land.Mar 29 2019, 4:56 PM

Hmmm, interesting. Could there be an issue with NoteTag not being trivially destructible?

NoQ added a comment.Apr 18 2019, 3:00 PM

Hmmm, interesting. Could there be an issue with NoteTag not being trivially destructible?

Yeah, i think you're right. Which means we cannot allocate it in a BumpPtrAllocator.

NoQ updated this revision to Diff 195835.Apr 18 2019, 4:08 PM

Fall back to the previous std::vector<std::unique_ptr<NoteTag>> approach.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 19 2019, 1:25 PM