This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Introduce a new interface for tracking
ClosedPublic

Authored by vsavchenko on Jun 3 2021, 3:11 AM.

Details

Summary

Tracking values through expressions and the stores is fundamental
for producing clear diagnostics. However, the main components
participating in this process, namely trackExpressionValue and
FindLastStoreBRVisitor, became pretty bloated. They have an
interesting dynamic between them (and some other visitors) that
one might call a "chain reaction". trackExpressionValue adds
FindLastStoreBRVisitor, and the latter calls trackExpressionValue.

Because of this design, individual checkers couldn't affect what's
going to happen somewhere in the middle of that chain. Whether they
want to produce a more informative note or keep the overall tracking
going by utilizing some of the domain expertise. This all lead to two
biggest problems that I see:

  • Some checkers don't use it

This should probably never be the case for path-sensitive checks.

  • Some checkers incorporated their logic directly into those components

This doesn't make the maintenance easier, breaks multiple
architecture principles, and makes the code harder to read adn
understand, thus, increasing the probability of the first case.

This commit introduces a prototype for a new interface that will be
responsible for tracking. My main idea here was to make operations
that I want have as a checker developer easy to implement and hook
directly into the tracking process.

After that I intend to make the following changes:

  • Put all of the trackExpressionValue into one handler and replace calls to trackExpressionValue to Tracker::track
  • Change visitors that call trackExpressionValue to have a parent Tracker, so they re-use the actual tracker that spawned them.
  • Split trackExpressionValue implementation (at this point some big handler) into smaller handlers based on their semantics.
  • Refactor all of the note producing code out of the FindLastStoreBRVisitor into a big clunky StoreHandler and call Tracker::handle instead.
  • Split that one handler into smaller handlers based on their semantics.
  • Extract certain handlers into checkers

Diff Detail

Event Timeline

vsavchenko created this revision.Jun 3 2021, 3:11 AM
vsavchenko requested review of this revision.Jun 3 2021, 3:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 3:11 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
vsavchenko edited the summary of this revision. (Show Details)Jun 3 2021, 3:13 AM
vsavchenko edited the summary of this revision. (Show Details)Jun 3 2021, 3:20 AM
vsavchenko updated this revision to Diff 349511.Jun 3 2021, 3:21 AM

Add docstring for StoreHandler

vsavchenko updated this revision to Diff 349529.Jun 3 2021, 6:02 AM

Tweak some parts of the interface

Great initiative! You haven't mention markInteresting functions, but I think it would be really important to incorporate that functionality here as well. IMHO, markInteresting shouldn't have been part of the public API ever, Checkers should have been calling only the trackExpressionValue (as internally that calls markInteresting anyway).

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
113

I think we should document this class' responsibility with a terse sentence.

231

This naming is a bit too bloated to me, what do you think about this:

enum class Position { Front, Back };
addExpressionHandler(Front, ...) {

Great initiative! You haven't mention markInteresting functions, but I think it would be really important to incorporate that functionality here as well. IMHO, markInteresting shouldn't have been part of the public API ever, Checkers should have been calling only the trackExpressionValue (as internally that calls markInteresting anyway).

Oh, yes, interesting-ness doesn't provide a very clear interface for the feedback loop that I'm trying to adjust. From the perspective of the use "mark interesting whatever your checker is interested in and the core will figure it out", it's fine. It has a clear message and you simply expect the engine to do its magic. The problem comes when you try to catch something interesting from your own checker-specific visitor. In that use case, you, first of all, really need to understand how trackers traverse the graph and what they might consider interesting, and rely on the fact that you will get the right interesting thingy. So, the predicate checking for interesting-ness available to the users, is not very helpful IMO.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
113

Sure

231

Maybe addHighPriorityHandler and addLowPriorityHandler? There is probably no reason to put Expression and Store into the actual name.

vsavchenko updated this revision to Diff 349537.Jun 3 2021, 6:46 AM

Address review comments

vsavchenko marked an inline comment as done.Jun 3 2021, 6:46 AM
vsavchenko added inline comments.
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
231

What do you think about this solution?

vsavchenko updated this revision to Diff 349548.Jun 3 2021, 7:42 AM

Minor change

Great initiative! You haven't mention markInteresting functions, but I think it would be really important to incorporate that functionality here as well. IMHO, markInteresting shouldn't have been part of the public API ever, Checkers should have been calling only the trackExpressionValue (as internally that calls markInteresting anyway).

Oh, yes, interesting-ness doesn't provide a very clear interface for the feedback loop that I'm trying to adjust. From the perspective of the use "mark interesting whatever your checker is interested in and the core will figure it out", it's fine. It has a clear message and you simply expect the engine to do its magic. The problem comes when you try to catch something interesting from your own checker-specific visitor. In that use case, you, first of all, really need to understand how trackers traverse the graph and what they might consider interesting, and rely on the fact that you will get the right interesting thingy. So, the predicate checking for interesting-ness available to the users, is not very helpful IMO.

What I mean is that probably we should have the following function(s) provided for the user:

Result track(SVal InterestingSVal, ....);
Result track(const MemRegion* InterestingRegion, ....);
Result track(SymbolRef InterestingSym, ....);

Or even better, only track(SVal InterestingSVal, ....) would be enough to be public. Then we could deprecate markInteresting and all Checkers should use your new Tracker interface. As I see, both trackExpressionValue and markInteresting serves the same purpose: i.e. to do a backward slicing in the ExplodedGraph to select interesing/important nodes that should be presented to the user.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
231

Sounds good.

Great initiative! You haven't mention markInteresting functions, but I think it would be really important to incorporate that functionality here as well. IMHO, markInteresting shouldn't have been part of the public API ever, Checkers should have been calling only the trackExpressionValue (as internally that calls markInteresting anyway).

Oh, yes, interesting-ness doesn't provide a very clear interface for the feedback loop that I'm trying to adjust. From the perspective of the use "mark interesting whatever your checker is interested in and the core will figure it out", it's fine. It has a clear message and you simply expect the engine to do its magic. The problem comes when you try to catch something interesting from your own checker-specific visitor. In that use case, you, first of all, really need to understand how trackers traverse the graph and what they might consider interesting, and rely on the fact that you will get the right interesting thingy. So, the predicate checking for interesting-ness available to the users, is not very helpful IMO.

What I mean is that probably we should have the following function(s) provided for the user:

Result track(SVal InterestingSVal, ....);
Result track(const MemRegion* InterestingRegion, ....);
Result track(SymbolRef InterestingSym, ....);

Or even better, only track(SVal InterestingSVal, ....) would be enough to be public. Then we could deprecate markInteresting and all Checkers should use your new Tracker interface. As I see, both trackExpressionValue and markInteresting serves the same purpose: i.e. to do a backward slicing in the ExplodedGraph to select interesing/important nodes that should be presented to the user.

We need to define semantics for those a bit better I think. Not that trackExpressionValue or FindLastStoreBRVisitor are defined very well! But I want to approach it incrementally and first of all make those two extendable.
But of course, whatever tracks something (want to perform backward slicing) should probably go through the tracker in the future.

Add forgotten piece of StoreInfo

NoQ added a comment.Jun 3 2021, 9:46 PM

Ok. Oof. Whoa. That's amazing.

Let me re-iterate to make sure that my understanding of this patchset is correct.

Chapter 1. "A is correlated with B (ρ = 0.56), given C, assuming D and under E conditions"

  • Two new entities are introduced: trackers and handlers.
  • Trackers are attached to bug reports. A single bug report may have multiple trackers.
    • Why not have only one tracker per report?
  • Handlers are attached to trackers. Each tracker can have multiple handlers with a priority system.
    • Do you envision it growing into a dependency system later?
  • A subclass of visitors known as "tracking visitors" are connected to trackers. One tracker may have multiple visitors attached to it.
  • Trackers are a tiny, almost uncustomizable class. It contains is a method called track() that is invoked when attention of handlers is required.
  • Each tracker comes with a default set of handlers.
  • The only way to customize a tracker is to add extra handlers to it.
  • Handlers implement a handle() method invoked by the tracker every time their attention is required in track().
  • Handlers may return a note from their handle() method. If a handler returns a note, this note is attached to the bug report and lower-priority handlers are not invoked at all.
  • There are two major classes of handlers: expression handlers and store handlers. Expression handlers pay attention to Environment entries, store handlers pay attention to Region Store bindings.

Chapter 2. "For immediate release: Scientists find potential link between A and B"

  • Bug visitors are still responsible for discovering connections across exploded nodes. On the contrary, track()/handle() workflow is instantaneous.
    • It sounds like this whole patchset is orthogonal to reimplementing value tracking visitors with note tags, as long as we can pass a reference to the appropriate tracker into the tag(?)
  • Tracking starts when a checker creates a tracker and invokes track() on it, which instantly causes the respective handle() to be invoked.
  • handle() may attach visitors. These visitors may in turn call track() as they reach a different exploded node.
  • The termination condition is as usual: no further visitors attached.
    • It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

Chapter 3. "A causes B all the time. What will this means for Obama?"

  • trackExpressionValue() now boils down to creation of a single tracker and an initial invocation of track() on the expression which in turn causes expression handlers to immediately react.
  • Most of the old body of trackExpressionValue() is transformed into a default expression handler. So it's still part of an immediate reaction, just a bit more generalized.
  • Most of the visitors that previously composed trackExpressionValue() are still part of the process; they're attached by the default expression handler.
  • However, instead of adding notes directly, they simply invoke track(). All note-attaching work is moved into additional default handlers.

Chapter 4. "I'm wearing this to ward off A!"

  • Basic usage in checkers is basically unchanged. They still simply call trackExpressionValue() and it just works.
  • Advanced usage - which is the whole point of this patchset - probably implies adding custom handlers to the tracker created by trackExpressionValue().
  • Let's consider an example. Suppose you're implementing evalCall for llvm::dyn_cast. You could have your expression handler detect a modeled dyn_cast call and implement its handle() method to emit a checker-specific note and start tracking the argument.
  • With this implementation such note will only be emitted if the value is tracked back to dyn_cast. Not every modeled dyn_cast will receive a note but only the one that, say, caused a null dereference on its return value when it failed.
    • Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.
    • It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was dyn_cast modeled during analysis, so it has to reverse-engineer what happened.
    • For example, in order to find out whether dyn_cast succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
    • Whereas a note tag would have this information accessible directly.
      • Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to ask the tracker whether track() was invoked at this node on a certain expression or region?
      • I.e., polling instead of callbacks.
      • This would naturally integrate with note tags.
      • This does not solve the priority problem. We could try to solve it separately by, say, enforcing at most one note per exploded node, and then implementing a similar priority system over visitors.
      • Or am I missing something about your solution that makes connection to note tags a non-issue?

Ok. Oof. Whoa. That's amazing.

Let me re-iterate to make sure that my understanding of this patchset is correct.

Chapter 1. "A is correlated with B (ρ = 0.56), given C, assuming D and under E conditions"

  • Two new entities are introduced: trackers and handlers.

Correct.

  • Trackers are attached to bug reports. A single bug report may have multiple trackers.
    • Why not have only one tracker per report?

That's a good question, we can actually even make it a part of BugReport for that matter. I can though think of one exotic case, where you wait till you encounter certain weird condition, where you need to spawn a very specific customized tracker only starting from that point.

  • Handlers are attached to trackers. Each tracker can have multiple handlers with a priority system.
    • Do you envision it growing into a dependency system later?

Yes, I think in order to correctly split some parts of the logic that already exists in trackExpressionValue

  • A subclass of visitors known as "tracking visitors" are connected to trackers. One tracker may have multiple visitors attached to it.

Correct.

  • Trackers are a tiny, almost uncustomizable class. It contains is a method called track() that is invoked when attention of handlers is required.

Correct. If you really want to, you can change the default behavior of those track functions.

  • Each tracker comes with a default set of handlers.

Correct. This is how we migrate existing functionality into the new interface.

  • The only way to customize a tracker is to add extra handlers to it.

It's not the only one, you can subclass it and add completely different logic. But the main way for the majority of use cases is indeed through handlers.

  • Handlers implement a handle() method invoked by the tracker every time their attention is required in track().

Correct.

  • Handlers may return a note from their handle() method. If a handler returns a note, this note is attached to the bug report and lower-priority handlers are not invoked at all.

Not entirely correct, this is correct for store handlers, these are intended to customize messages for events of symbolic value being stored into the region.
Expression handlers return Tracker::Result that simply tells if some visitor was added or not (backward compatibility with trackExpressionValue) and if we should prevent all other handlers from running here.

  • There are two major classes of handlers: expression handlers and store handlers. Expression handlers pay attention to Environment entries, store handlers pay attention to Region Store bindings.

I would say that expression handler is a way to externally extend trackExpressionValue functionality. Add new supported type of expression that we can track or add checker-specific visitor recursively.

Chapter 2. "For immediate release: Scientists find potential link between A and B"

  • Bug visitors are still responsible for discovering connections across exploded nodes. On the contrary, track()/handle() workflow is instantaneous.

It is tricky because track can add visitors and they can call it back from VisitNode, and so on to infinity. This is a pattern that we had between trackExpressionValue and FindLastBRVisitor (and ReturnVisitor) and every checker can create their own recursive tracking.

  • It sounds like this whole patchset is orthogonal to reimplementing value tracking visitors with note tags, as long as we can pass a reference to the appropriate tracker into the tag(?)

Correct, it just makes existing back-slicing tracking easier to customize.

  • Tracking starts when a checker creates a tracker and invokes track() on it, which instantly causes the respective handle() to be invoked.
  • handle() may attach visitors. These visitors may in turn call track() as they reach a different exploded node.

Exactly.

  • The termination condition is as usual: no further visitors attached.
    • It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

Actually, tracker dies when this happens. But hey, can you provide a use case when the checker really needs to know when this happens?

Chapter 3. "A causes B all the time. What will this means for Obama?"

  • trackExpressionValue() now boils down to creation of a single tracker and an initial invocation of track() on the expression which in turn causes expression handlers to immediately react.

Correct.

  • Most of the old body of trackExpressionValue() is transformed into a default expression handler. So it's still part of an immediate reaction, just a bit more generalized.

Correct.

  • Most of the visitors that previously composed trackExpressionValue() are still part of the process; they're attached by the default expression handler.

Correction: "All of the visitors..."

  • However, instead of adding notes directly, they simply invoke track(). All note-attaching work is moved into additional default handlers.

No, they call track where they used to call trackExpressionValue or add FindLastStoreBRVisitor.
Right now, only stores received their special customization and can produce notes based on tracker customization.

Chapter 4. "I'm wearing this to ward off A!"

  • Basic usage in checkers is basically unchanged. They still simply call trackExpressionValue() and it just works.

Correct

  • Advanced usage - which is the whole point of this patchset - probably implies adding custom handlers to the tracker created by trackExpressionValue().

Correct

  • Let's consider an example. Suppose you're implementing evalCall for llvm::dyn_cast. You could have your expression handler detect a modeled dyn_cast call and implement its handle() method to emit a checker-specific note and start tracking the argument.

So, in here it would look like this: If I want to support dyn_cast for tracking, I will add extra ExpressionHandler that calls track on the expression inside of llvm::dyn_cast. Additionally, if you want to change the note's message for situations like a = dyn_cast<Ty>(b), you can add a StoreHandler that will produce a note like "a is casted from b".

  • With this implementation such note will only be emitted if the value is tracked back to dyn_cast. Not every modeled dyn_cast will receive a note but only the one that, say, caused a null dereference on its return value when it failed.

Correct.

  • Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.

I mean, it's just an orthogonal mechanism.

  • It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was dyn_cast modeled during analysis, so it has to reverse-engineer what happened.

Correct, but during the analysis, checker could've modeled a zillion of different dyn_casts, and we don't want to have notes on all of them.

  • For example, in order to find out whether dyn_cast succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
  • Whereas a note tag would have this information accessible directly.

Maybe we can access the note tag from the handler and kill two birds with one stone?

  • Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to ask the tracker whether track() was invoked at this node on a certain expression or region?

There are still situations when you don't have tags, but want to hook into the tracking process, so you can add your custom feature. Let's say you modeled a system call foo(a, b) and you want to track a and b. Or more generally if you want to help tracking when it's stuck.
Additionally, handlers help to tear existing trackExpressionValue implementation apart.

  • I.e., polling instead of callbacks.
  • This would naturally integrate with note tags.

My counter-suggestion is to create a flag on NoteTag (maybe Prunable already can fit into that, I'm not very familiar with this interface) that means "this tag should appear only when part of the tracking trace", and make a handler-visitor pair that will produce notes with messages in those tags.
IMO, it will prevent a ton of checkers repeating very similar code: make visitor that will check if some expression is tracked and generate a note out of tag

  • This does not solve the priority problem. We could try to solve it separately by, say, enforcing at most one note per exploded node, and then implementing a similar priority system over visitors.
  • Or am I missing something about your solution that makes connection to note tags a non-issue?

My understanding is that tags don't rule out tracking. The other way around is also true. My take on this - let's make tracking a bit more flexible, it's not gonna hurt anyone.
Another aspect that I think is very important here is the ability to make this change incrementally and to extract some parts of trackExpressionValue and FindLastStoreBRVisitor back into checkers to make it at least a bit easier for the reader.

Ok. Oof. Whoa. That's amazing.

Let me re-iterate to make sure that my understanding of this patchset is correct.

Chapter 1. "A is correlated with B (ρ = 0.56), given C, assuming D and under E conditions"

  • Two new entities are introduced: trackers and handlers.
  • Trackers are attached to bug reports. A single bug report may have multiple trackers.
    • Why not have only one tracker per report?
  • Handlers are attached to trackers. Each tracker can have multiple handlers with a priority system.
    • Do you envision it growing into a dependency system later?
  • A subclass of visitors known as "tracking visitors" are connected to trackers. One tracker may have multiple visitors attached to it.
  • Trackers are a tiny, almost uncustomizable class. It contains is a method called track() that is invoked when attention of handlers is required.
  • Each tracker comes with a default set of handlers.
  • The only way to customize a tracker is to add extra handlers to it.
  • Handlers implement a handle() method invoked by the tracker every time their attention is required in track().
  • Handlers may return a note from their handle() method. If a handler returns a note, this note is attached to the bug report and lower-priority handlers are not invoked at all.
  • There are two major classes of handlers: expression handlers and store handlers. Expression handlers pay attention to Environment entries, store handlers pay attention to Region Store bindings.

Chapter 2. "For immediate release: Scientists find potential link between A and B"

  • Bug visitors are still responsible for discovering connections across exploded nodes. On the contrary, track()/handle() workflow is instantaneous.
    • It sounds like this whole patchset is orthogonal to reimplementing value tracking visitors with note tags, as long as we can pass a reference to the appropriate tracker into the tag(?)
  • Tracking starts when a checker creates a tracker and invokes track() on it, which instantly causes the respective handle() to be invoked.
  • handle() may attach visitors. These visitors may in turn call track() as they reach a different exploded node.
  • The termination condition is as usual: no further visitors attached.
    • It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

Chapter 3. "A causes B all the time. What will this means for Obama?"

  • trackExpressionValue() now boils down to creation of a single tracker and an initial invocation of track() on the expression which in turn causes expression handlers to immediately react.
  • Most of the old body of trackExpressionValue() is transformed into a default expression handler. So it's still part of an immediate reaction, just a bit more generalized.
  • Most of the visitors that previously composed trackExpressionValue() are still part of the process; they're attached by the default expression handler.
  • However, instead of adding notes directly, they simply invoke track(). All note-attaching work is moved into additional default handlers.

Chapter 4. "I'm wearing this to ward off A!"

  • Basic usage in checkers is basically unchanged. They still simply call trackExpressionValue() and it just works.
  • Advanced usage - which is the whole point of this patchset - probably implies adding custom handlers to the tracker created by trackExpressionValue().
  • Let's consider an example. Suppose you're implementing evalCall for llvm::dyn_cast. You could have your expression handler detect a modeled dyn_cast call and implement its handle() method to emit a checker-specific note and start tracking the argument.
  • With this implementation such note will only be emitted if the value is tracked back to dyn_cast. Not every modeled dyn_cast will receive a note but only the one that, say, caused a null dereference on its return value when it failed.
    • Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.
    • It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was dyn_cast modeled during analysis, so it has to reverse-engineer what happened.
    • For example, in order to find out whether dyn_cast succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
    • Whereas a note tag would have this information accessible directly.
      • Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to ask the tracker whether track() was invoked at this node on a certain expression or region?
      • I.e., polling instead of callbacks.
      • This would naturally integrate with note tags.
      • This does not solve the priority problem. We could try to solve it separately by, say, enforcing at most one note per exploded node, and then implementing a similar priority system over visitors.
      • Or am I missing something about your solution that makes connection to note tags a non-issue?

Now this sounds like a plan. Valeriy, this is something that I missed for such a massive refactor...

xazax.hun added inline comments.Jun 4 2021, 12:44 PM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
110

I vaguely remember we wanting to put this defensive check suppression in the related checkers. If that is the case, I wonder if this option belongs here.

127

There are some other places were we might have effects that are very similar to stores, like invalidations during conservative function evaluation.

147

During path sensitive analysis we already have a callback for stores. We kind of replicating this logic for bug paths.
So my questions are:

  • Do we expect to have additional information here that were not available during the analysis earlier?
  • Do we want to make this as similar to the forward analysis part as possible for developer familiarity?
210

Not directly related to this patch, but I wonder if we want to have unknown values with identities at some point, so we can track them.

NoQ added a comment.Jun 4 2021, 1:10 PM
  • Trackers are attached to bug reports. A single bug report may have multiple trackers.
    • Why not have only one tracker per report?

That's a good question, we can actually even make it a part of BugReport for that matter. I can though think of one exotic case, where you wait till you encounter certain weird condition, where you need to spawn a very specific customized tracker only starting from that point.

Ok, let's take a simpler example:

int foo(int D) {
  int A = 1;
  int B = 1;
  int C = B - A;
  return D / C; // division by zero
}

And suppose we want to track A and B so that to explain why we think that C is zero. What are pros and cons of:
(1) tracking A and B within the same tracker that already helped us reach C?
(2) terminating the current tracker as it reaches C and spawning two separate trackers for A and B?

  • The termination condition is as usual: no further visitors attached.
    • It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

Actually, tracker dies when this happens. But hey, can you provide a use case when the checker really needs to know when this happens?

Inlined defensive check suppressions are a major example. Basically null dereference warnings are suppressed based on properties of the program point at which the tracker terminates. If we know when this happens, we can refactor inlined defensive check suppressions into a checker specific handler.

  • However, instead of adding notes directly, they simply invoke track(). All note-attaching work is moved into additional default handlers.

No, they call track where they used to call trackExpressionValue or add FindLastStoreBRVisitor.
Right now, only stores received their special customization and can produce notes based on tracker customization.

What's the ultimate plan though, do we want to fully move all notes into handlers?

  • Let's consider an example. Suppose you're implementing evalCall for llvm::dyn_cast. You could have your expression handler detect a modeled dyn_cast call and implement its handle() method to emit a checker-specific note and start tracking the argument.

So, in here it would look like this: If I want to support dyn_cast for tracking, I will add extra ExpressionHandler that calls track on the expression inside of llvm::dyn_cast. Additionally, if you want to change the note's message for situations like a = dyn_cast<Ty>(b), you can add a StoreHandler that will produce a note like "a is casted from b".

  • With this implementation such note will only be emitted if the value is tracked back to dyn_cast. Not every modeled dyn_cast will receive a note but only the one that, say, caused a null dereference on its return value when it failed.

Correct.

  • Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.

I mean, it's just an orthogonal mechanism.

  • It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was dyn_cast modeled during analysis, so it has to reverse-engineer what happened.

Correct, but during the analysis, checker could've modeled a zillion of different dyn_casts, and we don't want to have notes on all of them.

  • For example, in order to find out whether dyn_cast succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
  • Whereas a note tag would have this information accessible directly.

Maybe we can access the note tag from the handler and kill two birds with one stone?

If we can, one way or another, I believe we absolutely should kill two birds with one stone. We shouldn't have two mutually exclusive solutions for two aspects of the same problem. In order to produce good notes, it's necessary to *both* know what happened during analysis *and* know whether the value is tracked.

  • Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to ask the tracker whether track() was invoked at this node on a certain expression or region?

There are still situations when you don't have tags, but want to hook into the tracking process, so you can add your custom feature. Let's say you modeled a system call foo(a, b) and you want to track a and b. Or more generally if you want to help tracking when it's stuck.

Yes, so like in my a = dyn_cast<...>(b) example the proposed solution would be like:

if (isTracked(a)) {
  emitNote();
  track(b);
}

Such code can be put into either a visitor or a note tag depending on what you have. It is entirely equivalent to the current prevalent idiom

if (isInteresting(a)) {
  emitNote();
  track(b);
}

except it's asking the right question, with your facility allowing to answer it. This approach is very easy to understand for checker developers, basically adds no extra code (doesn't require people to inherit classes).

Additionally, handlers help to tear existing trackExpressionValue implementation apart.

  • I.e., polling instead of callbacks.
  • This would naturally integrate with note tags.

My counter-suggestion is to create a flag on NoteTag (maybe Prunable already can fit into that, I'm not very familiar with this interface) that means "this tag should appear only when part of the tracking trace", and make a handler-visitor pair that will produce notes with messages in those tags.

I think this works too. I like the idea that like note tags are for execution paths, we could make "track tags" for value paths (or, well, "value tracks"? - we should really use this concept more often, it's very effective; this patchset separates these two notions of a path that we were previously confusing).

It sounds a bit more restrictive on what kinds of conditions are allowed. For example, if we have auto [a, b] = foo(c), we may decide to track c either when both a and b are tracked, or when only one of them is tracked; I think both approaches could be reasonable.

In particular, a simple flag (a-la Prunable) is not enough. There needs to be a way to communicate *which* value in the state is of interest. The data structure describing such conditions functionally may potentially grow pretty large and this may be a reason to go imperative but I don't think we'll actually run into this problem soon.

IMO, it will prevent a ton of checkers repeating very similar code: make visitor that will check if some expression is tracked and generate a note out of tag

Yes, absolutely, any solution must eliminate the need for such visitors.

NoQ added inline comments.Jun 4 2021, 1:18 PM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
127

Yes, this would be really interesting. Like, we probably don't want to add notes saying "we invalidated the Store so here's why we're now assuming stuff that seemingly contradicts our previous assumptions". But we may want to add suppressions based on this info, eg. "our control flow dependencies are based on invalidation, maybe let's try to find a different path to the same bug?"

210

UnknownVal is a stop-gap for situations when we've screwed up so badly we don't even have types anymore. The only thing I'd ever want from them is to disappear :)

I guess this could be useful for a debug checker that could explain where does an UnknownVal come from. In this case unknown values don't need identities; we can track other values without identities, such as null pointers, just fine.

  • Trackers are attached to bug reports. A single bug report may have multiple trackers.
    • Why not have only one tracker per report?

That's a good question, we can actually even make it a part of BugReport for that matter. I can though think of one exotic case, where you wait till you encounter certain weird condition, where you need to spawn a very specific customized tracker only starting from that point.

Ok, let's take a simpler example:

int foo(int D) {
  int A = 1;
  int B = 1;
  int C = B - A;
  return D / C; // division by zero
}

And suppose we want to track A and B so that to explain why we think that C is zero. What are pros and cons of:
(1) tracking A and B within the same tracker that already helped us reach C?
(2) terminating the current tracker as it reaches C and spawning two separate trackers for A and B?

  • The termination condition is as usual: no further visitors attached.
    • It sounds like this patchset is orthogonal to the problem of detecting the tracking termination point(?) It does make it easier to implement customized reactions to such termination though.

Actually, tracker dies when this happens. But hey, can you provide a use case when the checker really needs to know when this happens?

Inlined defensive check suppressions are a major example. Basically null dereference warnings are suppressed based on properties of the program point at which the tracker terminates. If we know when this happens, we can refactor inlined defensive check suppressions into a checker specific handler.

It seems to me that it is a hard concept to wrap your mind around. And even if I would tell someone how we suppress null pointer inline checks, I wouldn't use "when the tracker stops" as a description of the event. It is very vague and tightly coupled with the implementation of tracker stopping, and it is not connected semantically to the problem we are trying to solve. Correct me if I'm wrong, but better description would be track when we constrained the pointer value to null, and if it in the child stack frame, drop it.
So, maybe instead of very hard to understand "when the tracker stops", we can use "when the constraint changes" or "when the constraint became this". This is more similar to store handlers in their nature, we find a pair of states where some condition stops holding (man, that can have a generic implementation!), describe it, and call the handler.

  • However, instead of adding notes directly, they simply invoke track(). All note-attaching work is moved into additional default handlers.

No, they call track where they used to call trackExpressionValue or add FindLastStoreBRVisitor.
Right now, only stores received their special customization and can produce notes based on tracker customization.

What's the ultimate plan though, do we want to fully move all notes into handlers?

There was no such plan, honestly. I had one problem in mind: it is hard to detect stores yourself, FindLastStoreBRVisitor is very tightly coupled with trackExpressionValue, and you either put your code in it, or... forget about. Yep, there is no other option!
And it's not unheard of that checkers want to know how the value got into this very region, and it is not unheard of wanting to change messages on those. This component seems to be generic enough in identifying stores and classifying them, but horrible in the way that it gives you a fixed set of notes: leave it or take it.
I don't instantly see how other visitors from that set have necessities to be customized, but we already found one about changing constraints to be useful.

  • Let's consider an example. Suppose you're implementing evalCall for llvm::dyn_cast. You could have your expression handler detect a modeled dyn_cast call and implement its handle() method to emit a checker-specific note and start tracking the argument.

So, in here it would look like this: If I want to support dyn_cast for tracking, I will add extra ExpressionHandler that calls track on the expression inside of llvm::dyn_cast. Additionally, if you want to change the note's message for situations like a = dyn_cast<Ty>(b), you can add a StoreHandler that will produce a note like "a is casted from b".

  • With this implementation such note will only be emitted if the value is tracked back to dyn_cast. Not every modeled dyn_cast will receive a note but only the one that, say, caused a null dereference on its return value when it failed.

Correct.

  • Now, how does this interact with note tags? The idea behind note tags is so that visitors didn't need to reverse-engineer the analysis.

I mean, it's just an orthogonal mechanism.

  • It sounds like the handler is a step back to the visitor world: it doesn't know how exactly was dyn_cast modeled during analysis, so it has to reverse-engineer what happened.

Correct, but during the analysis, checker could've modeled a zillion of different dyn_casts, and we don't want to have notes on all of them.

  • For example, in order to find out whether dyn_cast succeeded or failed, it has to explore the program state to find out which values are passed into the cast and returned from the cast.
  • Whereas a note tag would have this information accessible directly.

Maybe we can access the note tag from the handler and kill two birds with one stone?

If we can, one way or another, I believe we absolutely should kill two birds with one stone. We shouldn't have two mutually exclusive solutions for two aspects of the same problem. In order to produce good notes, it's necessary to *both* know what happened during analysis *and* know whether the value is tracked.

I don't think that isTracked predicate is better than isInteresting. It is yet another way to mark things, and if we go this way, we should fix interesting-ness instead.
I still think that there is a use-case for expression handlers: custom handler -> custom visitor -> track -> custom handler -> custom visitor -> ...
This back-and-forth recursive addition of visitors is impossible right now. If the checker wants to implement their own FindLastStoreBRVisitor, they need to put it into trackExpressionValue.

  • Hence a suggestion. Maybe instead of having a system of handlers, allow visitors to ask the tracker whether track() was invoked at this node on a certain expression or region?

There are still situations when you don't have tags, but want to hook into the tracking process, so you can add your custom feature. Let's say you modeled a system call foo(a, b) and you want to track a and b. Or more generally if you want to help tracking when it's stuck.

Yes, so like in my a = dyn_cast<...>(b) example the proposed solution would be like:

if (isTracked(a)) {
  emitNote();
  track(b);
}

Such code can be put into either a visitor or a note tag depending on what you have. It is entirely equivalent to the current prevalent idiom

if (isInteresting(a)) {
  emitNote();
  track(b);
}

except it's asking the right question, with your facility allowing to answer it. This approach is very easy to understand for checker developers, basically adds no extra code (doesn't require people to inherit classes).

The existing pattern doesn't work, though. I honestly don't like how this tracking (as a fundamental process) is organized at all. It doesn't really matter what predicate or callback we introduce, it is still vague. It should be way more specific, like "I am tracking this value". Tracking involves a hell-lot of other things that we get "for free", but don't really understand them.
For this reason, "interesting-ness" and "being tracked" have (at least for now) exactly the same property, we get them "for free", but don't really understand. Not that callbacks don't have this problem, but they seem to be more flexible at least.

Additionally, handlers help to tear existing trackExpressionValue implementation apart.

  • I.e., polling instead of callbacks.
  • This would naturally integrate with note tags.

My counter-suggestion is to create a flag on NoteTag (maybe Prunable already can fit into that, I'm not very familiar with this interface) that means "this tag should appear only when part of the tracking trace", and make a handler-visitor pair that will produce notes with messages in those tags.

I think this works too. I like the idea that like note tags are for execution paths, we could make "track tags" for value paths (or, well, "value tracks"? - we should really use this concept more often, it's very effective; this patchset separates these two notions of a path that we were previously confusing).

We can sketch a plan for that.

It sounds a bit more restrictive on what kinds of conditions are allowed. For example, if we have auto [a, b] = foo(c), we may decide to track c either when both a and b are tracked, or when only one of them is tracked; I think both approaches could be reasonable.

We need to get these straight before we start changing things.

In particular, a simple flag (a-la Prunable) is not enough. There needs to be a way to communicate *which* value in the state is of interest. The data structure describing such conditions functionally may potentially grow pretty large and this may be a reason to go imperative but I don't think we'll actually run into this problem soon.

IMO, it will prevent a ton of checkers repeating very similar code: make visitor that will check if some expression is tracked and generate a note out of tag

Yes, absolutely, any solution must eliminate the need for such visitors.

I want to add something (because I might seem defensive). I do want to make the world peace, but this series of patchiest is addressing a lot of headache me (and Deep) had because of the way trackExpressionValue is organized.
If you think about it, all these patches just do one thing - extract piece of code into a separate function, class, or abstraction.
Fundamentally it is all the same.

I think we can discuss future plans, but also try to be realistic about the scope of this refactoring/redesign, since I'm trying to morph existing code into something a little better.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
127

Sure, nothing actually prevents us from extending this enum (and I actually do this, since I forgot call arguments here).
However, at this point this models and tears apart FindLastStoreBRVisitor.

The main idea here was that many checkers are benefiting from understanding how certain value got bound with a certain region. But it is so intertwined with trackExpressionValue that you couldn't do anything like this in your own code. And even simply the code that understands the node where the store happens is hard enough to make it abstract it away.

147

Oof, I actually think that this one is trickier. If the checker modeled some operation (most likely evalCall), it can bind a value to the region by itself. And this handler triggers for such events as well because it simply finds the place where the switch was flipped and the region changed its binding.

210

+1 for not caring about UnknownVal.

xazax.hun added inline comments.Jun 4 2021, 3:58 PM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
147

If the checker modeled some operation (most likely evalCall), it can bind a value to the region by itself. And this handler triggers for such events as well because it simply finds the place where the switch was flipped and the region changed its binding.

This is such a great point, I'd like that to be captured as a comment somewhere.

NoQ added a comment.Jun 6 2021, 6:58 PM

Actually, tracker dies when this happens. But hey, can you provide a use case when the checker really needs to know when this happens?

Inlined defensive check suppressions are a major example. Basically null dereference warnings are suppressed based on properties of the program point at which the tracker terminates. If we know when this happens, we can refactor inlined defensive check suppressions into a checker specific handler.

It seems to me that it is a hard concept to wrap your mind around. And even if I would tell someone how we suppress null pointer inline checks, I wouldn't use "when the tracker stops" as a description of the event. It is very vague and tightly coupled with the implementation of tracker stopping, and it is not connected semantically to the problem we are trying to solve. Correct me if I'm wrong, but better description would be track when we constrained the pointer value to null, and if it in the child stack frame, drop it.
So, maybe instead of very hard to understand "when the tracker stops", we can use "when the constraint changes" or "when the constraint became this". This is more similar to store handlers in their nature, we find a pair of states where some condition stops holding (man, that can have a generic implementation!), describe it, and call the handler.

Inlined defensive check suppressions are indeed pretty ugly; this problem should ideally be solved in a completely different manner.

That said, I strongly disagree that "when the tracker stops" is a hard-to-understand concept. It's an extremely natural and intuitive question to ask:

"Where does the value come from?"

It's fairly reasonable to wonder what happened during analysis that caused us to produce this value for the first time, which then traveled down to our bug point, and under what circumstances did it happen. Not much precise reasoning can be based on this information but it's a very good piece of information to drive imprecise reasoning and heuristics, such as (but not limited to) inlined defensive check suppressions (where it bears a relatively precise meaning that corresponds exactly to "state splits in nested stack frames are intrinsically less reliable as they can't ever be derived from presumption of no dead code").

Another example: a value of a top-level parameter variable is much more likely to be arbitrary than a return value of an unknown function call (if the user wanted pre-conditions they could have added an assert), which in turn is much more likely to be arbitrary than a value loaded from a heap region after store invalidation from unknown function call (we're 99% sure that the function doesn't touch this particular heap region so it's probably just the old value). In this sense, I'm pretty tempted to suppress bug paths where two branch conditions (ultimately going into opposite directions) are based on different values of the same heap variable after different number of invalidations. But I'm certainly against doing so for two different top-level parameter values. You may say that in these cases the information is most likely available from the structure of the symbol (SymbolRegionValue vs. SymbolConjured vs. SymbolDerived) but it only confirms my point: origins of values are so important that we even carry them as part of the value's identity! Direct access to the structure of the symbol for determining origins is also used a few times (eg., see call sites of getOriginRegion()).

I want to add something (because I might seem defensive). I do want to make the world peace, but this series of patchiest is addressing a lot of headache me (and Deep) had because of the way trackExpressionValue is organized.
If you think about it, all these patches just do one thing - extract piece of code into a separate function, class, or abstraction.
Fundamentally it is all the same.

I think we can discuss future plans, but also try to be realistic about the scope of this refactoring/redesign, since I'm trying to morph existing code into something a little better.

What I'm trying to do here is to see a general direction in which we're moving. We have the benefit of being able to predict things. We don't have to be agile; our problem is well-defined from the start and will never really change. There's a bit of flexibility required to support a variety of different future checkers, yes, but that's about it. And then, again, it's also not entirely arbitrary: we have a pretty good amount of data on what our checkers typically do and it's unlikely to expand drastically in the near future. So we don't always need to go for the most flexible solution, but there's often benefit in going for a simpler solution that simply covers our predictable needs and nothing else.

As a historical example of such process, currently bug visitors perform a single pass over the bug report, from bottom to top. This wasn't always this way: originally visitors traversed the bug report multiple times (occasionally 30+ times), adding new visitors along the way, until the set of visitors attached to the bug report reached a fixed point. The original solution was very flexible; it allowed us to backreference values at the bottom in a flash of hindsight and have all visitors restarted gracefully. But in the entirety of our foreseeable circumstances this wasn't necessary because the ultimate purpose of having visitors was to answer the question "why do we think this situation has indeed occured in the program?" by adding notes to the path, and the answer to that question is always above the current point in the path. Such simplification really helped.

This is why i'm trying to understand the entire thing before opting into a system. Especially when it comes to checker APIs for which overengineering is particularly dangerous - and I'm still not sure how exactly the new checker APIs are intended to work after a few pages of discussion. I don't care how they work as much as I care about knowing how they are ultimately intended to work and whether it's simple enough to understand and explain.

May i ask you to sketch the intended solution for @RedDocMD's problem and/or for your RetainCountChecker problem, i.e. how it'll look in the checker code? Or in any imaginary checker of your choice? Like, what's the motivating example? That could help me a lot with understanding your approach.

The existing pattern doesn't work, though. I honestly don't like how this tracking (as a fundamental process) is organized at all. It doesn't really matter what predicate or callback we introduce, it is still vague. It should be way more specific, like "I am tracking this value". Tracking involves a hell-lot of other things that we get "for free", but don't really understand them.

Interesting, can you give an example (say, in case of @RedDocMD's problem) where isTracked() would give an incorrect result but "I am tracking this value" would give a correct result? I guess such situations will arise more often as we add more tracking, but still.

I think it's a good time to step back and talk about statefulness of this whole thing.

Path sensitive checkers are stateless; their state is instead encoded in the ProgramState structure. This is essential because execution paths are branchy and nodes are modeled in a fairly arbitrary order.

Bug visitors are stateful. This is fine because every bug path is linear and traversed in order. Value tracks are linear too - whelp, most of the time, I suppose, but we don't mind basically forking our entire stateful machine whenever this is violated.

Note tags, on the other hand, are stateless again by design. This is why they are designed to rely on information in the bug report, such as interestingness, for communication across tags. Interestingness is an older concept that is simply re-used in note tags but from the perspective of note tags interestingness is a very simple thing: it's their state.

Information attached to the bug report is arbitrary. We can extend it as much as we want, make a data structure equivalent of the GenericDataMap that would track checker-specific variations over the concept of interestingness. It could decorate the specific role of every value as much as necessary. If we need to specify that this value is the same value that's seen by that other note tag (or participated in the emission of the warning itself in a certain manner), we can totally do that.

Hence the question, do we really need to preserve this whole imperative concept of "I" in "I am tracking this value", or is the functional concept of "This value is playing a certain role" entirely sufficient for our purposes?

Inlined defensive check suppressions are indeed pretty ugly; this problem should ideally be solved in a completely different manner.

That said, I strongly disagree that "when the tracker stops" is a hard-to-understand concept. It's an extremely natural and intuitive question to ask:

"Where does the value come from?"

It's fairly reasonable to wonder what happened during analysis that caused us to produce this value for the first time, which then traveled down to our bug point, and under what circumstances did it happen. Not much precise reasoning can be based on this information but it's a very good piece of information to drive imprecise reasoning and heuristics, such as (but not limited to) inlined defensive check suppressions (where it bears a relatively precise meaning that corresponds exactly to "state splits in nested stack frames are intrinsically less reliable as they can't ever be derived from presumption of no dead code").

Another example: a value of a top-level parameter variable is much more likely to be arbitrary than a return value of an unknown function call (if the user wanted pre-conditions they could have added an assert), which in turn is much more likely to be arbitrary than a value loaded from a heap region after store invalidation from unknown function call (we're 99% sure that the function doesn't touch this particular heap region so it's probably just the old value). In this sense, I'm pretty tempted to suppress bug paths where two branch conditions (ultimately going into opposite directions) are based on different values of the same heap variable after different number of invalidations. But I'm certainly against doing so for two different top-level parameter values. You may say that in these cases the information is most likely available from the structure of the symbol (SymbolRegionValue vs. SymbolConjured vs. SymbolDerived) but it only confirms my point: origins of values are so important that we even carry them as part of the value's identity! Direct access to the structure of the symbol for determining origins is also used a few times (eg., see call sites of getOriginRegion()).

OK, let's make a pause right here. We again start to go into other topics. You come from a pure perspective that track explains how the value got here and does only that. In this picture of the world, track has one starting point and one ending point. If it ends to early (due to the lack of domain knowledge), the checker can pick up and restart the tracking if needed. If it ends for real, this is the earliest where we got this value from and it can tell us about the warning and situation we are in. The problem is that tracking is fundamentally dirty the way it is. It is not linear, and when we track one simple value, it actually spawns a gigantic tree of traces. Thus when we get to the point where it all ended, it actually ended 100 times. It ended in multiple different visitors that were tracking all kinds of things not limited to the value itself. trackExpressionValue is hard also because of this reason. Inline defensive checks don't belong there and clutter it.

I DON'T KNOW how to simplify this ATM. Probably we can decouple ONE VALUE TRACKING from the rest of it, but it's not what I'm trying to do. I want to refactor the existing solution (with all of its inherent design flaws) into something more extensible. And I think that again ATM, with the current state of things, the concept of "tracking stopped here" is doomed.

What I'm trying to do here is to see a general direction in which we're moving. We have the benefit of being able to predict things. We don't have to be agile; our problem is well-defined from the start and will never really change. There's a bit of flexibility required to support a variety of different future checkers, yes, but that's about it. And then, again, it's also not entirely arbitrary: we have a pretty good amount of data on what our checkers typically do and it's unlikely to expand drastically in the near future. So we don't always need to go for the most flexible solution, but there's often benefit in going for a simpler solution that simply covers our predictable needs and nothing else.

As a historical example of such process, currently bug visitors perform a single pass over the bug report, from bottom to top. This wasn't always this way: originally visitors traversed the bug report multiple times (occasionally 30+ times), adding new visitors along the way, until the set of visitors attached to the bug report reached a fixed point. The original solution was very flexible; it allowed us to backreference values at the bottom in a flash of hindsight and have all visitors restarted gracefully. But in the entirety of our foreseeable circumstances this wasn't necessary because the ultimate purpose of having visitors was to answer the question "why do we think this situation has indeed occured in the program?" by adding notes to the path, and the answer to that question is always above the current point in the path. Such simplification really helped.

This is why i'm trying to understand the entire thing before opting into a system. Especially when it comes to checker APIs for which overengineering is particularly dangerous - and I'm still not sure how exactly the new checker APIs are intended to work after a few pages of discussion. I don't care how they work as much as I care about knowing how they are ultimately intended to work and whether it's simple enough to understand and explain.

It's not that it came from out of nowhere. That I just decided that trackExpressionValue is not extensible enough and put this all together. We saw with our own eyes that it didn't work for RetainCountChecker or SmartPtrChecker and started to come up with ridiculous workarounds to address it. I also see all the different pieces of other checkers that live inside of BugReportersVisitors.cpp. And they are there not because of checker developers failing to understand checker APIs, but rather tracking APIs failing them.

May i ask you to sketch the intended solution for @RedDocMD's problem and/or for your RetainCountChecker problem, i.e. how it'll look in the checker code? Or in any imaginary checker of your choice? Like, what's the motivating example? That could help me a lot with understanding your approach.

It is a bit weird to me after you correctly named all the properties of the proposed solution and described how it will look like in the new architecture, but OK.

Right now, if we go up the chain of tracking calls, we get to the point where tracker gets stuck or stops.
Instead of understanding that it stops, and modifying a hell-lot of code, and passing callbacks literally everywhere, I suggest we don't let it stop in the first place.
Let's say we have this example:

...
d = foo(e);
c = d;
Type b = c;
a = b;
reportPoint(a);

Here foo is either dyn_cast for RetainCountChecker or e.get() in SmartPtrChecker.
FindLastBRVisitor gets to d = foo(e) understands that this is assignment (maybe even can say that it gets the value stored into e), calls trackExpressionValue for foo(e) and it does nothing.

It is pretty high up the chain, so we probably don't want to repeat the logic of getting up there (@RedDocMD attempted that), but we couldn't insert our logic into this intimate dance of trackExpressionValue and FindLastBRVisitor.

With the new solution, if you want the most simple ways of addressing it, you create a handler:

class FooHandler : public ExpressionHandler {
public:
  using ExpressionHandler::ExpressionHandler;

  Tracker::Result handle(const Expr *E, const ExplodedNode *InputNode,
                         const ExplodedNode *ExprNode,
                         TrackingOptions Opts) {
    if (auto *Arg = getFooArgument(E)) {
      return getParentTracker().track(E, ExprNode, Opts);
    }
    return {};
};

Essentially these handlers is my way of first introducing new logic for these two checkers, and second to extract similar logic that exists there from all other checkers.

Then, let's say, I decided that I don't like the note "d is assigned here". It doesn't seem very specific and adding more notes on top will overwhelm our poor user even more. So, I decided to make notes: "d is foo'ed from e", "g is initialized to foo from e", and "p's foo is used as the 1st argument".
Right now, there is literally no way to do this except for adding this directly to already bloated FindLastStoreBRVisitor. Why do you think it already produces notes about "garbage values" and "null/nil pointers"?

So, the checker implements a StoreHandler where they handle all the cases. It is important to have it exactly like this because FindLastStoreBRVisitor has non-trivial logic of catching "stores" and we don't want this logic to get copied into different checkers.

And if we are talking where it can go further, this particular approach, is defining more "events" in the tracking process and decoupling "event identification" from the process of generating notes for such events.

Interesting, can you give an example (say, in case of @RedDocMD's problem) where isTracked() would give an incorrect result but "I am tracking this value" would give a correct result? I guess such situations will arise more often as we add more tracking, but still.

I'm not super sure, but I think smth like this will suffice:

// a and b are completely unrelated smart pointers
raw = a.get();
if (!b.get() && !a.get()) {
  use(*raw);
}

We can attempt to track b.get() because of the conditions. Checker wouldn't get that we are actually interested in conditions and will initiate a full-blown back-trace for b even though we don't need it.

I think it's a good time to step back and talk about statefulness of this whole thing.

Path sensitive checkers are stateless; their state is instead encoded in the ProgramState structure. This is essential because execution paths are branchy and nodes are modeled in a fairly arbitrary order.

Bug visitors are stateful. This is fine because every bug path is linear and traversed in order. Value tracks are linear too - whelp, most of the time, I suppose, but we don't mind basically forking our entire stateful machine whenever this is violated.

Note tags, on the other hand, are stateless again by design. This is why they are designed to rely on information in the bug report, such as interestingness, for communication across tags. Interestingness is an older concept that is simply re-used in note tags but from the perspective of note tags interestingness is a very simple thing: it's their state.

Information attached to the bug report is arbitrary. We can extend it as much as we want, make a data structure equivalent of the GenericDataMap that would track checker-specific variations over the concept of interestingness. It could decorate the specific role of every value as much as necessary. If we need to specify that this value is the same value that's seen by that other note tag (or participated in the emission of the warning itself in a certain manner), we can totally do that.

Hence the question, do we really need to preserve this whole imperative concept of "I" in "I am tracking this value", or is the functional concept of "This value is playing a certain role" entirely sufficient for our purposes?

I'm glad that you brought this up and don't understand why you don't see it, really. Visitors are stateful. Hmm, I don't like it because it can be confusing because of program state and so on. So, let's say "Visitors are mutable". They can actually change, carry one piece of data. Tracker/handler mechanism is the same, you are allowed to carry some mutable data through the process. Maybe it is bad, but I don't see the way to get isTracked to solve all the problems that I want to solve (it is probably trying to solve bigger problems). Can we move inline defensive checks out of the BugReportVisitors.cpp with isTracked only?

And yes, checkers should be able to say "I track this value". It is the operation that makes sense only if it is helpful for the checker, and the checker should be able to customize it however the checker pleases.

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
147

Gotcha! I'll add it!

NoQ added a comment.Jun 7 2021, 8:01 PM

OK, let's make a pause right here. We again start to go into other topics.

These are not other topics. We're discussing the overall direction into which this patchset is a large step. I absolutely welcome this step and am fascinated to see how it turns out but I want us to agree on the overall direction first.

You come from a pure perspective that track explains how the value got here and does only that. In this picture of the world, track has one starting point and one ending point. If it ends to early (due to the lack of domain knowledge), the checker can pick up and restart the tracking if needed. If it ends for real, this is the earliest where we got this value from and it can tell us about the warning and situation we are in. The problem is that tracking is fundamentally dirty the way it is. It is not linear, and when we track one simple value, it actually spawns a gigantic tree of traces. Thus when we get to the point where it all ended, it actually ended 100 times. It ended in multiple different visitors that were tracking all kinds of things not limited to the value itself. trackExpressionValue is hard also because of this reason. Inline defensive checks don't belong there and clutter it.

I DON'T KNOW how to simplify this ATM. Probably we can decouple ONE VALUE TRACKING from the rest of it, but it's not what I'm trying to do. I want to refactor the existing solution (with all of its inherent design flaws) into something more extensible. And I think that again ATM, with the current state of things, the concept of "tracking stopped here" is doomed.

The reason why I come from this perspective is because that's how I think it is supposed to work and I'm not aware of a single good reason why it can't. I think the reason why it's this way is that the original authors seem to have bombarded the code with extra clauses until it starts demonstrating the more or less expected behavior, without trying to solve the problem on paper first. Like with visitor fixpoint iteration, it sounds like an unnecessary piece of complexity that we will be ultimately able to eliminate.

Except, well, the distant constraints example a-la @RedDocMD's problem but for raw pointers. This is indeed an exceptional case which may cause two tracking efforts to run in parallel. This can be probably generalized to any constraint-like traits: if we want to explain a constraint, we'll have to be prepared that the constraint isn't part of the value track that we already have. This is the first time the above assumption was challenged and I definitely need time to process this but I still think there shouldn't be any other situations like this and there should be something special about this situation that will allow us to incorporate it into the linear-track solution.

"At the moment such refactoring is difficult to achieve" is a perfectly satisfying answer to me with respect to the current state of things. But I want us to agree as early as possible about whether we want to achieve this. Simply so that as we make every step, we keep this in mind and evaluate every step as taking us either closer to this goal or farther from this goal and try to avoid the latter. And I don't think that anything prevents us from discussing this goal right now; the problem we're trying to solve isn't really ever going to change and we already have a fairly large amount of data.

With the new solution, if you want the most simple ways of addressing it, you create a handler:

class FooHandler : public ExpressionHandler {
public:
  using ExpressionHandler::ExpressionHandler;

  Tracker::Result handle(const Expr *E, const ExplodedNode *InputNode,
                         const ExplodedNode *ExprNode,
                         TrackingOptions Opts) {
    if (auto *Arg = getFooArgument(E)) {
      return getParentTracker().track(E, ExprNode, Opts);
    }
    return {};
};

Essentially these handlers is my way of first introducing new logic for these two checkers, and second to extract similar logic that exists there from all other checkers.

Ok, do I understand correctly that:

  • Functionality-wise, this is entirely equivalent to my NoteTag solution with "isTracked()";
  • Boilerplate-wise it makes the isTracked() check implicit ("we were invoked at all therefore the value of the expression is tracked by... somebody") at the cost of asking the user to define a class;
  • Convenience-wise, it potentially requires the developer to reverse-engineer the symbolic execution step when the logic gets more complicated;
  • Flexibility-wise... I guess you can make custom trackers now, which will track the value in a different manner, and communicate with the tracker in order to produce different notes?

Then, let's say, I decided that I don't like the note "d is assigned here". It doesn't seem very specific and adding more notes on top will overwhelm our poor user even more. So, I decided to make notes: "d is foo'ed from e", "g is initialized to foo from e", and "p's foo is used as the 1st argument".
Right now, there is literally no way to do this except for adding this directly to already bloated FindLastStoreBRVisitor. Why do you think it already produces notes about "garbage values" and "null/nil pointers"?

So, the checker implements a StoreHandler where they handle all the cases. It is important to have it exactly like this because FindLastStoreBRVisitor has non-trivial logic of catching "stores" and we don't want this logic to get copied into different checkers.

And if we are talking where it can go further, this particular approach, is defining more "events" in the tracking process and decoupling "event identification" from the process of generating notes for such events.

Yeah, the store handler part seems perfectly legit because *I* don't see any better approaches here :) It's indeed a big problem that we can't customize existing notes and this sounds like the right data structure to solve the problem.

Interesting, can you give an example (say, in case of @RedDocMD's problem) where isTracked() would give an incorrect result but "I am tracking this value" would give a correct result? I guess such situations will arise more often as we add more tracking, but still.

I'm not super sure, but I think smth like this will suffice:

// a and b are completely unrelated smart pointers
raw = a.get();
if (!b.get() && !a.get()) {
  use(*raw);
}

We can attempt to track b.get() because of the conditions. Checker wouldn't get that we are actually interested in conditions and will initiate a full-blown back-trace for b even though we don't need it.
(...)
And yes, checkers should be able to say "I track this value". It is the operation that makes sense only if it is helpful for the checker, and the checker should be able to customize it however the checker pleases.

Ok, I don't understand! If we have anyway tracked the value back to b.get(), why not track further through b? What makes b.get() the exact place at which we want to stop tracking the value? If we wanted to know why we choose the true path at the branch, shouldn't we track further through b as well in order to answer that question? If we didn't want to know why we choose the true path, why did we track the value back to b.get() in the first place?

I guess one possible difference between a and b here would be in messaging, i.e. how we refer to that value. For example, we could refer to a as "the pointer value" (implying that this is the null-dereferenced pointer value) and refer to b as "a pointer value" (implying that it's some other value, maybe the one that participates in condition later). But even then, this distinction is much more coarse-grained than "I track the value" vs. "Someone else tracks the value". It's more like "Buggy value" vs. "Condition value" which is a distinction we already "almost" make with TrackingKind. Even if we wanted to identify our values as "pointer #1", "pointer #2", etc., we could totally do with a sufficiently advanced definition of isTracked().

Can we move inline defensive checks out of the BugReportVisitors.cpp with isTracked() only?

My original thinking was along the lines of, eg.,:

void ExprEngine::processBranch(...) {
  Bldr.addTransition(..., getNoteTag([] (BugReport &BR) {
    if (BR.isTracked(Condition) && !N->getLocationContext().inTopFrame())
      BR.markInvalid();
  }));
}

And this is how, basically, the entirety of trackExpressionValue() could be decomposed into individual stateless note tags, with all the necessary state stuffed into the bug report. Including the tracking process itself: if note tags are sufficient to help with tracking, they're sufficient to do the tracking from the start.

This code could also be moved into a checker's checkBranchCondition() so that to make it truly checker-specific. That would come at the cost of creating a lot of redundant exploded nodes just for the sake of attaching the tag that will probably never be used so this is rather bad, wouldn't recommend.

I still think the solution that relies on well-defined end of tracking is significantly more elegant. Which, again, could be implemented with note tags: we could have a note tag that'd call a callback stuffed directly into the bug report.

So I think it can work both ways given that the tags solution would rely on having a sufficiently flexible mutable state as part of the BugReport object.

These are not other topics. We're discussing the overall direction into which this patchset is a large step. I absolutely welcome this step and am fascinated to see how it turns out but I want us to agree on the overall direction first.

OK, fair point, but for further plan, we need to have an incremental model of how we want to achieve it.

The reason why I come from this perspective is because that's how I think it is supposed to work and I'm not aware of a single good reason why it can't. I think the reason why it's this way is that the original authors seem to have bombarded the code with extra clauses until it starts demonstrating the more or less expected behavior, without trying to solve the problem on paper first. Like with visitor fixpoint iteration, it sounds like an unnecessary piece of complexity that we will be ultimately able to eliminate.

Right now it is all very tightly coupled. It is a lot of moving pieces that I don't think anyone fully understands. If we take time and decompose existing tracking into handlers, it can be a step in getting to know some implicit connections between different tracking activities we have right now.

Except, well, the distant constraints example a-la @RedDocMD's problem but for raw pointers. This is indeed an exceptional case which may cause two tracking efforts to run in parallel. This can be probably generalized to any constraint-like traits: if we want to explain a constraint, we'll have to be prepared that the constraint isn't part of the value track that we already have. This is the first time the above assumption was challenged and I definitely need time to process this but I still think there shouldn't be any other situations like this and there should be something special about this situation that will allow us to incorporate it into the linear-track solution.

"At the moment such refactoring is difficult to achieve" is a perfectly satisfying answer to me with respect to the current state of things. But I want us to agree as early as possible about whether we want to achieve this. Simply so that as we make every step, we keep this in mind and evaluate every step as taking us either closer to this goal or farther from this goal and try to avoid the latter. And I don't think that anything prevents us from discussing this goal right now; the problem we're trying to solve isn't really ever going to change and we already have a fairly large amount of data.

I get it, but I think we got stuck and should put goal first and think about what can be done to achieve this. Is it bad that tracking exists and the analyzer reverse engineers what it did itself? - No doubt! Is note mechanism a good alternative? - Sure! Is it enough right now? - No. Let's think how we can get there with incremental changes.

Ok, do I understand correctly that:

  • Functionality-wise, this is entirely equivalent to my NoteTag solution with "isTracked()";

Not really, you also get the context of tracking: two nodes and options (so you can figure out if it's being tracked thoroughly or as a condition).

  • Boilerplate-wise it makes the isTracked() check implicit ("we were invoked at all therefore the value of the expression is tracked by... somebody") at the cost of asking the user to define a class;

I suppose we can put it this way.

  • Convenience-wise, it potentially requires the developer to reverse-engineer the symbolic execution step when the logic gets more complicated;

I don't see ANY difference with the situation where we use isTracked. If you talk about notes, you can access them here as well. If you don't, then both ways require the same.

  • Flexibility-wise... I guess you can make custom trackers now, which will track the value in a different manner, and communicate with the tracker in order to produce different notes?

Exactly, tracker object is the thing that is guaranteed to stay whenever you go with tracking, and since you can customize it, you can affect every part of the tracking process.

Yeah, the store handler part seems perfectly legit because *I* don't see any better approaches here :) It's indeed a big problem that we can't customize existing notes and this sounds like the right data structure to solve the problem.

And we do need to have a central customizable object (aka Tracker) that we pass around the tracking process to make it happen, right?

Ok, I don't understand! If we have anyway tracked the value back to b.get(), why not track further through b? What makes b.get() the exact place at which we want to stop tracking the value? If we wanted to know why we choose the true path at the branch, shouldn't we track further through b as well in order to answer that question? If we didn't want to know why we choose the true path, why did we track the value back to b.get() in the first place?

I guess one possible difference between a and b here would be in messaging, i.e. how we refer to that value.

Bingo!

For example, we could refer to a as "the pointer value" (implying that this is the null-dereferenced pointer value) and refer to b as "a pointer value" (implying that it's some other value, maybe the one that participates in condition later). But even then, this distinction is much more coarse-grained than "I track the value" vs. "Someone else tracks the value". It's more like "Buggy value" vs. "Condition value" which is a distinction we already "almost" make with TrackingKind. Even if we wanted to identify our values as "pointer #1", "pointer #2", etc., we could totally do with a sufficiently advanced definition of isTracked().

Correct!

Can we move inline defensive checks out of the BugReportVisitors.cpp with isTracked() only?

My original thinking was along the lines of, eg.,:

void ExprEngine::processBranch(...) {
  Bldr.addTransition(..., getNoteTag([] (BugReport &BR) {
    if (BR.isTracked(Condition) && !N->getLocationContext().inTopFrame())
      BR.markInvalid();
  }));
}

And this is how, basically, the entirety of trackExpressionValue() could be decomposed into individual stateless note tags, with all the necessary state stuffed into the bug report. Including the tracking process itself: if note tags are sufficient to help with tracking, they're sufficient to do the tracking from the start.

Maybe, or maybe not. Maybe it takes more context than we think. You essentially assume that the only thing that we need to reimplement is one boolean variable isTracked and the expression that is being tracked. I doubt it.
And it is harder to debug polling approach as opposed to the callback approach.

This code could also be moved into a checker's checkBranchCondition() so that to make it truly checker-specific. That would come at the cost of creating a lot of redundant exploded nodes just for the sake of attaching the tag that will probably never be used so this is rather bad, wouldn't recommend.

Why not model defensive checks in the checker, so we don't make undesired assumptions in the first place?

I still think the solution that relies on well-defined end of tracking is significantly more elegant. Which, again, could be implemented with note tags: we could have a note tag that'd call a callback stuffed directly into the bug report.

I'm not sure I understood the last thing, can you please elaborate?

So I think it can work both ways given that the tags solution would rely on having a sufficiently flexible mutable state as part of the BugReport object.

OK, let me re-iterate to get some things straight.
Refactoring of trackExpressionValue is good, but we need to think further (and even further). You have a dream of getting rid of visitors and tracking altogether so that users don't need to reverse engineer their own checkers (preferably using tags). As a matter of transformation, you see that tracking should provide two new concepts: isTracking predicate (similar to the existing interesting-ness mechanism) and the way to detect where the tracking stopped. You also don't want tracking to be a gigantic tree and have everything, but to have one source of value without anything else. All other parts should be decoupled and maybe even given a different name. We still want to remove all different parts of other checkers from BugReporterVisitors.cpp and probably can do it using just one abstraction of isTracking to do that.

And here is my take on things (it is a bit chaotic, sorry about that):

  • If we want to move to isTracking, we still need a central component that participates in the tracking or something that is shared through all tracking logic. In my patches it is a Tracker object. It can be done differently by passing such object directly into every trackExpressionValue and its sub-visitors, but probably something more extensible would be a preference here.
  • If we want to split trackExpressionValue and extract checker-specific parts, we need a good and reliable way of splitting it. I used a time-tested approach of function extraction: if we have foo(A a, B b, C c) { ... } with N independent parts, we can refactor them into foo(A a, B b, C c) { foo1(a, b, c); foo2(a, b, c); ... fooN(a, b, c); }. If one of the fooK is big, we can repeat the same procedure. Instead of doing it plainly, I decided that the user might be interested in putting their own fooK into the mix.
  • Can we implement isTracked first and then split trackExpressionValue? My answer is no. It implies bigger refactoring because isTracking doesn't provide all the bits of A a, B b, C c that the logic used to receive before. One example here is the SuppressInlineDefensiveChecksVisitor. When we add it, we give it a node where the tracking of the expression was initiated, not the node with expression itself. We also have a very specific comment there that this is a requirement and expression own node might be "too late". If you use isTracked in the Visitor you will be automatically in the "too late" situation. And it is only one example. So, I insist that isTracked couldn't be the initial step for refactoring.
  • Can we implement "tracking has stopped mechanism" first and then split trackExpressionValue? My answer is no. Tracking right now is a tree and it is very convoluted. There is no one single point where it stops, it stops 100 of times. The concept of "stopped" in this situation is not useful (I'm still not convinced that it's useful at all, but this is a part of a different conversation). In order to fix that we need to split trackExpressionValue. Contradiction.
  • Can we implement isTracked afterwards? Yes, we can. Implementation of isTracked on it's own is fairly simple. And we can have users of isTracked and ExpressionHandler at the same time. Maybe, once we migrate all uses, ExpressionHandler can be safely retired.
  • Can we implement "tracking has stopped mechanism". Probably. This is something that I doubt in usefulness, and don't really want to plan.
  • Is there a way to get rid of tracking? Honestly, I don't think so. I do believe, though, that it is possible to stop reverse engineering ourselves. While the analyzer explores different paths, both the core and checkers somehow affect the state. We can cal such a change of the state as "event". Events can be both relevant and irrelevant (e.g. happened to a different smart pointer) to the issue that is found. Many of these events are pivotal to the user's understanding of the issue and we should explain them with notes. The way to understand that it is relevant is through tracking. We might want to keep these traces as we go, but it might be some extra price that we don't want to pay.
  • Let's get back to the events! Can we build our entire solution around them? I do think so, actually. I think that the analyzer can produce a bunch of events (maybe implemented as note tags). Then each bug report can provide a visitor, not exploded graph visitor as it is now, but event visitor. It will be automatically called for relevant events only (with tracking hidden under the hood). Many events will have reasonable defaults so you don't need to write your own implementations every time (e.g. "a is assigned with b here"), but you can always override them. This is something that I started with StoreHandler because "store" is one of those events. It is different though from the perfect vision because we don't want to reverse engineer what the engine/the checker did before, perfectly we would have this information in the node (or on the edge) that "store event happened". Tracker will be very trivial in this plan.
  • Is it possible to be done incrementally? I do think so. We can extract different events from trackExpressionValue and change them into various EventHandlers (like StoreHandler). Probably, in the process, we will understand what kind of data describes each type of events. Another step is pretty hard, we need to design how we are going to generate (easier) and store (harder) events during analysis. It should be pretty implicit, so it has a lot of sensible defaults produced by the core. And checker developers should be able to generate their own events (maybe even for other checkers, if they model something that everyone can benefit from). I think it will be possible to generate one event type at a time, and change how we handle it in the tracker, i.e. move from reverse engineering to detecting relevant events in the graph. Once we are finished with this, we can extract a separate thing called EventVisitor and migrate a big chunk of Tracker and EventHandler interfaces there.
  • Why do we even need to introduce tracker and event handlers if we are going to hide/delete them in this plan? The trick is about making this whole plan manageable. One enormous refactoring is hard to plan and test. Big work can demotivate and drain if you don't see the end of it. I believe that we can make great things if we take a lot of baby steps.

Phew, I hope this answers it.

NoQ added a comment.Jun 9 2021, 9:37 PM
  • ... Implementation of isTracked on it's own is fairly simple. And we can have users of isTracked and ExpressionHandler at the same time. Maybe, once we migrate all uses, ExpressionHandler can be safely retired.

I think this sounds like a plan. More specifically, if we could maintain a map of all currently tracked expressions inside the PathSensitiveBugReport object, dynamically updated as tracking proceeds (expressions added as they're tracked and removed as soon as they're no longer necessary), with values in the map being arbitrary auxiliary information we want to pass around (eg., an explanation of how to refer to the value in notes), then all tracking machineries, regardless of how they're implemented, could communicate to each other easily through that map.

I still think the solution that relies on well-defined end of tracking is significantly more elegant. Which, again, could be implemented with note tags: we could have a note tag that'd call a callback stuffed directly into the bug report.

I'm not sure I understood the last thing, can you please elaborate?

I was thinking of something like:

void NullDereferenceChecker::checkLocation(...) {
  trackExpressionValue(
      BR, Ex, /*completionHandler=*/ [](const ExplodedNode *N) {
        if (N->getLocationContext()->inTopFrame())
          BR.markInvalid();
      }); // stashes completionHandler into the aforementioned tracking map
          // inside BR for Ex.
}

void ExprEngine::processBranch(...) {
  Bldr.addTransition(..., getNoteTag([](BugReport &BR) {
    if (BR.isTracked(Condition)) {
      // We've tracked a null pointer back to the assumption.
      // Our job here is done. Look up the callback in the tracking map.
      if (auto completionHandler = BR.getTrackingCompletionHandler(Condition))
        (*completionHandler)();
    }
  }));
}
  • Why do we even need to introduce tracker and event handlers if we are going to hide/delete them in this plan? The trick is about making this whole plan manageable. One enormous refactoring is hard to plan and test. Big work can demotivate and drain if you don't see the end of it. I believe that we can make great things if we take a lot of baby steps.

I completely agree. Incremental development ftw! I guess I'll keep an eye on whether we could convert some of your newly extracted handlers to note tags immediately (but most will probably require the expression tracking map to be implemented first).

This code could also be moved into a checker's checkBranchCondition() so that to make it truly checker-specific. That would come at the cost of creating a lot of redundant exploded nodes just for the sake of attaching the tag that will probably never be used so this is rather bad, wouldn't recommend.

Why not model defensive checks in the checker, so we don't make undesired assumptions in the first place?

That's pretty difficult. We can't enter (or skip) a branch without recording its respective condition. We can't continue analysis unless we enter (or skip) the branch. Once the branch has exited, we can't get rid of the constraint yet because we may encounter the same condition again and we're obliged to be consistent. Even when we've exited the nested stack frame we're still obliged to be consistent, unless we're also willing to completely drop the execution path inside the function and replace it with some form of summary. Or, you know, do summaries from the start.


Ok, I think I mostly understand where this is going, thanks!

I'll now dive into code.

NoQ added inline comments.Jun 9 2021, 9:47 PM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
160–161

Empty deques can be pretty large:

https://en.cppreference.com/w/cpp/container/deque
e.g. 8 times the object size on 64-bit libstdc++; 16 times the object size or 4096 bytes, whichever is larger, on 64-bit libc++

If we're attaching multiple trackers per non-deduplicated bug report I suspect this can easily amount to like 20-30 MB per process.

I suspect that a list would be cheaper in every way, esp. given that we don't need random access.

210

That said, I'd rather not force the caller to perform a getAs<>(). It's pretty cumbersome and we can do the right thing (ignore it) in the callee anyway.

vsavchenko added inline comments.Jun 10 2021, 2:09 AM
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
110

It is not really about suppressing it at all for all checkers. Such option already exists in analyzer options.
This one is used specifically for situations when globally suppressions are on, but don't make sense for that particular type (like array index).

210

OK, I see, so we can call it with whatever value, but track only known values, correct?

vsavchenko marked 4 inline comments as done.

Fix review remarks

NoQ accepted this revision.Jun 10 2021, 11:39 PM

Ok this is amazing, no more comments!!

clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
210

Yup!

This revision is now accepted and ready to land.Jun 10 2021, 11:39 PM
This revision was landed with ongoing or failed builds.Jun 11 2021, 2:51 AM
This revision was automatically updated to reflect the committed changes.

Sorry for the absence, took my time to catch up with this series! Really interesting insight into how a new interface is being designed! I need some time to digest things, but can't wait to help on the remaining patches.

Sorry for the absence, took my time to catch up with this series! Really interesting insight into how a new interface is being designed! I need some time to digest things, but can't wait to help on the remaining patches.

Hey, thanks! Here is one example of how it can be useful for checkers (D104136), maybe it can help to form a full picture.