This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Reimplement UnreachableCodeChecker using worklists
Needs ReviewPublic

Authored by steakhal on Jun 15 2022, 9:36 AM.

Details

Summary

This checker suffers from many false-positives.
I decided to have a look at them and I found multiple areas in which
we could improve.

  1. We should ignore try statements for now, until #55621 is resolved by fixing the CFG generation for try-catch and other exception-related constructs.
  2. If some checker placed a sink node into some block, the successors of the sink block was marked unreachable previously. This leads to an unpleasant situation from the user's point of view since a different bugreport gets correlated with the FPs caused by the sink node of that. This effectively means that for each unconditional sink, we will also have (preferable) an unreachable code report as well. Now, we ignore unreachable node x if the predecessors of x are reachable, but none of the successors of the predecessors of x are reachable. This case is demonstrated in the magic_clamp example later, where the B10 block has a division by zero fatal bug, which sinks all execution paths making both successor blocks (B8, B9) unreachable.
  3. The sweep-back part cannot be implemented correctly via a DFS visitation. This part supposed to filter the unreachable nodes to keep only the first unreachable block in the CFG. This is done to minimize the number of dead statement reports. It needs to be done in a breath-search manner, using a worklist. I'll express later why in the magic_clamp example, where I demonstrate step-by-step the previous algorithm goes wide.
  4. Imagine a completely detached CFG block segment, e.g.:
void test10_chained_unreachable(void) {
  goto end;
a:
  goto b;
b:
  goto c;
c:
  goto a;
end:;
}

This produces this CFG:

#6(entry)      #2(goto a;)
 |              |        ^
#5(goto end;)   |         \
 |             #4(goto b;) |
#1(end:)        |          |
 |             #3(goto c;) |
#0(exit)         \________/

Previously, the checker only reported B2 as dead, since that was the
first block which it encountered.
Which means that the swipe-back part kept only the initial node as
dead, and marking the rest (B3, B4) as reachable.
IMO picking artificially that block makes no sense. We should
either mark all or none in this case. Why would be 'only' the B2 dead?


The previous algorithm:

  1. Walk each visited ExplodedNode, find which corresponds to basic-block enter events and mark the entered CFG block as reachable by adding it to the corresponding set.
  2. Walk all the CFG blocks starting with the artificial CFG exit node (ID 0). Do a backward DFS to find the start of each unreachable CFG block chain, to only report unreachable diagnostics to the very first unreachable block. I refer to this step as the sweep-back part. We do this by inserting the uninteresting CFG block IDs into the reachables set. Note that the algorithm also maintains a visited block set, to account for circles in the CFG.
  3. Iterate over the CFG blocks, and if the block is not present in the reachables set, then it's a candidate for diagnostic. We do some trivial checks to filter out common FPs, and emit the report otherwise.

Why do we need to do the sweep-back part in BFS order?
Here is some code demonstrating this:

int magic_clamp(int x) {
  if (x != 0) return -1;
  int v1 = 100 / x; // Div by zero

  int clamped;
  if (v1 < 0)
    clamped = 0;
  else if (v1 > 100)
    clamped = 100;
  else if (v1 % 2 == 0)
    clamped = 0;
  else if (v1 % 2 == -1)
    clamped = -1;
  else
    clamped = v1;

  return clamped;
}

The resulting CFG looks like this:

         <13>(entry)
          |
         <12>
         /  \      Legend:
       <10> |
       /  \ |       <N> : reachable   block with the id N
      8   9 |        N  : unreachable block with the id N
     / \  | |
    6  7  | |
   / \ |  | |
  4  5 |  | |
 / \ | |  | |
2  3 | |  | |
 \  \| | /  |
  \  |/ /   |
   \ | /    |
    \|/     |
     1    <11>
      \   /
       <0>(exit)

Initially, B13, B12, B10, B11, B0 are reachable, thus the rest are
unreachable.
The swipe-back phase starts from B0 and marks B1, B2, B4, B6
reachable until it reaches the first unreachable block (B8), whose parent
is reachable. After this, it backtracks and checks if B3 should be
refuted or not. It finds that the single predecessor block of B3 is
reachable, thus we should report the B3 as unreachable.
I leave the rest for the reader, but in the end, we end up having B3, B5,
B8, B9 unreachable, thus 4 reports were generated for this example
previously.

If we were doing the swipe-back phase in BFS order, we would have only
B8 and B9 blocks as unreachable - as we should.

However, I'm suppressing these since they only exist because some
checker (in this case DivByZeroChecker) sank all paths reaching B8 and
B9. Consequently, after fixing the div-by-zero bug, they would become
reachable again.


Unfortunately, even with this new implementation, the number of
false-positives is still too high to move this out of alpha.
I've checked a couple of those reports, and I could not find any obvious
patterns. They certainly are, but I'd need to reduce some cases and have
a deeper look - which I haven't done.
IMO even this is a good increment which is worth considering to land.

My measurements did not show any crashes, or runtime regressions.
Regarding the reports:

  • In both the baseline, and in this version: 13868
  • Disappeared: 4560 (32.88%)
  • New: 190 (1.37%)

So, this implementation would not flood the users with new reports,
'only' remove a couple of annoying ones.

Diff Detail

Event Timeline

steakhal created this revision.Jun 15 2022, 9:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 9:36 AM
steakhal requested review of this revision.Jun 15 2022, 9:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2022, 9:36 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

If some checker placed a sink node into some block, the successors of the sink block was marked unreachable previously. This leads to an unpleasant situation from the user's point of view since a different bugreport gets correlated with the FPs caused by the sink node of that. This effectively means that for each unconditional sink, we will also have (preferable) an unreachable code report as well.

An example and/or a visualization would be useful for this.

Now, we ignore unreachable nodes if the predecessor is reachable, but none of its successors are.

Isn't there a contradiction here? The node's predecessor must have at least one reachable successor, the node itself; otherwise the node itself would be unreachable.

the sweep-back part cannot be implemented correctly via a DFS visitation

What is the purpose of this sweep-back? You should summarize that here.

... Do a backward DFS to find the start of each unreachable CFG block chain ...

How does it find the start if there is a cycle in the graph? Isn't it rather about finding strongly connected components?

The sweep-back part cannot be implemented correctly via a DFS visitation ... Why do we need to do the sweep-back part in BFS order?

Connected components can be found by both DFS or BFS. Please elaborate why should we prefer BFS? The problem you are trying to solve seems to be an unnecessarily convoluted problem to me. Shouldn't it be enough to simply pick one node (or all of them) from the connected component? It should not matter if we use DFS or BFS for finding the nodes in a connected component.

steakhal edited the summary of this revision. (Show Details)Jun 16 2022, 9:33 AM

If some checker placed a sink node into some block, the successors of the sink block was marked unreachable previously. This leads to an unpleasant situation from the user's point of view since a different bugreport gets correlated with the FPs caused by the sink node of that. This effectively means that for each unconditional sink, we will also have (preferable) an unreachable code report as well.

An example and/or a visualization would be useful for this.

It's quite hard. I could add step numbers so that I could mark which block is being entered and exited the visitation in the recursion.

Now, we ignore unreachable nodes if the predecessor is reachable, but none of its successors are.

Isn't there a contradiction here? The node's predecessor must have at least one reachable successor, the node itself; otherwise the node itself would be unreachable.

I tried to cheap out the mouthful definition. I'm not sure if it helps more than hinders this way.

the sweep-back part cannot be implemented correctly via a DFS visitation

What is the purpose of this sweep-back? You should summarize that here.

Yea, done.

... Do a backward DFS to find the start of each unreachable CFG block chain ...

How does it find the start if there is a cycle in the graph? Isn't it rather about finding strongly connected components?

I left circle detection to the reader. Now, I'm explicitly mentioning the maintenance of the visited block set.

The sweep-back part cannot be implemented correctly via a DFS visitation ...

Why do we need to do the sweep-back part in BFS order?

Due to the recursive visitation order, the basic blocks get inserted to the reachable blocks set sooner, than another unreachable block would get visited.
In the magic_clamp example the recursive (DFS) visitation starts from B0, enters B1, recurses further up to B2 then to B4 and B4. This is when the algorithm starts popping frames (backtracking).
It backtracks, backtracks ... back to B1, where it has another predecessor (B3), hence it recurses on that path as well. But this time, it will find that the only predecessor of B3 is reachable, hence it will leave B3 as unreachable. And this is the problem, because we should not report a dead code bug there. Consequently, we should not extend the reachable blocks set in DFS visitation order. However, if we were using BFS visitation order, we can never end up in this situation.

Connected components can be found by both DFS or BFS. Please elaborate why should we prefer BFS? The problem you are trying to solve seems to be an unnecessarily convoluted problem to me. Shouldn't it be enough to simply pick one node (or all of them) from the connected component? It should not matter if we use DFS or BFS for finding the nodes in a connected component.

I don't think any of the issues mentioned in this patch relates to strongly connected components, thus I don't think I can answer to this question.

I don't think any of the issues mentioned in this patch relates to strongly connected components, thus I don't think I can answer to this question.

In your example above, repeated here:

#6(entry)      #2(goto a;)
 |              |        ^
#5(goto end;)   |         \
 |             #4(goto b;) |
#1(end:)        |          |
 |             #3(goto c;) |
#0(exit)         \________/

[#2, #4, #3] is a strongly connected (and unreachable) component of the CFG, isn't it?

I don't think any of the issues mentioned in this patch relates to strongly connected components, thus I don't think I can answer to this question.

In your example above, repeated here:

#6(entry)      #2(goto a;)
 |              |        ^
#5(goto end;)   |         \
 |             #4(goto b;) |
#1(end:)        |          |
 |             #3(goto c;) |
#0(exit)         \________/

[#2, #4, #3] is a strongly connected (and unreachable) component of the CFG, isn't it?

Upsz, I've made a mistake. I wanted to write connected components without the strongly adverb. Please remove the strongly from all my comments above.

I don't think any of the issues mentioned in this patch relates to strongly connected components, thus I don't think I can answer to this question.

In your example above, repeated here:

#6(entry)      #2(goto a;)
 |              |        ^
#5(goto end;)   |         \
 |             #4(goto b;) |
#1(end:)        |          |
 |             #3(goto c;) |
#0(exit)         \________/

[#2, #4, #3] is a strongly connected (and unreachable) component of the CFG, isn't it?

Right; those three blocks are unreachable in the CFG.

Let me clarify that this (previous) example has nothing to do with the visitation order. For that, yes either BFS and DFS order would work.
The magic_clamp example supposed to underpin the rationale behind choosing BFS instead of DFS.
In the summary, you will find a step-by-step playthrough how the DFS visitation worked previously, and resulted in falsely leaving B3 and B5 unreachable due to the order in which their predecessor nodes were visited. Let me know if it helped.