This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Try to minimize the number of equivalent bug reports evaluated by the refutation manager
ClosedPublic

Authored by mikhail.ramalho on Jul 23 2018, 1:37 PM.

Details

Summary

This patch changes how the SMT bug refutation runs in an equivalent bug report class.

Now, all other visitor are executed until they find a valid bug or mark all bugs as invalid. When the one valid bug is found (and crosscheck is enabled), the SMT refutation checks the satisfiability of this single bug.

If the bug is still valid after checking with Z3, it is returned and a bug report is created. If the bug is found to be invalid, the next bug report in the equivalent class goes through the same process, until we find a valid bug or all bugs are marked as invalid.

Massive speedups when verifying redis/src/rax.c, from 1500s to 10s.

Diff Detail

Repository
rL LLVM

Event Timeline

This is an alternative to D49669.

Currently, we not only check the one path that is believed to be true, but all previous explored in the BFS. Maybe it's enough to check only the one believed to be true? @NoQ, @george.karpenkov

mikhail.ramalho retitled this revision from [analyzer] Check all equivalent bug reports in one formula, instead of individually to [analyzer] [WIP] Check all equivalent bug reports in one formula, instead of individually.Jul 23 2018, 1:38 PM
  1. Impressive.

Currently, we not only check the one path that is believed to be true, but all previous explored in the BFS. Maybe it's enough to check only the one believed to be true

That's what makes the problem hard. We have an equivalence class of paths, and we don't know yet which one the visitors will invalidate.
So we loop through all of them until we find the one which is not invalidated.
They are all "believed to be true" - until proven otherwise, but if we do it step by step we end up running the refutation thousands of times.

I don't see a way around having to insert auxiliary variables into the formula, interpreting them, and then figuring out which path did they correspond to back from the model the solver gives.

That's what makes the problem hard. We have an equivalence class of paths, and we don't know yet which one the visitors will invalidate.
So we loop through all of them until we find the one which is not invalidated.
They are all "believed to be true" - until proven otherwise, but if we do it step by step we end up running the refutation thousands of times.

I don't see a way around having to insert auxiliary variables into the formula, interpreting them, and then figuring out which path did they correspond to back from the model the solver gives.

But from the code in findValidReport, it seems to be the other way around:

  1. One path is popped from the TrimmedGraph.
  2. All visitors are executed in the popped path.
  3. If this path is valid, return it.
  4. If there are no other path to pop, return null.
  5. Otherwise, go to 1.

Then on GRBugReporter::generatePathDiagnostics, a PathDiagnostic is only created if a bug was marked as valid. Am I missing something?

My idea is: what if we only run the Refutation Manager when the bug is marked as valid? I've a patch that implements this idea, but I need to fully test this version before testing the new one.

My idea is: what if we only run the Refutation Manager when the bug is marked as valid?

But how's that different to what we had initially?
The bug is marked "valid". You run the refutation manager. It's marked as "invalid" now.
Then how do you avoid repeating this procedure for all the reports in the equivalence class?

But how's that different to what we had initially?
The bug is marked "valid". You run the refutation manager. It's marked as "invalid" now.
Then how do you avoid repeating this procedure for all the reports in the equivalence class?

Let me try to explain with an example.

On redis, there is a set of 497 equivalent reports: currently, Z3 runs on all of them and mark them all as invalid. No other visitor mark them as invalid. Since Z3 is marking them all as invalid, all 497 bug reports need to be converted to SMT and checked.

What I'm suggesting is: first run all visitors, since no other visitor mark them as invalid, the evaluation of all 497 bugs will be fast and will return only on bug report. We run the refutation manager in this one bug report.

What I'm suggesting is: first run all visitors, since no other visitor mark them as invalid, the evaluation of all 497 bugs will be fast and will return only on bug report. We run the refutation manager in this one bug report.

Maybe I am missing something but I think if this one bug report is on an infeasible path that does not imply that all other paths are infeasible. Nevertheless, it makes sense to run the visitors first that are both fast and can invalidate some paths.

@xazax.hun yeah, thanks.
@mikhail.ramalho basically the problem with your proposal is again you don't know which one is valid.
If you refute the report, it doesn't mean that the next one in the equivalence class will be actually valid.

NoQ added a comment.Jul 23 2018, 4:16 PM

So, if none is valid, then their conjunction is invalid as well, so we can make a single solver pass.

And if one is valid, we can fall back to the old behavior and scan them one-by-one and hopefully find a good one pretty quickly (?)

@mikhail.ramalho basically the problem with your proposal is again you don't know which one is valid.
If you refute the report, it doesn't mean that the next one in the equivalence class will be actually valid.

Sorry guys, I still don't get it. These lines are inside the loop that checks all the equivalent bug reports, one by one:

if (R->isValid())
  return std::make_pair(R, std::move(visitorNotes));

The first valid bug report found in the equivalence class is returned, without checking the others.

@NoQ

  1. Disjunction, not conjunction.
  2. Maybe, could still fall into the trap where the report #399/400 is valid.

But yeah, that case seems somewhat unlikely.

Can we iterate over all reports multiple times? If yes, then we don't really need a hack,
and a proper solution is quite easy: create auxiliary variables, identify # of the report which is

NoQ added a comment.Jul 23 2018, 4:30 PM

Disjunction, not conjunction.

Mmm right.

@mikhail.ramalho What is your proposal?

You wrote:

What I'm suggesting is: first run all visitors, since no other visitor mark them as invalid, the evaluation of all 497 bugs will be fast and will return only on bug report. We run the refutation manager in this one bug report.

Again: suppose Z3 marks this report as invalid. What do you do then?

@mikhail.ramalho What is your proposal?

You wrote:

What I'm suggesting is: first run all visitors, since no other visitor mark them as invalid, the evaluation of all 497 bugs will be fast and will return only on bug report. We run the refutation manager in this one bug report.

Again: suppose Z3 marks this report as invalid. What do you do then?

Let me upload the new patch. I can revert it back if it doesn't work.

In this version, the refutation manager is not added to the list of visitors.

Instead, if a bug is marked as valid, instead of returning it, the refutation visitor is added and executed once more. If it's still valid, we return it.

There is duplicated code in this patch:

BugReporterContext BRC(Reporter, ErrorGraph.BackMap);
visitorNotes = generateVisitorsDiagnostics(R, ErrorGraph.ErrorNode, BRC);

Small update:

  • Perform the refutation check inside the loop: no need to declare BugReport *R and std::unique_ptr<VisitorsDiagnosticsTy> visitorNotes outside the loop.
  • generateVisitorsDiagnostics is called but it does not override the notes written by other visitors. The refutation manager does not add any new note, so keep the old one.

Just tested this patch on tmux, redis, openssl, twin, git, postgresql and sqlite3: the verification time is almost the same with and without refutation!

The refutation manager is actually removing more bugs; all previously being removed are still being removed but I need to check if the new ones are actually false positives.

The refutation manager is actually removing more bugs

That seems suspicious; why would this change affect semantics?

The refutation manager is actually removing more bugs

That seems suspicious; why would this change affect semantics?

I agree it's suspicious but I don't have an answer yet. The number of new removed bugs:

  • redis: 5
  • twin: 1
  • postgresql: 4

Sooo, my reference build was wrong. I reran it (both the reference build and with refutation) and no new bug removed and the slowdown was negligible.

mikhail.ramalho retitled this revision from [analyzer] [WIP] Check all equivalent bug reports in one formula, instead of individually to [analyzer] Check all equivalent bug reports in one formula, instead of individually.Jul 24 2018, 5:24 AM
george.karpenkov requested changes to this revision.Jul 24 2018, 11:56 AM

OK so

  1. The code is now inconsistent with the revision title/description.
  2. What you are doing is just running other visitors first, then only running Z3 if the bug wasn't refuted by other visitors.

This is a valid optimization, but it does *not* solve the problem of having to refute all bugs in the equivalence class one by one.
Maybe it's sufficient for our purposes now, if the problem never occurs on your benchmarks.

This revision now requires changes to proceed.Jul 24 2018, 11:56 AM
mikhail.ramalho retitled this revision from [analyzer] Check all equivalent bug reports in one formula, instead of individually to [analyzer] Avoid running the refutation manager in every bug in an equivalent class.Jul 24 2018, 12:25 PM
mikhail.ramalho edited the summary of this revision. (Show Details)

Your description is still not accurate: we could still run cross-checking on every single report in the equivalence class.
E.g. what if on every report R all other visitors say "valid", and Z3 says "not valid".

mikhail.ramalho retitled this revision from [analyzer] Avoid running the refutation manager in every bug in an equivalent class to [analyzer] Try to minimize the number of equivalent bug reports evaluated by the refutation manager.Jul 24 2018, 1:17 PM
george.karpenkov accepted this revision.Jul 24 2018, 5:12 PM
This revision is now accepted and ready to land.Jul 24 2018, 5:12 PM
This revision was automatically updated to reflect the committed changes.