This is an archive of the discontinued LLVM Phabricator instance.

[llvm-diff] Fix false-positive diffs on forward-referencing phi nodes
ClosedPublic

Authored by jsilvanus on Nov 3 2022, 1:45 AM.

Details

Summary

When a phi node references a variable defined in a basic block dominated
by the the basic block containing the phi node, llvm-diff currently cannot determine
whether the variable is equivalent, and thus treats the phi node as different
and reports a difference. This leads to false positive differences as demonstrated
by the loop.ll diff, for which llvm-diff reports a diff when comparing the file
with itself.

Fix that issue by adding the concept of *equivalence assumptions*.
When encountering a pair of values which can neither be proven to be equivalent
nor to be non-equivalent, instead optimistically assume equivalence, and store
somewhere that the equivalence of the currently compared basic blocks depends
on this assumption.

Later, once all BBs have been processed, check all made assumptions and report
blocks as different whose equivalence was depending on an incorrect assumption,
or an assumption we could not prove to be correct.

In order to preserve the original diff report order, also schedule diffs
of blocks already known to be different using the same mechanism, so all block
diffs are now generated at the very end of function diffing.

In case an incorrect assumption was made, all further shown equivalences between
old and new values implictly depend on the incorrect assumption. Some of these
may in fact be not equivalent, but these are neither reverted nor reported,
because they are considered indirect diffs caused by an earlier direct diff.

See inline comments for an argument why we do not run into issues caused by circular
proof dependencies.

Diff Detail

Event Timeline

jsilvanus created this revision.Nov 3 2022, 1:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 1:45 AM
jsilvanus requested review of this revision.Nov 3 2022, 1:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2022, 1:45 AM

Can't say I have much context here - but out of curiosity, what's your interest in this tool, @jsilvanus ? & @void is this library/tool still of interest to you? (@nickdesaulniers is this of interest to you? Looks like @void's work recently (last year, at least, which is recent relative to this code mostly not having been touched since 2010 or so... ) was related to maybe adding some live patching tool, which I guess might be kernel related in some way?)

Given how old the code is here, and it's relatively untouched, I'd like to make sure we've got invested folks working together on it.

Hi David, thanks for looking into this change!

I'm aware that there is very little active development for llvm-diff, and I chose you and @void based on the patch introducing phi node diffing.

While experimenting with llvm-diff (for context see below), I noticed this issue.
Since it seems to affect many inputs (essentially anything with a loop counter), and the fix seemed to be not too complicated, I decided to work on this.

My main interest, although I am not yet sure whether llvm-diff is the right tool for the job, is related to update_test_checks.py. I'd like to reduce diff sizes of lit tests caused by changed automatic variable names (%15 -> %16) of updated lit tests, with the goal of simplifying reviews.
Say you have an automatically generated, large lit test. A small change in the generated IR, say one added line, changes all following identifier names, and leads to a huge diff of text IR.
When updating the lit test with update_test_checks.py, these changes are propagated to the CHECK lines, since the generated FileCheck capture names (e.g. TMP15) are also incrementally numbered.
I'd like to have a mode where update_test_checks.py tries to preserve existing capture names to minimize the diff size of changed tests. The work flow would be like this:

  1. Implement LLVM change
  2. Update lit tests, preserving capture names
  3. Normalize capture names in lit tests using an ordinary update_test_checks.py call

The result of 1 + 2 would be one small commit that is the main interest of review. Step 3 would be in a different, possibly large commit that only contains the automatically generated capture renaming and would not needed to be reviewed in the same detail. Note that 3. is not absolutely necessary, but it seems to be nicer to have tests with sequentially named identifiers without any remainders of past versions of the test.

A possible implementation of the above would be to run both the old and new opt binary on the test, use llvm-diff to establish a mapping of variables, and use this correspondence to establish a mapping of FileCheck capture names.

dblaikie accepted this revision.Nov 3 2022, 2:10 PM

Can't say I've got quite the time to page in the algorithm, but it sounds like a reasonable idea & given this tool's probably relatively unused - I think maybe this is the right level of review & you should feel welcome to give it a go, see how it plays out in general/for your use case.

Thanks for the context & patch.

llvm/tools/llvm-diff/lib/DifferenceEngine.cpp
184

SmallVector?

287–289

Maybe?

291
349

Could you change this and others to use something more like this/aggregate initialization (otherwise it looks like it could be a ctor call, which makes reading a bit more difficult/need more care, perhaps)

545–546

would be nice to avoid these unrelated changes, if possible (could pre-commit them if you think they're valuable)

This revision is now accepted and ready to land.Nov 3 2022, 2:10 PM
jsilvanus updated this revision to Diff 473150.Nov 4 2022, 1:11 AM

Address review feedback.

jsilvanus updated this revision to Diff 473151.Nov 4 2022, 1:13 AM

Remove reformatting of existing code.

jsilvanus marked 5 inline comments as done.Nov 4 2022, 1:17 AM

Thanks for your review, I updated the patch to address it.

I'll wait a few days with committing so others have a chance to comment.

llvm/tools/llvm-diff/lib/DifferenceEngine.cpp
184

Thanks, fixed.

287–289

Thanks, that's better indeed.

291

Fixed.

349

Thanks, fixed.

545–546

I removed those.

This revision was automatically updated to reflect the committed changes.
jsilvanus marked 5 inline comments as done.