This is an archive of the discontinued LLVM Phabricator instance.

[CloneDetector] No longer reporting clones that don't have a common referenced variable pattern.
ClosedPublic

Authored by teemperor on Jul 29 2016, 5:12 PM.

Details

Summary

One of the current false-positives the CloneDetector produces is that the statements a < b ? b and b < a ? b are considered clones of each other, even though they represent two different kinds of algorithms. The reason for this is that the StmtDataCollector ignores variable names when collecting data, so that two clones that only differ in the names of the referenced variables are considered clones. The pattern in which they are referenced however is not taken into aspect so far and causes the mentioned false-positive.

This patch introduces a filter that removes clones which don't have the same variable pattern which prevents these kinds of false-positives.

It should be noted that this is intentionally done after the clones are grouped together by their collected data vector because another planned feature will find clones that explicitly violate the variable patterns of each other.

Diff Detail

Repository
rL LLVM

Event Timeline

teemperor updated this revision to Diff 66138.Jul 29 2016, 5:12 PM
teemperor retitled this revision from to [CloneDetector] No longer reporting clones that don't have a common referenced variable pattern..
teemperor updated this object.
teemperor added reviewers: v.g.vassilev, NoQ, zaks.anna.
teemperor added a subscriber: cfe-commits.
vsk added a subscriber: vsk.Jul 29 2016, 5:55 PM

This is neat!

I may be missing something, but could you explain why 'false-positives.cpp' was deleted? I'd expect the test to be incorporated into 'functions.cpp' with the 'expected-*' directives removed.

lib/Analysis/CloneDetection.cpp
92 ↗(On Diff #66138)

Extra 'std::' here.

test/Analysis/copypaste/var-patterns.cpp
7 ↗(On Diff #66138)

This looks duplicated from 'functions.cpp'. Perhaps you could modify that test case directly?

teemperor updated this revision to Diff 66262.Jul 31 2016, 6:33 PM
  • Fixed a few typos in the comments.
  • Removed unnecessary std::.
  • Added the new tests to functions.cpp instead of their own file.
teemperor marked 2 inline comments as done.Jul 31 2016, 6:34 PM

I've deleted false-positives.cpp because all false-positives in there are now resolved and the test did no longer serve a purpose. We could leave the test file in there, but I think new false-positives either belong to an existing test category or deserve their own named test file (e.g. macros.cpp would contain tests for macros and related false-positives with a FIXME remark). Opinions?

NoQ edited edge metadata.Aug 1 2016, 2:45 AM

Uhm. I've a feeling that now i realize how it should have been written from the start. That's a pretty bad feeling to have.

I wish we had a series of passes. Before the first pass, all statements are considered equivalent. After the first pass, they're split into smaller clone groups. The second pass would add further distinction between items in each clone group, producing more clone groups. Etc. Every subsequent pass would compute more performance-intensive properties of statement sequences (the first pass being very simple, eg. only statement kinds). Once all passes end, only true positives remain. Probably wrap up each pass into a class that conforms to some common interface, and we're done with a nicely structured and easy-to-extend copy-paste error detection framework. We could probably even allow the user to turn separate passes on and off through some kind of dynamic registry.

Never mind though, will have a look at what we have :)

NoQ added a comment.Aug 1 2016, 3:46 AM

Looks good, minor comments :)

lib/Analysis/CloneDetection.cpp
99 ↗(On Diff #66262)

I've a feeling this is essentially a map<const VarDecl *, size_t>, because that harmonizes with our access patterns (lookup number by variable, insert the new value Variables.size() if not found). So i think a map would express the intention more clearly.

168 ↗(On Diff #66262)

Because your code uses a lot of fancy C++11, i'd probably suggest to make it even prettier with std::all_of().

NoQ accepted this revision.Aug 3 2016, 8:05 AM
NoQ edited edge metadata.
NoQ added inline comments.
lib/Analysis/CloneDetection.cpp
99 ↗(On Diff #66262)

Discussed that we're going to have lookups in both directions in the next patch.

168 ↗(On Diff #66262)

Uhm, never mind, iterating two arrays here.

This revision is now accepted and ready to land.Aug 3 2016, 8:05 AM
v.g.vassilev accepted this revision.Aug 3 2016, 8:11 AM
v.g.vassilev edited edge metadata.

LGTM. My comments were clarified in a Skype chat.

NoQ requested changes to this revision.Aug 4 2016, 12:09 PM
NoQ edited edge metadata.

Whoops, because of a merge conflict while applying the patch, i noticed a few problems in the tests, they seem obvious, but could you have a quick look?

test/Analysis/copypaste/functions.cpp
21 ↗(On Diff #66262)

Not sure, do "different clones" mean "different code, should not be reported" or "different clones of the same code, should be reported"?

30 ↗(On Diff #66262)

This code is exactly identical, did you mean 'x > y' in any of those places?

This revision now requires changes to proceed.Aug 4 2016, 12:09 PM
teemperor updated this revision to Diff 66839.Aug 4 2016, 12:32 PM
teemperor edited edge metadata.
teemperor marked 3 inline comments as done.
  • Patch should no longer cause merge conflicts.
  • Improved comments and tests in functions.cpp.
teemperor marked 3 inline comments as done.Aug 4 2016, 12:34 PM

I think the updated patch should no longer have the problems mentioned in the inline comments :)

This revision was automatically updated to reflect the committed changes.
cfe/trunk/lib/Analysis/CloneDetection.cpp