This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Teach CloneDetection about Qt Meta-Object Compiler
ClosedPublic

Authored by xiangzhai on Mar 24 2017, 2:20 AM.

Details

Summary

Hi LLVM developers,

CloneChecker is very useful:

  • Potential copy-paste error; did you really mean to use 'track' here? screenshot
  • Duplicate code detected screenshot

It is able to detect Copy-paste issue acts as Coverity scan!

But too much False Positives! screenshot for Qt based project, for example, k3b so I simply add a filter for Qt Meta-Object Compiler moc_XXX.cpp please review my patch, give me some advice, thanks a lot!

Regards,
Leslie Zhai

Diff Detail

Repository
rL LLVM

Event Timeline

xiangzhai created this revision.Mar 24 2017, 2:20 AM
xiangzhai retitled this revision from Teach CloneDetection about Qt Meta-Object Compiler to [analyzer] Teach CloneDetection about Qt Meta-Object Compiler.Mar 24 2017, 2:24 AM
NoQ edited edge metadata.Mar 24 2017, 3:16 AM

This checker would definitely benefit from suppression in all auto-generated files, which definitely includes MOC files.

However, there were also plans to tweak minimum clone sizes, which would also most likely suppress a lot of false positives.

Also, should we do it in CloneDetector (better performance) or in CloneChecker (cleaner framework), or maybe as a separate constraint on clone groups (hmm, sounds perfect in all senses)?

xiangzhai added a comment.EditedMar 25 2017, 12:32 AM

Hi NoQ,

Sorry for my late reply! I have to look after my (almost 2 years old) little kid in the weekend :)

Also, should we do it in CloneDetector (better performance) or in CloneChecker (cleaner framework), or maybe as a separate constraint on clone groups (hmm, sounds perfect in all senses)?

I argue that it is better to do it in CloneDetector like my patch did, others' thoughts?

To Qt, there is not only auto-generation Meta Object Compiler, but also User Interface Compiler and D-Bus XML Compiler, please point out my fault if missed any, thanks! so we need a smarter Filter to get rid of moc_XXX, ui_XXX and dbus_XXX auto-generated source files.

But my monkey patch is very rough, just filter the moc_XXX fileName, for balancing performance and low false positive, I will update my patch next Monday, so welcome to discuss :)

Regards,
Leslie Zhai

teemperor requested changes to this revision.Mar 25 2017, 3:45 AM

The constraints should be able to do the same filtering with the same performance in a cleaner way. So once D23418 is merged, we can put this code into a constraint and just prepend it to the list of constraints we use in the checker.

This revision now requires changes to proceed.Mar 25 2017, 3:45 AM

Hi Raphael,

The father of CloneDetection :) thanks for your great job!

I will try your patch, is it rebase for the SVN trunk? almost for the nightly build, I want to detect Copy-paste issues for KDE projects, such as K3B, the real and big testcase for verifying whether or not False Positive! thanks again for your great job!

Regards,
Leslie Zhai

xiangzhai updated this revision to Diff 93447.EditedMar 29 2017, 11:48 PM
xiangzhai edited edge metadata.

Hi Raphael,

I updated my patch base on your job D23418 I put the code into a constraint AutoGeneratedCloneConstraint and just prepend it to the list of constraints used in the checker. and I will tweak it with a lot of real testcases: such as KDE projects. please review my patch, thanks a lot!

Regards,
Leslie Zhai

xiangzhai updated this revision to Diff 93448.Mar 30 2017, 12:32 AM

Typo... it should be !D->isInvalidDecl().

xiangzhai updated this revision to Diff 93453.Mar 30 2017, 2:06 AM

Fix my stupid copy-paste issue...

NoQ added inline comments.Apr 5 2017, 5:37 AM
lib/Analysis/CloneDetection.cpp
382 ↗(On Diff #93453)

Hmm. Raphael: How often is D unavailable, or is invalid? I believe the checker currently only takes decls that come from checkASTCodeBody, in which case it's always safe; however there may be future plans to lift this contract.

389 ↗(On Diff #93453)

Should we break here? There's no need to remove the same index twice.

lib/StaticAnalyzer/Checkers/CloneChecker.cpp
84 ↗(On Diff #93453)

Would it be beneficial to apply this constraint first, to quickly filter out autogenerated code and save some work for other constraints?

98 ↗(On Diff #93453)

Should we add the new constraint here as well?

xiangzhai updated this revision to Diff 94314.Apr 5 2017, 7:49 PM

Hi Artem,

Thanks for your review! I have updated my patch as you suggest, please review it, thanks a lot!

Regards,
Leslie Zhai

teemperor added inline comments.Apr 6 2017, 3:23 AM
lib/Analysis/CloneDetection.cpp
374 ↗(On Diff #94314)

You could just use CloneConstraint::filterGroups() instead of doing your own loop and erasing.

382 ↗(On Diff #93453)

No such plans for invalid declarations. And if we would want to filter them, we should in the CloneDetector::analyzeCodeBody()

lib/StaticAnalyzer/Checkers/CloneChecker.cpp
97 ↗(On Diff #94314)

This is indeed unnecessary as Artem said (we already removed it above from all clone groups).

unittests/Analysis/CloneDetectionTest.cpp
76 ↗(On Diff #94314)

It's not relevant for the test, please remove.

94 ↗(On Diff #94314)

It's not relevant for the test, please remove.

NoQ added inline comments.Apr 6 2017, 3:40 AM
lib/StaticAnalyzer/Checkers/CloneChecker.cpp
97 ↗(On Diff #94314)

Aha, my bad.

xiangzhai updated this revision to Diff 94478.Apr 6 2017, 9:03 PM

Hi Raphael,

Thanks for your review!

I have updated my patch as you suggested, and make check-clang-analysis for tripple check my patch, please review it, thanks a lot!

Regards,
Leslie Zhai

NoQ added a comment.Apr 27 2017, 9:13 AM

Could you add a test, like a regular test in test/Analysis/copypaste/? Like, make a file with _moc in it, and demonstrate that the checker doesn't warn, despite having obvious clones otherwise.

lib/Analysis/CloneDetection.cpp
378 ↗(On Diff #94478)

I guess you can use SourceManager::getFilename().

This revision now requires changes to proceed.Apr 27 2017, 9:13 AM
xiangzhai updated this revision to Diff 97047.Apr 27 2017, 11:13 PM
xiangzhai added a subscriber: cfe-commits.

Hi Artem,

Please review my updated patch, I use SourceManager's getFilename and add some testcases, thanks a lot!

Regards,
Leslie Zhai

NoQ added a comment.Apr 27 2017, 11:36 PM

Thanks! Because LLVM's Illinois license is rather permissive than copyleft, we try to avoid stuff copied from GPL/LGPL software (such as Qt or K3B) in our codebase. The code doesn't seem to have been copy-pasted from a copyleft project, but I guess it's better to rename the file to avoid referencing to K3B, and i added inline comments regarding moc intro comments.

lib/Analysis/CloneDetection.cpp
377 ↗(On Diff #97047)

Maybe .startswith("moc_") would be more accurate?

In fact, you could also try to use a single LLVM regex here.

test/Analysis/copypaste/k3blib_automoc.cpp
5–7 ↗(On Diff #97047)

Other LLVM contributors are free to edit this file, and i doubt it was autogenerated; i believe these comments should be removed.

That said, there should be a comment explaining why the file has no warning. Eg.:

// Because files that have `_automoc' in their names are most likely autogenerated,
// we suppress copy-paste warnings here.

// expected-no-diagnostics

(same in the other file)

16 ↗(On Diff #97047)

My idea regarding the comment above would make this comment unnecessary. (same in the other file)

test/Analysis/copypaste/moc_k3bactivepipe.cpp
5–13 ↗(On Diff #97047)

This file wasn't in fact created by moc, does not seem to be anyhow related to k3b; i believe these comments should be removed.

xiangzhai updated this revision to Diff 97059.Apr 28 2017, 1:19 AM

Hi Artem,

Thanks for your review!

Because LLVM's Illinois license is rather permissive than copyleft, we try to avoid stuff copied from GPL/LGPL software (such as Qt or K3B) in our codebase. The code doesn't seem to have been copy-pasted from a copyleft project, but I guess it's better to rename the file to avoid referencing to K3B.

Sorry for my mistake! I have rename the testcase to moc_autogenerated.cpp and autogenerated_automoc.cpp. I am the maintainer of K3B, if you find any bug when burning ISO please report bug to me :)

Other LLVM contributors are free to edit this file, and i doubt it was autogenerated; i believe these comments should be removed.

Sorry for my fault!

/* This file is autogenerated, do not edit*/

it is autogenerated by MOC, I removed it!

And I use a single line LLVM regex instead of std::string functions, please review my patch, thanks a lot!

Regards,
Leslie Zhai

xiangzhai marked 10 inline comments as done.Apr 28 2017, 1:20 AM
v.g.vassilev edited edge metadata.Apr 28 2017, 2:03 AM

In general, I think we should support such use-cases. It seems cleaner to me if we provide some sort of callbacks where the users can specify their custom false positive filters. I've discussed this in brief with Raphael and we were planning to write this up.

xiangzhai added a comment.EditedApr 28 2017, 2:19 AM

Hi Vassil,

The first one considered about CopyPaste detection Thanks for your reply! Let's do it together and Happy International Labor Day :)

Regards,
Leslie Zhai

zaks.anna edited edge metadata.Jun 14 2017, 3:47 PM

Should this revision be "abandoned" or is the plan to iterate on it?

Dear Anna,

Long time no see, miss you :)

Should this revision be "abandoned" or is the plan to iterate on it?

I wanna the plan to iterate on it, Artem reviewed the patch, and I fixed the issue as he suggested, ran make check-clang-analysis also used it for checking real project K3b's Copy-paste issue, worked!

Hi Vassil,

... Thanks for your reply! Let's do it together and Happy International Labor Day :)

But Vassil and Raphael dis not reply me.

Could you kindly review it, if LGTU, may I commit it? thanks a lot!

Regards,
Leslie Zhai

teemperor requested changes to this revision.Jun 15 2017, 2:53 AM

Sorry for the delay, we got stuck because hard coding a certain file pattern into the clang source code doesn't seem to be a optimal solution (e.g. every user that has a different set of generated files needs to patch clang to hide his specific reports). I would prefer if we could get this into a variable that the user can change dynamically.

Would it solve your use case if allow specifying file patterns via the command line like this -analyzer-config alpha.clone.CloneChecker:IgnoredFiles=moc_*;*.pb.h;*.pb.cc? If yes, please update this PR accordingly and then I think this patch is good to go.

Thanks for the work!

This revision now requires changes to proceed.Jun 15 2017, 2:53 AM

I agree with Raphael. If you need to have more fine grained control over the translation unit (TU) I think we can have suppressing false positives via comments (@NoQ would know more).

Eg.

// File.h defining moc_* yields a lot of false positives

// alpha.clone.CloneChecker:Ignored: some_regex
...

This would affect only the TU which include File.h.

xiangzhai updated this revision to Diff 102777.EditedJun 15 2017, 8:40 PM
xiangzhai edited edge metadata.

Dear Raphael,

Thanks for your suggestion!

Would it solve your use case if allow specifying file patterns via the command line like this -analyzer-config

Fixed!

$ /home/zhaixiang/project/llvm/build/./bin/clang -cc1 -internal-isystem /home/zhaixiang/project/llvm/build/lib64/clang/5.0.0/include -nostdsysteminc -analyze -analyzer-constraints=range -std=c++11 -analyzer-checker=alpha.clone.CloneChecker -analyzer-config alpha.clone.CloneChecker:IgnoredFilesPattern="moc_|ui_|dbus_|.*_automoc" -verify /home/zhaixiang/project/llvm/tools/clang/test/Analysis/copypaste/not-autogenerated.cpp

I ran make check-clang-analysis and tested it for checking the real K3B project's Copy-paste issue to filter Meta Object Compiler, User Interface Compiler, D-Bus XML Compiler, worked!

And it is able to specify the IgnoredFilesPattern to filter auto-generated files for different frameworks, please review my patch, if LGTU, may I commit it? thanks a lot!

Regards,
Leslie Zhai

See my inline comments about technical changes, but otherwise this looks ready to land. Please update and I'll give green light ASAP. Thanks!

lib/Analysis/CloneDetection.cpp
375 ↗(On Diff #102777)

You can skip the const Decl *D = S.getContainingDecl(); and just do const SourceManager &SM = S.getASTContext().getSourceManager();

378 ↗(On Diff #102777)

Let's get the basename with the LLVM path class instead:
http://llvm.org/docs/doxygen/html/namespacellvm_1_1sys_1_1path.html#a799b002e67dcf41330fa8d6fa11823dc

E.g. moc_test\.cpp is a valid filename on Linux and then this code isn't doing the right thing.

383 ↗(On Diff #102777)

Hmm, we can't filter by file extension this way? Can we remove this extension stripping and just make people type the file extension in the filter, this feels more intuitive.

384 ↗(On Diff #102777)

Artem suggested we could move this Regex out of the loop, I think we should even make it a member of the AutoGeneratedCloneConstraint so that we only parse/generate the regex state machine once per invocation. Currently we reparse this Regex a few thousand times and performance is quite important in this Checker.

xiangzhai updated this revision to Diff 102796.Jun 16 2017, 2:59 AM

Dear Raphael,

I updated my patch as you suggested, please review it, thanks a lot!

And I noticed that you are doing Performance optimizations for the CloneChecker D34182 I will rebase my patch if you kindly commit.

Regards,
Leslie Zhai

xiangzhai marked 4 inline comments as done.Jun 16 2017, 3:00 AM
teemperor accepted this revision.Jun 16 2017, 3:18 AM

Please check the last inline comment and then feel free to commit it with the suggested fix. And I wanted to wait for review on the other performance patches, so you can push this now.

Thanks for the work!

lib/Analysis/CloneDetection.cpp
375 ↗(On Diff #102796)

Sorry, I what I wanted to suggest is: make this a member variable of the AutoGeneratedCloneConstraint class. Moving this out of the loop actually doesn't change anything for the checker (because the first constraint get's a list of single-sequence groups, so we still call this function N times for N functions). So something like llvm::Regex AutoGeneratedCloneConstraint::IgnoredFilesRegex :)

This revision is now accepted and ready to land.Jun 16 2017, 3:18 AM

Dear Raphael,

Sorry I am not available, I am looking after my little kid, see you next week, rebase perhaps.

Thanks,
Leslie Zhai

xiangzhai updated this revision to Diff 102840.Jun 16 2017, 9:26 AM

Dear Raphael,

I updated my patch as you suggested, may I commit it? thanks!

Regards,
Leslie Zhai

xiangzhai marked an inline comment as done.Jun 16 2017, 9:26 AM
teemperor accepted this revision.Jun 18 2017, 2:21 PM

LGTM, thanks for the patch!

This revision was automatically updated to reflect the committed changes.
v.g.vassilev added inline comments.Jun 19 2017, 1:03 AM
cfe/trunk/include/clang/Analysis/CloneDetection.h
324

Shouldn't the name be more generic. What this essentially does is to filter out false positives according to a regex...

xiangzhai added inline comments.Jun 19 2017, 1:18 AM
cfe/trunk/include/clang/Analysis/CloneDetection.h
324

At the very beginning, it is by regex match the filename, it is still very rough! but in future it is able to filter by looking for QT_BEGIN_MOC_NAMESPACE macro or qt_meta_ prefix function in the ASTContext? and there might be other auto-generated mechanism for different framework, such as Gtk and sort of open source GUI libraries. so perhaps AutoGeneratedCloneConstraint is better name, I am not good at naming, it is difficult to name my little kid :)

teemperor added inline comments.Jun 19 2017, 3:14 AM
cfe/trunk/include/clang/Analysis/CloneDetection.h
324

Yeah, maybe we should have named this FilenamePatternConstraint or something like that...