This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Performance optimizations for the CloneChecker
ClosedPublic

Authored by teemperor on Jun 13 2017, 3:50 PM.

Details

Summary

This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3.

The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%.

The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%.

I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this.

Diff Detail

Repository
rL LLVM

Event Timeline

teemperor created this revision.Jun 13 2017, 3:50 PM
v.g.vassilev added inline comments.Jun 14 2017, 1:22 AM
include/clang/Analysis/CloneDetection.h
263 ↗(On Diff #102442)

Could we typedef std::vector<CloneDetector::CloneGroup> into CloneDetector::CloneGroups?

teemperor added inline comments.Jun 14 2017, 1:36 AM
include/clang/Analysis/CloneDetection.h
263 ↗(On Diff #102442)

Yes, I'll do this in another patch for the whole CloneDetector code base.

teemperor marked an inline comment as done.
  • made saveHash static.
NoQ accepted this revision.Jul 2 2017, 12:43 AM

Totally makes sense :)

include/clang/Analysis/CloneDetection.h
258–260 ↗(On Diff #102700)

Since this constraint is a trivial composition of the other two now, is it useful at all as a separate class? Maybe just remove it?

266–273 ↗(On Diff #102700)

As a personal preference, i'd probably like to see a more straightforward answer to "what exactly does this do?", rather than "what is this part of?" and "how fast this is?".

Eg., "RecursiveCloneTypeIIHashConstraint computes a hash of each statement sequence; sequences with different hash values are moved into separate clone groups. Collisions are possible, and this constraint does nothing to address this them. Add the slower RecursiveCloneTypeIIVerifyConstraint later in the constraint chain, not necessarily immediately, to eliminate hash collisions through a more detailed analysis."

This revision is now accepted and ready to land.Jul 2 2017, 12:43 AM
  • Updated according to Artem's comments.

(Fixed diff)

This revision was automatically updated to reflect the committed changes.