Page MenuHomePhabricator

[analyzer][CrossTU] Lower CTUImportThreshold default value
ClosedPublic

Authored by gamesh411 on Jun 25 2020, 8:08 AM.

Details

Summary

The default value of 100 makes the analysis slow. Projects of considerable
size can take more time to finish than it is practical. The new default
setting of 8 is based on the analysis of LLVM itself. With the old default
value of 100 the analysis time was over a magnitude slower. Thresholding the
load of ASTUnits is to be extended in the future with a more fine-tuneable
solution that accomodates to the specifics of the project analyzed.

Diff Detail

Event Timeline

gamesh411 created this revision.Jun 25 2020, 8:08 AM
gamesh411 updated this revision to Diff 273372.Jun 25 2020, 8:11 AM

update test value

Szelethus accepted this revision.Jun 25 2020, 1:55 PM

Based on discussions I heard in between @dkrupp, @martong and You, this seems appropriate.

This revision is now accepted and ready to land.Jun 25 2020, 1:55 PM

The analyzer inlines small functions within a TU regardless of the thresholds. I think it would be sensible to do the same across TUs in the case we don't do this already.

The analyzer inlines small functions within a TU regardless of the thresholds. I think it would be sensible to do the same across TUs in the case we don't do this already.

That means perform a get CTU definition if the TU to be imported (where the function comes from) is small? Otherwise it does not matter how small the function is, it can result in importing of large amount of code. Determining parameters (like "smallness") of the TU is probably not simple.

The analyzer inlines small functions within a TU regardless of the thresholds. I think it would be sensible to do the same across TUs in the case we don't do this already.

That means perform a get CTU definition if the TU to be imported (where the function comes from) is small? Otherwise it does not matter how small the function is, it can result in importing of large amount of code. Determining parameters (like "smallness") of the TU is probably not simple.

Measuring the smallness of a function is currently not trivial if the function is in another TU. But theoretically, the creation process of the CTU index has access to the number of declarations in a function, which could be used as a metric for complexity.

Another concern is memory usage. Currently I am working on a solution to approximate the memory usage of declarations by measuring the total memory usage, and the number of declarations inside a TU. Then I can come up with average values for memory used per declaration for projects I test. I could measure multiple projects per language and come up with some statistics that way, but I'm sure that I am not alone with the feeling that there should be a better metric for memory usage other than this roundabout method. All would be well if there were a way to get sizeof(astunit), but I am not aware of any.

martong accepted this revision.Jun 26 2020, 1:51 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
324

typo: burder -> burden

That means perform a get CTU definition if the TU to be imported (where the function comes from) is small? Otherwise it does not matter how small the function is, it can result in importing of large amount of code. Determining parameters (like "smallness") of the TU is probably not simple.

Measuring the smallness of a function is currently not trivial if the function is in another TU. But theoretically, the creation process of the CTU index has access to the number of declarations in a function, which could be used as a metric for complexity.

The number of declarations is not a good proxy for complexity.
We talk about 3 calculations here:

  1. Loading the TU from disk (which might involve parsing in case the analysis is on demand)
  2. Merging the AST sub"trees"
  3. Actual inline analysis

1-2 Is cached and the complexity of 2 is largely determined by the number of dependencies in a declaration. I suspect that the number of declarations in a function is not a good estimate of the number of unmerged dependencies (classes and so on).

For 3, we have the CFG node size as a complexity estimate.

The question is how the processing time is split between all these phases.

Another concern is memory usage. Currently I am working on a solution to approximate the memory usage of declarations by measuring the total memory usage, and the number of declarations inside a TU. Then I can come up with average values for memory used per declaration for projects I test. I could measure multiple projects per language and come up with some statistics that way, but I'm sure that I am not alone with the feeling that there should be a better metric for memory usage other than this roundabout method. All would be well if there were a way to get sizeof(astunit), but I am not aware of any.

Yeah, loading a big TU into the memory just for a single small function might not be the best to do. However, when we dramatically reducing the CTU threshold, the analysis will be more unstable in the sense that adding a new function call will change the order in which we analyze functions. That will change the order of CTU imports. That will change the set of TUs that will be imported. It might not be a problem as the analyzer already can exhibit this behavior: a small code change can make unrelated diagnostics appear or disappear. But a small CTU threshold will likely make this phenomenon worse.

Just to be clear I am on board with this patch, I was merely brainstorming for future improvements :)

This revision was automatically updated to reflect the committed changes.