This is an archive of the discontinued LLVM Phabricator instance.

[clang][dataflow] Exclude protobuf types from modeling in the environment.
AbandonedPublic

Authored by ymandel on Apr 4 2022, 7:22 AM.

Details

Summary

Google's protobufs are often quite large and their internal state is never worth
modeling in the environment. Exclude them during value construction.

Diff Detail

Event Timeline

ymandel created this revision.Apr 4 2022, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 7:22 AM
ymandel requested review of this revision.Apr 4 2022, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 7:22 AM
xazax.hun added inline comments.Apr 4 2022, 3:02 PM
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
635

Not sure how often is this invoked but we could reduce the number of string comparisons by caching the identifier ptr and do a pointer comparison.

ymandel marked an inline comment as done.Apr 5 2022, 5:57 AM
ymandel added inline comments.
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
635

Good question. It means an extra comparison for each type until the pointer is cached (to check if the cache is set) and then, afterwards, 2 comparisons vs ~10 for the common case where the class name is doesn't match. In the matching case, though, it is clearly saving much more.

For proto-heavy code, it seems a win, and a loss otherwise. But, the question is where to put the cache. It seems to me best to move this to be a method on DataflowAnalysisContext (since it is a global, not local env, property) and make the cached pointer a private member of DAC.

Thoughts?

xazax.hun added inline comments.Apr 5 2022, 9:01 AM
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
635

I think it might be nice to have a context that is scoped to the translation unit rather than a function. We might have other stuff that we want to cache here.

An example how the static analyzer is dealing with this: https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp#L960

In case we do not want to be lazy and willing to populate the cache eagerly at the beginning of the analysis, we would not pay for the extra checks to see if the cache is populated.

xazax.hun added inline comments.Apr 5 2022, 9:03 AM
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
635

I'm also fine with not doing caching for now and having a note to consider this in the future if it becomes measurable in the profiles. It is probably fine for now when we only have a couple of short strings to compare. But I guess the number of special cases we handle this way will only increase in the future.

ymandel updated this revision to Diff 426388.May 2 2022, 6:02 AM
ymandel marked an inline comment as done.

added FIXME

ymandel marked 2 inline comments as done.May 6 2022, 6:32 AM
ymandel added inline comments.
clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp
635

I'm also fine with not doing caching for now and having a note to consider this in the future if it becomes measurable in the profiles. It is probably fine for now when we only have a couple of short strings to compare. But I guess the number of special cases we handle this way will only increase in the future.

(sorry for the very long delay in responding). I think holding off on the cache may be best for now, pending performance analysis. I'm hesitant to put too much effort in to optimizing this, because ultimately we want to move to a lazy-initialization model which would obviate the need for this kind of optimization, because we would only model the fields that are used, making the size of the underlying struct irrelevant.

I've added a FIXME to this effect.

ymandel marked an inline comment as done.Jun 13 2022, 6:15 AM

Ping... Gabor, Stanislav: what are your thoughts on this patch? I think we're in a place where I could do performance comparisons before/after if you think that's justified.

Alternatively, I have a relatively simple proposal for lazy initialization that I expect I will implement in the next couple of months. If you're sufficiently concerned with this patch, waiting for that patch is an option as well.

xazax.hun accepted this revision.Jun 13 2022, 8:35 AM

Since there are many moving pieces and the whole framework is experimental without many users, I'm fine with landing this as is as long we are tracking the improvement opportunities (tickets or fixemes).

This revision is now accepted and ready to land.Jun 13 2022, 8:35 AM
sgatev accepted this revision.Jun 13 2022, 9:37 AM
ymandel updated this revision to Diff 484830.Dec 22 2022, 7:10 AM

Reviving this patch, addressing some of the earlier concerns. However, I may have a better solution which makes this patch irrelevant. So, still WIP.

ymandel abandoned this revision.Jan 6 2023, 6:52 AM

Abandoning in favor of https://reviews.llvm.org/D140694.