This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Cache the result of getLocationType in TypedValueRegion
Needs ReviewPublic

Authored by ASDenysPetrov on Aug 11 2022, 11:41 AM.

Details

Summary

Fix FIXME. Produce QualType once in getLocationType at the first call. Then cache the result and return it for the next calls.

Diff Detail

Event Timeline

ASDenysPetrov created this revision.Aug 11 2022, 11:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 11:41 AM
ASDenysPetrov requested review of this revision.Aug 11 2022, 11:41 AM
ASDenysPetrov retitled this revision from [analyzer]{NFC} Cache the result of getLocationType in TypedValueRegion to [analyzer][NFC] Cache the result of getLocationType in TypedValueRegion.Aug 11 2022, 11:40 PM

Please measure the performance implications/gains of this patch to justify this change.
Preferably test it on multiple kinds of projects ranging from old-school C and even some modern C++ projects.
Let me know your findings and if you need some help setting it up.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
549–550

I would appreciate it if you assert that the resulting type cannot be null, which would imply that if CachedLocationType.isNull() means that we haven't cached it yet.

ASDenysPetrov added a comment.EditedAug 15 2022, 8:39 AM

@steakhal
Thank you for the review.
Could you please do testing for me, since I'm on Windows and have no prepared testing environment as I know you do.
I'll add an assertion.

Add assertion accroding to the remark.

This comment was removed by ASDenysPetrov.

@steakhal
Thank you for the review.
Could you please do testing for me, since I'm on Windows and have no prepared testing environment as I know you do.

ATM it seems like I don't have the bandwidth. What I meant is choosing the right projects to test etc.
Maybe @balazske could give some helping hands. AFAIK it shouldn't be hard for him to enqueue a benchmark.
What do you say @balazske, could you please have a differential measurement in release configuration on the test set demonstrating the performance gain of this patch compared to the baseline?

NoQ added a comment.Aug 15 2022, 4:48 PM

Hmm at a glance I disagree with the FIXME, this doesn't look like a valuable optimization. This operation is typically constant-time anyway so it's probably not worth the memory impact and the added complexity.

The easiest way to roughly estimate impact is to take sqlite3.c (which is a famous large-ish project that gets compiled as a single huge file to ensure that optimizations have access to the entire program) and measure time and memory with UNIX time. Not very representative but very easy to execute. If it shows promising results I'm open to further investigation in this area.

I also recommend consulting a performance profiler before implementing optimizations, to avoid premature optimizations. IIUC the picture hasn't changed much in the recent years:

  • From low-level perspective the majority of time is spent accessing and manipulating ImmutableMaps and other immutable data structures in the program state;
  • From high-level perspective almost half the time is spent in mark-and-sweep garbage collection in ExprEngine::removeDead(). Which can't really be cut as running removeDead() less often only increases the analysis time as it makes the maps larger and therefore slower to access. And it's also hard to optimize because the procedure is very complicated and fragile and very few people actually understand how it works.

So I think the most valuable optimizations are low-level optimizations to ImmutableMap. There were a few suggestions on the mailing list to use something more modern than the AVL trees under the hood but I don't think authors found much success with those.

xazax.hun added a comment.EditedAug 15 2022, 4:55 PM

So I think the most valuable optimizations are low-level optimizations to ImmutableMap. There were a few suggestions on the mailing list to use something more modern than the AVL trees under the hood but I don't think authors found much success with those.

I am not sure how optimized our ImmutableMap is, I would not be surprised if there were some small gains here and there. I don't remember the entire discussion, but I think some of the mentioned alternative data structures are:

  • Patricia Trees, see C. Okasaki, A. Gill. Fast Mergeable Integer Maps. In Workshop on ML (1998)
  • Finger Trees, see https://en.wikipedia.org/wiki/Finger_tree
  • Abandon the concept of trees and implement an immutable HashMap

Just wanted to dump all of this, in case there are some takers :)

So I think the most valuable optimizations are low-level optimizations to ImmutableMap. There were a few suggestions on the mailing list to use something more modern than the AVL trees under the hood but I don't think authors found much success with those.

Back then I was experimenting by replacing some of our ImmutableMap instances with an alternative implementation. I've started with Environment::ExprBindings because profiling showed that as the most frequently used. I've chosen immer::map as an alternative immutable map implementation. It seemed to be very promising because of

Results: I could make the LIT tests pass, except two checks. The performance was not promising though. The run-time was similar that we had with the AVL trees, however, the memory consumption was 5-10x higher! Consequently, I've abandoned the prototype. But, I think an efficient batch manipulation implementation with the existing AVL based ImmutableMap might be worth to experiment with further.

  • From high-level perspective almost half the time is spent in mark-and-sweep garbage collection in ExprEngine::removeDead(). Which can't really be cut as running removeDead() less often only increases the analysis time as it makes the maps larger and therefore slower to access. And it's also hard to optimize because the procedure is very complicated and fragile and very few people actually understand how it works.

Yes, this is also aligned with my measurements. There are analyzer options to manipulate when the removeDead is called, before every Stmt, or before every basic block, or never. Actually, the last option is meaningless, because renders both the engine and many checkers faulty. The before every basic block option might still be worth to experiment with, but I could not find reasonable arguments to change to that yet (I cannot recall, but some lit tests were failing miserably with that option too).