Fix FIXME. Produce QualType once in getLocationType at the first call. Then cache the result and return it for the next calls.
Diff Detail
Event Timeline
Please measure the performance implications/gains of this patch to justify this change.
Preferably test it on multiple kinds of projects ranging from old-school C and even some modern C++ projects.
Let me know your findings and if you need some help setting it up.
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h | ||
---|---|---|
549–550 | I would appreciate it if you assert that the resulting type cannot be null, which would imply that if CachedLocationType.isNull() means that we haven't cached it yet. |
@steakhal
Thank you for the review.
Could you please do testing for me, since I'm on Windows and have no prepared testing environment as I know you do.
I'll add an assertion.
ATM it seems like I don't have the bandwidth. What I meant is choosing the right projects to test etc.
Maybe @balazske could give some helping hands. AFAIK it shouldn't be hard for him to enqueue a benchmark.
What do you say @balazske, could you please have a differential measurement in release configuration on the test set demonstrating the performance gain of this patch compared to the baseline?
Hmm at a glance I disagree with the FIXME, this doesn't look like a valuable optimization. This operation is typically constant-time anyway so it's probably not worth the memory impact and the added complexity.
The easiest way to roughly estimate impact is to take sqlite3.c (which is a famous large-ish project that gets compiled as a single huge file to ensure that optimizations have access to the entire program) and measure time and memory with UNIX time. Not very representative but very easy to execute. If it shows promising results I'm open to further investigation in this area.
I also recommend consulting a performance profiler before implementing optimizations, to avoid premature optimizations. IIUC the picture hasn't changed much in the recent years:
- From low-level perspective the majority of time is spent accessing and manipulating ImmutableMaps and other immutable data structures in the program state;
- From high-level perspective almost half the time is spent in mark-and-sweep garbage collection in ExprEngine::removeDead(). Which can't really be cut as running removeDead() less often only increases the analysis time as it makes the maps larger and therefore slower to access. And it's also hard to optimize because the procedure is very complicated and fragile and very few people actually understand how it works.
So I think the most valuable optimizations are low-level optimizations to ImmutableMap. There were a few suggestions on the mailing list to use something more modern than the AVL trees under the hood but I don't think authors found much success with those.
I am not sure how optimized our ImmutableMap is, I would not be surprised if there were some small gains here and there. I don't remember the entire discussion, but I think some of the mentioned alternative data structures are:
- Patricia Trees, see C. Okasaki, A. Gill. Fast Mergeable Integer Maps. In Workshop on ML (1998)
- Finger Trees, see https://en.wikipedia.org/wiki/Finger_tree
- Abandon the concept of trees and implement an immutable HashMap
Just wanted to dump all of this, in case there are some takers :)
So I think the most valuable optimizations are low-level optimizations to ImmutableMap. There were a few suggestions on the mailing list to use something more modern than the AVL trees under the hood but I don't think authors found much success with those.
Back then I was experimenting by replacing some of our ImmutableMap instances with an alternative implementation. I've started with Environment::ExprBindings because profiling showed that as the most frequently used. I've chosen immer::map as an alternative immutable map implementation. It seemed to be very promising because of
Results: I could make the LIT tests pass, except two checks. The performance was not promising though. The run-time was similar that we had with the AVL trees, however, the memory consumption was 5-10x higher! Consequently, I've abandoned the prototype. But, I think an efficient batch manipulation implementation with the existing AVL based ImmutableMap might be worth to experiment with further.
- From high-level perspective almost half the time is spent in mark-and-sweep garbage collection in ExprEngine::removeDead(). Which can't really be cut as running removeDead() less often only increases the analysis time as it makes the maps larger and therefore slower to access. And it's also hard to optimize because the procedure is very complicated and fragile and very few people actually understand how it works.
Yes, this is also aligned with my measurements. There are analyzer options to manipulate when the removeDead is called, before every Stmt, or before every basic block, or never. Actually, the last option is meaningless, because renders both the engine and many checkers faulty. The before every basic block option might still be worth to experiment with, but I could not find reasonable arguments to change to that yet (I cannot recall, but some lit tests were failing miserably with that option too).
I would appreciate it if you assert that the resulting type cannot be null, which would imply that if CachedLocationType.isNull() means that we haven't cached it yet.