If SymbolMetadata is referenced but its region is dead it is removed from GDM entries making referenced information inconsistent. This patch resolves this issue.
This also resolves some FIXMEs in CStringChecker tests.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
The intent here was that a metadata symbol represents metadata about a region. That means if the region is dead, and the symbol isn't directly referenceable, we won't be able to recreate it. The advantage of && was that checkers didn't have to track when regions were no longer live.
Of course, the example shows that a symbol doesn't have to be directly referenceable to be used, so it didn't exactly pan out. But making this change without a corresponding change to every checker using metadata symbols (in-tree, just CStringChecker) means that those checkers will now keep the symbols alive forever. (And if we wanted to do that, we wouldn't have to use markInUse; just markLive.)
I think this needs a little more thought.
Thank you for you reply!
This version contains more radical solution. Also, I took an another look at the CStringChecker and its way of handling checkDeadSymbols.
lib/StaticAnalyzer/Core/SymbolManager.cpp | ||
---|---|---|
457 | Maybe we should just return false here? |
So we just throw out the whole notion of "metadata in use"? That can work. There are two things I'd be concerned about changing here:
- What happens when you take a string's length, test it against something, throw it away, then remeasure the string? In this case we'd be okay because CStringChecker still hangs on to the symbol in its map until the region is invalidated.
- Will we keep more symbols alive than we did previously, increasing the analyzer's memory use? I think the answer is "yes", but not by very much: the live/dead checking is two-pass, so we'll mark a metadata symbol live in checkLiveSymbols but then find out we could have dropped it in checkDeadSymbols when the region is found to be dead. It'll get cleaned up the next time around, but I think that sort of conditional liveness is what the old mechanism was trying to accomplish.
Do you have any ideas for how we could make that better? I can think of a complicated case where you use markInUse if the metadata symbol exactly matches the key, but markLive otherwise…but at some point I'm not sure it's worth the complexity cost. (In particular, after this change, I'm not sure metadata symbols behave any differently from regular conjured symbols. This is a good thing.)
(We've also talked about having better REGISTER_MAP_WITH_PROGRAMSTATE that auto-remove any keys for dead symbols or dead regions. I'm sure Anna and Devin would be happy to see someone pick that up.)
lib/StaticAnalyzer/Checkers/CStringChecker.cpp | ||
---|---|---|
2137–2141 | Huh, this is probably an improvement anyway! |
- What happens when you take a string's length, test it against something, throw it away, then remeasure the string? In this case we'd be okay because CStringChecker still hangs on to the symbol in its map until the region is invalidated.
We cannot throw out anything related to live region we reference in the checker. While region lives, checker will keep its length alive, so this case seem to be OK.
- Will we keep more symbols alive than we did previously, increasing the analyzer's memory use? I think the answer is "yes", but not by very much: the live/dead checking is two-pass, so we'll mark a metadata symbol live in checkLiveSymbols but then find out we could have dropped it in checkDeadSymbols when the region is found to be dead. It'll get cleaned up the next time around, but I think that sort of conditional liveness is what the old mechanism was trying to accomplish.
Yes, you are right. The lengths of dead regions will be garbage-collected only in the next collection so they will consume some memory.
Do you have any ideas for how we could make that better? I can think of a complicated case where you use markInUse if the metadata symbol exactly matches the key, but markLive otherwise…but at some point I'm not sure it's worth the complexity cost.
As I understand, to make a precise GC, we need to do it in a number of iterations but now we make only one iteration. This does not influence correctness because we are conservative and we don't remove information that is alive (but may keep some dead). However, iterative GC may dramatically decrease analyzer perfomance.
(In particular, after this change, I'm not sure metadata symbols behave any differently from regular conjured symbols. This is a good thing.)
Yes. But we extensively use SymbolMetadata's information in our summary implementation so don't hurry to throw it out :)
(We've also talked about having better REGISTER_MAP_WITH_PROGRAMSTATE that auto-remove any keys for dead symbols or dead regions. I'm sure Anna and Devin would be happy to see someone pick that up.)
I don't think it is a good idea because some checkers handle dead symbols in customizable ways to detect leaks. Some helper for remove all items with dead keys from state working with common collections (in CheckerHelpers, for example) may be useful to to avoid code duplication, but it is an item for another patch.
BTW, what about my comment for line 457? Should we consider a metadata symbol always dead after this patch?
Sorry, I misunderstood your first question. This case is OK because CStringChecker handles invalidation and shouldn't track invalidated regions until their lengths become known. Test strlen_global() (string.c:106) should confirm this.
Maybe it would be great to do a profiling before and after this patch?
It looks like this is a hotspot of the analyzer: http://clang-developers.42468.n3.nabble.com/Analyzer-pretty-profiling-pictures-td4052016.html
Huh, this is probably an improvement anyway!