This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][solver] Fix equivalence class invariant violation in removeDeadBindings
AbandonedPublic

Authored by martong on Jul 16 2021, 2:44 AM.

Details

Summary

There is an invariant in the range based solver for equivalence classes.
We don't store class->member associations for trivial classes in the
state (ClassMembers). This means any SymbolSet stored in ClassMembers
must have at least two members. This invariant is violated in
removeDeadBindings, because we remove a class from ClassMembers only
once it became empty.

Fixing this invariant violation implies that we must prepare for element
removals from an equivalence class. An element removal might result in
downgrading a non-trivial class to trivial, also the representative
symbol might be changed. If the representative symbol has changed then
we have to re-key constraints and the disequality info.

Diff Detail

Event Timeline

martong created this revision.Jul 16 2021, 2:44 AM
martong requested review of this revision.Jul 16 2021, 2:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2021, 2:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Note1: This issue is not related to and does not fix https://bugs.llvm.org/show_bug.cgi?id=51109
Note2: I am about to measure any performance penalty induced by the extra bookkeeping we must do.

vsavchenko added a comment.EditedJul 16 2021, 3:10 AM

Thanks for working on it, but it is a quite large change that I don't get the motivation for (it doesn't even fix the recently found bug).

Summary explains what the patch does, but for why it is done, it talks about an invariant that is not specified anywhere in the code. The invariant you are talking about in the summary is simply not true and was never intended to be true.

When I was implementing it, trivial class is a symbol that was never ever equal to anything else. The moment we make merge something into it, it stops being trivial, forever! Representative is an implementation detail and shouldn't be required to represent the actual information. What I'm saying is that representative can be long gone, but the class can still hold. If we use representative symbol for something other than its pointer or type after it's gone, the mistake is using representative in the first place, not the motivation to make it more complex and change representatives.

vsavchenko requested changes to this revision.Jul 16 2021, 3:14 AM
This revision now requires changes to proceed.Jul 16 2021, 3:14 AM

Thanks for taking your time to take a look. And I accept your statements. Nevertheless, let me explain my motivation.

I thought that a class is trivial if it has only one member. And it seemed perfectly logical to not store equivalence classes with one member. I.e a equals to a does not hold any meaningful information it just takes precious memory. When we add a new constraint we take careful steps to avoid adding a new class with one member. However, when remove dead kicks in, suddenly we end up having classes stored with one member, which is a somewhat inconsistent design IMHO. Perhaps some better documentation could have helped.

Thanks for taking your time to take a look. And I accept your statements. Nevertheless, let me explain my motivation.

I thought that a class is trivial if it has only one member. And it seemed perfectly logical to not store equivalence classes with one member. I.e a equals to a does not hold any meaningful information it just takes precious memory. When we add a new constraint we take careful steps to avoid adding a new class with one member. However, when remove dead kicks in, suddenly we end up having classes stored with one member, which is a somewhat inconsistent design IMHO. Perhaps some better documentation could have helped.

Representative symbol gives its equivalence class an ID. We use this ID for everything, for comparing and for mapping. Since we live in a persistent world, we can't just change this ID at some point, it will basically mean that we want to replace a class with another class. So, all maps where this class participated (constraints, class, members, disequality) should remove the old class and add the new class. This is a huge burden. You need to carefully do all this, and bloat existing data structures.

Trivial class is an optimization for the vast majority of symbols that never get into any classes. We still need to associate constraints with those. This is where implicit Symbol to Class conversion comes in handy.

Now let's imagine that something else is merged into it. We are obliged to officially map the symbol to its class, and the class to its members. When this happens, it goes into the data structure FOREVER. And when in the future, with only one member, instead of keeping something that we already have from other versions of the data structure, you decide to remove the item from the class map, you actually use more memory when it was there! This is how persistent data structures work, different versions of the same data structure share data. And I'm not even talking here about the fact that you now need to replace the class with one ID with the class with another ID in every relationship.

You can probably measure memory footprint in your example and see it for yourself.

martong abandoned this revision.Jul 20 2021, 12:32 AM

Thanks for taking your time to take a look. And I accept your statements. Nevertheless, let me explain my motivation.

I thought that a class is trivial if it has only one member. And it seemed perfectly logical to not store equivalence classes with one member. I.e a equals to a does not hold any meaningful information it just takes precious memory. When we add a new constraint we take careful steps to avoid adding a new class with one member. However, when remove dead kicks in, suddenly we end up having classes stored with one member, which is a somewhat inconsistent design IMHO. Perhaps some better documentation could have helped.

Representative symbol gives its equivalence class an ID. We use this ID for everything, for comparing and for mapping. Since we live in a persistent world, we can't just change this ID at some point, it will basically mean that we want to replace a class with another class. So, all maps where this class participated (constraints, class, members, disequality) should remove the old class and add the new class. This is a huge burden. You need to carefully do all this, and bloat existing data structures.

Trivial class is an optimization for the vast majority of symbols that never get into any classes. We still need to associate constraints with those. This is where implicit Symbol to Class conversion comes in handy.

Now let's imagine that something else is merged into it. We are obliged to officially map the symbol to its class, and the class to its members. When this happens, it goes into the data structure FOREVER. And when in the future, with only one member, instead of keeping something that we already have from other versions of the data structure, you decide to remove the item from the class map, you actually use more memory when it was there! This is how persistent data structures work, different versions of the same data structure share data. And I'm not even talking here about the fact that you now need to replace the class with one ID with the class with another ID in every relationship.

You can probably measure memory footprint in your example and see it for yourself.

Yeah, makes sense. Thanks for taking more time for further explanations. Still, IMHO, the definition of a trivial class needs a clear written documentation in the code itself, so other developers can easier understand and maintain this code. I am going to create an NFC patch that summarizes this discussion in a comment attached to the isTrivial function.

Thanks for taking your time to take a look. And I accept your statements. Nevertheless, let me explain my motivation.

I thought that a class is trivial if it has only one member. And it seemed perfectly logical to not store equivalence classes with one member. I.e a equals to a does not hold any meaningful information it just takes precious memory. When we add a new constraint we take careful steps to avoid adding a new class with one member. However, when remove dead kicks in, suddenly we end up having classes stored with one member, which is a somewhat inconsistent design IMHO. Perhaps some better documentation could have helped.

Representative symbol gives its equivalence class an ID. We use this ID for everything, for comparing and for mapping. Since we live in a persistent world, we can't just change this ID at some point, it will basically mean that we want to replace a class with another class. So, all maps where this class participated (constraints, class, members, disequality) should remove the old class and add the new class. This is a huge burden. You need to carefully do all this, and bloat existing data structures.

Trivial class is an optimization for the vast majority of symbols that never get into any classes. We still need to associate constraints with those. This is where implicit Symbol to Class conversion comes in handy.

Now let's imagine that something else is merged into it. We are obliged to officially map the symbol to its class, and the class to its members. When this happens, it goes into the data structure FOREVER. And when in the future, with only one member, instead of keeping something that we already have from other versions of the data structure, you decide to remove the item from the class map, you actually use more memory when it was there! This is how persistent data structures work, different versions of the same data structure share data. And I'm not even talking here about the fact that you now need to replace the class with one ID with the class with another ID in every relationship.

You can probably measure memory footprint in your example and see it for yourself.

Yeah, makes sense. Thanks for taking more time for further explanations. Still, IMHO, the definition of a trivial class needs a clear written documentation in the code itself, so other developers can easier understand and maintain this code. I am going to create an NFC patch that summarizes this discussion in a comment attached to the isTrivial function.

That's a great idea! Thanks!