One might think that by merging the equivalence classes (eqclasses) of Sym1 and Sym2 symbols we would end up with a State in which the eqclass of Sym1 and Sym2 should now be the same. Surprisingly, it's not always true.
Such an example triggered the assertion enforcing this unjustified invariant in https://github.com/llvm/llvm-project/issues/58677.
unsigned a, b; #define assert(cond) if (!(cond)) return void f(unsigned c) { /*(1)*/ assert(c == b); /*(2)*/ assert((c | a) != a); /*(3)*/ assert(a); // a = 0 => c | 0 != 0 => c != 0 => b != 0 }
I believe, that this assertion hold for reasonable cases - where both MemberSym and SimplifiedMemberSym refer to live symbols.
It can only fail if SimplifiedMemberSym refers to an already dead symbol. See the reasoning at the very end.
In this context, I don't know any way of determining if a symbol is alive/dead, so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.
Let me elaborate on why I think the assertion is wrong in its current shape.
Here is a quick reminder about equivalence classes in CSA.
We have 4 mappings:
- ClassMap: map, associating Symbols with an EquivalenceClass.
- ClassMembers: map, associating EquivalenceClasses with its members - basically an enumeration of the Symbols which are known to be equal.
- ConstraintRange: map, associating EquivalenceClasses with the range constraint which should hold for all the members of the eqclass.
- DisequalityMap: I'm omitting this, as it's irrelevant for our purposes now.
Whenever we encounter/assume that two SymbolRefs are equal, we update the ClassMap so that now both SymbolRefs are referring to the same eqclass. This operation is known as merge or union.
Each eqclass is uniquely identified by the representative symbol, but it could have been just a unique number or anything else - the point is that an eqclass is identified by something unique.
Initially, all Symbols form - by itself - a trivial eqclass, as there are no other Symbols to which it is assumed to be equal. A trivial eqclass has notionally exactly one member, the representative symbol.
I'm emphasizing that notionally because for such cases we don't store an entry in the ClassMap nor in the ClassMembers map, because it could be deduced.
By merging two eqclasses, we essentially do what you would think it does. An important thing to highlight is that the representative symbol of the resulting eqclass will be the same as one of the two eqclasses.
This operation should be commutative, so that merge(eq1,eq2) and merge(eq2,eq1) should result in the same eqclass - except for the representative symbol, which is just a unique identifier of the class, a name if you will. Using the representative symbol of eq1 or eq2 should have no visible effect on the analysis overall.
When merging eq1 into eq2, we take each of the ClassMembers of eq1 and add them to the ClassMembers of eq2 while we also redirect all the Symbol members of eq1 to map to the eq2 eqclass in the ClassMap. This way all members of eq1 will refer to the correct eqclass.
After these, eq1 key is unreachable in the ClassMembers, hence we can drop it.
Let's get back to the example.
Note that when I refer to symbols a, b, c, I'm referring to the SymbolRegionValue{VarRegion{.}} - the value of that variable.
After (1), we will have a constraint c == b : [1,1] and an eqclass c,b with the c representative symbol.
After (2), we will have an additional constraint c|b != a : [1,1] with the same eqclass. We will also have disequality info about that c|a is disequal to a - and the other way around.
However, after the full-expression, c will become dead. This kicks in the garbage collection, which transforms the state into this:
- We no longer have any constraints, because only a is alive, for which we don't have any constraints.
- We have a single (non-trivial) eqclass with a single element b and representative symbol c. (Dead symbols can be representative symbols.)
- We have the same disequality info as before.
At (3) within the false branch, a get perfectly constrained to zero. This kicks in the simplification, so we try to substitute (simplify) the variable in each SymExpr-tree. In our case, it means that the c|a != a : [1,1] constraint gets re-evaluated as c|0 != 0 : [1,1], which is c != 0 : [1,1].
Under the hood, it means that we will call merge(c|a, c). where c is the result of simplifyToSVal(State, MemberSym).getAsSymbol() inside EquivalenceClass::simplify().
Note that the result of simplifyToSVal() was a dead symbol. We shouldn't have acquired an already dead symbol. AFAIK, this is the only way we can get one at this point.
Since c is dead, we no longer have a mapping in ClassMap for it; hence when we try to find() the eqclass of c, it will report that it's a trivial eqclass with the representative symbol c.
After merge(c|a, c), we will have a single (non-trivial) eqclass of b, c|a with the representative symbol c|a - because we merged the eqclass of c into the eqclass of c|a.
This means that find(c|a) will result in the eqclass with the representative symbol c|a. So, we ended up having different eqclasses for find(c|a) and find(c) after merge(c|a, c), firing the assertion.
I believe, that this assertion hold for reasonable cases - where both MemberSym and SimplifiedMemberSym refer to live symbols.
MemberSym should be live in all cases here, because it is from ClassMembers which is pruned in removeDeadBindings() so these must be alive. In this context, I don't know any way of determining if a symbol is alive/dead, so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.
I'd like to thank @martong for helping me to conclude the investigation.
It was really difficult to track down.
PS: I mentioned that merge(eq1, eq2) should be commutative. We actually exploit this for merging the smaller eqclass into the bigger one within EquivalenceClass::merge().
Yea, for some reason, if you swap the operands, 3 tests break (only failures, no crashes) aside from the test which checks the state dump. But I believe, that is a different bug and orthogonal to this one. I just wanted to mention that.
- Analysis/solver-sym-simplification-adjustment.c
- Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
- Analysis/symbol-simplification-reassume.cpp