This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Remove unjustified assertion from EQClass::simplify
ClosedPublic

Authored by steakhal on Nov 15 2022, 7:47 AM.

Details

Summary

One might think that by merging the equivalence classes (eqclasses) of Sym1 and Sym2 symbols we would end up with a State in which the eqclass of Sym1 and Sym2 should now be the same. Surprisingly, it's not always true.

Such an example triggered the assertion enforcing this unjustified invariant in https://github.com/llvm/llvm-project/issues/58677.

unsigned a, b;
#define assert(cond) if (!(cond)) return

void f(unsigned c) {
    /*(1)*/ assert(c == b);
    /*(2)*/ assert((c | a) != a);
    /*(3)*/ assert(a);
    // a = 0  =>  c | 0 != 0  =>  c != 0  =>  b != 0
}

I believe, that this assertion hold for reasonable cases - where both MemberSym and SimplifiedMemberSym refer to live symbols.
It can only fail if SimplifiedMemberSym refers to an already dead symbol. See the reasoning at the very end.
In this context, I don't know any way of determining if a symbol is alive/dead, so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.


Let me elaborate on why I think the assertion is wrong in its current shape.

Here is a quick reminder about equivalence classes in CSA.
We have 4 mappings:

  1. ClassMap: map, associating Symbols with an EquivalenceClass.
  2. ClassMembers: map, associating EquivalenceClasses with its members - basically an enumeration of the Symbols which are known to be equal.
  3. ConstraintRange: map, associating EquivalenceClasses with the range constraint which should hold for all the members of the eqclass.
  4. DisequalityMap: I'm omitting this, as it's irrelevant for our purposes now.

Whenever we encounter/assume that two SymbolRefs are equal, we update the ClassMap so that now both SymbolRefs are referring to the same eqclass. This operation is known as merge or union.
Each eqclass is uniquely identified by the representative symbol, but it could have been just a unique number or anything else - the point is that an eqclass is identified by something unique.
Initially, all Symbols form - by itself - a trivial eqclass, as there are no other Symbols to which it is assumed to be equal. A trivial eqclass has notionally exactly one member, the representative symbol.
I'm emphasizing that notionally because for such cases we don't store an entry in the ClassMap nor in the ClassMembers map, because it could be deduced.

By merging two eqclasses, we essentially do what you would think it does. An important thing to highlight is that the representative symbol of the resulting eqclass will be the same as one of the two eqclasses.
This operation should be commutative, so that merge(eq1,eq2) and merge(eq2,eq1) should result in the same eqclass - except for the representative symbol, which is just a unique identifier of the class, a name if you will. Using the representative symbol of eq1 or eq2 should have no visible effect on the analysis overall.

When merging eq1 into eq2, we take each of the ClassMembers of eq1 and add them to the ClassMembers of eq2 while we also redirect all the Symbol members of eq1 to map to the eq2 eqclass in the ClassMap. This way all members of eq1 will refer to the correct eqclass.
After these, eq1 key is unreachable in the ClassMembers, hence we can drop it.


Let's get back to the example.
Note that when I refer to symbols a, b, c, I'm referring to the SymbolRegionValue{VarRegion{.}} - the value of that variable.

After (1), we will have a constraint c == b : [1,1] and an eqclass c,b with the c representative symbol.
After (2), we will have an additional constraint c|b != a : [1,1] with the same eqclass. We will also have disequality info about that c|a is disequal to a - and the other way around.
However, after the full-expression, c will become dead. This kicks in the garbage collection, which transforms the state into this:

  • We no longer have any constraints, because only a is alive, for which we don't have any constraints.
  • We have a single (non-trivial) eqclass with a single element b and representative symbol c. (Dead symbols can be representative symbols.)
  • We have the same disequality info as before.

At (3) within the false branch, a get perfectly constrained to zero. This kicks in the simplification, so we try to substitute (simplify) the variable in each SymExpr-tree. In our case, it means that the c|a != a : [1,1] constraint gets re-evaluated as c|0 != 0 : [1,1], which is c != 0 : [1,1].
Under the hood, it means that we will call merge(c|a, c). where c is the result of simplifyToSVal(State, MemberSym).getAsSymbol() inside EquivalenceClass::simplify().
Note that the result of simplifyToSVal() was a dead symbol. We shouldn't have acquired an already dead symbol. AFAIK, this is the only way we can get one at this point.
Since c is dead, we no longer have a mapping in ClassMap for it; hence when we try to find() the eqclass of c, it will report that it's a trivial eqclass with the representative symbol c.

After merge(c|a, c), we will have a single (non-trivial) eqclass of b, c|a with the representative symbol c|a - because we merged the eqclass of c into the eqclass of c|a.

This means that find(c|a) will result in the eqclass with the representative symbol c|a. So, we ended up having different eqclasses for find(c|a) and find(c) after merge(c|a, c), firing the assertion.

I believe, that this assertion hold for reasonable cases - where both MemberSym and SimplifiedMemberSym refer to live symbols.
MemberSym should be live in all cases here, because it is from ClassMembers which is pruned in removeDeadBindings() so these must be alive. In this context, I don't know any way of determining if a symbol is alive/dead, so I cannot refine the assertion in that way.
So, I'm proposing to drop this unjustified assertion.

I'd like to thank @martong for helping me to conclude the investigation.
It was really difficult to track down.

PS: I mentioned that merge(eq1, eq2) should be commutative. We actually exploit this for merging the smaller eqclass into the bigger one within EquivalenceClass::merge().
Yea, for some reason, if you swap the operands, 3 tests break (only failures, no crashes) aside from the test which checks the state dump. But I believe, that is a different bug and orthogonal to this one. I just wanted to mention that.

  • Analysis/solver-sym-simplification-adjustment.c
  • Analysis/symbol-simplification-fixpoint-iteration-unreachable-code.cpp
  • Analysis/symbol-simplification-reassume.cpp

Diff Detail

Event Timeline

steakhal created this revision.Nov 15 2022, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 7:47 AM
steakhal requested review of this revision.Nov 15 2022, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2022, 7:47 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal updated this revision to Diff 475480.Nov 15 2022, 7:53 AM

Add test case.

I did not spend too much time thinking about this yet, but this sounds scary. I wonder if we should target the underlying problem instead, i.e., making sure we never have dead symbols as representatives for eq. classes. What do you think?

I did not spend too much time thinking about this yet, but this sounds scary. I wonder if we should target the underlying problem instead, i.e., making sure we never have dead symbols as representatives for eq. classes. What do you think?

Yes, indeed scary. The answer is a bit complicated. I think a similar concern came up here, and Valeriy summed up pretty well.

Representative symbol gives its equivalence class an ID. We use this ID for everything, for comparing and for mapping. Since we live in a persistent world, we can't just change this ID at some point, it will basically mean that we want to replace a class with another class. So, all maps where this class participated (constraints, class, members, disequality) should remove the old class and add the new class. This is a huge burden. You need to carefully do all this, and bloat existing data structures.

Trivial class is an optimization for the vast majority of symbols that never get into any classes. We still need to associate constraints with those. This is where implicit Symbol to Class conversion comes in handy.

Now let's imagine that something else is merged into it. We are obliged to officially map the symbol to its class, and the class to its members. When this happens, it goes into the data structure FOREVER. And when in the future, with only one member, instead of keeping something that we already have from other versions of the data structure, you decide to remove the item from the class map, you actually use more memory when it was there! This is how persistent data structures work, different versions of the same data structure share data. And I'm not even talking here about the fact that you now need to replace the class with one ID with the class with another ID in every relationship.

You can probably measure memory footprint in your example and see it for yourself.

However, we will never know for sure if these allegations are true unless we implement and measure it, but I'm not volunteering.

I guess there are some more options. We could try keeping representatives alive no matter what. It could be a good exercise to see if doing that makes any difference in the analysis results.

I guess there are some more options. We could try keeping representatives alive no matter what. It could be a good exercise to see if doing that makes any difference in the analysis results.

Yes, I should have mentioned it. I was thinking about exactly this, prior convincing myself for removing this assertion. However, my feeling is that it would keep too many symbols. Basically leaking them.
Think about that all symbol starts as a trivial eqclass. Whenever we would merge, we would leak the other eqclass. Idk but this also seems scarry.
On the other side, I dont have the infra for conducting a detailed performance analysis yet - unlike with making report diffs.

Hi @steakhal, thanks for the suggested change.
How we can help move this forward? From what I'm comprehending from the notes, perhaps we could try running this change through our internal systems level test and fuzzer. Unfortunately, I'd not be able to say more than "trust me, we saw no problems" if we see no problems. But if I do find additional cases I can make simplified reproducers and we can work on addressing those.
Best!

Hi @steakhal, thanks for the suggested change.
How we can help move this forward? From what I'm comprehending from the notes, perhaps we could try running this change through our internal systems level test and fuzzer. Unfortunately, I'd not be able to say more than "trust me, we saw no problems" if we see no problems. But if I do find additional cases I can make simplified reproducers and we can work on addressing those.
Best!

Well, I totally forgot about this change. I think we could evaluate if keeping alive the representatives would help or not.
However, I cannot push this. I dont have the bandwidth.

I'd be interrested to see other crashing cases you mentioned.
Keep me informed.

vabridgers accepted this revision.Feb 12 2023, 2:14 PM

I got back to testing this through a large, internal set of builds, and do not see anymore problems. If everyone is ok with that, could this be merged?

This revision is now accepted and ready to land.Feb 12 2023, 2:14 PM

@NoQ @xazax.hun I plan to merge this by the end of the week if you have no objection.

I would also suggest backporting this to the release/16.x branch as that has already branched off. WDYT?

I am ok with committing this to unblock people hitting this assert, but at the same time I wonder if we want to open a ticket on GitHub that we might want to rethink how some of this works.

I'm backporting this as #60834 to the 16.x branch.

I am ok with committing this to unblock people hitting this assert, but at the same time I wonder if we want to open a ticket on GitHub that we might want to rethink how some of this works.

I created the ticket #60836 to track this. I don't plan to work on it in the near and far future.

I'm backporting this as #60834 to the 16.x branch.

Backported with 1ab37a25e463ea144289a1cbcf32a7659b2d60bb.