This is an archive of the discontinued LLVM Phabricator instance.

[ASTImporter] Remove NonEquivalentDecls from ASTImporter
Needs ReviewPublic

Authored by martong on May 20 2019, 3:17 AM.

Details

Summary

Currently ASTImporter::NonEquivalentDecls member keeps its state and
shares it between consecutive calls of
StructuralEquivalenceContesxt::IsEquivalent(). Thus, we cache
inequivalent pairs from a previous snapshot of the AST. However, we
continuously build the AST and a pair which used to be inequivalent may
be equivalent later (or vica-versa). NonEquivalentDecls behaves
similarly to other internal states of StructuralEquivalenceContext
(TentativeEquivalences, DeclsToCheck). I.e this state too should be
reset before each IsEquivalent() call.

Event Timeline

martong created this revision.May 20 2019, 3:17 AM
Herald added a reviewer: shafik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript

Hi Gabor,
Could you provide an example of an import sequence leading to this behavior? It's hard for me to imagine such a situation.

clang/test/ASTMerge/struct/test.c
37

It looks strange to me that these warnings are gone.

martong marked 2 inline comments as done.Jun 28 2019, 8:10 AM

Hi Gabor,
Could you provide an example of an import sequence leading to this behavior? It's hard for me to imagine such a situation.

Alexei, thanks for the review again. I don't think I could create an import sequence for this. However, I have experienced this kind of poisonous cache behavior during the debugging of a mysterious false positive structural inequivalency (during the analysis of protobuf).
The following happened: During the analysis we compared two Decls which turned out to be inequivalent, so we cached them. Later during the analysis, however, we added a new node to the redecl chain of one of these Decls which we previously compared. Then another structural equivalent check followed for the two Decls. And this time they should have been considered structurally equivalent, but the cache already contained them as nonequivalent. This resulted in a false positive NameConflict error.

By this change the error had gone, and we did not recognize any analysis slowdown. Remember, we still have a cache, but not a global cache in the ASTImporter object.

clang/test/ASTMerge/struct/test.c
37

For me, these warnings seems just like noise, by themselves line 37-40 does not show where is the exact mismatch.
The exact mismatch is shown in line 34-36, those warnings indicate that DeeperError has incompatible fields in the different TUs with different types (int vs float). That is the lowest level where the mismatch happens, I don't think we should indicate a warning for all other types which contain the mismatched types.

The same is true below (line 50-52) with the struct and union.

The following happened: During the analysis we compared two Decls which turned out to be inequivalent, so we cached them. Later during the analysis, however, we added a new node to the redecl chain of one of these Decls which we previously compared. Then another structural equivalent check followed for the two Decls. And this time they should have been considered structurally equivalent, but the cache already contained them as nonequivalent. This resulted in a false positive NameConflict error.

Should we reset the non-equivalence relation after a decl is imported for this decl and its redecls?

By this change the error had gone, and we did not recognize any analysis slowdown. Remember, we still have a cache, but not a global cache in the ASTImporter object.

Hm, I wonder if our cache is really useful or not. Unfortunately, I never did any measures.

Still the lack of a test disturbs me, even despite the fact that I got the idea.

Example about how to get wrong things into NonEquivalentDecls:
We want to compare class C for structural equivalence in the following codes:

class A {}; class B {int i;}; void x(A *); void y(A *); class C { friend void x(A *); friend void y(A *); };

and

class A {}; class B {int i;}; void x(A *); void y(B *); class C { friend void x(A *); friend void y(B *); };

The result is false for C but the NonEquivalentDecls will contain after the compare a void x(A*)<->void x(A*) pair.

The reason is that during the check we encounter first a A<->B pair (iterating over the friends and friend functions, first y is encountered and a check for A and B follows). The B is recorded as "tentative equivalence" to A. Then we try to check A to A (during check of x) but because there is a "tentative equivalence" with B from A the check returns false (not equivalent). This false result is recorded as a (incorrect) non-equivalence of the two x functions.
I want to replace the DeclsToCheck and TentativeEquivalences with a set of Decl pairs (like the NonEquivalentDecls) so it will be possible to record the same from Decl with multiple to values. (Is there a reason for why there is a NonEquivalentDecls cache but not a EquivalentDecls or simply a cache for result values?)

martong marked an inline comment as done.Aug 23 2019, 7:39 AM

Example about how to get wrong things into NonEquivalentDecls:
We want to compare class C for structural equivalence in the following codes:

class A {}; class B {int i;}; void x(A *); void y(A *); class C { friend void x(A *); friend void y(A *); };

and

class A {}; class B {int i;}; void x(A *); void y(B *); class C { friend void x(A *); friend void y(B *); };

The result is false for C but the NonEquivalentDecls will contain after the compare a void x(A*)<->void x(A*) pair.

The reason is that during the check we encounter first a A<->B pair (iterating over the friends and friend functions, first y is encountered and a check for A and B follows). The B is recorded as "tentative equivalence" to A. Then we try to check A to A (during check of x) but because there is a "tentative equivalence" with B from A the check returns false (not equivalent). This false result is recorded as a (incorrect) non-equivalence of the two x functions.
I want to replace the DeclsToCheck and TentativeEquivalences with a set of Decl pairs (like the NonEquivalentDecls) so it will be possible to record the same from Decl with multiple to values. (Is there a reason for why there is a NonEquivalentDecls cache but not a EquivalentDecls or simply a cache for result values?)

Yes, this is indeed a problem. But it is independent from the original problem this patch intends to solve.

martong abandoned this revision.Aug 23 2019, 8:10 AM

@a_sidorin Alexei,

With Balazs we had a long discussion and I think we reached a consensus and common understanding. Now I try to summarize it.

It turned out that this patch essentially renders the NonEquivalentDecls cache to be completely disabled, i.e. we never have a cache hit. (I tried an example test run with an assertion enabled for the cache hit.)
This is because in Finish(), when we find the first non-equivalent pair then we immediately break out from the while loop which pops out elements from the queue. This means we do not consider any elements left in the queue after this, so the "local" cache is never going to be used again.
This, however, explains why did we have better results with this patch: the cache was disabled, so those faulty entries which were cached (see Balazs's example above) could not cause a faulty behavior.

Secondly, I tried to fabricate a test case where the change of the redecl chain could render previously non-equivalent pairs to be equivalent. I could not create such. I also executed an experiment on llvm/master to see in case of a cache hit whether we would get a different result without the cache:

   // Check whether we already know that these two declarations are not
   // structurally equivalent.
   if (Context.NonEquivalentDecls.count(
-          std::make_pair(D1->getCanonicalDecl(), D2->getCanonicalDecl())))
+          std::make_pair(D1->getCanonicalDecl(), D2->getCanonicalDecl()))) {
+    llvm::errs() << "Cache hit\n";
+    llvm::DenseSet<std::pair<Decl *, Decl *>> LocalNeqDecls;
+    StructuralEquivalenceContext LocalCtx(Context.FromCtx, Context.ToCtx,
+                                     LocalNeqDecls, Context.EqKind, false,
+                                     false, false);
+    assert(!LocalCtx.IsEquivalent(D1, D2));
     return false;
+  }

The added assertion did not fire. (I executed it on Xerces for a ~100 TUs). So, it is highly probable that if some pairs were ever non-equivalent they will stay non-equivalent.
This is not true, however, the other way around. Equivalent decls may become non-equivalent (e.g. a fwd decl is being completed).

To solve the issue that is found by Balazs, I suggest to move on with https://reviews.llvm.org/D66538 and abandon this patch.
An alternative solution would be to cache after the Finish() has returned, but I prefer D66538 because it makes the whole algorithm simpler, plus we can get rid of the redecleration chain check. By getting rid of that, I believe we can have a more robust import mechanism, which is more resistant to lookup errors. ATM, if we have a lookup error then we end up having two identical declarations in different redecl chains, which may initiate a NameConflict error avalanche. With D66538 we can avoid that.

As a next step, probably we should remove the cache because it complicates the code and the added performance value is not recognizable. If you'd like I can do measurements to investigate the performance difference. However, on our internal fork we disabled the cache entirely for quite a long time (this patch is originated from there) and we did not experience any noticeable performance degradation during the analysis.

martong reclaimed this revision.Aug 23 2019, 8:13 AM

... it is highly probable that if some pairs were ever non-equivalent they will stay non-equivalent.

This may not be true in case of LLDB, because usually they do a minimal import, and in a later phase they do a normal import which imports e.g. the members of a class.
Clearly, in this case it is wrong to cache non-equivalent decls.