This is an archive of the discontinued LLVM Phabricator instance.

[ASTImporter] Fix poisonous structural equivalence cache
ClosedPublic

Authored by martong on Jul 13 2018, 8:43 AM.

Details

Summary

Implementation functions call into the member functions of
ASTStructuralEquivalence, thus they can falsely alter the DeclsToCheck state
(they add decls). This results that some leaf declarations can be stated as
inequivalent as a side effect of one inequivalent element in the DeclsToCheck
list. And since we store the non-equivalencies, any (otherwise independent)
decls will be rendered as non-equivalent. Solution: I tried to clearly
separate the implementation functions (the static ones) and the public
interface. From now on, the implementation functions do not call any public
member functions, only other implementation functions.

Diff Detail

Repository
rC Clang

Event Timeline

martong created this revision.Jul 13 2018, 8:43 AM

Hi Gabor,
Could you provide some tests for the issue?

martong added a comment.EditedJul 16 2018, 7:08 AM

Hi Aleksei,

Unfortunately, it seems that it is very to hard synthesize a lit or unit test for this fix.
I have found this flaw during the CTU analysis of Bitcoin, where the AST was immense.

Because of the lack of tests, now I am trying to persuade you based on some theory.
Let's consider this simple BFS algorithm from the s source:

void bfs(Graph G, int s)
{
  Queue<Integer> queue = new Queue<Integer>();
  marked[s] = true; // Mark the source
  queue.enqueue(s); // and put it on the queue.
  while (!q.isEmpty()) {
    int v = queue.dequeue(); // Remove next vertex from the queue.
    for (int w : G.adj(v))
      if (!marked[w]) // For every unmarked adjacent vertex,
      {
        marked[w] = true;
        queue.enqueue(w);
      }
  }
}

I believe , the structural equivalence check could be implemented as a parallel BFS on a pair of graphs.
And, I think that must have been the original approach at the beginning.
Indeed, it has it's queue, which holds pairs of nodes, one from each graph, this is the DeclsToCheck and it's pair is in TentativeEquivalences.
TentativeEquivalences also plays the role of the marking (marked) functionality above, we use it to check whether we've already seen a pair of nodes.

We put in the elements into the queue only in the toplevel decl check function:

static bool IsStructurallyEquivalent(StructuralEquivalenceContext &Context,
                                     Decl *D1, Decl *D2);

The while loop where we iterate over the children is implemented in Finish().
And Finish is called only from the two member functions which check the equivalency of two Decls or two Types. ASTImporter (and other clients) call only these functions.

The static implementation functions are called from Finish, these push the children nodes to the queue via static bool IsStructurallyEquivalent(StructuralEquivalenceContext &Context, Decl *D1, Decl *D2).
So far so good, this is almost like the BFS.
However, if we let a static implementation function to call Finish via another member function that means we end up with two nested while loops each of them working on the same queue. This is wrong and nobody can reason about it's doing. (This time it had an effect of poisoning the non-equivalencies cache.)

So, now TentativeEquivalences plays two roles. It is used to store the second half of the decls which we want to compare, plus it plays a role in closing the recursion. On a long term, I think, (after this change) we could refactor structural equivalency to be more alike to the traditional BFS.

a_sidorin accepted this revision.Jul 16 2018, 3:29 PM

Hi Gabor,

Thank you for this explanation, it sounds reasonable. Ithink it should be placed into the commit message or into the comment somewhere.

This revision is now accepted and ready to land.Jul 16 2018, 3:29 PM

@a_sidorin

Ok, thanks Aleksei for the review. I added the explanation as file comments into StructuralEquivalence.cpp.

martong updated this revision to Diff 155858.Jul 17 2018, 5:32 AM
  • Add comment about structural eq and BFS
This revision was automatically updated to reflect the committed changes.