Page MenuHomePhabricator

[ASTImporter] Import the whole redecl chain of functions

Authored by martong on May 30 2018, 6:13 AM.



With this patch when any FunctionDecl of a redeclaration chain is imported
then we bring in the whole declaration chain. This involves functions and
function template specializations. Also friend functions are affected. The
chain is imported as it is in the "from" tu, the order of the redeclarations
are kept. I also changed the lookup logic in order to find friends, but first
making them visible in their declaration context. We may have long
redeclaration chains if all TU contains the same prototype, but our
measurements shows no degradation in time of CTU analysis (Tmux, Xerces,
Bitcoin, Protobuf). Also, as further work we could squash redundant
prototypes, but first ensure that functionality is working properly; then
should we optimize.

This may seem like a huge patch, sorry about that. But, most of the changes are
new tests, changes in the production code is not that much. I also tried to
create a smaller patch which does not affect specializations, but that patch
failed to pass some of the clang-import-tests because there we import
function specializations. Also very importantly, we can't just change the
import of FunctionDecls without changing the import of function template
specializations because they are handled as FunctionDecls.

Diff Detail

rC Clang

Event Timeline

martong created this revision.May 30 2018, 6:13 AM
martong updated this revision to Diff 149107.May 30 2018, 6:20 AM
  • Add a missing "else"

I just wanted to give a detailed justification about why we should import the
whole redecl chain. Consider the following code:

void f(); // prototype
void f() { f(); }

Currently, when we import the prototype we end up having two independent
functions with definitions:

TranslationUnitDecl 0x25214c8 <<invalid sloc>> <invalid sloc>
|-FunctionDecl 0x255e3e8 <, col:27> col:16 f 'void ()'
| `-CompoundStmt 0x255e4f0 <col:20, col:27>
|   `-CallExpr 0x255e4c8 <col:22, col:24> 'void'
|     `-ImplicitCastExpr 0x255e4b0 <col:22> 'void (*)()' <FunctionToPointerDecay>
|       `-DeclRefExpr 0x255e488 <col:22> 'void ()' lvalue Function 0x255e3e8 'f' 'void ()'
`-FunctionDecl 0x255e300 <col:1, col:27> col:6 f 'void ()'
  `-CompoundStmt 0x255e570 <col:20, col:27>
    `-CallExpr 0x255e548 <col:22, col:24> 'void'
      `-ImplicitCastExpr 0x255e530 <col:22> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x255e508 <col:22> 'void ()' lvalue Function 0x255e3e8 'f' 'void ()'

Also, when we import a definition of the friend function below we again end up
with two different definitions:

struct X { friend void f(); };
void f(){}

And what's more, one of these definitions has the class as a parent:

`-CXXRecordDecl 0x1802358 <, col:8> col:8 struct X definition
  |-CXXRecordDecl 0x1802478 <col:1, col:8> col:8 implicit struct X
  |-FunctionDecl 0x1802560 parent 0x17c5618 <col:12, col:40> col:24 f 'void ()'
  | `-CompoundStmt 0x1802610 <col:39, col:40>
  `-FriendDecl 0x1802620 <col:12, col:40> col:24
    `-FunctionDecl 0x1802560 parent 0x17c5618 <col:12, col:40> col:24 f 'void ()'
      `-CompoundStmt 0x1802610 <col:39, col:40>

In previous versions, we had similar problems in case of virtual functions,
i.e. the virtual flag of an out-of-class definition was not set, because we
missed to import the prototype which had the explicit virtual keyword. That
seems to be fixed even before this patch, but we wouldn't have that problem if
we had import the whole redecl chain.

These redundant definitions can cause assertions and can mislead any tool which
tries to process such a malformed AST.

balazske added inline comments.May 31 2018, 5:34 AM

Assert for FunctionDecl?


This test imports the definition two times, the second should result in error. The import does not check the function body for similarness. Probably a check can be made for the result of the second import.

martong added inline comments.May 31 2018, 6:43 AM

cast itself will assert if it is not a FunctionDecl. (dyn_cast is the other formula which can result with a nullptr.)


I don't think this should result an error, because the second definition has the very same type as the first. And also the body is structurally equivalent too, but we do not check that. Perhaps, on a long term we should check the body too (?).

I think the importer should work in this way: if there is already a definition then just simply use that and omit any subsequent definitions if they are the same definitions (or report error if the body is different).
The key principle I followed is this: we can have arbitrary number of prototypes but only one definition.

Hello Gabor,

This patch is really cool. Unfortunately, right now I don't have enough time for reviewing large patches in a single review. Are you OK with incremental review?


We can drop llvm:: prefix for SmallVectors.


Please keep * where possible.


Does this code mean that we can get R == getFirstDecl() in the middle of the import chain?


This code should be unified with ImportTemplateInformation() to avoid duplication.


Can we just import getPreviousDecl()?


Unchecked import result, is it intentional?


Double space.


Has removed comment become irrelevant? (Same below)






So, we're importing a redeclaration. It imports its own redeclarations during import. This means that after the first redecl is imported, all other redecls should be imported too. So, do we need to have a loop here?


Why did the order change?


I think this is a right approach because we have ODR. Unless function signatures are different, only one possible definition should be used.
This situation is quite different from ASTImporter approach because ASTImporter checker for declaration compatibility, not definition compatibility. ASTImporter does not check function bodies for similarity, and I believe it shouldn't.


Misplaced *s appeared. Could you clang-format?


We can replace EXPECT_TRUE with negations with EXPECT_FALSE.

Are you OK with incremental review?

Hi Aleksei, thanks for reviewing this! Of course I am OK with an incremental review. Just a minor thing, I'll be back in the office only at the 18th, so I can address all of the comments only then.

martong updated this revision to Diff 152262.Jun 21 2018, 5:34 AM
martong marked 15 inline comments as done.
  • Addressing Alexei's comments.

D->redecls() gives a list of decls which always starts with D and the next item is actually the previous item in the order of source locations. But this is a circular list, so once we reached the first item then the next item will be the last in the order of the source locations.
So, D->getFirstDecl()->redecls() gives a reversed list which has D as the first element. We must make D as the last, so we can easily reverse.


OK, I've refactored the common code into a new import function.


In short, we can't.

The Redeclarable class provides the getPreviousDecl() function and the redecls()range.
The list of the declarations is singly linked circular list and the directions goes backwards.

///  #1 int f(int x, int y = 1); // previousDecl: <pointer to #3>
///  #2 int f(int x = 0, int y); // previousDecl: <pointer to #1>
///  #3 int f(int x, int y) { return x + y; } // previousDecl: <pointer to #2>

I think, it is very important to preserve the order of the Decls as it was in the from context. Keeping the order makes the imported AST to be structurally similar to the original AST. Also, the canonical Decl in the "From" context will be the canonical in the "To" context too (when we import to an empty context).

During the import of #2 if we used getPreviousDecl then we would import first #1 and then #2. The order would be fine. But, during the import of #2 there is no way to import #3, because we can't go forward in the linked list. We could however import #3 during the import of #1, but that would result a #3, #1, #2 order (i.e. a redecl chain where #3 is the canonical decl).

Alternatively we could start the import from the most recent decl (i.e the last) and then recursively import the previous decl up to the first decl, but then we should pass an extra parameter through the function calls of ASTImporter to avoid jumping back to the last decl if we are in the middle of a chain. We could perhaps follow this state in a member variable too, but I consider this way too complex and error prone.

This is the exact reason why we need the getCanonicalForwardRedeclChain() function which provides a list of declarations starting from the first (canonical) decl and ends with the latest redecl, this provides the same list for each element in the redecl chain.


Good catch, thanks! Changed it to check the result.


// FIXME: Actually try to merge the body and other attributes.
This FIXME comment (and some other comments) is in strong contrast to all the ODR diagnostics implemented in ASTImporter. Actually, (generally in the ASTImporter) rather than merging two definitions we diagnose an ODR violation.
Perhaps the ASTImporter is used or had been used previously to merge two compatible function or class definitions. But having two structurally equivalent definitions is against the ODR.
I think we should follow the ODR violation approach here and we should not merge two function body ever.


When we start with a middle element, then once we reach the first redecl then this loop will import the rest of the chain. So, this means when we are going back in the call stack then some Import calls will just return with the already imported decls. This way, we handle with the same code the import of any element in the chain.
This may seem like a recursive madness, but I hope it makes sense.

You may wonder if this will be quadratic in the length of the declaration chain? Will we import each member of the redecl chain and for each member of the chain will we enumerate all the decls in the declaration chain?
The answer is No, because once a member is imported then all other import calls return early with the already imported decl. Consider the import of C from this chain A<-B<-C.

  VisitFunctionDecl(A) // A is imported before any other recursive import to the chain 
       Import(A) // A is already imported, early return
          Import(A) // A is already imported, early return
          Import(B) // B is already imported, early return
          // C is imported

Each VisitFunctionDecl calls getCanonicalForwardRedeclChain which is O(n). And we go once backward the chain and from the first decl we go forward, so we call getCanonicalForwardRedeclChainat maximum 2`n` times . But 99% of the time n <= 2. n is the number of decls in the redecl chain in the actual "From" context.
Also, we plan to implement the squashing of equivalent prototypes in the future, which will limit the length of the chains.


We provide diagnostics during the call of isStructuralMatch check. Now the order of the import changed and isStructuralMatch is called during the import, thus the diagnostic is emitted later.


Good point, thanks!

a_sidorin accepted this revision.Jun 24 2018, 3:28 PM
a_sidorin added a subscriber: a_sidorin.

Hi Gabor,

The patch LG, thank you for addressing my questions. Just some stylish nits.


Could you add comments why these tests are disabled?


Please add a space after '//' and a dot in the end. Same below.


In this patch, sometimes '*' is used for auto, sometimes not. Could we make it consistent?

This revision is now accepted and ready to land.Jun 24 2018, 3:28 PM
martong updated this revision to Diff 152691.Jun 25 2018, 7:39 AM
martong marked 3 inline comments as done.
  • Address review comments
martong added inline comments.Jun 25 2018, 7:39 AM

Added the comment, plus copied the explanation here as well.

This test is disabled, because ATM we create a redundant FunctionDecl. We start the import with the definition of f then we continue with the import of the type of f which involves X. During the import of X we start again the import of the definition of f and then finally we create the node. But then in the first frame of VisitFunctionDecl we create a node again since we do not check if such a node exists yet or not. This is being fixed in a separate patch: .

Backtrace to support the above explanation:

#0  clang::ASTNodeImporter::VisitFunctionDecl (this=0x7fffffffae30, D=0xb02b68) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:2372
#1  0x00007ffff694adbc in clang::declvisitor::Base<clang::declvisitor::make_ptr, clang::ASTNodeImporter, clang::Decl*>::Visit (this=0x7fffffffae30, D=0xb02b68) at tools/clang/include/clang/AST/
#2  0x00007ffff690fa64 in clang::ASTImporter::Import (this=0x7fffffffca18, FromD=0xb02b68) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6821
#3  0x00007ffff6920570 in clang::ASTNodeImporter::VisitFriendDecl (this=0x7fffffffb1e0, D=0xb02c20) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:2844
#4  0x00007ffff694a95c in clang::declvisitor::Base<clang::declvisitor::make_ptr, clang::ASTNodeImporter, clang::Decl*>::Visit (this=0x7fffffffb1e0, D=0xb02c20) at tools/clang/include/clang/AST/
#5  0x00007ffff690fa64 in clang::ASTImporter::Import (this=0x7fffffffca18, FromD=0xb02c20) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6821
#6  0x00007ffff6916a27 in clang::ASTNodeImporter::ImportDeclContext (this=0x7fffffffb9b0, FromDC=0xb02970, ForceImport=true) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:1161
#7  0x00007ffff69164a0 in clang::ASTNodeImporter::ImportDefinition (this=0x7fffffffb9b0, From=0xb02938, To=0xbcea58, Kind=clang::ASTNodeImporter::IDK_Default) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:1287
#8  0x00007ffff691bdf0 in clang::ASTNodeImporter::VisitRecordDecl (this=0x7fffffffb9b0, D=0xb02938) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:2215
#9  0x00007ffff6964678 in clang::declvisitor::Base<clang::declvisitor::make_ptr, clang::ASTNodeImporter, clang::Decl*>::VisitCXXRecordDecl (this=0x7fffffffb9b0, D=0xb02938) at tools/clang/include/clang/AST/
#10 0x00007ffff694abe0 in clang::declvisitor::Base<clang::declvisitor::make_ptr, clang::ASTNodeImporter, clang::Decl*>::Visit (this=0x7fffffffb9b0, D=0xb02938) at tools/clang/include/clang/AST/
#11 0x00007ffff690fa64 in clang::ASTImporter::Import (this=0x7fffffffca18, FromD=0xb02938) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6821
#12 0x00007ffff69128e7 in clang::ASTNodeImporter::VisitRecordType (this=0x7fffffffbac0, T=0xb02720) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:857
#13 0x00007ffff694a254 in clang::TypeVisitor<clang::ASTNodeImporter, clang::QualType>::Visit (this=0x7fffffffbac0, T=0xb02720) at ../../git/llvm/tools/clang/include/clang/AST/TypeNodes.def:92
#14 0x00007ffff690ff05 in clang::ASTImporter::Import (this=0x7fffffffca18, FromT=...) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6765
#15 0x00007ffff691107c in clang::ASTNodeImporter::VisitPointerType (this=0x7fffffffbc50, T=0xb02740) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:570
#16 0x00007ffff6949fb4 in clang::TypeVisitor<clang::ASTNodeImporter, clang::QualType>::Visit (this=0x7fffffffbc50, T=0xb02740) at ../../git/llvm/tools/clang/include/clang/AST/TypeNodes.def:64
#17 0x00007ffff690ff05 in clang::ASTImporter::Import (this=0x7fffffffca18, FromT=...) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6765
#18 0x00007ffff6911d21 in clang::ASTNodeImporter::VisitFunctionProtoType (this=0x7fffffffbf90, T=0xb027e0) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:712
#19 0x00007ffff694a120 in clang::TypeVisitor<clang::ASTNodeImporter, clang::QualType>::Visit (this=0x7fffffffbf90, T=0xb027e0) at ../../git/llvm/tools/clang/include/clang/AST/TypeNodes.def:80
#20 0x00007ffff690ff05 in clang::ASTImporter::Import (this=0x7fffffffca18, FromT=...) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6765
#21 0x00007ffff691d55a in clang::ASTNodeImporter::VisitFunctionDecl (this=0x7fffffffc990, D=0xb02840) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:2482
#22 0x00007ffff694adbc in clang::declvisitor::Base<clang::declvisitor::make_ptr, clang::ASTNodeImporter, clang::Decl*>::Visit (this=0x7fffffffc990, D=0xb02840) at tools/clang/include/clang/AST/
#23 0x00007ffff690fa64 in clang::ASTImporter::Import (this=0x7fffffffca18, FromD=0xb02840) at ../../git/llvm/tools/clang/lib/AST/ASTImporter.cpp:6821
#24 0x00000000004d4673 in clang::ast_matchers::ASTImporterTestBase::Import (this=0xa9e620, From=0xb02840, ToLang=clang::ast_matchers::Lang_CXX) at ../../git/llvm/tools/clang/unittests/AST/ASTImporterTest.cpp:300
This revision was automatically updated to reflect the committed changes.

Sure, looking into it.

This is not trivial to fix. Reverting until we can reproduce and fix it.
Reverted with commit: r335491

The broken lldb tests are fixed with a minor change. We no longer load the Decls from the
external source during the call of DeclContext::containsDecl. A new function
DeclContext::containsDeclAndLoad is added which does a load and calls

Re-apply commit: r335731

The broken lldb tests are fixed with a minor change. We no longer load the Decls from the
external source during the call of DeclContext::containsDecl. A new function
DeclContext::containsDeclAndLoad is added which does a load and calls

Re-apply commit: r335731

Thank you.