This is an archive of the discontinued LLVM Phabricator instance.

Make iteration over the DeclContext::lookup_result safe.
ClosedPublic

Authored by v.g.vassilev on Nov 16 2020, 2:48 AM.

Details

Summary

The idiom:

DeclContext::lookup_result R = DeclContext::lookup(Name);
for (auto *D : R) {}

is not safe when in the loop body we trigger deserialization from an AST file. The deserialization can insert new declarations in the StoredDeclsList whose underlying type is a vector. When the vector decides to reallocate its storage the pointer we hold becomes invalid.

This patch replaces a SmallVector with an std::list. The current approach stores a SmallVector<NamedDecl*, 4> which is around 8 pointers. The linked list is 3, 5, or 7. We do better in terms of memory usage for small cases (and worse in terms of locality -- the linked list entries won't be near each other, but will be near their corresponding declarations, and we were going to fetch those memory pages anyway). For larger cases: the vector uses a doubling strategy for reallocation, so will generally be between half-full and full. Let's say it's 75% full on average, so there's N * 4/3 + 4 pointers' worth of space allocated currently and will be 2N pointers with the linked list. So we break even when there are N=6 entries and slightly lose in terms of memory usage after that. We suspect that's still a win on average.

Many thanks to @rsmith.

Diff Detail

Event Timeline

v.g.vassilev requested review of this revision.Nov 16 2020, 2:48 AM
v.g.vassilev created this revision.
v.g.vassilev retitled this revision from Iterating over the DeclContext::lookup_result is unsafe. to Make iteration over the DeclContext::lookup_result is safe..
v.g.vassilev edited the summary of this revision. (Show Details)

Take another approach at solving this issue. Instead we are making the lookup_result safe.

v.g.vassilev retitled this revision from Make iteration over the DeclContext::lookup_result is safe. to Make iteration over the DeclContext::lookup_result safe..

Invert the diff, fix a typo.

v.g.vassilev added inline comments.Nov 21 2020, 3:29 PM
clang/include/clang/AST/DeclBase.h
1305–1306

I could not get rid of the now slow size() operation because in some cases we had if (Lookup1.size() == Lookup2.size())...

rsmith added inline comments.Nov 24 2020, 1:03 PM
clang/include/clang/AST/DeclContextInternals.h
34–35

It's not reasonable to use std::list for this; it will introduce too much allocation overhead.

Instead, how about a representation such as:

struct ListNode;
using Decls = llvm::PointerUnion<NamedDecl*, ListNode*>;
struct ListNode {
  NamedDecl *D;
  Decls Rest;
};

using DeclsAndHasExternalTy = llvm::PointerIntPair<Decls, 1, bool>;
DeclsAndHasExternalTy Data;
266–295

I think we should remove all assumptions about the order of declarations in a DeclContextLookupResult. The invariants maintained here can be expensive and are almost never useful.

clang/lib/Sema/SemaDeclCXX.cpp
15105–15111

The assumption made by this code was wrong, and is already fixed in trunk; you need to rebase.

clang/lib/Sema/SemaObjCProperty.cpp
1144–1145

This should presumably be a loop over the lookup results instead of assuming that the property decl, if it exists, will be the first declaration.

clang/lib/Sema/SemaTemplateInstantiate.cpp
3421–3424

This should be a loop over the results looking for the FieldDecl. (There can also be a TagDecl with the same name in the same scope.)

v.g.vassilev marked 2 inline comments as done.

Upload the current changes. I need to update the users such as lldb, clang-tools-extra and clangd. It has some rough edges wrt modules.

rsmith added inline comments.Dec 8 2020, 10:41 AM
clang/include/clang/AST/DeclBase.h
1234–1237

I think we never need to allocate a ListNode unless we have two or more declarations to track, so a ListNode can never be empty and neither D nor Rest can ever be null.

1249–1255

This doesn't look correct for a non-const iterator: modifying the result of this will modify the ListNode for all but the last list entry, but will modify only the data inside the iterator for the last item in the list.

Can we make this strictly a const iterator? I would hope people aren't mutating the lookup results they get back from DeclContext lookups :) We can open-code the list traversal in HandleRedeclaration.

1285–1294

Do we need this operation? It would be more efficient to support only push_front.

1285–1316

I don't think we should provide any of the operations to mutate a ListNode in DeclBase.h. Instead, how about we make StoredDeclsList a friend and move all this stuff to there?

1287

This should be allocated on the ASTContext's allocator.

1327

We should move this operation out of DeclContextLookupResult and give it a name that better expresses its purpose (eg: is an inline namespace qualifier necessary to disambiguate?).

How about moving this to a function such as:

bool NamespaceDecl::isRedundantInlineQualifierFor(DeclarationName Name) {
  if (!isInline())
    return false;
  auto X = lookup(Name);
  auto Y = getParent()->lookup(Name);
  return std::distance(X.begin(), X.end()) == std::distance(Y.begin(), Y.end());
}

... used as (for example):

-    if (Policy.SuppressInlineNamespace && NS->isInline() && NameInScope &&
-        DC->getParent()->lookup(NameInScope).equals(DC->lookup(NameInScope)))
+    if (Policy.SuppressInlineNamespace && NameInScope &&
+        NS->isRedundantInlineQualifierFor(NameInScope))
clang/include/clang/AST/DeclContextInternals.h
146–147

We don't need this complication with the new model; you should be able to replace this with Data.setInt(1);.

146–147

We used to clear the "has external decls" flag here. Should we still do so, or was that a bug?

152

This previously did not clear the "has external decls" flag. Presumably it still should not do so.

152

This seems impossible now. (Previously we only removed one element and this was checking there were no duplicates, but the new implementation of erase removes all matching elements rather than stopping after the first one.)

272–273

Looks like we end up with one too many ListNodes along this path. When we go from one declaration to two, we need only one ListNode but will allocate two of them.

clang/lib/AST/DeclBase.cpp
1508–1509

It seems wasteful to check whether the element is in the list here -- we're performing two passes where we only need one. Can we remove the assertion from StoredDeclsMap::remove that the element is present there, and replace all of this with a call to Pos->second.remove(ND)?

1900–1901

Please consider replacing setOnlyValue and AddSubsequentDecl with a single addDecl function. The caller shouldn't care about whether StoredDeclsList has a special case for there being only one declaration.

clang/lib/AST/ExternalASTMerger.cpp
73–81

I think this comment also applies to the return nullptr above. Consider moving the comment to before the new if.

v.g.vassilev marked 17 inline comments as done.

Store current state, address some comments, a few bugs still to address...

Minor typos, conciseness..

Add documentation, resolve address comments, update in-tree clients.

We still have several failures due to diag locations which seem to stem from the current design. The current failures are -- https://paste.ubuntu.com/p/MJSCQgm26P/

rsmith added inline comments.Jan 25 2021, 5:48 PM
clang/include/clang/AST/ASTContext.h
666

Should we clear out Alloc->Rest here too?

clang/include/clang/AST/DeclBase.h
1227
1232

Consider renaming DeclListNode::DeclListIterator to DeclListNode::iterator. I don't think we benefit from having both a public name and a private name for it.

1251–1257

This appears to be equivalent, and slightly simpler.

1257

I would expect it->x to mean the same as (*it).x (which is invalid), but it actually means (**it).x. Can you remove this, and replace the pointer type with void to indicate we don't support operator->?

1262–1263

Is this assert different from the previous one? Do we need both?

1266–1269

Looks like this can be simplified a little.

1286

Minor simplification, NFC.

1306
clang/include/clang/AST/DeclContextInternals.h
35

Can you give this a better name? Or remove it -- I'm not sure we need a typedef for this.

75–78

This seems to be adding the new declaration as the second-last declaration in the list, not as the first declaration as the name of the function suggests.

89–111

Maybe something like this.

108

We mostly maintain the invariant that the Rest pointer of a list node is not null. I think this is the only deviation from that, and it seems like we can address that here -- in this case, I think we want to remove and deallocate Prev, replacing it with its declaration. I'll suggest edits.

156–160

This can now be simplified.

162–212

I think this can be simplified too.

290–302

Looks like this will dump the same thing twice. I assume that's just for debugging and can be simplified?

clang/lib/ARCMigrate/ObjCMT.cpp
650–652

May as well simplify this further while we're here.

clang/lib/AST/CXXInheritance.cpp
390

The "uninitialized" E here makes me uncomfortable as a reader. Maybe we could add a static DeclContext::lookup_iterator::end() and use that here? (E = I.end() if you want to get a little cute with it.)

clang/lib/Sema/SemaObjCProperty.cpp
114–115

May as well use the shiny new toys :)

233–234

Can use find_first here too.

v.g.vassilev marked 18 inline comments as done.

Address most of the comments.

The current version of the patch causes the following failures -- https://paste.ubuntu.com/p/Z4yKpK6rXq/

They are three kinds:

  • the order of the overload candidates depends on the lookup order -- I'd rather make the candidate set resilient to the lookup order if that's feasible
  • CodeGen's multi version function support depends on the lookup order -- there we have some ordering that depends and we may be able to incorporate another criterion rather than adjusting the test failures (CodeGenModule::emitMultiVersionFunctions)
  • ObjC redeclaration implementation goes into an infinite loop -- the new model assumes new declarations to be prepended in the StoredDeclsList. However, in ObjC redecls are modeled in the ASTContext and when we are done we try finding the decl we started from but we find it via getMethod doing a lookup (however we started writing the middle decl and we never hit the termination rule).

For a) we can adjust the test cases but

v.g.vassilev added inline comments.Jan 26 2021, 11:55 AM
clang/include/clang/AST/DeclBase.h
1262–1263

Indeed! That patch had too implementation versions already :)

clang/include/clang/AST/DeclContextInternals.h
89–111

For the time being I fail to integrate the proposed change. My current diff is :

diff --git a/clang/include/clang/AST/DeclContextInternals.h b/clang/include/clang/AST/DeclContextInternals.h
index 63acc1d4396a..6aec759d87f0 100644
--- a/clang/include/clang/AST/DeclContextInternals.h
+++ b/clang/include/clang/AST/DeclContextInternals.h
@@ -32,7 +32,6 @@ class DependentDiagnostic;
 /// An array of decls optimized for the common case of only containing
 /// one entry.
 class StoredDeclsList {
-  using DeclsTy = DeclListNode;
   using Decls = DeclListNode::Decls;
 
   /// A collection of declarations, with a flag to indicate if we have
@@ -62,7 +61,7 @@ class StoredDeclsList {
     // to the next to point to OldD. This way we save an extra allocation.
     if (NamedDecl *OldD = getAsDecl()) {
       //assert(OldD != ND && "list already contains decl");
-      DeclsTy *Node = C.AllocateDeclListNode(OldD);
+      DeclListNode *Node = C.AllocateDeclListNode(OldD);
       Node->Rest = ND;
       Data.setPointer(Node);
       return;
@@ -71,40 +70,38 @@ class StoredDeclsList {
     // (void)Vec;
     // assert(llvm::find(Vec, ND) == Vec.end() && "list still contains decl");
 
-    DeclsTy *Node = C.AllocateDeclListNode(ND);
-    // DeclsTy *Cur = getAsList();
-    // while(Cur->Rest.dyn_cast<DeclsTy*>())
-    //   Cur = Cur->Rest.get<DeclsTy*>();
-    // Node->Rest = Cur->Rest;
-    // Cur->Rest = Node;
+    DeclListNode *Node = C.AllocateDeclListNode(ND);
     Node->Rest = Data.getPointer();
     Data.setPointer(Node);
   }
 
   template<typename Pred>
   void erase_if(Pred pred) {
-    assert(getAsList() && "Not in list mode!");
     ASTContext &C = getASTContext();
-    for (Decls N = Data.getPointer(), Prev = N; N; ) {
-      if (DeclsTy *L = N.dyn_cast<DeclsTy*>()) {
+    for (Decls N = Data.getPointer(), *Prev = nullptr; N; ) {
+      if (DeclListNode *L = N.dyn_cast<DeclListNode*>()) {
         if (pred(L->D)) {
-          if (L == getAsList())
-            Data.setPointer(L->Rest);
+          if (Prev)
+            Prev->get<DeclListNode*>()->Rest = L->Rest;
           else
-            Prev.get<DeclsTy*>()->Rest = L->Rest;
+            Data.setPointer(L->Rest);
           N = L->Rest;
           C.DeallocateDeclListNode(L);
         } else {
           N = L->Rest;
-          Prev = L;
+          if (Prev)
+            Prev = &Prev->get<DeclListNode*>()->Rest;
         }
       } else {
         NamedDecl *ND = N.get<NamedDecl*>();
         if (pred(ND)) {
-          if (getAsDecl())
+          if (Prev) {
+            DeclListNode *PrevNode = Prev->get<DeclListNode*>();
+            *Prev = PrevNode->D;
+            C.DeallocateDeclListNode(PrevNode);
+          } else {
             Data.setPointer(nullptr);
-          else
-            Prev.get<DeclsTy*>()->Rest = nullptr;
+          }
         }
         break;
       }
@@ -132,8 +129,8 @@ public:
     // If this is a list-form, free the list.
     ASTContext &C = getASTContext();
     Decls List = Data.getPointer();
-    while(List.is<DeclsTy*>()) {
-      DeclsTy *ToDealloc = List.get<DeclsTy*>();
+    while (List.is<DeclListNode*>()) {
+      DeclListNode *ToDealloc = List.get<DeclListNode*>();
       List = ToDealloc->Rest;
       C.DeallocateDeclListNode(ToDealloc);
     }
@@ -167,8 +164,8 @@ public:
     return getAsListAndHasExternal().getPointer().dyn_cast<NamedDecl *>();
   }
 
-  DeclsTy *getAsList() const {
-    return getAsListAndHasExternal().getPointer().dyn_cast<DeclsTy*>();
+  DeclListNode *getAsList() const {
+    return getAsListAndHasExternal().getPointer().dyn_cast<DeclListNode*>();
   }
 
   bool hasExternalDecls() const {
@@ -192,16 +189,10 @@ public:
 
   /// Remove any declarations which were imported from an external AST source.
   void removeExternalDecls() {
-    if (isNull()) {
-      // Nothing to do.
-    } else if (NamedDecl *Singleton = getAsDecl()) {
-      if (Singleton->isFromASTFile())
-        *this = StoredDeclsList();
-    } else {
+    if (!isNull())
       erase_if([](NamedDecl *ND){ return ND->isFromASTFile(); });
-      // Don't have any external decls any more.
-      Data.setInt(0);
-    }
+    // Don't have any external decls any more.
+    Data.setInt(0);
   }
 
   /// getLookupResult - Return an array of all the decls that this list
@@ -222,7 +213,7 @@ public:
     }
 
     // Determine if this declaration is actually a redeclaration.
-    for (DeclsTy* N = getAsList(); N; N = N->Rest.dyn_cast<DeclsTy*>()) {
+    for (auto N = getAsList(); N; N = N->Rest.dyn_cast<DeclListNode*>()) {
       if (D->declarationReplaces(N->D, IsKnownNewer)) {
         N->D = D;
         return true;

I get these failures -- https://paste.ubuntu.com/p/S6kzg5ngjy/

clang/lib/Sema/SemaObjCProperty.cpp
114–115

Good catch, thanks!

rsmith added inline comments.Feb 8 2021, 1:10 PM
clang/include/clang/AST/DeclBase.h
1239

I think this should be void to indicate that we don't support operator->.

clang/include/clang/AST/DeclContextInternals.h
60–68

This switches the order of the first two declarations in the list; after adding a few declarations in the order D1, D2, D3, D4, we'll have: {.D = D4, .Rest = {.D = D3, .Rest = {.D = D1, .Rest = D2}}}.

I think you'd get a more consistent ordering if you removed this if.

89–111

Suggested alternative: https://godbolt.org/z/szefda

v.g.vassilev marked 5 inline comments as done.

Address some of Richard's comments.

Make visitation of multiversion functions independent on the lookup ordering; adjust tests.

More fixes suggested by @rsmith

Adjust the two tests I failed to merge.

rsmith added inline comments.Mar 3 2021, 12:48 PM
clang/include/clang/AST/DeclBase.h
1261

This triggers a compile error for me: this use of Ptr requires us to compute the alignment of a NamedDecl in order to determine how to lay out a PointerUnion<NamedDecl*, ...>, but we've not seen the definition of NamedDecl yet.

Adding this resolves the problem, but it seems a bit awkward and I wonder if there's a more elegant fix:

namespace llvm {
template <> struct PointerLikeTypeTraits<::clang::NamedDecl *> {
  static inline void *getAsVoidPointer(::clang::NamedDecl *P) { return P; }
                  
  static inline ::clang::NamedDecl *getFromVoidPointer(void *P) {
    return static_cast<::clang::NamedDecl *>(P);
  }
                              
  static constexpr int NumLowBitsAvailable = 3;
};                      
}

Add changes proposed by @rsmith, add an assertion for duplicate entries.

rsmith accepted this revision.Mar 16 2021, 5:11 PM

Thanks! Looks good. I've tested this on our side and it seems to work well. Let's land this and see if any further problems shake out.

This revision is now accepted and ready to land.Mar 16 2021, 5:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 5:11 PM
This revision was landed with ongoing or failed builds.Mar 17 2021, 2:00 AM
This revision was automatically updated to reflect the committed changes.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 17 2021, 2:00 AM

argh... I only now see the pre-merge suggestions -- almost all of them seem not super essential -- let me know if I should address them in a subsequent commit...