This is an archive of the discontinued LLVM Phabricator instance.

[Modules][PR39287] Consolidate multiple std's
Needs ReviewPublic

Authored by modocache on Mar 4 2019, 12:41 PM.

Details

Reviewers
rsmith
andrewjcg
Summary

This patch addresses https://bugs.llvm.org/show_bug.cgi?id=39287.

Clang creates a 'std' namespace in one of two ways: either it parses a
namespace std { ... } declaration, or it creates one implicitly. For the
implicit case clang::Sema maintains a pointer StdNamespace and either
gets an existing namespace decl, lazily deserializes one from an external
AST, or lazily creates a 'std' namespace decl.

When using modules, however, a different instance of Sema is used, with a
different StdNamespace pointer. So a new instance of the 'std' namespace may
be created when compiling a module.

Normally, multiple definitions of the same namespace would be merged across
modules. However, finding multiple definitions is done using the standard
lookup machinery in Clang, which ignores the implicit 'std' namespaces lazily
created by Sema. This results in bugs in which members of one module's 'std'
namespace are not visible to another's.

One case in which issues may arise is when Clang creates an implicit 'std'
namespace when it encounters a virtual destructor in C++17. This results in
the implicit definition of operator new and delete overloads that take a
std::align_val_t argument (this behavior was added in rL282800). The tests
in this patch demonstrate two existing bugs that are fixed by this patch:

  • mod-virtual-destructor-bug defines std::type_info within the main translation unit. When it does so, there are two 'std' namespaces available: the implicitly defined one that is returned by Sema::getStdNamespace, and the explicit one defined in the a.h module. The std::type_info definition ends up being attached to the module's std namespace, and so the subsequent lookup of getStdNamespace's type_info member fails.
  • mod-virtual-destructor-bug-two defines std::type_info within a module. Later, the lookup of std::type_info finds the implicitly created getStdNamespace, which only defines std::align_val_t, not std::type_info.

To fix, this patch modifies ASTDeclReader::findExisting such that an
explicit check is made: "are we trying to merge multiple definitions of a
namespace named 'std'?" If so, it explicitly grabs Sema::StdNamespace
(taking care not to trigger its lazy deserialization) to merge the
definitions.

Diff Detail

Event Timeline

modocache created this revision.Mar 4 2019, 12:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 12:41 PM

I realize this isn't the correct solution, but would any would-be reviewers like to comment on the problem? Whether it's here or on the Bugzilla report https://bugs.llvm.org/show_bug.cgi?id=39287, as a newcomer to Clang modules I could use some help understanding whether this sort of behavior is expected, or if there are known workarounds. Any and all comments appreciated!

Sorry for the delay. Just catching up on the code this covers, so apologies if the questions don't make sense.

lib/Sema/SemaDeclCXX.cpp
9214–9215 ↗(On Diff #189191)

I think you mentioned in another forum that one side effect of this change is that it's causing multiple candidates for names in std. Is this the implicit align constructros/destructors? Is this because we're marking the implicit namespace as being externally visible?

lib/Serialization/ASTReader.cpp
9291–9295 ↗(On Diff #189191)

How does modules normally "connect up" definitions of the same namspace occurring in the imported module? Is it done here (e.g. so that a name lookup will search all namespace definitions)? Is an alternate solution to this diff to make sure this handles the std implicit special case? Apologies for the naivety here -- still getting up to speed on this code.

I expect that we get this wrong in the same way for all the SemaDeclRefs declarations (StdNamespace, StdBadAlloc, and StdAlignValT).

I think the place where this is going wrong is ASTDeclReader::findExisting, which is assuming that the prior declaration for normal NamedDecls can be found by name lookup; that's not true for these particular implicitly-declared entities. Perhaps the simplest fix would be to add a check around here:

} else if (DeclContext *MergeDC = getPrimaryContextForMerging(Reader, DC)) {
  DeclContext::lookup_result R = MergeDC->noload_lookup(Name);
  for (DeclContext::lookup_iterator I = R.begin(), E = R.end(); I != E; ++I) {
    if (NamedDecl *Existing = getDeclForMerging(*I, TypedefNameForLinkage))
      if (isSameEntity(Existing, D))
        return FindExistingResult(Reader, D, Existing, AnonymousDeclNumber,
                                  TypedefNameForLinkage);
  }
  // [HERE]
}

... to see if D is one of our three special entities, and if so to ask Sema for its current declaration of that entity (without triggering deserialization) and if not, set D as the declaration of that entity.

lib/Sema/SemaDeclCXX.cpp
8915–8919 ↗(On Diff #189191)

This doesn't seem right to me. This change appears to affect the case where we already have two distinct std namespaces and are declaring a third, which is too late -- things already went wrong if we reach that situation.

9215 ↗(On Diff #189191)

This also doesn't look right: the "has external visible storage" flag should only be set by the external source, to indicate that it has lookup results for the decl context.

Thanks @rsmith for the guidance here! I appreciate it very much. One snag I ran into after following your suggestion, though, is that when I modify ASTDeclReader::findExisting to return Sema's existing implicit std namespace, I run into an assertion later on, when the decl chain is being marked as incomplete. That is, the following patch (debug output included):

diff --git a/lib/Serialization/ASTReaderDecl.cpp b/lib/Serialization/ASTReaderDecl.cpp
index 32bd82d077..9d447952e1 100644
--- a/lib/Serialization/ASTReaderDecl.cpp
+++ b/lib/Serialization/ASTReaderDecl.cpp
@@ -47,6 +47,7 @@
 #include "clang/Basic/SourceLocation.h"
 #include "clang/Basic/Specifiers.h"
 #include "clang/Sema/IdentifierResolver.h"
+#include "clang/Sema/Sema.h"
 #include "clang/Serialization/ASTBitCodes.h"
 #include "clang/Serialization/ASTReader.h"
 #include "clang/Serialization/ContinuousRangeMap.h"
@@ -3401,6 +3402,22 @@ ASTDeclReader::FindExistingResult ASTDeclReader::findExisting(NamedDecl *D) {
           return FindExistingResult(Reader, D, Existing, AnonymousDeclNumber,
                                     TypedefNameForLinkage);
     }
+    if (isa<NamespaceDecl>(D) && D->getName() == "std") {
+      llvm::outs() << "Found std namespace: ";
+      D->dump();
+      llvm::outs() << "Merging into Sema std namespace: ";
+      Sema *S = Reader.getSema();
+      S->getStdNamespace()->dump();
+      NamedDecl *Existing =
+          getDeclForMerging(S->getStdNamespace(), TypedefNameForLinkage);
+      llvm::outs() << "Found Existing: ";
+      Existing->dump();
+      if (isSameEntity(Existing, D)) {
+        llvm::outs() << "isSameEntity is true\n";
+        return FindExistingResult(Reader, D, Existing, AnonymousDeclNumber,
+                                  TypedefNameForLinkage);
+      }
+    }
   } else {
     // Not in a mergeable context.
     return FindExistingResult(Reader);

Results in the expected output, along with an unexpected assert:

Found std namespace: NamespaceDecl 0x55a391530910 <mod-virtual-destructor-bug/a.h:1:1, col:16> col:11 imported in a.h std
Merging into Sema std namespace: NamespaceDecl 0x55a391501ec8 <<invalid sloc>> <invalid sloc> implicit std
Found Existing: NamespaceDecl 0x55a391501ec8 <<invalid sloc>> <invalid sloc> implicit std
isSameEntity is true
clang-9: ../include/llvm/ADT/PointerUnion.h:135: T llvm::PointerUnion<PT1, PT2>::get() const [with T = clang::LazyGenerationalUpdatePtr<const clang::Decl*, clang::Decl*, &clang::ExternalASTSource::CompleteRedeclChain>; PT1 = llvm::PointerUnion<clang::Decl*, const void*>; PT2 = clang::LazyGenerationalUpdatePtr<const clang::Decl*, clang::Decl*, &clang::ExternalASTSource::CompleteRedeclChain>]: Assertion `is<T>() && "Invalid accessor called"' failed.
Stack dump:
0.      Program arguments: /home/modocache/Source/llvm/git/dev/llvm/build/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name mod-virtual-destructor-bug.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -resource-dir /home/modocache/Source/llvm/git/dev/llvm/build/lib/clang/9.0.0 -I mod-virtual-destructor-bug -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/backward -internal-isystem /usr/local/include -internal-isystem /home/modocache/Source/llvm/git/dev/llvm/build/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++17 -fdeprecated-macro -fdebug-compilation-dir /home/modocache/Source/tmp/mod -ferror-limit 19 -fmessage-length 195 -fmodules -fimplicit-module-maps -fmodules-cache-path=mod-virtual-destructor-cache -fmodules-validate-system-headers -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/mod-virtual-destructor-bug-0da92f.o -x c++ mod-virtual-destructor-bug.cpp -faddrsig
1.      mod-virtual-destructor-bug.cpp:10:17: current parser token 'class'
 #0 0x000055a388bfaaa5 llvm::sys::PrintStackTrace(llvm::raw_ostream&) /home/modocache/Source/llvm/git/dev/llvm/build/../lib/Support/Unix/Signals.inc:494:22
 #1 0x000055a388bfab38 PrintStackTraceSignalHandler(void*) /home/modocache/Source/llvm/git/dev/llvm/build/../lib/Support/Unix/Signals.inc:558:1
 #2 0x000055a388bf8b32 llvm::sys::RunSignalHandlers() /home/modocache/Source/llvm/git/dev/llvm/build/../lib/Support/Signals.cpp:68:20
 #3 0x000055a388bfa4fb SignalHandler(int) /home/modocache/Source/llvm/git/dev/llvm/build/../lib/Support/Unix/Signals.inc:357:1
 #4 0x00007f909ab86dd0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12dd0)
 #5 0x00007f909a249077 raise /build/glibc-B9XfQf/glibc-2.28/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #6 0x00007f909a22a535 abort /build/glibc-B9XfQf/glibc-2.28/stdlib/abort.c:81:7
 #7 0x00007f909a22a40f get_sysdep_segment_value /build/glibc-B9XfQf/glibc-2.28/intl/loadmsgcat.c:509:8
 #8 0x00007f909a22a40f _nl_load_domain /build/glibc-B9XfQf/glibc-2.28/intl/loadmsgcat.c:970:34
 #9 0x00007f909a23a142 (/lib/x86_64-linux-gnu/libc.so.6+0x32142)
#10 0x000055a388ec8ee7 clang::LazyGenerationalUpdatePtr<clang::Decl const*, clang::Decl*, &(clang::ExternalASTSource::CompleteRedeclChain(clang::Decl const*))> llvm::PointerUnion<llvm::PointerUnion<clang::Decl*, void const*>, clang::LazyGenerationalUpdatePtr<clang::Decl const*, clang::Decl*, &(clang::ExternalASTSource::CompleteRedeclChain(clang::Decl const*))> >::get<clang::LazyGenerationalUpdatePtr<clang::Decl const*, clang::Decl*, &(clang::ExternalASTSource::CompleteRedeclChain(clang::Decl const*))> >() const /home/modocache/Source/llvm/git/dev/llvm/build/../include/llvm/ADT/PointerUnion.h:135:5
#11 0x000055a38986268d clang::Redeclarable<clang::NamespaceDecl>::DeclLink::markIncomplete() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/include/clang/AST/Redeclarable.h:156:50
#12 0x000055a38985907a void clang::ASTDeclReader::markIncompleteDeclChainImpl<clang::NamespaceDecl>(clang::Redeclarable<clang::NamespaceDecl>*) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Serialization/ASTReaderDecl.cpp:3651:1
#13 0x000055a38983e243 clang::ASTReader::markIncompleteDeclChain(clang::Decl*) /home/modocache/Source/llvm/git/dev/llvm/build/tools/clang/include/clang/AST/DeclNodes.inc:105:1
#14 0x000055a389795c25 clang::ASTReader::finishPendingActions() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Serialization/ASTReader.cpp:9309:5
#15 0x000055a3897a27df clang::ASTReader::FinishedDeserializing() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Serialization/ASTReader.cpp:11563:5
#16 0x000055a3897ad297 clang::ExternalASTSource::Deserializing::~Deserializing() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/include/clang/AST/ExternalASTSource.h:90:5
#17 0x000055a38978d994 clang::ASTReader::FindExternalVisibleDeclsByName(clang::DeclContext const*, clang::DeclarationName) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Serialization/ASTReader.cpp:7561:35
#18 0x000055a38c4c0fa8 clang::DeclContext::lookup(clang::DeclarationName) const (.localalias.2) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/AST/DeclBase.cpp:1635:60
#19 0x000055a38bb9dd35 LookupDirect(clang::Sema&, clang::LookupResult&, clang::DeclContext const*) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Sema/SemaLookup.cpp:845:23
#20 0x000055a38bba142f clang::Sema::LookupQualifiedName(clang::LookupResult&, clang::DeclContext*, bool) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Sema/SemaLookup.cpp:2055:3
#21 0x000055a38b86d705 clang::Sema::ActOnStartNamespaceDef(clang::Scope*, clang::SourceLocation, clang::SourceLocation, clang::SourceLocation, clang::IdentifierInfo*,clang::SourceLocation, clang::ParsedAttributesView const&, clang::UsingDirectiveDecl*&) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Sema/SemaDeclCXX.cpp:8915:25
#22 0x000055a38b41cbae clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Parse/ParseDeclCXX.cpp:215:53
#23 0x000055a38b3fe8a5 clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Parse/ParseDecl.cpp:1714:43
#24 0x000055a38b3e11e6 clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) (.localalias.1) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Parse/Parser.cpp:856:77
#25 0x000055a38b3e0861 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Parse/Parser.cpp:674:42
#26 0x000055a38b3dc6dc clang::ParseAST(clang::Sema&, bool, bool) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Parse/ParseAST.cpp:158:37
#27 0x000055a3895f86ef clang::ASTFrontendAction::ExecuteAction() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Frontend/FrontendAction.cpp:1035:11
#28 0x000055a389e277f9 clang::CodeGenAction::ExecuteAction() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/CodeGen/CodeGenAction.cpp:1057:1
#29 0x000055a3895f80de clang::FrontendAction::Execute() /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Frontend/FrontendAction.cpp:938:38
#30 0x000055a3895966d6 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/Frontend/CompilerInstance.cpp:945:24
#31 0x000055a38974e262 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:271:38
#32 0x000055a387732e03 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/tools/driver/cc1_main.cpp:225:40
#33 0x000055a38772871d ExecuteCC1Tool(llvm::ArrayRef<char const*>, llvm::StringRef) /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/tools/driver/driver.cpp:309:64
#34 0x000055a387728dfb main /home/modocache/Source/llvm/git/dev/llvm/build/../tools/clang/tools/driver/driver.cpp:381:26
#35 0x00007f909a22c09b __libc_start_main /build/glibc-B9XfQf/glibc-2.28/csu/../csu/libc-start.c:342:3
#36 0x000055a387726eea _start (/home/modocache/Source/llvm/git/dev/llvm/build/bin/clang-9+0x2070eea)
fish: “"/home/modocache/Source/llvm/gi…” terminated by signal SIGABRT (Abort)

I'm looking into how to avoid this assertion, but for the time being I thought I'd post an update to let everyone know this patch isn't abandoned :)

The problem may well be that you're recursively triggering deserialization of the std namespace from its own deserialization.

@@ -3401,6 +3402,22 @@ ASTDeclReader::FindExistingResult ASTDeclReader::findExisting(NamedDecl *D) {
           return FindExistingResult(Reader, D, Existing, AnonymousDeclNumber,
                                     TypedefNameForLinkage);
     }
+    if (isa<NamespaceDecl>(D) && D->getName() == "std") {
+      llvm::outs() << "Found std namespace: ";
+      D->dump();
+      llvm::outs() << "Merging into Sema std namespace: ";
+      Sema *S = Reader.getSema();
+      S->getStdNamespace()->dump();

You shouldn't call getStdNamespace() here, because that will trigger deserialization. You should query the LazyDeclPtr directly to see if there's an already-deserialized std namespace.

+      NamedDecl *Existing =
+          getDeclForMerging(S->getStdNamespace(), TypedefNameForLinkage);

(No need to call this, std isn't an anonymous struct.)

+      llvm::outs() << "Found Existing: ";
+      Existing->dump();
+      if (isSameEntity(Existing, D)) {
+        llvm::outs() << "isSameEntity is true\n";
+        return FindExistingResult(Reader, D, Existing, AnonymousDeclNumber,
+                                  TypedefNameForLinkage);
+      }
+    }
   } else {
     // Not in a mergeable context.
     return FindExistingResult(Reader);

You should also change FindExistingResult::~FindExistingResult to update Sema::StdNamespace to point to your newly-deserialized namespace if you didn't find a prior declaration of it, so that Sema::getStdNamespace() finds the deserialized namespace.

modocache updated this revision to Diff 199708.May 15 2019, 6:17 PM

Thanks for the help, @rsmith! Your suggestions were spot-on. (It took me a little while to figure out why, even using the LazyDeclPtr directly, I was still triggering deserialization. It turns out dump() causes deserialization too -- whoops!)

You should also change FindExistingResult::~FindExistingResult to update Sema::StdNamespace to point to your newly-deserialized namespace if you didn't find a prior declaration of it, so that Sema::getStdNamespace() finds the deserialized namespace.

I haven't done this yet. I'm trying to think of a test case that would fail if this were not done properly -- or would there not be one?

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2019, 6:17 PM
modocache updated this revision to Diff 199709.May 15 2019, 6:18 PM

Oops, sent the patch from the wrong repository.

modocache edited the summary of this revision. (Show Details)May 15 2019, 6:21 PM
modocache removed a project: Restricted Project.
modocache removed a subscriber: llvm-commits.

Thanks for the help, @rsmith! Your suggestions were spot-on. (It took me a little while to figure out why, even using the LazyDeclPtr directly, I was still triggering deserialization. It turns out dump() causes deserialization too -- whoops!)

You should also change FindExistingResult::~FindExistingResult to update Sema::StdNamespace to point to your newly-deserialized namespace if you didn't find a prior declaration of it, so that Sema::getStdNamespace() finds the deserialized namespace.

I haven't done this yet. I'm trying to think of a test case that would fail if this were not done properly -- or would there not be one?

I think the failure case is a little complex to set up. I think you need:

  • two modules each of which contains an invisible (implicitly-declared) std namespace
  • arrange for Sema::StdNamespace to point to the ID of one of them
  • trigger the import of the other one
  • perform a name lookup for std to erase the placeholder std lookup result from the translation unit's lookup table
  • trigger the import of the first std (eg, by triggering the declaration of operator new)

At that point, there is nowhere for the newly-imported std namespace to find the old one: it's not in the translation unit's name lookup table, and it's not in Sema::StdNamespace, so we would presumably end up with two std namespaces not connected to one another.

modocache updated this revision to Diff 200518.May 21 2019, 8:19 AM

Hmm... alright, I'm not really sure how I could implement a test that fails without this, but I added a check in the FindExistingResult destructor.

@rsmith, what do you think of the patch as-is?

modocache updated this revision to Diff 227808.Nov 4 2019, 7:18 PM

Rebasing onto the monorepo. @rsmith, I confirmed the test cases that this diff adds still fail on trunk, and that the Clang source changes made in this diff fix the test case failures. Tomorrow I plan on poking around to see if I can reproduce similar issues in the C++20 modules implementation. But in the meantime, how do you feel about this patch? You suggested a change to ~FindExistingResult but also that it'd be difficult to test/verify such a change. Is that change still something you're looking for before accepting this diff?

urnathan added inline comments.
clang/lib/Serialization/ASTReaderDecl.cpp
3198–3199 ↗(On Diff #227808)

Don't we also have to check New's context is the global namespace? What happens for something like 'namespace evil::std {};'?

ayermolo added a subscriber: ayermolo.EditedDec 13 2021, 1:33 PM

@modocache Just checking is this diff abandoned?