This is an archive of the discontinued LLVM Phabricator instance.

[PCH] Avoid early VarDecl emission attempt if no owning module avaiable
ClosedPublic

Authored by bruno on Feb 9 2017, 12:33 AM.

Details

Summary

This fixes PR31863, a regression introduced in r276159.

Consider this snippet:

struct FVector;
struct FVector {};
struct FBox {

FVector Min;
FBox(int);

};
namespace {
FBox InvalidBoundingBox(0);
}

While parsing the DECL_VAR for 'struct FBox', clang recursively read all the
dep decls until it finds the DECL_CXX_RECORD forward declaration for 'struct
FVector'. Then, it resumes all the way up back to DECL_VAR handling in
ReadDeclRecord, where it checks if isConsumerInterestedIn for the decl.

One of the condition for isConsumerInterestedIn to return false is if the
VarDecl is imported from a module D->getImportedOwningModule(), because it
will get emitted when we import the relevant module. However, before checking
if it comes from a module, clang checks if Ctx.DeclMustBeEmitted(D), which
triggers the emission of 'struct FBox'. Since one of its fields is still
incomplete, it crashes.

Instead, check if D->getImportedOwningModule() is true before calling
Ctx.DeclMustBeEmitted(D).

Diff Detail

Repository
rL LLVM

Event Timeline

bruno created this revision.Feb 9 2017, 12:33 AM

I'm not comfortable signing off on this, but it seems like this should be set up as a blocker for LLVM 4.0 if it isn't already.

lib/Serialization/ASTReaderDecl.cpp
2518–2523 ↗(On Diff #87774)

It's not locally obvious why the order matters. Can you add a comment explaining why you need to check getImportedOwningModule first? It might be worth splitting Ctx.DeclMustBeEmitted into its own; e.g.,

// An ImportDecl or VarDecl imported from a module will get emitted when
// we import the relevant module.
if ((isa<ImportDecl>(D) || isa<VarDecl>(D)) && D->getImportedOwningModule())
  // Only check DeclMustBeEmitted if D has been fully imported, since it may
  // emit D as a side effect.
  if (Ctx.DeclMustBeEmitted(D))
    return false;

but anything that prevents someone from swapping the conditions when they refactor this would be good enough I think.

rsmith edited edge metadata.Feb 16 2017, 2:12 PM

It seems to me that the problem here is that DeclMustBeEmitted is not safe to call in the middle of deserialization if anything partially-deserialized might be reachable from the declaration we're querying, and yet we're currently calling it in that case. I don't see how this patch actually addresses that problem, though: if you build this testcase as a module instead of a PCH, won't it still fail in the same way that it does now?

I think we probably need to track all the declarations we deserialize in a top-level deserialization, and only check whether they're interesting when we finish recursive deserialization (somewhere around ASTReader::FinishedDeserializing). For the update record case, this will need some care in order to retain the property that we only pass a declaration to the consumer once, but I think we can make that work by observing that it should generally be safe to call DeclMustBeEmitted on a declaration that we didn't create in this deserialization step, and when we're loading update records we know whether that's the case.

hans added a subscriber: hans.Feb 21 2017, 3:55 PM

This seems to be stuck. Bruno, Richard, do you think there's a chance this can be fixed for 4.0?

To unblock 4.0, I'm leaning towards merging Bruno's patch as a stop-gap fix.

I realize it probably only fixes the problem for PCH, and not modules. But PCH is used more widely than modules, so maybe it's good enough for now?

We've run out of time for 4.0, and it seems fixing this properly might be involved.

What do folks think?

It seems to me that the problem here is that DeclMustBeEmitted is not safe to call in the middle of deserialization if anything partially-deserialized might be reachable from the declaration we're querying, and yet we're currently calling it in that case.

DeclMustBeEmitted name seems misleading since we does more than just checking, it actually tries to emit stuff. If it's a local module, shouldn't be everything already available? Do we have non-deserialized items for a local module? Maybe I get the wrong idea of what local means in this context..

I don't see how this patch actually addresses that problem, though: if you build this testcase as a module instead of a PCH, won't it still fail in the same way that it does now?

D->getImportedOwningModule calls isFromASTFile, which refers to PCH and modules, shouldn't it work for both then? I couldn't come up with a testcase that triggered it for modules though.

I think we probably need to track all the declarations we deserialize in a top-level deserialization, and only check whether they're interesting when we finish recursive deserialization (somewhere around ASTReader::FinishedDeserializing). For the update record case, this will need some care in order to retain the property that we only pass a declaration to the consumer once, but I think we can make that work by observing that it should generally be safe to call DeclMustBeEmitted on a declaration that we didn't create in this deserialization step, and when we're loading update records we know whether that's the case.

Right. This is a more involved fix and I don't fully understand all r276159 consequences yet. OTOH, the current patch improves the situation and unblock some big projects to compile with clang ToT, for instance, Unreal Engine 4.

What do folks think?

IMO we should do it.

hans accepted this revision.Feb 28 2017, 3:45 PM

What do folks think?

IMO we should do it.

Go ahead and commit this, but keep the bug open so we can work on fixing it properly eventually.

This revision is now accepted and ready to land.Feb 28 2017, 3:45 PM
rsmith accepted this revision.Feb 28 2017, 10:34 PM

Yes, I'm OK with this as a stopgap fix for 4.0. I would have preferred a more full fix for 4.0 but we've run out of time for that, and the PCH case does seem more pressing.

It seems to me that the problem here is that DeclMustBeEmitted is not safe to call in the middle of deserialization if anything partially-deserialized might be reachable from the declaration we're querying, and yet we're currently calling it in that case.

DeclMustBeEmitted name seems misleading since we does more than just checking, it actually tries to emit stuff.

It doesn't emit anything itself. But like most AST interactions, it *can* trigger more declarations being imported lazily from an external AST source, which can in turn result in them being passed to the AST consumer. And that's why the serialization code generally has to be careful to not call unbounded operations on the AST, because it doesn't want to reenter itself. But in this case we're violating that principle.

If it's a local module, shouldn't be everything already available? Do we have non-deserialized items for a local module? Maybe I get the wrong idea of what local means in this context..

With this patch, we're calling DeclMustBeEmitted in the *non-local* module case (where there can be non-deserialized items).

I don't see how this patch actually addresses that problem, though: if you build this testcase as a module instead of a PCH, won't it still fail in the same way that it does now?

D->getImportedOwningModule calls isFromASTFile, which refers to PCH and modules, shouldn't it work for both then?

It only fixes the cases where getImportedOwningModule returns 0, which is the normal case for a PCH, but is rare for a declaration from a module.

I couldn't come up with a testcase that triggered it for modules though.

Something like this triggers it for me:

$ cat test/PCH/empty-def-fwd-struct.modulemap
module M { header "empty-def-fwd-struct.h" }
$ clang -cc1 -x c++ -std=c++14 -fmodules -emit-module -fmodule-name=M test/PCH/empty-def-fwd-struct.modulemap -o tmp.pcm
$ cat use.cpp
#include "test/PCH/empty-def-fwd-struct.h"
$ clang -cc1 -x c++ -std=c++14 -fmodules -emit-llvm -fmodule-file=tmp.pcm use.cpp -o -
clang: [...]/src/tools/clang/lib/AST/RecordLayoutBuilder.cpp:2933: const clang::ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const clang::RecordDecl *) const: Assertion `D && "Cannot get layout of forward declarations!"' failed.
test/PCH/empty-def-fwd-struct.h
2 ↗(On Diff #87774)

Since you don't care about what's in the produced LLVM IR, you can use -emit-llvm-only instead here.

This revision was automatically updated to reflect the committed changes.