Page MenuHomePhabricator

Set InvalidDecl directly when deserializing a Decl
ClosedPublic

Authored by aaronpuchert on Aug 19 2020, 4:55 AM.

Details

Summary

When parsing a C++17 binding declaration, we first create the
BindingDecls in Sema::ActOnDecompositionDeclarator, and then build the
DecompositionDecl in Sema::ActOnVariableDeclarator, so the contained
BindingDecls are never null. But when deserializing, we read the
DecompositionDecl with all properties before filling in the Bindings.
Among other things, reading a declaration reads whether it's invalid,
then calling setInvalidDecl which assumes that all bindings of the
DecompositionDecl are available, but that isn't the case.

Deserialization should just set all properties directly without invoking
subsequent functions, so we just set the flag without using the setter.

Fixes PR34960.

Diff Detail

Event Timeline

aaronpuchert created this revision.Aug 19 2020, 4:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2020, 4:55 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
aaronpuchert requested review of this revision.Aug 19 2020, 4:55 AM

Is my comment on the deserialization of BindingDecls in https://reviews.llvm.org/D85613?id=284364 related to this change?

Is my comment on the deserialization of BindingDecls in https://reviews.llvm.org/D85613?id=284364 related to this change?

Not sure. The FIXME comment on the code is correct, and it would be correct after this change. But notice that Decomp is also not set when constructing a BindingDecl from the parser: first we build the BindingDecls in Sema::ActOnDecompositionDeclarator (there Decomp remains unset), then build the DecompositionDecl from that in Sema::ActOnVariableDeclarator. The constructor of DecompositionDecl is then calling setDecomposedDecl on all bindings, so Decomp is set as soon as that's finished. But the BindingDecls exist without Decomp for a while.

The expression for the binding (Binding) will be deserialized when visiting the BindingDecl. This expression when non-null will always (as far as I can tell) contain a reference to the decomposition declaration so the decomposition will be deserialized which will set Decomp.

My impression is that Decomp is set by ASTDeclReader::VisitDecompositionDecl, after reading the individual BindingDecls, not from deserializing the Binding expression. Otherwise the BDs[I]->setDecomposedDecl(DD); above wouldn't be needed. But maybe I'm misunderstanding you.

Maybe you're saying it might be a problem if I read the BindingDecls first and then they can't reference the DecompositionDecl? I'm not sure how I can see the deserialized AST, when I use -ast-dump with -include-pch it doesn't show the AST of the included header. But the generated code looks fine for an example I tried out.

Is my comment on the deserialization of BindingDecls in https://reviews.llvm.org/D85613?id=284364 related to this change?

Not sure. The FIXME comment on the code is correct, and it would be correct after this change. But notice that Decomp is also not set when constructing a BindingDecl from the parser: first we build the BindingDecls in Sema::ActOnDecompositionDeclarator (there Decomp remains unset), then build the DecompositionDecl from that in Sema::ActOnVariableDeclarator. The constructor of DecompositionDecl is then calling setDecomposedDecl on all bindings, so Decomp is set as soon as that's finished. But the BindingDecls exist without Decomp for a while.

I agree, but I think that the BindingDecl should still be considered to be under construction until the corresponding DecompositionDecl is constructed. Any use of the BindingDecl in this interval has to be mindful that the BindingDecl is not fully constructed.

The expression for the binding (Binding) will be deserialized when visiting the BindingDecl. This expression when non-null will always (as far as I can tell) contain a reference to the decomposition declaration so the decomposition will be deserialized which will set Decomp.

My impression is that Decomp is set by ASTDeclReader::VisitDecompositionDecl, after reading the individual BindingDecls, not from deserializing the Binding expression. Otherwise the BDs[I]->setDecomposedDecl(DD); above wouldn't be needed. But maybe I'm misunderstanding you.

Maybe you're saying it might be a problem if I read the BindingDecls first and then they can't reference the DecompositionDecl? I'm not sure how I can see the deserialized AST, when I use -ast-dump with -include-pch it doesn't show the AST of the included header. But the generated code looks fine for an example I tried out.

My concern is:

Is it possible to see a deserialized BindingDecl when the corresponding DecompositionDecl has not been deserialized (which will as you say set BindingDecl::Decomp)? I have not been able to construct an example where this is the case but I am not at all confident that this is impossible.

(-ast-dump-all can be added to force deserialization)

rsmith added inline comments.Sep 2 2020, 11:46 AM
clang/lib/Serialization/ASTReaderDecl.cpp
585

The bug is here: we should not be calling Decl::setInvalidDecl here because it has invariants, and the Decl is incomplete at this point.

1500

BindingDecl might not be fully initialized here: if we enter deserialization to deserialize a BindingDecl, and then recurse into reading its binding expression, and then deserialize the DecompositionDecl, we can land here before we finish with the BindingDecl. If we called something that looked at the Binding expression on the BindingDecl, that'd be a problem.

But the general idea is that deserialization should never invoke AST functions with invariants (and generally should set state directly rather than using AST member functions in order to avoid accidentally calling a function with an invariant). So it shouldn't matter whether we deserialize the DecompositionDecl or the BindingDecls first.

1509

Hm, presumably BindingDecl::Decomp gets set here because readExpr always reads an expression that involves the DecompositionDecl, which calls setDecomposedDecl? That seems ... very subtle. If that's the intended way for this to work, we should at least add a comment for that (or better, an assert that Decomp got set), and BindingDecl::Decomp should be a regular pointer not a LazyDeclPtr. (But even then, is this chain of reasoning guaranteed to hold even for invalid declarations? Maybe we should be serializing the DecompositionDecl* and setting Decomp here?)

I'll go with what @rsmith proposed to fix the bug. If the entire deserialization process doesn't rely on invariants, the order should indeed be irrelevant.

I agree, but I think that the BindingDecl should still be considered to be under construction until the corresponding DecompositionDecl is constructed. Any use of the BindingDecl in this interval has to be mindful that the BindingDecl is not fully constructed.

That's probably advisable, they have a bit of a cyclic relationship.

Is it possible to see a deserialized BindingDecl when the corresponding DecompositionDecl has not been deserialized (which will as you say set BindingDecl::Decomp)? I have not been able to construct an example where this is the case but I am not at all confident that this is impossible.

Maybe it could be done with global variables: have an invalid global decomposition in the PCH, so the BindingDecls have no Binding expression, then reference one of those bindings in the source file. The expression should be empty, so readExpr would end up deserializing the DecompositionDecl.

But I don't see how it could work with local variables, because we're either reading the entire function or no part of it, right?

(-ast-dump-all can be added to force deserialization)

Thanks, that does it.

clang/lib/Serialization/ASTReaderDecl.cpp
585

Didn't notice that we're a friend and can access private members—but it makes sense. That change fixes the bug as well.

1500

Right, we could end up reading a binding that is already being read higher up in the stack. It would have been visited already, but we wouldn't have the expression yet.

1509

Or the BindingDecl is read via the DecompositionDecl in the first place, but otherwise I think that's right. An assertion works neither before nor after this change: if the DecompositionDecl is read first, or contains more than one binding, it's created before some BindingDecl. Then readExpr will presumably just build a DeclRefExpr to the still unfinished decomposition that we're building higher in the stack, so we can't assume the back reference to be set.

For invalid declarations, I think we should still have a correspondence between decomposition and bindings, but the expression is obviously nullptr in some cases. So maybe if we only load a BindingDecl (would have to be a global variable then, right?) and not the decomposition, Decomp might stay uninitialized, but I'm not sure where to check that.

If we want to deserialize the DecompositionDecl here, we'd probably need to store declarations both ways? I think we'd still want to find all bindings for a decomposition that we're reading.

Regarding the LazyDeclPtr I think you're right—I don't see us initializing this with an offset anywhere. So using a plain pointer should be trivially NFC.

Everything compiles with ValueDecl* Decomp instead of a LazyDeclPtr, but I'll leave it until we figure out whether we might actually need it.

Maybe we want to store the decomposition for a binding, just like we store the bindings for a decomposition. Although it's not clear if lazy-loading is actually helpful.

Fix as suggested by @rsmith instead: set InvalidDecl directly.

aaronpuchert retitled this revision from (De-)serialize BindingDecls before DecompositionDecl to Set InvalidDecl directly when deserializing a Decl.Sep 2 2020, 5:39 PM
aaronpuchert edited the summary of this revision. (Show Details)
aaronpuchert removed a reviewer: aaron.ballman.
rsmith accepted this revision.Sep 3 2020, 12:38 PM
This revision is now accepted and ready to land.Sep 3 2020, 12:38 PM
This revision was automatically updated to reflect the committed changes.
aaronpuchert marked an inline comment as done.