Page MenuHomePhabricator

Modifying ImportDeclContext(...) to ensure that we also handle the case when the FieldDecl is an ArrayType whose ElementType is a RecordDecl
ClosedPublic

Authored by shafik on Aug 26 2020, 1:53 PM.

Details

Summary

When we fixed ImportDeclContext(...) in D71378 to make sure we complete each FieldDecl of a RecordDecl when we are importing the definition we missed the case where a FeildDecl was an ArrayType whose ElementType is a record.

This fix was motivated by a codegen crash during LLDB expression parsing. Since we were not importing the definition we were crashing during layout which required all the records be defined.

Diff Detail

Event Timeline

shafik created this revision.Aug 26 2020, 1:53 PM
shafik requested review of this revision.Aug 26 2020, 1:53 PM
labath added a subscriber: labath.Aug 27 2020, 12:33 AM
labath added inline comments.
clang/lib/AST/ASTImporter.cpp
1754–1758

What about 2- or n-dimensional arrays?

lldb/test/API/commands/expression/codegen-crash-import-def-arraytype-element/main.cpp
23–28

What's the significance of this union?

martong added inline comments.Aug 27 2020, 8:08 AM
clang/lib/AST/ASTImporter.cpp
1736

ImportDefinition(...) here refers to ASTImporter::ImportDefinition(Decl*) and not any of the ASTNodeImporter::ImportDefinition funcitons. This is a very important distinction, because the former is part of the public interface, while the latter are private implementation functions. Could you please fix this in the comment?

1754–1758

@labath, this is a very good question! And made me realize that D71378 is fundamentally flawed (@shafik, please take no offense). Let me explain.

So, with D71378, we added the if (ImportedOrErr) { ... } block to import definitions specifically of fields' Record types. But we forget to handle arrays. Now we may forget to handle multidimensional arrays ... and we may forget to handle other language constructs. So, we would finally end up in re-implementing the logic of ASTNodeImporter::VisitFieldDecl.

So all this should have been handled properly by the preceding import call of the FieldDecl! Here

1735: ExpectedDecl ImportedOrErr = import(From);

I have a suspicion that real reason why this import call fails in case of the public ASTImporter::ImportDefinition() is that we fail to drive through the import kind (IDK_Everything) during the import process.
Below we set IDK_Everything and start a complete import process.

8784   if (auto *ToRecord = dyn_cast<RecordDecl>(To)) {
8785     if (!ToRecord->getDefinition()) {
8786       return Importer.ImportDefinition(   // ASTNodeImporter::ImportDefinition !
8787           cast<RecordDecl>(FromDC), ToRecord,
8788           ASTNodeImporter::IDK_Everything);
8789     }
8790   }

However, there may be many places where we fail to channel through that we want to do a complete import. E.g here

1957           ImportDefinitionIfNeeded(Base1.getType()->getAsCXXRecordDecl()))

we do another definition import and IDK_Everything is not set. So we may have a wrong kind of import since the "minimal" flag is set.

The thing is, it is really confusing and error-prone to have both the ASTImporter::Minimal flag and the ImportDefinitionKind. They seem to be in contradiction to each other some times.
I think we should get rid of the Minimal flag. And Kind should be either a full completion (IDK_Everythink) or just a minimal (IDK_Basic). So, I'd scrap the IDK_Default too. Alternatively we could have a Kind member in ASTNodeImporter.
I think we should go into this direction to avoid similar problems during CodeGen in the future. WDYT?

shafik marked 2 inline comments as done.Aug 27 2020, 2:19 PM
shafik added inline comments.
clang/lib/AST/ASTImporter.cpp
1754–1758

No offense, you definitely understand the Importer better than I, so I value your input here. I don't believe that should have other fields where we could have a record that effects the layout but the concern is still valid and yes I did miss multi-dimensional arrays.

Your theory on IDK_Everything not be pushed through everywhere, I did a quick look and it seems pretty reasonable.

I agree that the ASTImporter::Minimal flag and the ImportDefinitionKind seem to inconsistent or perhaps a work-around. That seems like a bigger change with a lot more impact beyond this fix but worth looking into longer term.

If pushing through IDK_Everything everywhere fixes the underlying issue I am very happy to take that approach. If not we can discuss alternatives.

lldb/test/API/commands/expression/codegen-crash-import-def-arraytype-element/main.cpp
23–28

Not sure, I was not able to discern why this is important to reproduce the crash. I wanted feedback on the approach so I left to further investigation later on.

labath added inline comments.Aug 28 2020, 1:28 AM
clang/lib/AST/ASTImporter.cpp
1754–1758

I've been looking at this code, but I'm still not very familiar with it, so what I am asking may be obvious, but... What is the expected behavior for non-minimal imports for code like this?

struct A { ...};
struct B { A* a; }; // Note the pointer

Should importing B also import the definition of the A struct ? As I think that should not happen in the minimal import... If we get rid of the minimal flag, and rely solely on argument passing, we will need to be careful, as it shouldn't be passed _everywhere_ (it should stop at pointers for instance). But then it may not be possible to reproduce the current non-minimal import, as it (I think) expects that A will be fully imported too...

I don't believe that should have other fields where we could have a record that effects the layout

This isn't exactly layout related, but there is the question of covariant methods. If a method is covariant, then its return type must be complete. Currently we handle the completion of these in LLDB, but that solution is a bit hacky. It might be interesting if that could be handled by the ast importer as well.

lldb/test/API/commands/expression/codegen-crash-import-def-arraytype-element/main.cpp
23–28

Yeah, I was afraid of that. :) BTW, something I've been discussing with with @teemperor a few weeks ago might be interesting here: add something like the clangs -dump-ast flag to the expr command. Then we could test that the type A got properly imported even if we did not trigger the exact code path which causes the crash.

That said, I would expect that an assertion like "type X got imported" should be expressible in the ASTImporter unit tests, without any lldb involvement.

martong added inline comments.Aug 28 2020, 5:45 AM
clang/lib/AST/ASTImporter.cpp
1754–1758

What is the expected behavior for non-minimal imports for code like this?

In this case we import the definition of A fully, even though that seems unnecessary. This may be an existing technical debt in clang::ASTNodeImporter::VisitPointerType. Perhaps instead of a full import we could just do a minimal import for the PointeeType always. Similarly to references. But I am not sure about whether we would know during a subsequent import (where the full type is really needed) that the type had been minimally imported before.
Actually, we had a similar discussion with Jaroslav about this before:
http://lists.llvm.org/pipermail/lldb-dev/2019-November/015738.html

If we get rid of the minimal flag, and rely solely on argument passing, we will need to be careful as it shouldn't be passed _everywhere_ (it should stop at pointers for instance)

Yes, I agree. Currently, the problem is that even though the flag is set, we require complete imports via the public ImportDefinition function. And during this supposedly full import we sometimes make import decisions based on the flag, ouch. Perhaps we could keep the flag, but in case of the ASTImporter::ImportDefinition we should set that to false at entry and before returning we should set that back to true .(?)
Anyhow, I really think we shall have only one mechanism to decide whether we want a full or just a partial import, this should be either the flag or the Kind.

This isn't exactly layout related, but there is the question of covariant methods. If a method is covariant, then its return type must be complete.

We already import all the base classes of a derived class (at least in full import). But, we do not import all the derived classes during the import of the base. Is this what you do in LLDB to support covariant return types?

I'm going to respond to the rest of your (very insightful) comment later. So far, I'm just responding to this:

This isn't exactly layout related, but there is the question of covariant methods. If a method is covariant, then its return type must be complete.

We already import all the base classes of a derived class (at least in full import). But, we do not import all the derived classes during the import of the base. Is this what you do in LLDB to support covariant return types?

This situation is a bit tricky to explain as there are two independent class hierarchies that need to be considered. Let me start with an example:

struct B1;
struct B2;
struct A1 { B1 *f(); };
struct A2 { B2 *f(); };

This is a perfectly valid C++ snippet. In fact it could be extended to also implement A1::f and A2::f and call them and it would still not require the definitions for structs B1 and B2. And I believe the ast importer will would not import their definitions even if they were available. It certainly wouldn't force them to be defined (getExternalSource()->CompleteType(B1)).

This changes if the methods become virtual. In this case, this code becomes valid only if B2* is convertible to B1* (i.e., B2 is derived from B1). To know that, both B1 and B2 have to be complete. Regular clang parser will enforce that, and codegen will depend on it. However, the AST importer will not automatically import these classes. Lldb's code to do that is here https://github.com/llvm/llvm-project/blob/fdc6aea3fd822b639baaa5b666fdf7598d08c8de/lldb/source/Plugins/ExpressionParser/Clang/ClangASTImporter.cpp#L994.

I don't know whether something like this would be in scope for the default ast importer, but it seemed useful to mention in the context of the present discussion.

shafik marked an inline comment as done.Sep 3 2020, 1:41 PM
shafik added a subscriber: rsmith.

@martong I have been experimenting w/ how we pass around ImportDefinitionKind and pushing it through to more places does not help. I also tried defaulting most locations to IDK_Everything to experiment to see if maybe I was missing some crucial point and no luck. So while it seemed like a good direction it has not worked out, unless I missed something in your idea.

So my current approach may be the best we have until we can do some larger refactoring.

I had explored trying to fix this from the codegen and sema side of things but after discussing this w/ @rsmith there is no simple fix there that would not require some large refactoring on the LLDB side.

martong added a comment.EditedSep 4 2020, 3:26 AM

The key question is this: Why exactly does 1735: ExpectedDecl ImportedOrErr = import(From); fail to do the import properly when called from ASTImporter::ImportDefinition? Would it be possible to debug that from your LLDB test cases? Maybe starting with the simpler tests in D71378?

@martong I have been experimenting w/ how we pass around ImportDefinitionKind and pushing it through to more places does not help. I also tried defaulting most locations to IDK_Everything to experiment to see if maybe I was missing some crucial point and no luck. So while it seemed like a good direction it has not worked out, unless I missed something in your idea.

These are bad news, probably I am missing something. Still, we should not give up the investigation, or not at least in a long term. I really think we have to fix this sooner or later. What about the other approaches (trying to get rid of the minimal flag, trying to get rid of the kind, moving the minimal flag to the nodeimporter, and could be other ideas as well)?

So my current approach may be the best we have until we can do some larger refactoring.

Okay, I understand this fixes the immediate issue and we don't have any other solution yet without significant more efforts.

I had explored trying to fix this from the codegen and sema side of things but after discussing this w/ @rsmith there is no simple fix there that would not require some large refactoring on the LLDB side.

There must be a way to fix this in ASTImporter and/or in LLDB. We should be able to provide an AST that is meaningful for the codegen. But yeah, some refactoring might be needed.

We are going to move forward with this approach (after dealing with the multi-dimensional array case) temporarily. We are seeing crash bugs from this from users and we want to fix it while we explore the solution space more.

This PR has brought up a lot of great questions and I have spent some time looking various approaches to fixing this and while I believe I have a better understanding of the underlying problem there are some pieces I still need to understand better. Mainly how other ExternalSource implementations work. Do they ever end up in the situation where they have a record that is defined but not complete.

My impression at this point is that because expression parsing is not really generating well-formed translation units, we are relying on clang calling back into us to do name look-ups and then during that process LLDB and the ASTImporter needs to be clever enough to figure out when a record should be completed. Expression paring can end up in non-trivial cases where we have not seen the complete type yet (because we are being lazy) but we need it and we have not figured that out.

martong accepted this revision.Sep 17 2020, 12:10 AM

We are going to move forward with this approach (after dealing with the multi-dimensional array case) temporarily. We are seeing crash bugs from this from users and we want to fix it while we explore the solution space more.

Okay, I understand the business reasons and I am fine with this patch if this is really a temporary solution. Just please add a FIXME that indicates this and D71378 are temporary, plus that this should be really handled by the preceding import call.

This revision is now accepted and ready to land.Sep 17 2020, 12:10 AM
shafik updated this revision to Diff 292607.Sep 17 2020, 1:35 PM

Fix handling of multi-dimensional arrays.

martong accepted this revision.Sep 17 2020, 10:36 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 21 2020, 2:57 PM