This is an archive of the discontinued LLVM Phabricator instance.

[ASTImporter] Add ability to convert CXXRecordDecls to RecordDecls when importing to C language TUs
Changes PlannedPublic

Authored by teemperor on Mar 25 2021, 8:43 AM.

Details

Summary

Right now when using the ASTImporter to bring struct declarations from a C++ AST to a C AST, the
resulting imported Decl will just be a CXXRecordDecl even though in a C AST we only have RecordDecls.
When the 'to' AST already has a RecordDecl with the same name we currently also fail to merge the
CXXRecordDecl from the 'from' AST into it and instead just create a duplicated CXXRecordDecl.

Our use case for this is actually for Swift where the ASTImporter is used in a lot of rather adventurous
way to move things around clang::ASTContext instances. In LLDB we are (at the moment) always in
an AST that supports C++ so this won't have any effect on LLDB itself.

This patch introduces basic conversion functionality that converts a CXXRecordDecl that is just expressing
a valid C struct into a RecordDecl. If the target AST already has a equivalent RecordDecl this will now
also merge the 'from' record into that one.

There are a bunch of things that this patch doesn't try to fix:

  • Unsupported C++ nodes such as user-defined constructors are still imported when explicitly requested by the user.
  • C to C++ conversion for structs (i.e., adding the missing pieces to a RecordDecl and turn it into a CXXRecordDecl).

I'll try to address those in follow up patches. I also have some plans to at least implement the error handling for
importing from Objective-C to non-Objective-C contexts. The same goes for importing nodes that are exclusive
to one C/C++ standard to an AST with (usually earlier) standard where the node can't exist.

Diff Detail

Event Timeline

teemperor created this revision.Mar 25 2021, 8:43 AM
teemperor requested review of this revision.Mar 25 2021, 8:43 AM

This patch is also intended to get early feedback on how we should approach this. We could also discuss *if* we want to really support this whole cross-language
translation in the ASTImporter, but so far it hasn't been explicitly forbidden so people already started using it this way.

I think supporting the most common scenario of translating Record/CXXRecordDecl and making the rest an importer error doesn't seem so complicated
and that seems like reasonable behaviour for the ASTImporter.

(Also I plan to test all this x-language importing in its own test file, hence why the tests are in ASTImporterCrossLanguageTest.cpp).

teemperor added inline comments.Mar 25 2021, 8:51 AM
clang/unittests/AST/ASTImporterCrossLanguageTest.cpp
118

Do we actually have a test utility for checking *what* error actually happened? I could just implement the usual llvm::Error handling here otherwise.

I think it would be better to have a fully new module for converting AST between language versions. It would do only conversion, not AST merge. The ASTImporter is already too large and complicated (should be split into files) code, with this conversion logic embedded into other import functions and with multiple language versions (in the future) it would be really hard to debug (number of special cases at import increases and is already big).

teemperor planned changes to this revision.Mar 29 2021, 1:48 AM

So something like an ASTConverter that just sits on top of the ASTImporter? In theory that class could implement the ImportImpl callback and do the custom imports it needs to do on its own, so I think that would work. I'm putting this in "planned changes" for now until I get around to make an ASTConverter patch..

I did not mean to reuse ASTImporter, that is a possibility too. Probably it can be more easy to handle AST conversion only without importing, specially if during the conversion no kind of error can occur, so no error handling or ODR search are needed. The conversion can happen before doing the AST import (with converted AST).

I do not know how this feature will be used, is it not possible to simply set C++ mode for every used AST?

Hi Raphael,

Thanks for addressing this problem! In CTU, currently we simply do not allow to import anything from another TU if the two languages differ. What's more, we do this even if the languages are the same but the dialects are different (e.g. C++98 vs C++11). We might change this by the help of this new functionality and it could widen the usability of CTU analysis as well! So, I am excited to see this new feature evolving. On the other hand, I agree with @balazske that we should separate this functionality into it's own class without further complicating the existing ASTImporter, if possible.

So something like an ASTConverter that just sits on top of the ASTImporter? In theory that class could implement the ImportImpl callback and do the custom imports it needs to do on its own

I am seeing some difficulties with this approach b/c this would require to replace an existing node in the AST with a new one in some cases. And that might not be always possible without a full retraverse of the AST. Perhaps we could come up with some simplified visitation or could reuse some data from ASTImporter::ImportedDecls.

I do not know how this feature will be used, is it not possible to simply set C++ mode for every used AST?

AFAIK, the parser creates the AST node according to the set language/dialect. So, it is not possible to convert a RecordDecl into a CXXRecordDecl so simply.

I did not mean to reuse ASTImporter, that is a possibility too. Probably it can be more easy to handle AST conversion only without importing, specially if during the conversion no kind of error can occur, so no error handling or ODR search are needed. The conversion can happen before doing the AST import (with converted AST).
I do not know how this feature will be used, is it not possible to simply set C++ mode for every used AST?

Some downstream projects are using the ASTImporter to move types between ASTS (this is outside the normal LLDB architecture we have on top of the ASTImporter. It's more of a Clang + a bit of LLDB mix). The expectation there seems to be that the ASTImporter just handles the language conversion and that basic POD types can be copied from/to every AST. I believe the AST in Question has a 'To' AST that is Objective-C and the 'From' AST is Objective-C++. Both ASTs contain some basic shared struct definitions that need to be merged/imported. I don't think we can just parse everything as (Objective-)C++ as C++ introduces new keywords that might already be used as a normal identifier in the Objective-C code.

Also just to be clear: We are moving nodes from one AST to another AST with different LangOpts. So I don't think there is a way around using the ASTImporter (or reimplementing equivalent logic). Converting nodes in the same AST doesn't seem very useful, so the use case is similar to what the ASTImporter is doing + some extra work for converting the few Clang nodes that can be converted.

AFAIK, the parser creates the AST node according to the set language/dialect. So, it is not possible to convert a RecordDecl into a CXXRecordDecl so simply.

I assume @balazske meant setting the *parser* to C++ (at least that's what I assume in the answer above).

My idea was that if we have an already existing C AST then the mode of it could be changed to C++, without changing other nodes. Then it is possible to import C++ nodes.
For ASTConverter, I think about something similar to ASTImporter, but the To AST is always empty at start and one correct From AST can be imported into it. This "import" performs the conversion and build of AST nodes. This converted AST can then be processed with normal ASTImporter.