This is an archive of the discontinued LLVM Phabricator instance.

[ASTImporter] Add an ImportImpl method to allow customizing Import behavior.
ClosedPublic

Authored by teemperor on Mar 18 2019, 5:41 AM.

Details

Summary

We are currently implementing support in LLDB that reconstructs the STL templates from
the target program in the expression evaluator. This reconstruction happens during the
import process from our debug info AST into the expression evaluation AST, which means
we need a way to intercept the ASTImporter import process.

This patch adds an protected ImportImpl method that we can overwrite in LLDB to implement
our special importing logic (which is essentially just looking into a C++ module that is attached to
the target context). Because ImportImpl has to call MapImported/AddToLookup for the decls it
creates, this patch also exposes those via a new unified method and checks that we call it when
importing decls.

Diff Detail

Event Timeline

teemperor created this revision.Mar 18 2019, 5:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 18 2019, 5:41 AM

I do not know if "Shim" is the correct name for this, maybe "Strategy" is better?

The replace of Import calls may work if the internal state of the ASTImporter is updated correctly from the 'shim' object (especially if not every Decl is handled by the shim). For this to work some of the state of ASTImporter that was intended to be internal must be made public, for example map of imported decls and lookup table. The shim may corrupt the ASTImporter state for example when the lookup table is not updated correctly. (Probably for the LLDB special use cases this is not a problem?)

teemperor updated this revision to Diff 191257.Mar 19 2019, 1:11 AM
teemperor retitled this revision from [ASTImporter] Allow adding a shim to the ASTImporter to [ASTImporter] Allow adding a import strategy to the ASTImporter.
teemperor edited the summary of this revision. (Show Details)

Thanks for the quick review! I changed the code to also update the internal ASTImporter state like with a normal decl.

Changes:

  • Renamed from Shim -> Strategy
  • Now using the ImportedDecls map and also updating the internal ASTImporter state.

It looks better now.
One "problem" is that now there is the strategy and there is the Imported function. It would be possible to unify these into a strategy that contains maybe a "import" and a "post-import" callback. (But this requires big changes in the ASTImporter user code.)

clang/include/clang/AST/ASTImporter.h
140

'Shim' should be replaced with 'strategy' (and the nullptr means "no user-specified strategy" or "default strategy").

teemperor planned changes to this revision.Mar 19 2019, 2:49 AM

I think we could also refactor the strategy into a PreImport call. That way we don't break all the user-code out there and it seems less intrusive to the ASTImporter code. I'll update the patch.

Another observation: The Import function of the strategy now has no way to return an error. An even better version of it would be to include the possibility of import error (with ImportError, or other error type). Or the "PreImport" function could indicate if the Decl is handled by the strategy, and Import is called only if yes and Import can return an Expected<Decl *>. (The PostImport is called for every successfully imported Decl.)

By the way if the Import of the strategy uses recursive import of other things this can cause same problems as it was in ASTImporter before the GetImportedOrCreateDecl was introduced. So this should be avoided or something similar to GetImportedOrCreateDecl must be performed, maybe that function can be made public somehow.

But if this strategy is used only for special cases like in the test case (replace a specific Decl with something other) probably implementing this functionality in another way is a more simple solution.

We can't reliable import templates with the ASTImporter, so actually reconstructing the template in the debug info AST and then importing it doesn't work.

Could you please elaborate how the import of templates fails in ASTImporter? Is it because the AST you build from the DWARF is incomplete? Or is there a lookup problem? Is there an assertion in one of the CodeGen functions once the import is finished?
If we could fix that error in the ASTImporter then other clients of it (like CTU) could benefit from the fix too. Perhaps, the best would be to have a test which reproduces the template import related errors, maybe a lit test with clang-import-test?

If the crux of the problem is because of the incomplete AST from DWARF then probably we cannot fix that here in clang::ASTImporter. However, if that is not the case then I think we should try to fix the error here instead of duplicating lookup and import logic in LLDB. (As for the lookup problems, quite recently I just have discovered, that lookup during expression evaluation in LLDB does not work properly, I'll communicate this in a separate email/patch.)

It would also be much slower.

Could you please explain why it would be slower?

teemperor updated this revision to Diff 191765.Mar 21 2019, 1:14 PM
teemperor retitled this revision from [ASTImporter] Allow adding a import strategy to the ASTImporter to [ASTImporter] Add an ImportInternal method to allow customizing Import behavior..
teemperor edited the summary of this revision. (Show Details)

So I replaced the whole Strategy/Shim thing with a simple ImportInternal method that we can overwrite and is also no longer public. That also gets rid of all the boilerplate code from the first patches and also makes the follow-up LLDB patch nicer.

@martong It's not related to lookup or anything like that, it's just that we don't have enough information in our debug info AST to allow people to use meta programming. The custom logic we have in LLDB looks into the std C++ module to fill out these gaps in the AST by hand when they are used in an expression.

The remark about the alternative being slow just means that fixing all the templates we have in our debug info AST seems like a costly approach. I'm also not sure how well it would work, as I never tried mixing a C++ module in an ASTContext that isn't currently being parsed by Clang.

balazske added inline comments.Mar 22 2019, 1:43 AM
clang/lib/AST/ASTImporter.cpp
7738

Comment can be better: Import it using ASTNodeImporter.

clang/unittests/AST/ASTImporterTest.cpp
581

Is this using needed?

583

Is it possible to make this a static variable instead of function?

588

The static_cast should not be needed.

martong added a comment.EditedMar 22 2019, 3:51 AM

@martong It's not related to lookup or anything like that, it's just that we don't have enough information in our debug info AST to allow people to use meta programming. The custom logic we have in LLDB looks into the std C++ module to fill out these gaps in the AST by hand when they are used in an expression.

The remark about the alternative being slow just means that fixing all the templates we have in our debug info AST seems like a costly approach. I'm also not sure how well it would work, as I never tried mixing a C++ module in an ASTContext that isn't currently being parsed by Clang.

Well, I still don't understand how LLDB synthesis the AST.
So you have a C++ module in a .pcm file. This means you could create an AST from that with ASTReader from it's .clang_ast section, right? In that case the AST should be complete with all type information. If this was the case then I don't see why it is not possible to use clang::ASTImporter to import templates and specializations, since we do exactly that in CTU.

Or do you guys create the AST from the debug info, from the .debug_info section of .pcm module file? And this is why AST is incomplete? I've got this from https://youtu.be/EgkZ8PTNSHQ?list=PL85Cf-ok7HpppFWlp2wX_-A1dkyFVlaEB&t=1565
If this is the case, then comes my naiive quiestion: Why don't you use the ASTReader?

teemperor updated this revision to Diff 191859.Mar 22 2019, 4:47 AM
teemperor marked 4 inline comments as done.
  • Addressed feedback.

Well, I still don't understand how LLDB synthesis the AST.
So you have a C++ module in a .pcm file. This means you could create an AST from that with ASTReader from it's .clang_ast section, right? In that case the AST should be complete with all type information. If this was the case then I don't see why it is not possible to use clang::ASTImporter to import templates and specializations, since we do exactly that in CTU.

Or do you guys create the AST from the debug info, from the .debug_info section of .pcm module file? And this is why AST is incomplete? I've got this from https://youtu.be/EgkZ8PTNSHQ?list=PL85Cf-ok7HpppFWlp2wX_-A1dkyFVlaEB&t=1565
If this is the case, then comes my naiive quiestion: Why don't you use the ASTReader?

There are no C++ modules involved in our use case beside the generic std module. No user-written code is read from a module and we don't have any PCM file beside the std module we build for the expression evaluator.

We use the ASTReader to directly read the std module contents into the expression evaluation ASTContext, but this doesn't give us the decls for the instantiations the user has made (e.g. std::vector<Foo>). We only have these user instantiations in the 'normal' debug info where we have a rather minimal AST (again, no modules are not involved in building this debug info AST). The InternalImport in the LLDB patch just stitches the module AST and the debug info AST together when we import something that we also have (in better quality from the C++ module) in the target ASTContext.

clang/unittests/AST/ASTImporterTest.cpp
581

Yeah, otherwise we have to provide our own constructor that essentially just forwards to the ASTImporter constructor.

588

Thanks!

balazske added inline comments.Mar 22 2019, 8:24 AM
clang/unittests/AST/ASTImporterTest.cpp
348

This function is not needed?

482

This is unused, can be removed.

583

Another small thing: The comment above is now not updated ("Returns").

Well, I still don't understand how LLDB synthesis the AST.
So you have a C++ module in a .pcm file. This means you could create an AST from that with ASTReader from it's .clang_ast section, right? In that case the AST should be complete with all type information. If this was the case then I don't see why it is not possible to use clang::ASTImporter to import templates and specializations, since we do exactly that in CTU.

Or do you guys create the AST from the debug info, from the .debug_info section of .pcm module file? And this is why AST is incomplete? I've got this from https://youtu.be/EgkZ8PTNSHQ?list=PL85Cf-ok7HpppFWlp2wX_-A1dkyFVlaEB&t=1565
If this is the case, then comes my naiive quiestion: Why don't you use the ASTReader?

There are no C++ modules involved in our use case beside the generic std module. No user-written code is read from a module and we don't have any PCM file beside the std module we build for the expression evaluator.

We use the ASTReader to directly read the std module contents into the expression evaluation ASTContext, but this doesn't give us the decls for the instantiations the user has made (e.g. std::vector<Foo>). We only have these user instantiations in the 'normal' debug info where we have a rather minimal AST (again, no modules are not involved in building this debug info AST). The InternalImport in the LLDB patch just stitches the module AST and the debug info AST together when we import something that we also have (in better quality from the C++ module) in the target ASTContext.

Thank you for the explanation.
Now I understand you directly create specializations from the std module and intercept the import to avoid importing broken specializations from the debug info, this makes sense.

Well, I still don't understand how LLDB synthesis the AST.
So you have a C++ module in a .pcm file. This means you could create an AST from that with ASTReader from it's .clang_ast section, right? In that case the AST should be complete with all type information. If this was the case then I don't see why it is not possible to use clang::ASTImporter to import templates and specializations, since we do exactly that in CTU.

Or do you guys create the AST from the debug info, from the .debug_info section of .pcm module file? And this is why AST is incomplete? I've got this from https://youtu.be/EgkZ8PTNSHQ?list=PL85Cf-ok7HpppFWlp2wX_-A1dkyFVlaEB&t=1565
If this is the case, then comes my naiive quiestion: Why don't you use the ASTReader?

There are no C++ modules involved in our use case beside the generic std module. No user-written code is read from a module and we don't have any PCM file beside the std module we build for the expression evaluator.

We use the ASTReader to directly read the std module contents into the expression evaluation ASTContext, but this doesn't give us the decls for the instantiations the user has made (e.g. std::vector<Foo>). We only have these user instantiations in the 'normal' debug info where we have a rather minimal AST (again, no modules are not involved in building this debug info AST). The InternalImport in the LLDB patch just stitches the module AST and the debug info AST together when we import something that we also have (in better quality from the C++ module) in the target ASTContext.

Thank you for the explanation.
Now I understand you directly create specializations from the std module and intercept the import to avoid importing broken specializations from the debug info, this makes sense.

Really, just one last question to see the whole picture: If clang::ASTImporter were capable of handling a specialization/instantiation with missing info then we could use that. So the reason for this interception is that clang::ASTImporter::VisitClassTemplateSpecializationDecl cannot handle the specialization it receives because that or its dependent nodes has too many missing data, right?

teemperor updated this revision to Diff 191946.Mar 22 2019, 2:09 PM
teemperor marked 5 inline comments as done.
  • Removed unused functions.
  • Fixed two comments that I missed to update.

Hello Raphael,

Thank you for the explanation. I have +1 to Gabor's question to understand if this functionality can actually be added to the common ASTImporter.

clang/include/clang/AST/ASTImporter.h
149

I'd suggest to rename it to ImportImpl.

clang/unittests/AST/ASTImporterTest.cpp
588

I think it's better to merge these conditions: if (!ND || ND->getName() != "shouldNotBeImported")

590

LookupTable?

593

auto * (pointer sign is needed).

Well, I still don't understand how LLDB synthesis the AST.
So you have a C++ module in a .pcm file. This means you could create an AST from that with ASTReader from it's .clang_ast section, right? In that case the AST should be complete with all type information. If this was the case then I don't see why it is not possible to use clang::ASTImporter to import templates and specializations, since we do exactly that in CTU.

Or do you guys create the AST from the debug info, from the .debug_info section of .pcm module file? And this is why AST is incomplete? I've got this from https://youtu.be/EgkZ8PTNSHQ?list=PL85Cf-ok7HpppFWlp2wX_-A1dkyFVlaEB&t=1565
If this is the case, then comes my naiive quiestion: Why don't you use the ASTReader?

There are no C++ modules involved in our use case beside the generic std module. No user-written code is read from a module and we don't have any PCM file beside the std module we build for the expression evaluator.

We use the ASTReader to directly read the std module contents into the expression evaluation ASTContext, but this doesn't give us the decls for the instantiations the user has made (e.g. std::vector<Foo>). We only have these user instantiations in the 'normal' debug info where we have a rather minimal AST (again, no modules are not involved in building this debug info AST). The InternalImport in the LLDB patch just stitches the module AST and the debug info AST together when we import something that we also have (in better quality from the C++ module) in the target ASTContext.

Thank you for the explanation.
Now I understand you directly create specializations from the std module and intercept the import to avoid importing broken specializations from the debug info, this makes sense.

Really, just one last question to see the whole picture: If clang::ASTImporter were capable of handling a specialization/instantiation with missing info then we could use that. So the reason for this interception is that clang::ASTImporter::VisitClassTemplateSpecializationDecl cannot handle the specialization it receives because that or its dependent nodes has too many missing data, right?

Feel free to ask! I'm just traveling so my internet access (and my reply rate) is a bit low at the moment.

Not sure if we can easily merge the logic of D59537 into the ASTImporter. It has no logic at all to actually determine if a source AST node has missing data but solely relies on our knowledge that our AST in LLDB isn't very useful for std templates. Also instantiating a template instead of importing an existing instantiation seems like a very rare scenario. I'm not even sure how we would test having a broken AST with Clang (the parser won't produce one, so we would have to hack our own AST builder for broken nodes).

If there is a reasonable way to get this logic into the ASTImporter I have no objection against doing so. The only problem I see is that I can't come up with a reasonable way of merging/testing the logic in the ASTImporter (and what other application would benefit from it).

Well, I still don't understand how LLDB synthesis the AST.
So you have a C++ module in a .pcm file. This means you could create an AST from that with ASTReader from it's .clang_ast section, right? In that case the AST should be complete with all type information. If this was the case then I don't see why it is not possible to use clang::ASTImporter to import templates and specializations, since we do exactly that in CTU.

Or do you guys create the AST from the debug info, from the .debug_info section of .pcm module file? And this is why AST is incomplete? I've got this from https://youtu.be/EgkZ8PTNSHQ?list=PL85Cf-ok7HpppFWlp2wX_-A1dkyFVlaEB&t=1565
If this is the case, then comes my naiive quiestion: Why don't you use the ASTReader?

There are no C++ modules involved in our use case beside the generic std module. No user-written code is read from a module and we don't have any PCM file beside the std module we build for the expression evaluator.

We use the ASTReader to directly read the std module contents into the expression evaluation ASTContext, but this doesn't give us the decls for the instantiations the user has made (e.g. std::vector<Foo>). We only have these user instantiations in the 'normal' debug info where we have a rather minimal AST (again, no modules are not involved in building this debug info AST). The InternalImport in the LLDB patch just stitches the module AST and the debug info AST together when we import something that we also have (in better quality from the C++ module) in the target ASTContext.

Thank you for the explanation.
Now I understand you directly create specializations from the std module and intercept the import to avoid importing broken specializations from the debug info, this makes sense.

Really, just one last question to see the whole picture: If clang::ASTImporter were capable of handling a specialization/instantiation with missing info then we could use that. So the reason for this interception is that clang::ASTImporter::VisitClassTemplateSpecializationDecl cannot handle the specialization it receives because that or its dependent nodes has too many missing data, right?

Feel free to ask! I'm just traveling so my internet access (and my reply rate) is a bit low at the moment.

Not sure if we can easily merge the logic of D59537 into the ASTImporter. It has no logic at all to actually determine if a source AST node has missing data but solely relies on our knowledge that our AST in LLDB isn't very useful for std templates. Also instantiating a template instead of importing an existing instantiation seems like a very rare scenario. I'm not even sure how we would test having a broken AST with Clang (the parser won't produce one, so we would have to hack our own AST builder for broken nodes).

If there is a reasonable way to get this logic into the ASTImporter I have no objection against doing so. The only problem I see is that I can't come up with a reasonable way of merging/testing the logic in the ASTImporter (and what other application would benefit from it).

Raphael, thank you for your answer. This explanation makes so much more easier for me to understand the need for this patch.

teemperor updated this revision to Diff 192588.Mar 28 2019, 2:43 AM
teemperor marked an inline comment as done.
  • Addressed (most of) Aleksei's comments.
clang/unittests/AST/ASTImporterTest.cpp
590

Not sure if I can follow?

martong added inline comments.Mar 28 2019, 2:51 AM
clang/unittests/AST/ASTImporterTest.cpp
590

I think Alexey's comment relates to the parameter in ASTImporterOptionSpecificTestBase::ImporterConstructor. There is a typo in the name of the parameter an 'e' is missing: ASTImporterLookupTable *LookupTabl

teemperor updated this revision to Diff 192594.Mar 28 2019, 2:58 AM

Ah, I get it now. Fixed!

a_sidorin accepted this revision.Mar 28 2019, 4:31 PM

Hello Raphael,
I think we should accept this change. I don't see an easy way to merge the LLDB stuff into ASTImporter; also, this patch provides a good extension point for ASTImporter since it is designed to be a parent class. @martong @shafik Gabor, Shafik, what do you think?

This revision is now accepted and ready to land.Mar 28 2019, 4:31 PM
martong accepted this revision.Mar 29 2019, 1:29 AM

Hello Raphael,
I think we should accept this change. I don't see an easy way to merge the LLDB stuff into ASTImporter; also, this patch provides a good extension point for ASTImporter since it is designed to be a parent class. @martong @shafik Gabor, Shafik, what do you think?

Yes, I agree.

teemperor updated this revision to Diff 196955.Apr 27 2019, 2:29 AM
teemperor retitled this revision from [ASTImporter] Add an ImportInternal method to allow customizing Import behavior. to [ASTImporter] Add an ImportImpl method to allow customizing Import behavior..
teemperor edited the summary of this revision. (Show Details)
  • Added RegisterImportedDecl call as suggested in D59537
  • Added an assert that checks that we register any decl that an overwritten ImportInternal would create.
  • Fixed the tests that they no longer trigger that new assert.
martong accepted this revision.Apr 29 2019, 4:59 AM

Looks good, thanks!

martong added inline comments.Apr 29 2019, 6:08 AM
clang/unittests/AST/ASTImporterTest.cpp
617

I just realized that maybe we could simplify and make this part more elegant:
If ASTImporterTestBase had a data member with the type ImporterConstructor then we could overwrite that data member in the constructor of CustomImporter.
(The ASTImporter is created lazily when the first getImportedDecl is called, but by that time the creator is already set, so this will work.)
This way we could elide the last argument here (RedirectingImporter::Constructor) and in all subsequent TEST_P tests, it would be enough to set up the creator per test fixture.

  • Refactored test according to Gábor's feedback.

I'll land this as it seems only the tests are subject to change and I want to also land the dependencies of this patch. Please let me know if you want any other changes changes to the test and thanks for the review!

This revision was automatically updated to reflect the committed changes.