This is an archive of the discontinued LLVM Phabricator instance.

[clang-repl] Implement partial translation units and error recovery.
ClosedPublic

Authored by v.g.vassilev on Jun 25 2021, 7:34 AM.

Details

Summary

https://reviews.llvm.org/D96033 contained a discussion regarding efficient modeling of error recovery. @rjmccall has outlined the key ideas:

Conceptually, we can split the translation unit into a sequence of partial translation units (PTUs). Every declaration will be associated with a unique PTU that owns it.

The first key insight here is that the owning PTU isn't always the "active" (most recent) PTU, and it isn't always the PTU that the declaration "comes from". A new declaration (that isn't a redeclaration or specialization of anything) does belong to the active PTU. A template specialization, however, belongs to the most recent PTU of all the declarations in its signature - mostly that means that it can be pulled into a more recent PTU by its template arguments.

The second key insight is that processing a PTU might extend an earlier PTU. Rolling back the later PTU shouldn't throw that extension away. For example, if the second PTU defines a template, and the third PTU requires that template to be instantiated at float, that template specialization is still part of the second PTU. Similarly, if the fifth PTU uses an inline function belonging to the fourth, that definition still belongs to the fourth. When we go to emit code in a new PTU, we map each declaration we have to emit back to its owning PTU and emit it in a new module for just the extensions to that PTU. We keep track of all the modules we've emitted for a PTU so that we can unload them all if we decide to roll it back.

Most declarations/definitions will only refer to entities from the same or earlier PTUs. However, it is possible (primarily by defining a previously-declared entity, but also through templates or ADL) for an entity that belongs to one PTU to refer to something from a later PTU. We will have to keep track of this and prevent unwinding to later PTU when we recognize it.

Fortunately, this should be very rare; and crucially, we don't have to do the bookkeeping for this if we've only got one PTU, e.g. in normal compilation. Otherwise, PTUs after the first just need to record enough metadata to be able to revert any changes they've made to declarations belonging to earlier PTUs, e.g. to redeclaration chains or template specialization lists.

It should even eventually be possible for PTUs to provide their own slab allocators which can be thrown away as part of rolling back the PTU. We can maintain a notion of the active allocator and allocate things like Stmt/Expr nodes in it, temporarily changing it to the appropriate PTU whenever we go to do something like instantiate a function template. More care will be required when allocating declarations and types, though.

We would want the PTU to be efficiently recoverable from a Decl; I'm not sure how best to do that. An easy option that would cover most declarations would be to make multiple TranslationUnitDecls and parent the declarations appropriately, but I don't think that's good enough for things like member function templates, since an instantiation of that would still be parented by its original class. Maybe we can work this into the DC chain somehow, like how lexical DCs are.

This patch teaches clang-repl how to recover from errors by disconnecting the most recent PTU and update the primary PTU lookup tables. For instance:

./clang-repl
clang-repl> int i = 12; error;
In file included from <<< inputs >>>:1:
input_line_0:1:13: error: C++ requires a type specifier for all declarations
int i = 12; error;
                 ^
error: Parsing failed.
clang-repl> int i = 13; extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=13
clang-repl> quit

Diff Detail

Event Timeline

v.g.vassilev created this revision.Jun 25 2021, 7:34 AM
v.g.vassilev requested review of this revision.Jun 25 2021, 7:34 AM
rsmith accepted this revision.Jun 29 2021, 1:57 PM

I'm (happily) surprised this required so few changes!

clang/include/clang/Basic/LangOptions.h
702–703

I don't think this is clear enough about the difference between TU_Prefix and TU_Partial. I think the difference is:

  • TU_Prefix is an incomplete prefix of a translation unit. Because it's not complete, we don't perform (most) finalization steps (eg, template instantiation) at the end.
  • TU_Partial is a complete translation unit that we might nonetheless incrementally extend later. Because it is complete (and we might want to generate code for it), we do perform template instantiation, but because it might be extended later, we don't warn if it declares or uses undefined internal-linkage symbols.

I wonder if TU_Incremental would be a better name than TU_Partial.

This revision is now accepted and ready to land.Jun 29 2021, 1:57 PM

A few minor notes

clang/include/clang/Interpreter/Interpreter.h
60–61

ErrorOr<T> has different semantics and is still in use, so the name could be confusing. Idiomatic usage for Expected<T> would rather be like:

auto PTU = Parse(Code);
if (!PTU)
  return PTU.takeError();

The patch didn't introduce it, but it might be a good chance for improvement.

clang/lib/Interpreter/IncrementalParser.cpp
210

Where does it point, if the very first TU fails? Maybe worth noting in the assert and/or adding a test case?

262–264
auto PTU = ParseOrWrapTopLevelDecl();
if (!PTU)
  return PTU.takeError();
v.g.vassilev marked 4 inline comments as done.

Address review comments.

clang/include/clang/Basic/LangOptions.h
702–703

Good point. Fixed.

clang/include/clang/Interpreter/Interpreter.h
60–61

Ha, that made the code much nicer looking! Thanks!!

clang/lib/Interpreter/IncrementalParser.cpp
210

The very first TU contains the compiler builtins and it is created when the ASTContext is created. Not having a PreviousTU would mean that we did not initialize the CompilerInstance properly.

Added some description in the assert.

This revision was landed with ongoing or failed builds.Jul 11 2021, 3:24 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2021, 3:24 AM

@v.g.vassilev For LLDB you need to change the TypeSystemClang::SetExternalSource function to this (it just moves the setHasExternalLexicalStorage one line up. You can just add that change and the compilation fix to this commit.

void TypeSystemClang::SetExternalSource(
    llvm::IntrusiveRefCntPtr<ExternalASTSource> &ast_source_up) {
  ASTContext &ast = getASTContext();
  ast.getTranslationUnitDecl()->setHasExternalLexicalStorage(true);
  ast.setExternalSource(ast_source_up);
}

The problem is that a few ASTs in LLDB change the ExternalSource that is originally set, but the LazyGenerationalUpdatePtr remembers the initial ExternalSource and tries to access it to complete the redecl chain (but the original ExternalSource just got replaced and deleted). The better fix is to not even create that original ExternalSource, but I can fix that in a follow up as that seems out of scope for this patch.

@v.g.vassilev For LLDB you need to change the TypeSystemClang::SetExternalSource function to this (it just moves the setHasExternalLexicalStorage one line up. You can just add that change and the compilation fix to this commit.

void TypeSystemClang::SetExternalSource(
    llvm::IntrusiveRefCntPtr<ExternalASTSource> &ast_source_up) {
  ASTContext &ast = getASTContext();
  ast.getTranslationUnitDecl()->setHasExternalLexicalStorage(true);
  ast.setExternalSource(ast_source_up);
}

The problem is that a few ASTs in LLDB change the ExternalSource that is originally set, but the LazyGenerationalUpdatePtr remembers the initial ExternalSource and tries to access it to complete the redecl chain (but the original ExternalSource just got replaced and deleted). The better fix is to not even create that original ExternalSource, but I can fix that in a follow up as that seems out of scope for this patch.

Thanks @teemperor! I will add your suggestion and recommit.