This is an archive of the discontinued LLVM Phabricator instance.

[clang-repl] Land initial infrastructure for incremental parsing
ClosedPublic

Authored by v.g.vassilev on Feb 4 2021, 6:31 AM.

Details

Summary

In http://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html we have mentioned our plans to make some of the incremental compilation facilities available in llvm mainline.

This patch proposes a minimal version of a repl, clang-repl, which enables interpreter-like interaction for C++. For instance:

./bin/clang-repl
clang-repl> int i = 42;
clang-repl> extern "C" int printf(const char*,...);
clang-repl> auto r1 = printf("i=%d\n", i);
i=42
clang-repl> quit

The patch allows very limited functionality, for example, it crashes on invalid C++. The design of the proposed patch follows closely the design of cling. The idea is to gather feedback and gradually evolve both clang-repl and cling to what the community agrees upon.

The IncrementalParser class is responsible for driving the clang parser and codegen and allows the compiler infrastructure to process more than one input. Every input adds to the “ever-growing” translation unit. That model is enabled by an IncrementalAction which prevents teardown when HandleTranslationUnit.

The IncrementalExecutor class hides some of the underlying implementation details of the concrete JIT infrastructure. It exposes the minimal set of functionality required by our incremental compiler/interpreter.

The Transaction class keeps track of the AST and the LLVM IR for each incremental input. That tracking information will be later used to implement error recovery.

The Interpreter class orchestrates the IncrementalParser and the IncrementalExecutor to model interpreter-like behavior. It provides the public API which can be used (in future) when using the interpreter library.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@v.g.vassilev Great to see this getting upstreamed! @teemperor Thanks for adding, I will take a look in the next days.

v.g.vassilev marked 21 inline comments as done.

Address more review comments.

clang/include/clang/Frontend/FrontendAction.h
237

Done. Ideally I would like to avoid making these routines virtual. However, here we call EndSourceFile https://github.com/llvm/llvm-project/blob/1f6ec3d08f75dba6c93c291bd92552b807736eb3/clang/lib/Frontend/CompilerInstance.cpp#L952 and this shuts down some of the objects required by the IncrementalAction.

In fact, after some checking, it does not make sense to allow overriding BeginSourceFile -- it initializes state of the Action class and does not make sense to forward it for instance in the WrapperFrontendAction. In principle, clients may be interested in initialize differently but for clang-repl we only need the EndSourceFile to be virtual.

Alternatively, we could somehow make the Inputs to take an "external" iterator (https://github.com/llvm/llvm-project/blob/1f6ec3d08f75dba6c93c291bd92552b807736eb3/clang/lib/Frontend/CompilerInstance.cpp#L942) which will provide us with a hook to treat each input line as a separate Input file. AFAICT, that has two disadvantages: 1) we will initialize/finalize more state of various objects which can hinder performance; 2) each input file is considered as a separate translation unit (TU) which clashes with the concept we introduce of an ever growing single TU.

clang/include/clang/Interpreter/Transaction.h
2

I added some documentation.

Having a long name for the Transaction class will make the future code clunky. This class will be extensively used throughout the interpreter codebase and api. I'd be in favor of going for a namespace but then I think we will have confusing naming for say clang::interpreter::Interpreter...

clang/lib/Interpreter/IncrementalExecutor.h
37

The case we have is when there is no JIT -- currently we have such a case in IncrementalProcessingTest I think. Another example, which will show up in future patches, is the registration of atexit handlers. That is, before we runCtors we make a pass over the LLVM IR and collect some specific details and (possibly change the IR and then run).

I'd rather keep it separate for now if that's okay.

clang/lib/Interpreter/IncrementalParser.cpp
64

Cling, on which this patch is based on, has a CodeCompleteConsumer and it seems to work.

I can leave it out but then we will have to remember where to put it. I have a slight preference to leave it as it is.

70

Indeed, should be fixed.

102

Nice catch! Thanks!

137

Not directly. A test case should be

#pragma weak f
void f() {}

This is tested by standard, we "just" record some decls in the vector. Do you have a specific test in mind?

166

That is good to know. I will try to remember this and use it if needed.

190

assert on the FID you mean?

196

I can work on that but I'd be in favor of not holding up that patch due to this issue. I see two options:

  • remove the code -- that'd make it harder for cling to reuse this piece of code.
  • keep it as is -- the risk is that we have an untested codepath.

Either way, that may be some bug in clang that we need to fix and drop these lines altogether...

That particular issue is tracked here https://github.com/root-project/root/pull/7078 (some of the related context is here https://paste.ubuntu.com/p/bM2VRgmqSG/ in a very obscure way :) )

clang/lib/Interpreter/IncrementalParser.h
51

Yeah, much nicer.

clang/lib/Interpreter/Interpreter.cpp
44

Let's keep it here for now and we can always move somewhere else. Fixing the includes. Thanks!

83

I do not understand the problem entirely, could you propose wording for the FIXME?

141

No worries, fixed.

clang/tools/clang-repl/CMakeLists.txt
4

It compiles just fine for me. Should we add it back once we find the setup that requires it?

clang/tools/clang-repl/ClangRepl.cpp
60

Good point!

v.g.vassilev added inline comments.Feb 13 2021, 12:41 AM
clang/lib/CodeGen/CodeGenAction.cpp
888

@rjmccall, we were wondering if there is a better way to ask CodeGen to start a new module. The current approach seems to be drilling hole in a number of abstraction layers.

In the past we have touched that area a little in https://reviews.llvm.org/D34444 and the answer may be already there but I fail to connect the dots.

Recently, we thought about having a new FrontendAction callback for beginning a new phase when compiling incremental input. We need to keep track of the created objects (needed for error recovery) in our Transaction. We can have a map of Transaction* to llvm::Module* in CodeGen. The issue is that new JITs take ownership of the llvm::Module* which seems to make it impossible to support jitted code removal with that model (cc: @lhames, @rsmith).

Hi Vassil, thanks for upstreaming this! I think it goes into a good direction.

The last time I looked at the Cling sources downstream, it was based on LLVM release 5.0. The IncrementalJIT class was based on what we call OrcV1 today. OrcV1 is long deprecated and even though it's still in tree today, it will very likely be removed in the upcoming release cycle. So I guess, one of the challenges will be porting the Cling internals to OrcV2 -- a lot has changed, mostly to the better :) Not all of this is relevant for this patch, but maybe it's worth mentioning for upcoming additions.

OrcV2 works with Dylibs, basically symbol namespaces. When you add a module to a Dylib, all its symbols will be added. Respectively, if you want to remove something from a Dylib, you have to remove the symbols (for fine-tuning it you can reach for a Dylib's ResourceTracker). Symbols won't be materialized until you look them up. I guess for incremental compilation you would keep on adding symbols, one increment at a time.

int var1 = 42; int f() { return var1; }
int var2 = f();

Let's say these are two inputs. The first line only adds definitions for symbols var1 and f() but won't materialize anything. The second line would have to lookup f(), execute it and emit a new definition for var2. I never got into Cling deep enough to find out how it works, but I assume it's high-level enough that it won't require large changes. One thing I'd recommend to double-check: if there is a third line that adds a static constructor, will LLJIT only run this one or will it run all previous static ctors again when calling initialize()? I assume the former but I wouldn't bet on it.

Another aspect is that downstream Cling is based on RuntimeDyld for linking Orc's output object files. I remember RemovableObjectLinkingLayer adding some object file removal code. Upstream OrcV2 grew it's own linker in the meantime. It's called JITLink and gets pulled into LLJIT via ObjectLinkingLayer. RuntimeDyld-based linking is still supported with the RTDyldObjectLinkingLayer. JITLink is not complete for all platforms yet. Thus, LLJITBuilder defaults to JITLink on macOS and RuntimeDyld otherwise. Chances are that JITLink gets good enough for ELF to enable it by default on Linux (at least x86-64). I guess that's your platform of concern? The related question is whether you are aiming for JITLink straight away or staying with RuntimeDyld for now.

For the moment, I added a few pointers inline. Some are referring to my general comments above.

clang/lib/CodeGen/CodeGenAction.cpp
888

When compiling incrementally, doeas a "new phase" mean that all subsequent code will go into a new module from then on? How will dependencies to previous symbols be handled? Are all symbols external?

The issue is that new JITs take ownership of the llvm::Module*

That's true, but you can still keep a raw pointer to it, which will be valid at least as long as the module wasn't linked. Afterwards it depends on the linker:

  • RuntimeDyld can return ownership of the object's memory range via NotifyEmittedFunction
  • JITLink provides the ReturnObjectBufferFunction for the same purpose

seems to make it impossible to support jitted code removal with that model

Can you figure out what symbols are affected and remove these? A la: https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020

I think @anarazel has ported a client with code removal to OrcV2 successfully in the past. Maybe there's something we can learn from it.

clang/lib/Interpreter/IncrementalExecutor.h
37

Should we maybe merge runCtors and addModule?

+1 even though there may be open questions regarding incremental initialization.

The case we have is when there is no JIT -- currently we have such a case in IncrementalProcessingTest

Can you run anything if there is no JIT? I think what you have in IncrementalProcessing.EmitCXXGlobalInitFunc is getGlobalInit(llvm::Module*), which checks for symbol names with a specific prefix.

before we runCtors we make a pass over the LLVM IR and collect some specific details and (possibly change the IR and then run).

The idiomatic solution for such modifications would use an IRTransformLayer as in:
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/examples/OrcV2Examples/LLJITWithOptimizingIRTransform/LLJITWithOptimizingIRTransform.cpp#L108

Another example, which will show up in future patches, is the registration of atexit handlers

atexit handlers as well as global ctors/dtors should be covered by LLJIT PlatformSupport. The LLJITBuilder will inject a GenericLLVMIRPlatformSupport instance by default:
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp#L125

It's not as comprehensive as e.g. the MachO implementation, but should be sufficient for your use-case as you have IR for all your JITed code. (It would NOT work if you cached object files, reloaded them in a subsequent session and wanted to run their ctors.) So, your below call to initialize() should do it already.

clang/unittests/Interpreter/InterpreterTest.cpp
30

Warning: std::move prevents copy elision

Hi Stefan,

Thanks a lot for the details you shared. They are really helpful to me.

Hi Vassil, thanks for upstreaming this! I think it goes into a good direction.

The last time I looked at the Cling sources downstream, it was based on LLVM release 5.0. The IncrementalJIT class was based on what we call OrcV1 today. OrcV1 is long deprecated and even though it's still in tree today, it will very likely be removed in the upcoming release cycle. So I guess, one of the challenges will be porting the Cling internals to OrcV2 -- a lot has changed, mostly to the better :) Not all of this is relevant for this patch, but maybe it's worth mentioning for upcoming additions.

Cling is currently being upgraded to llvm9 (https://github.com/vgvassilev/cling/tree/upgrade_llvm90). I expect by the end of this year to be upgraded again to llvm12 and the big challenge is making good use of OrcV2. We need to support code removal of the kind:

[cling] struct Adder { double Add(double a, double b) {return a - b; }; // comes in a "script" file.
[cling] Adder adder; printf("%f\n", adder.Add(1, 2)); //realize that we have a mistake.
[cling] .undo 2
[cling] struct Adder { double Add(double a, double b) {return a + b; }; // comes in a "script" file.
[cling] Adder adder; printf("%f\n", adder.Add(1, 2));

Some more details can be seen here -- https://blog.llvm.org/posts/2020-11-30-interactive-cpp-with-cling/

In the example above the JIT will need to remove objects from its state (including already created machine code). Until recently that was not entirely possible with the OrcV2. Lang was actively working on this and maybe this now works well.

The other aspect of this is that upon unloading of these pieces of code we need to run the destructors (that's why we need some non-canonical handling of when we run the atexit handlers).

I'd very much want to jump on the OrcV2 with this patch but that would cause me a longer term problem as now the initial version of minimal cling (clang-repl) is already substantially different than cling itself. My upstream strategy was to make a minimal patch with design as close as possible to cling. Then depending on the comments we will start evolving both systems in such a way that cling is not left substantially behind as it still has a majority of interactive C++ cases which we care about.

OrcV2 works with Dylibs, basically symbol namespaces. When you add a module to a Dylib, all its symbols will be added. Respectively, if you want to remove something from a Dylib, you have to remove the symbols (for fine-tuning it you can reach for a Dylib's ResourceTracker). Symbols won't be materialized until you look them up. I guess for incremental compilation you would keep on adding symbols, one increment at a time.

All this sounds super nice and I am eager to start gradually using it.

int var1 = 42; int f() { return var1; }
int var2 = f();

Let's say these are two inputs. The first line only adds definitions for symbols var1 and f() but won't materialize anything. The second line would have to lookup f(), execute it and emit a new definition for var2. I never got into Cling deep enough to find out how it works, but I assume it's high-level enough that it won't require large changes. One thing I'd recommend to double-check: if there is a third line that adds a static constructor, will LLJIT only run this one or will it run all previous static ctors again when calling initialize()? I assume the former but I wouldn't bet on it.

Would that capture your concern?

./bin/clang-repl 
clang-repl> extern "C" int printf(const char*,...);
clang-repl> int var1 = 42; int f() { return printf("init_once\n"); }
clang-repl> int var2 = f();
init_once
clang-repl> int var3 = f();
init_once
clang-repl> struct S{ S(const char*) {} } s("");
clang-repl>

I think the reason we do not rerun the static constructors is this tweak we have in codegen https://github.com/llvm/llvm-project/commit/188ad3ac02d06

Another aspect is that downstream Cling is based on RuntimeDyld for linking Orc's output object files. I remember RemovableObjectLinkingLayer adding some object file removal code. Upstream OrcV2 grew it's own linker in the meantime. It's called JITLink and gets pulled into LLJIT via ObjectLinkingLayer. RuntimeDyld-based linking is still supported with the RTDyldObjectLinkingLayer. JITLink is not complete for all platforms yet. Thus, LLJITBuilder defaults to JITLink on macOS and RuntimeDyld otherwise. Chances are that JITLink gets good enough for ELF to enable it by default on Linux (at least x86-64). I guess that's your platform of concern?

We also care about COFF.

The related question is whether you are aiming for JITLink straight away or staying with RuntimeDyld for now.

I'd prefer to stay closer to cling at least with that initial patch. That'd mean stick to the RuntimeDyld and switch when cling is ready (presumably end of this year).

clang/lib/CodeGen/CodeGenAction.cpp
888

When compiling incrementally, doeas a "new phase" mean that all subsequent code will go into a new module from then on? How will dependencies to previous symbols be handled? Are all symbols external?

There is some discussion on this here https://reviews.llvm.org/D34444#812418

I think the relevant bit is that 'we have just one ever growing TU [...] which we send to the RuntimeDyLD allowing only JIT to resolve symbols from it. We aid the JIT when resolving symbols with internal linkage by changing all internal linkage to external (We haven't seen issues with that approach)'.

The issue is that new JITs take ownership of the llvm::Module*

That's true, but you can still keep a raw pointer to it, which will be valid at least as long as the module wasn't linked.

That was my first implementation when I upgraded cling to llvm9 where the shared_ptrs went to unique_ptrs. This was quite problematic for many of the use cases we support as the JIT is somewhat unpredictable to the high-level API user.

Afterwards it depends on the linker:

  • RuntimeDyld can return ownership of the object's memory range via NotifyEmittedFunction
  • JITLink provides the ReturnObjectBufferFunction for the same purpose

That's exactly what we ended up doing (I would like to thank Lang here who gave a similar advice).

seems to make it impossible to support jitted code removal with that model

Can you figure out what symbols are affected and remove these? A la: https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020

I think @anarazel has ported a client with code removal to OrcV2 successfully in the past. Maybe there's something we can learn from it.

Indeed. That's not yet on my radar as seemed somewhat distant in time.

clang/lib/Interpreter/IncrementalExecutor.h
37

Should we maybe merge runCtors and addModule?

+1 even though there may be open questions regarding incremental initialization.

The case we have is when there is no JIT -- currently we have such a case in IncrementalProcessingTest

Can you run anything if there is no JIT? I think what you have in IncrementalProcessing.EmitCXXGlobalInitFunc is getGlobalInit(llvm::Module*), which checks for symbol names with a specific prefix.

Yes, I'd think such mode is useful for testing but also for other cases where the user is handed a Transaction* and allowed to make some modification before processing the llvm::Module

before we runCtors we make a pass over the LLVM IR and collect some specific details and (possibly change the IR and then run).

The idiomatic solution for such modifications would use an IRTransformLayer as in:
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/examples/OrcV2Examples/LLJITWithOptimizingIRTransform/LLJITWithOptimizingIRTransform.cpp#L108

That looks very nice. It assumes the JIT is open to the users, here we open only the llvm::Module (I am not arguing if that's a good idea in general).

Another example, which will show up in future patches, is the registration of atexit handlers

atexit handlers as well as global ctors/dtors should be covered by LLJIT PlatformSupport. The LLJITBuilder will inject a GenericLLVMIRPlatformSupport instance by default:
https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp#L125

Does that give me control over when the atexit handlers are called? Can the interpreter call them at its choice?

It's not as comprehensive as e.g. the MachO implementation, but should be sufficient for your use-case as you have IR for all your JITed code. (It would NOT work if you cached object files, reloaded them in a subsequent session and wanted to run their ctors.) So, your below call to initialize() should do it already.

teemperor added inline comments.Feb 17 2021, 12:56 AM
clang/lib/Interpreter/IncrementalParser.cpp
64

Alright, given that this is just passing along the CompletionConsumer from Ci to Sema, I think this can stay then.

185

The + 2 here is probably not what you want. This will just give you a pointer into Clang's source buffers and will eventually point to random source buffers (or worse) once InputCount is large enough.

I feel like the proper solution is to just use the StartOfFile Loc and don't add any offset to it. I think Clang should still be able to figure out a reasonable ordering for overload candidates etc.

(I thought I already commented on this line before, but I can't see my comment or any replies on Phab so I'm just commenting again).

190

I meant the EnterSourceFile call has a bool error it returns (along with a diagnostic). I think the only way it should fail is if the previous code got somehow messed up, hence the assert suggestion.

clang/lib/Interpreter/Interpreter.cpp
83

The PCM files generated with -gmodules are object-files that contain the Clang AST inside them. To deal with the object-file wrapping and get to the AST inside, we need the ObjectFilePCHContainerReader. But that class is part of CodeGen which isn't a dependency of the normal parsing logic, so these classes can't be hooked up automatically by Clang like the other 'ContainerReaders' classes (well, there is only one other classes which just opens the file normally).

So to work around the fact that to parse code we now need a CodeGen dependency, every clang tool (which usually depend on CodeGen + parsing code) has to manually register the ObjectFilePCHContainer* classes.

My point is just that it's not sustainable to have everyone copy around these three lines as otherwise Clang won't handle any PCM file that was generated with -gmodules.

I think the FIXME for this could be:
FIXME: Clang should register these container operations automatically..

clang/tools/clang-repl/CMakeLists.txt
4

On my Linux setup the dependencies were required (and it seems logical that they are required), so I would just add them.

lhames added inline comments.Feb 17 2021, 7:26 PM
clang/lib/CodeGen/CodeGenAction.cpp
888

Recently, we thought about having a new FrontendAction callback for beginning a new phase when compiling incremental input. We need to keep track of the created objects (needed for error recovery) in our Transaction. We can have a map of Transaction* to llvm::Module* in CodeGen. The issue is that new JITs take ownership of the llvm::Module* which seems to make it impossible to support jitted code removal with that model (cc: @lhames, @rsmith).

In the new APIs, in order to enable removable code, you can associate Modules with ResourceTrackers when they're added to the JIT. The ResourceTrackers then allow for removal. Idiomatic usage looks like:

auto Mod = /* create module */;
auto RT = JD.createResourceTracker();
J.addModule(RT, std::move(Mod));
//...
if (auto Err = RT.remove())
  /* handle Err */;

we have just one ever growing TU [...] which we send to RuntimeDyld...

So is a TU the same as an llvm::Module in this context? If so, how do you reconcile that with the JIT taking ownership of modules? Are you just copying the Module each time before adding it?

We need to keep track of the created objects (needed for error recovery) in our Transaction.

Do you need the Module* for error recovery? Or just the Decls?

clang/lib/Interpreter/IncrementalExecutor.cpp
30–52

I think this can be shortened to:

using namespace llvm::orc;
llvm::ErrorAsOutParameter EAO(&Err);

if (auto JitOrErr = LLJITBuilder.create())
  Jit = std::move(*JitOrErr);
else {
  Err = JitOrErr.takeError();
  return;
}

const auto &DL = Jit->getDataLayout();
if (auto PSGOrErr = DynamicLibrarySearchGenerator::GetForCurrentProcess(DL.getGlobalPrefix()))
  Jit->getMainJITDylib().addGenerator(std::move(*PSGOrErr));
else {
  Err = PSGOrErr.takeError();
  return;
}

You don't need the call to llvm::sys::DynamicLibrary::LoadLibraryPermanently(nullptr); any more: DynamicLibrarySearchGenerator::GetForCurrentProcess does that for you.

57–60

This doesn't look right. The ThreadSafeContext has to contain the LLVMContext for the module, but here you're creating a new unrelated ThreadSafeContext.

clang/lib/Interpreter/IncrementalExecutor.h
37

Should we maybe merge runCtors and addModule?

+1 even though there may be open questions regarding incremental initialization.

In the long term constructors should be run via the Orc runtime (currently planned for initial release in LLVM 13 later this year). I like the idea of keeping "add module" and "run initializers" as two separate steps, with initializers being run only when you execute a top level expression. It would allow for workflows like this:

interpreter% :load a.cpp
interpreter% :load b.cpp

where an initializer in a.cpp depends on code in b.cpp. It would also allow for defining constructors with forward references in the REPL itself.

The Orc runtime is currently focused on emulating the usual execution environment: The canonical way to execute initializers is by calling jit_dlopen on the target JITDylib. I think the plan should be to generalize this behavior (either in the jit_dlopen contract, or by introducing a jit_dlopen_repl function) to allow for repeated calls to dlopen, with each subsequent dlopen call executing any discovered-but-not-yet-run initializers.

Does that give me control over when the atexit handlers are called? Can the interpreter call them at its choice?

It's not as comprehensive as e.g. the MachO implementation, but should be sufficient for your use-case as you have IR for all your JITed code. (It would NOT work if you cached object files, reloaded them in a subsequent session and wanted to run their ctors.) So, your below call to initialize() should do it already.

Yep -- initialize should run the constructors, which should call cxa_atexit. The cxa_atexit calls should be interposed by GenericLLVMIRPlatform, and the atexits run when you call LLJIT::deinitialize on the JITDylib. There are some basic regression tests for this, but it hasn't been stress tested yet.

GenericLLVMIRPlatform should actually support initializers in cached object files that were compiled from Modules added to LLJIT: The platform replaces llvm.global_ctors with an init function with a known name, then looks for that name in objects loaded for the cache. At least that was the plan, I don't recall whether it has actually been tested. What definitely doesn't work is running initializers in objects produced outside LLJIT. That will be fixed by JITLink/ELF and the Orc Runtime though (and already works for MachO in the runtime prototype).

clang/lib/Interpreter/IncrementalParser.cpp
156–160

Wherever this CodeGenAction is created it's probably the missing piece of the ThreadSafeContext puzzle: CodeGenAction's constructor takes an LLVMContext*, creating a new LLVMContext if the argument is null. In your system (at least to start with) I guess you will want to create one ThreadSafeModule associated with the interpreter and pass a pointer to that context to all CodeGenActions.

v.g.vassilev marked 8 inline comments as done.

Address comments from Raphael and Stefan.

clang/lib/Interpreter/IncrementalParser.cpp
64

Cool, thanks!

185

The + 2 here is probably not what you want. This will just give you a pointer into Clang's source buffers and will eventually point to random source buffers (or worse) once InputCount is large enough.

Indeed.

I feel like the proper solution is to just use the StartOfFile Loc and don't add any offset to it. I think Clang should still be able to figure out a reasonable ordering for overload candidates etc.

That particular part of the input processing has been causing a lot of troubles for cling over the years. If we use the StartOfFile each new line will appear before the previous which can be problematic for as you say diagnostics but also template instantiations.

Cling ended up solving this by introducing a virtual file with impossible to allocate size of 1U << 15U (https://github.com/vgvassilev/cling/blob/da1bb78f3dea4d2bf19b383aeb1872e9f2b117ad/lib/Interpreter/CIFactory.cpp#L1516-L1527 and https://github.com/vgvassilev/cling/blob/da1bb78f3dea4d2bf19b383aeb1872e9f2b117ad/lib/Interpreter/IncrementalParser.cpp#L394) Then we are save to get Loc + 1 (I do not remember how that + 2 came actually) and you should be still safe.

I wonder if that's something we should do here?

(I thought I already commented on this line before, but I can't see my comment or any replies on Phab so I'm just commenting again).

190

Got it now, somehow overlooked that EnterSourceFile returns true on a failure. I decided to return an error.

clang/tools/clang-repl/CMakeLists.txt
4

Ok, readded.

v.g.vassilev marked 7 inline comments as done.

Address Lang's comments.

clang/lib/CodeGen/CodeGenAction.cpp
888

Recently, we thought about having a new FrontendAction callback for beginning a new phase when compiling incremental input. We need to keep track of the created objects (needed for error recovery) in our Transaction. We can have a map of Transaction* to llvm::Module* in CodeGen. The issue is that new JITs take ownership of the llvm::Module* which seems to make it impossible to support jitted code removal with that model (cc: @lhames, @rsmith).

In the new APIs, in order to enable removable code, you can associate Modules with ResourceTrackers when they're added to the JIT. The ResourceTrackers then allow for removal. Idiomatic usage looks like:

auto Mod = /* create module */;
auto RT = JD.createResourceTracker();
J.addModule(RT, std::move(Mod));
//...
if (auto Err = RT.remove())
  /* handle Err */;

Nice, thanks!

we have just one ever growing TU [...] which we send to RuntimeDyld...

So is a TU the same as an llvm::Module in this context? If so, how do you reconcile that with the JIT taking ownership of modules? Are you just copying the Module each time before adding it?

Each incremental chunk with which the TU grows has a corresponding llvm::Module. Once clang's CodeGen is done for the particular module it transfers the ownership to the Transaction which, in turn, hands it to the JIT and once the JIT is done it retains the ownership again.

We need to keep track of the created objects (needed for error recovery) in our Transaction.

Do you need the Module* for error recovery? Or just the Decls?

Yes, we need a llvm::Module that corresponds to the Decls as sometimes CodeGen will decide not to emit a Decl.

clang/lib/Interpreter/IncrementalExecutor.cpp
30–52

Cool, thanks!

57–60

Thanks. I think I fixed it now. Can you take a look?

clang/lib/Interpreter/IncrementalExecutor.h
37

@sgraenitz, @lhames, thanks for the clarifications.

I am marking your comments as resolved (for easier tracking on my end). If the intent was to change something in this patch could you elaborate a little more what specifically I need to do here?

The other aspect of this is that upon unloading of these pieces of code we need to run the destructors (that's why we need some non-canonical handling of when we run the atexit handlers).

I just noticed this comment. I think long term you could handle this by introducing an "initialization generation" -- each time you run jit_dlopen_repl you would increment the generation. You'd point the __cxa_atexit alias at a custom function that keeps a map: __dso_handle -> (generation -> [ atexits ]). Then you could selectively run atexits for each generation before removing them.

clang/lib/CodeGen/CodeGenAction.cpp
888

Each incremental chunk with which the TU grows has a corresponding llvm::Module. Once clang's CodeGen is done for the particular module it transfers the ownership to the Transaction which, in turn, hands it to the JIT and once the JIT is done it retains the ownership again.

Yes, we need a llvm::Module that corresponds to the Decls as sometimes CodeGen will decide not to emit a Decl.

Can you elaborate on this? (Or point me to the relevant discussion / code?)

Does CodeGen aggregate code into the Module as you CodeGen each incremental chunk? Or do you Link the previously CodeGen'd module into a new one?

clang/lib/Interpreter/IncrementalExecutor.cpp
57–60

Yep -- This looks right now.

clang/lib/Interpreter/IncrementalExecutor.h
37

I don't think there's anything to do here -- those notes were just background info.

v.g.vassilev marked 3 inline comments as done.Feb 23 2021, 4:27 AM
v.g.vassilev added inline comments.
clang/lib/CodeGen/CodeGenAction.cpp
888

Each incremental chunk with which the TU grows has a corresponding llvm::Module. Once clang's CodeGen is done for the particular module it transfers the ownership to the Transaction which, in turn, hands it to the JIT and once the JIT is done it retains the ownership again.

Yes, we need a llvm::Module that corresponds to the Decls as sometimes CodeGen will decide not to emit a Decl.

Can you elaborate on this? (Or point me to the relevant discussion / code?)

Does CodeGen aggregate code into the Module as you CodeGen each incremental chunk? Or do you Link the previously CodeGen'd module into a new one?

Cling's "code unloading" rolls back the states of the various objects without any checkpointing. Consider the two subsequent incremental inputs: int f() { return 12; } and int i = f();; undo 1.

When we ask CodeGen to generate code for the first input it will not as f is not being used. Transaction1 will contain the FunctionDecl* for f but the corresponding llvm::Module will be empty. Then when we get the second input line, the Transaction2 will contain the VarDecl* but the corresponding llvm::Module will contain both IR definitions -- of f and i.

Having the clang::Decl is useful because we can restore the previous state of the various internal frontend structures such as lookup tables. However, we cannot just drop the llvm::Module as it might contain deferred declarations which were emitted due to a use.

That's pretty much the rationale behind this and the design dates back to pre-MCJIT times. I am all for making this more robust but that's what we currently have. The "code unloading" is mostly done in cling's DeclUnloader.

There was some useful discussion about the model here quite some time ago

rsmith added inline comments.Mar 3 2021, 1:42 PM
clang/test/Interpreter/execute.c
1 ↗(On Diff #325249)

Presumably here (and in all the interpreter tests) we will need to check that we configured Clang and LLVM so that they can actually JIT code for the host machine, and disable the test if not.

ychen added a subscriber: ychen.Mar 3 2021, 6:25 PM
v.g.vassilev marked 2 inline comments as done.

Add a lit feature to check if llvm has jit support to selectively run tests.

  • Do not rely on process exit code for --host-supports-jit but parse the true/false flag.
  • Move the IncrementalProcessingTest unittest from unittests/CodeGen to unittests/Interpreter.

Address most of the formatting suggestions.

I think this makes good progress. I found two details in the test code that need attention. The stdout issue might be done now or in a follow-up patch some time soon. Otherwise, this seems to get ready to land.

@teemperor What about your notes. Are there any open issues remaining?

clang/lib/Interpreter/Interpreter.cpp
94

It looks like clang-repl always dumps errors to stdout currently. This is fine for the interactive use case, but it will be impractical for input/output tests. As a result unit tests e.g. dump:

Note: Google Test filter = InterpreterTest.Errors
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from InterpreterTest
[ RUN      ] InterpreterTest.Errors
In file included from <built-in>:0:
input_line_0:1:1: error: unknown type name 'intentional_error'
intentional_error v1 = 42; 
^
[       OK ] InterpreterTest.Errors (9024 ms)
[----------] 1 test from InterpreterTest (9024 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (9025 ms total)
[  PASSED  ] 1 test.

It would be useful to have an option for streaming diagnostics to an in-memory buffer (and maybe append them to returned llvm::Error instances in the future). Instead of createDiagnostics() you could pass a TextDiagnosticPrinter via setDiagnostics() here to accomplish it.

Not insisting on having it in this review, but it would be a good follow-up task at least.

clang/test/Interpreter/execute.c
9 ↗(On Diff #329894)

*nit* this should be a cpp file right? Otherwise, you should write struct S *m = nullptr; here. Also, C doesn't understand the extern "C" above :)

11 ↗(On Diff #329894)

The %p format placeholder in printf is implementation-defined, so the output here varies. Maybe you can do something like this instead:

auto r2 = printf("S[f=%f, m=0x%llx]\n", s.f, (unsigned long long)s.m);
// CHECK-NEXT: S[f=1.000000, m=(0x0)]

Or reinterpret_cast<unsigned long long>(s.m) if you go the C++ way.

clang/test/lit.cfg.py
97

I couldn't test this on a host that doesn't support JIT, but it looks like a nice "duck typing" way of testing for the feature.

v.g.vassilev marked 2 inline comments as done.

Address comments -- rename test file; add a logging diagnostic consumer.

clang/lib/Interpreter/Interpreter.cpp
94

I should have addressed it now I get:

[==========] Running 5 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from IncrementalProcessing
[ RUN      ] IncrementalProcessing.EmitCXXGlobalInitFunc
[       OK ] IncrementalProcessing.EmitCXXGlobalInitFunc (17 ms)
[----------] 1 test from IncrementalProcessing (17 ms total)

[----------] 4 tests from InterpreterTest
[ RUN      ] InterpreterTest.Sanity
[       OK ] InterpreterTest.Sanity (8 ms)
[ RUN      ] InterpreterTest.IncrementalInputTopLevelDecls
[       OK ] InterpreterTest.IncrementalInputTopLevelDecls (9 ms)
[ RUN      ] InterpreterTest.Errors
[       OK ] InterpreterTest.Errors (91 ms)
[ RUN      ] InterpreterTest.DeclsAndStatements
[       OK ] InterpreterTest.DeclsAndStatements (8 ms)
[----------] 4 tests from InterpreterTest (116 ms total)

[----------] Global test environment tear-down
[==========] 5 tests from 2 test cases ran. (133 ms total)
[  PASSED  ] 5 tests.
clang/test/Interpreter/execute.c
9 ↗(On Diff #329894)

Thanks! It is C++ indeed and I have changed the file extension.

sgraenitz accepted this revision.Mar 12 2021, 4:03 PM

Thanks. From my side this looks good now.

@rjmccall I'd like your input on this patch in particular, if you have time.

I'm nervous in general about the looming idea of declaration unloading, but the fact that it's been working in Cling for a long time gives me some confidence that we can do that in a way that's not prohibitively expensive and invasive.

@rjmccall I'd like your input on this patch in particular, if you have time.

I've been specifically avoiding paying any attention to this patch because it sounds like an enormous time-sink to review. :) That's not to say it'd be time poorly spent, because it's an intriguing feature, but I just don't have the time to engage with it fully. I can try to give snippets of time if you can pose specific questions. I did at least go back and read the RFC from the summer. I'm not sure I have time to read the review thread up until now; it's quite daunting.

I'm nervous in general about the looming idea of declaration unloading, but the fact that it's been working in Cling for a long time gives me some confidence that we can do that in a way that's not prohibitively expensive and invasive.

I don't really know the jargon here. The biggest problem that I foresee around having a full-featured C REPL is the impossibility of replacing existing types — you might be able to introduce a *new* struct with a particular name, break the redeclaration chain, and have it shadow the old one, and we could probably chase down all the engineering problems that that causes in the compiler, but it's never going to be a particularly satisfying model. If we don't have to worry about that, then I feel like the largest problem is probably the IRGen interaction — in particular, whether we're going to have to start serializing IRGen state the same way that Sema state has to be serialized across PCH boundaries. But I'm sure the people who are working on this have more knowledge of what issues they're seeing than I can just abstractly anticipate.

I'm nervous in general about the looming idea of declaration unloading, but the fact that it's been working in Cling for a long time gives me some confidence that we can do that in a way that's not prohibitively expensive and invasive.

I don't really know the jargon here.

"Code unloading" is mentioned here https://reviews.llvm.org/D96033?id=323531#inline-911819

The biggest problem that I foresee around having a full-featured C REPL is the impossibility of replacing existing types — you might be able to introduce a *new* struct with a particular name, break the redeclaration chain, and have it shadow the old one, and we could probably chase down all the engineering problems that that causes in the compiler, but it's never going to be a particularly satisfying model.

Indeed. Cling did not have this feature until recently. We indeed "shadow" the reachable declaration maintaining (using inline namespaces) the redecl chain invariants (relevant code here). I am not sure that approach can work outside C++.

If we don't have to worry about that, then I feel like the largest problem is probably the IRGen interaction — in particular, whether we're going to have to start serializing IRGen state the same way that Sema state has to be serialized across PCH boundaries. But I'm sure the people who are working on this have more knowledge of what issues they're seeing than I can just abstractly anticipate.

We find ourselves patching often that area in CodeGen when upgrading llvm (eg. here). The current cling model requires the state to be transferred to subsequent calls of IRGen. We briefly touched that topic in https://reviews.llvm.org/D34444#812418 (and onward) and I thought we had a plan how to move forward.

I'm nervous in general about the looming idea of declaration unloading, but the fact that it's been working in Cling for a long time gives me some confidence that we can do that in a way that's not prohibitively expensive and invasive.

I don't really know the jargon here.

"Code unloading" is mentioned here https://reviews.llvm.org/D96033?id=323531#inline-911819

I see. So you want to be able to sort of "roll back" Sema to a previous version of the semantic state, and yet you also need IRGen to *not* be rolled back because of lazy emission. That seems... tough. I don't think we're really architected to support that goal — frankly, I'm not sure C and C++ are architected to support that goal — and I'm concerned that it might require a ton of extra complexity and risk for us. I say this not to be dismissive, but to clarify how I see our various responsibilities here. Clang's primary mission is to be a static compiler, and it's been architected with that in mind, and it's acceptable for us to put that mission above other goals if we think they're in conflict. So as I see it, your responsibility here is to persuade us of one of the following:

  • that you can achieve your goals without introducing major problems for Clang's primary mission,
  • that the changes you'll need to make will ultimately have substantial benefits for Clang's primary mission, or
  • that we should change how we think about Clang's primary mission.

We probably need to talk about it.

The biggest problem that I foresee around having a full-featured C REPL is the impossibility of replacing existing types — you might be able to introduce a *new* struct with a particular name, break the redeclaration chain, and have it shadow the old one, and we could probably chase down all the engineering problems that that causes in the compiler, but it's never going to be a particularly satisfying model.

Indeed. Cling did not have this feature until recently. We indeed "shadow" the reachable declaration maintaining (using inline namespaces) the redecl chain invariants (relevant code here). I am not sure that approach can work outside C++.

If we don't have to worry about that, then I feel like the largest problem is probably the IRGen interaction — in particular, whether we're going to have to start serializing IRGen state the same way that Sema state has to be serialized across PCH boundaries. But I'm sure the people who are working on this have more knowledge of what issues they're seeing than I can just abstractly anticipate.

We find ourselves patching often that area in CodeGen when upgrading llvm (eg. here). The current cling model requires the state to be transferred to subsequent calls of IRGen. We briefly touched that topic in https://reviews.llvm.org/D34444#812418 (and onward) and I thought we had a plan how to move forward.

Ah, I sort of remember that conversation.

To be clear, what I'm trying to say is that — absent consensus that it's a major architecture shift is appropriate — we need to consider what functionality is reasonably achievable without it. I'm sure that covers most of what you're trying to do; it just may not include everything.

One of the fortunate things about working in a REPL is that many ABI considerations go completely out the window. The most important of those is the abstract need for symbol-matching; that is, there's practically no reason why a C REPL needs to use the simple C symbol mangling, because nobody expects code written outside the REPL to link up with user-entered declarations. So we can be quite aggressive about how we emit declarations "behind the scenes"; C mode really just means C's user-level semantics, calling convention, and type layout, but not any of C's ordinary interoperation/compatibility requirements.

v.g.vassilev added a comment.EditedMar 18 2021, 7:32 AM

I'm nervous in general about the looming idea of declaration unloading, but the fact that it's been working in Cling for a long time gives me some confidence that we can do that in a way that's not prohibitively expensive and invasive.

I don't really know the jargon here.

"Code unloading" is mentioned here https://reviews.llvm.org/D96033?id=323531#inline-911819

I see. So you want to be able to sort of "roll back" Sema to a previous version of the semantic state, and yet you also need IRGen to *not* be rolled back because of lazy emission.

The state should match the state of Sema (at least that's the current case for cling).

That seems... tough. I don't think we're really architected to support that goal — frankly, I'm not sure C and C++ are architected to support that goal — and I'm concerned that it might require a ton of extra complexity and risk for us. I say this not to be dismissive, but to clarify how I see our various responsibilities here. Clang's primary mission is to be a static compiler, and it's been architected with that in mind, and it's acceptable for us to put that mission above other goals if we think they're in conflict. So as I see it, your responsibility here is to persuade us of one of the following:

  • that you can achieve your goals without introducing major problems for Clang's primary mission,
  • that the changes you'll need to make will ultimately have substantial benefits for Clang's primary mission, or
  • that we should change how we think about Clang's primary mission.

We have, over the years, deliberately have taken a very conservative stance and we have tried to achieve that goal without major modifications in clang in general. Cling has been ou-of-tree so clang design changes were not on the table then and now. In my opinion, we have managed to go long way using this set of changes in clang which in my view is fairly minimal considering the case we enable. My understanding of clang is limited so I cannot really judge if the approach we took is sustainable but over the years we have mostly suffered from having to selectively reset CodeGen state across incremental inputs. Plus some state in Sema which comes from another feature which requires recursive parsing but I would not consider that feature central for this particular discussion.

We probably need to talk about it.

+1. Do you use discord/slack/skype?

To be clear, what I'm trying to say is that — absent consensus that it's a major architecture shift is appropriate — we need to consider what functionality is reasonably achievable without it. I'm sure that covers most of what you're trying to do; it just may not include everything.

I understand and would like to thank you for bringing this up! I concur with that preference.

One of the fortunate things about working in a REPL is that many ABI considerations go completely out the window. The most important of those is the abstract need for symbol-matching; that is, there's practically no reason why a C REPL needs to use the simple C symbol mangling, because nobody expects code written outside the REPL to link up with user-entered declarations. So we can be quite aggressive about how we emit declarations "behind the scenes"; C mode really just means C's user-level semantics, calling convention, and type layout, but not any of C's ordinary interoperation/compatibility requirements.

I have never thought about ABI being a bi-directional problem, and indeed we can probably define away one direction. Do I understand that correctly? If I do, that would mean we can still embed the C REPL into third party code and be able to call compiled code from libraries?

We probably need to talk about it.

+1. Do you use discord/slack/skype?

I will try to summarize the discussion here. @rjmccall , @rsmith please feel free to correct me if I am wrong or add important points that I missed.
The discussion focused on supporting two major REPL-enabling features.

  1. Error recovery:
cpp
[cling] #include <vector> // #1
[cling] std::vector<int> v; v[0].error_here; // #2, we need to undo the template instantiation.
input_line_4:2:26: error: member reference base type 'std::__1::__vector_base<int, std::__1::allocator<int> >::value_type' (aka 'int') is not a structure or union
 std::vector<int> v; v[0].error_here;
                     ~~~~^~~~~~~~~~~
  1. Code unloading:
cpp
[cling] .L Adder.h // #1, similar to #include "Adder.h"
[cling] Add(3, 1) // int Add(int a, int b) {return a - b; }
(int) 2
[cling] .U Adder.h // reverts the state prior to #1
[cling] .L Adder.h
[cling] Add(3, 1) // int Add(int a, int b) {return a + b; }
(int) 4

The implementation of (1.) requires tracking of state in the clang Frontend and (2.) requires tracking of state in the clang Frontend and Backend. We discussed the current DeclUnloader implementation in Cling which tracks the emission of declarations (from Sema & co). And, upon request, cleans up various data structures such as lookup tables, AST, and CodeGen. It does not yet free the bump allocated AST memory but rather makes the AST nodes unreachable. It has a very similar conceptual design to the ASTReader which adds declarations to various data structures under the hood. The DeclUnloader does the opposite -- removes declarations from various internal data structures. We understood that the approach is not intrusive to the architecture of clang and can be very useful for a number of other tools and IDEs.

In addition, John pointed out that we can allow freeing chunks of memory if Sema and CodeGen track more rigorously the template instantiations and various other components which should not be a major challenge. What makes the implementation feasible is two assumptions:
a) each incremental input leaves the compiler in a valid state;
b) each state reversal transitions to a previously valid state.

We seem to have reached consensus that incremental compilation is possible to be supported without major changes in clang's architecture or significant changes in its main mission of being a static compiler.

Sorry for the delay. That seems like a reasonable summary of our discussion. Let me try to lay out the right architecture for this as I see it.

Conceptually, we can split the translation unit into a sequence of partial translation units (PTUs). Every declaration will be associated with a unique PTU that owns it.

The first key insight here is that the owning PTU isn't always the "active" (most recent) PTU, and it isn't always the PTU that the declaration "comes from". A new declaration (that isn't a redeclaration or specialization of anything) does belong to the active PTU. A template specialization, however, belongs to the most recent PTU of all the declarations in its signature — mostly that means that it can be pulled into a more recent PTU by its template arguments.

The second key insight is that processing a PTU might extend an earlier PTU. Rolling back the later PTU shouldn't throw that extension away. For example, if the second PTU defines a template, and the third PTU requires that template to be instantiated at float, that template specialization is still part of the second PTU. Similarly, if the fifth PTU uses an inline function belonging to the fourth, that definition still belongs to the fourth. When we go to emit code in a new PTU, we map each declaration we have to emit back to its owning PTU and emit it in a new module for just the extensions to that PTU. We keep track of all the modules we've emitted for a PTU so that we can unload them all if we decide to roll it back.

Most declarations/definitions will only refer to entities from the same or earlier PTUs. However, it is possible (primarily by defining a previously-declared entity, but also through templates or ADL) for an entity that belongs to one PTU to refer to something from a later PTU. We will have to keep track of this and prevent unwinding to later PTU when we recognize it. Fortunately, this should be very rare; and crucially, we don't have to do the bookkeeping for this if we've only got one PTU, e.g. in normal compilation. Otherwise, PTUs after the first just need to record enough metadata to be able to revert any changes they've made to declarations belonging to earlier PTUs, e.g. to redeclaration chains or template specialization lists.

It should even eventually be possible for PTUs to provide their own slab allocators which can be thrown away as part of rolling back the PTU. We can maintain a notion of the active allocator and allocate things like Stmt/Expr nodes in it, temporarily changing it to the appropriate PTU whenever we go to do something like instantiate a function template. More care will be required when allocating declarations and types, though.

We would want the PTU to be efficiently recoverable from a Decl; I'm not sure how best to do that. An easy option that would cover most declarations would be to make multiple TranslationUnitDecls and parent the declarations appropriately, but I don't think that's good enough for things like member function templates, since an instantiation of that would still be parented by its original class. Maybe we can work this into the DC chain somehow, like how lexical DCs are.

Thank you for the details -- will be really useful to me and that architecture will define away several hard problems we tried to solve over the years.

Sorry for the delay. That seems like a reasonable summary of our discussion. Let me try to lay out the right architecture for this as I see it.

Conceptually, we can split the translation unit into a sequence of partial translation units (PTUs). Every declaration will be associated with a unique PTU that owns it.

The first key insight here is that the owning PTU isn't always the "active" (most recent) PTU, and it isn't always the PTU that the declaration "comes from". A new declaration (that isn't a redeclaration or specialization of anything) does belong to the active PTU. A template specialization, however, belongs to the most recent PTU of all the declarations in its signature — mostly that means that it can be pulled into a more recent PTU by its template arguments.

The second key insight is that processing a PTU might extend an earlier PTU. Rolling back the later PTU shouldn't throw that extension away. For example, if the second PTU defines a template, and the third PTU requires that template to be instantiated at float, that template specialization is still part of the second PTU. Similarly, if the fifth PTU uses an inline function belonging to the fourth, that definition still belongs to the fourth. When we go to emit code in a new PTU, we map each declaration we have to emit back to its owning PTU and emit it in a new module for just the extensions to that PTU. We keep track of all the modules we've emitted for a PTU so that we can unload them all if we decide to roll it back.

Most declarations/definitions will only refer to entities from the same or earlier PTUs. However, it is possible (primarily by defining a previously-declared entity, but also through templates or ADL) for an entity that belongs to one PTU to refer to something from a later PTU. We will have to keep track of this and prevent unwinding to later PTU when we recognize it. Fortunately, this should be very rare; and crucially, we don't have to do the bookkeeping for this if we've only got one PTU, e.g. in normal compilation. Otherwise, PTUs after the first just need to record enough metadata to be able to revert any changes they've made to declarations belonging to earlier PTUs, e.g. to redeclaration chains or template specialization lists.

It should even eventually be possible for PTUs to provide their own slab allocators which can be thrown away as part of rolling back the PTU. We can maintain a notion of the active allocator and allocate things like Stmt/Expr nodes in it, temporarily changing it to the appropriate PTU whenever we go to do something like instantiate a function template. More care will be required when allocating declarations and types, though.

We would want the PTU to be efficiently recoverable from a Decl; I'm not sure how best to do that. An easy option that would cover most declarations would be to make multiple TranslationUnitDecls and parent the declarations appropriately, but I don't think that's good enough for things like member function templates, since an instantiation of that would still be parented by its original class. Maybe we can work this into the DC chain somehow, like how lexical DCs are.

Would it make sense to have each Decl point to its owning PTU, similarly to what we do for the owning module (Decl::getOwningModule)?

In terms of future steps, do you prefer to try implementing what you suggested as part of this patch? I would prefer to land this patch and then add what we discussed here rather than keep piling to this already bulky patch.

Would it make sense to have each Decl point to its owning PTU, similarly to what we do for the owning module (Decl::getOwningModule)?

I think that's the interface we want, but actually storing the PTU in every Decl that way is probably prohibitive in memory overhead; we need some more compact way to recover it. But maybe it's okay to do something like that if we can spare a bit in Decl. Richard, thoughts here?

In terms of future steps, do you prefer to try implementing what you suggested as part of this patch? I would prefer to land this patch and then add what we discussed here rather than keep piling to this already bulky patch.

It depends on how much you think your patch is working towards that architecture. Since this is just infrastructure without much in the way of Sema/IRGen changes, it's probably fine. I haven't reviewed it yet, though, sorry.

Would it make sense to have each Decl point to its owning PTU, similarly to what we do for the owning module (Decl::getOwningModule)?

I think that's the interface we want, but actually storing the PTU in every Decl that way is probably prohibitive in memory overhead; we need some more compact way to recover it. But maybe it's okay to do something like that if we can spare a bit in Decl. Richard, thoughts here?

Ha, each Decl has a getTranslationUnitDecl() which may be rewired to point to the PTU...

In terms of future steps, do you prefer to try implementing what you suggested as part of this patch? I would prefer to land this patch and then add what we discussed here rather than keep piling to this already bulky patch.

It depends on how much you think your patch is working towards that architecture. Since this is just infrastructure without much in the way of Sema/IRGen changes, it's probably fine. I haven't reviewed it yet, though, sorry.

If you could skim through the patch it'd be great! I think the only bit that remotely touches on the new architecture is the Transaction class -- it is a pair of vector of decls and an llvm::Module. I think the vector of Decls would become a PTU in future.

@teemperor, could you take another look at this patch -- I believe most of your concerns are addressed now.

Sorry for the delay. I think all my points have been resolved beside the insertion SourceLoc.

clang/lib/Interpreter/IncrementalParser.cpp
185

I think my point then is: If we would change Clang's behaviour to consider the insertion order in the include tree when deciding the SourceLocation order, wouldn't that fix the problem? IIUC that's enough to make this case work the intended way without requiring some fake large source file. It would also make this work in other projects such as LLDB.

So IMHO we could just use StartOfFile as the loc here and then consider the wrong ordering as a Clang bug (and add a FIXME for that here).

Address source location comment of teemperor.

v.g.vassilev marked 2 inline comments as done.May 3 2021, 11:48 AM

Sorry for the delay. I think all my points have been resolved beside the insertion SourceLoc.

Apologies, I overlooked this comment. Should be fixed now.

teemperor accepted this revision.May 4 2021, 5:23 AM

I believe everything I pointed out so far has been resolved. I still have one more nit about the unit test though (just to not make it fail on some Windows setups).

FWIW, given that this is in quite the raw state at the moment, I wonder if we should actually keep it out of the list of installed binaries until it reaches some kind of MVP state. Once this landed this otherwise gets shipped as part of the next clang release and shows up in people's shells.

clang/unittests/Interpreter/InterpreterTest.cpp
87

You still need a #ifdef GTEST_HAS_DEATH_TEST around this (death tests aren't supported on some specific setups)

This revision is now accepted and ready to land.May 4 2021, 5:23 AM

(Obviously this should still wait on Richard & John as I think they still have unaddressed objections)

v.g.vassilev marked an inline comment as done.
  • Check if we can use death tests;
  • Do not install clang-repl as being an early stage development.

Thanks, @teemperor. I have addressed your last comments, too.

I wanted to use the opportunity to ping @rjmccall and @rsmith.

rsmith added inline comments.May 10 2021, 6:45 PM
clang/lib/Interpreter/IncrementalParser.cpp
226–235

What are these tokens, exactly? Are we sure it's safe to discard them rather than parsing them?

rsmith accepted this revision.May 10 2021, 6:49 PM

Generally-speaking, we have a plan that I'm happy for us to work towards, and I'm happy for our progress towards that plan to be incremental. Even though this might not be fully in that direction right now, I think that's OK.

clang-tidy and clang-format

This revision was landed with ongoing or failed builds.May 12 2021, 9:24 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2021, 9:24 PM

Thanks everybody for reviewing and help making this patch better!!

@v.g.vassilev, the test does not appear to be appropriately set up for builds that default to a non-native target:
https://lab.llvm.org/staging/#/builders/126/builds/371/steps/5/logs/FAIL__Clang__execute_cpp

Can you fix or revert until there is a fix?

@v.g.vassilev, the test does not appear to be appropriately set up for builds that default to a non-native target:
https://lab.llvm.org/staging/#/builders/126/builds/371/steps/5/logs/FAIL__Clang__execute_cpp

Can you fix or revert until there is a fix?

Apologies. Do you know if REQUIRES: native is sufficient to fix (trying to avoid churn)?

@v.g.vassilev, the test does not appear to be appropriately set up for builds that default to a non-native target:
https://lab.llvm.org/staging/#/builders/126/builds/371/steps/5/logs/FAIL__Clang__execute_cpp

Can you fix or revert until there is a fix?

Apologies. Do you know if REQUIRES: native is sufficient to fix (trying to avoid churn)?

@lhames and I are working on a patch. We do not have easy access to such machines. Would you mind testing it on the bot?

@v.g.vassilev, the test does not appear to be appropriately set up for builds that default to a non-native target:
https://lab.llvm.org/staging/#/builders/126/builds/371/steps/5/logs/FAIL__Clang__execute_cpp

Can you fix or revert until there is a fix?

Apologies. Do you know if REQUIRES: native is sufficient to fix (trying to avoid churn)?

A speculative application of the above probably helps and would be harmless (I think).

@lhames and I are working on a patch. We do not have easy access to such machines. Would you mind testing it on the bot?

I have a local build I can apply a patch to.

...
I have a local build I can apply a patch to.

Hi Hubert,

Could you apply the following patch and let me know the output from the failing test? I'm trying to work out whether the JIT is getting the triple or the data layout wrong.

diff --git a/clang/tools/clang-repl/ClangRepl.cpp b/clang/tools/clang-repl/ClangRepl.cpp
index b5b5bf6e0c6b..cbf67f0e163e 100644
--- a/clang/tools/clang-repl/ClangRepl.cpp
+++ b/clang/tools/clang-repl/ClangRepl.cpp
@@ -57,6 +57,12 @@ int main(int argc, const char **argv) {
   llvm::InitializeNativeTarget();
   llvm::InitializeNativeTargetAsmPrinter();
 
+  auto JTMB = ExitOnErr(llvm::orc::JITTargetMachineBuilder::detectHost());
+  llvm::errs() << "triple:     " << JTMB.getTargetTriple().str() << "\n";
+  llvm::errs() << "datalayout: "
+               << ExitOnErr(JTMB.getDefaultDataLayoutForTarget())
+                      .getStringRepresentation()
+               << "\n";
   if (OptHostSupportsJit) {
     auto J = llvm::orc::LLJITBuilder().create();
     if (J)

Hi Hubert,

Could you apply the following patch and let me know the output from the failing test? I'm trying to work out whether the JIT is getting the triple or the data layout wrong.

I've started the build. My tree was a bit stale, so it might not be the fastest.

Hi Hubert,

Could you apply the following patch and let me know the output from the failing test? I'm trying to work out whether the JIT is getting the triple or the data layout wrong.

******************** TEST 'Clang :: Interpreter/execute.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/FileCheck /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp
--
Exit Code: 1

Command Output (stderr):
--
triple:     powerpc64-ibm-aix7.2.0.0
datalayout: E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)
/home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp:7:11: error: CHECK: expected string not found in input
// CHECK: i = 42
          ^
<stdin>:1:1: note: scanning from here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> 
^
<stdin>:1:9: note: possible intended match here
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> 
        ^

Input file: <stdin>
Check file: /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp

-dump-input=help explains the following input dump.

Input was:
<<<<<<
           1: clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl>  
check:7'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
check:7'1             ?                                                                                                                                                                 possible intended match
>>>>>>

--

********************

...
Command Output (stderr):

triple: powerpc64-ibm-aix7.2.0.0
datalayout: E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512
error: Added modules have incompatible data layouts: e-m:e-i64:64-n32:64-S128-v256:256:256-v512:512:512 (module) vs E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512 (jit)

Thanks @hubert.reinterpretcast!

Ok, looks like the JIT is getting the layout right, but clang-repl is constructing a module with an little-endian layout for some reason. I'm not sure why that would be but it's probably a question for @v.g.vassilev tomorrow.

In the mean time I have conditionally disabled this test on ppc64 in 71a0609a2b.

Ok, looks like the JIT is getting the layout right, but clang-repl is constructing a module with an little-endian layout for some reason.

clang-repl is generating a module consistent with LLVM_DEFAULT_TARGET_TRIPLE:STRING=powerpc64le-linux-gnu.
So it seems this test might need some way of telling clang-repl to use the host triple.

In the mean time I have conditionally disabled this test on ppc64 in 71a0609a2b.

That will help our builds; thanks.

Hi @hubert.reinterpretcast,

Would you mind testing this patch:

diff
diff --git a/clang/lib/Interpreter/Interpreter.cpp b/clang/lib/Interpreter/Interpreter.cpp
index 8de38c0afcd9..79acb5bd6898 100644
--- a/clang/lib/Interpreter/Interpreter.cpp
+++ b/clang/lib/Interpreter/Interpreter.cpp
@@ -157,7 +157,7 @@ IncrementalCompilerBuilder::create(std::vector<const char *> &ClangArgv) {
   ParseDiagnosticArgs(*DiagOpts, ParsedArgs, &Diags);
 
   driver::Driver Driver(/*MainBinaryName=*/ClangArgv[0],
-                        llvm::sys::getDefaultTargetTriple(), Diags);
+                        llvm::sys::getProcessTriple(), Diags);
   Driver.setCheckInputsExist(false); // the input comes from mem buffers
   llvm::ArrayRef<const char *> RF = llvm::makeArrayRef(ClangArgv);
   std::unique_ptr<driver::Compilation> Compilation(Driver.BuildCompilation(RF));
diff --git a/clang/test/Interpreter/execute.cpp b/clang/test/Interpreter/execute.cpp
index a9beed5714d0..81ab57e955cf 100644
--- a/clang/test/Interpreter/execute.cpp
+++ b/clang/test/Interpreter/execute.cpp
@@ -1,6 +1,5 @@
 // RUN: cat %s | clang-repl | FileCheck %s
 // REQUIRES: host-supports-jit
-// UNSUPPORTED: powerpc64
 
 extern "C" int printf(const char *, ...);
 int i = 42;

Hi @hubert.reinterpretcast,

Would you mind testing this patch:

Running it now. I've applied the first diff here to the base of my previously reported result to minimize build time.

Hi @hubert.reinterpretcast,

Would you mind testing this patch:

Does the test try to generate native object files in some way? There is functionality (with some limitations) for that under 32-bit AIX; however, we're running a 64-bit build and we don't have integrated-as capability for that at this time. This is what I'm seeing:

******************** TEST 'Clang :: Interpreter/execute.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/FileCheck /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp
--
Exit Code: 2

Command Output (stderr):
--
clang-repl: Driver initialization failed. Incremental mode for action is not supported
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/FileCheck /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp

--

********************

Hi @hubert.reinterpretcast,

Would you mind testing this patch:

Does the test try to generate native object files in some way? There is functionality (with some limitations) for that under 32-bit AIX; however, we're running a 64-bit build and we don't have integrated-as capability for that at this time. This is what I'm seeing:

******************** TEST 'Clang :: Interpreter/execute.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/FileCheck /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp
--
Exit Code: 2

Command Output (stderr):
--
clang-repl: Driver initialization failed. Incremental mode for action is not supported
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/FileCheck /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp

--

********************

That looks like progress! Thanks a lot for investing the time to do that. I tried to get access to such a machine but we don't have them off the shelf.

The test is supposed to create an incremental Clang and create a JIT. clang-repl will take the original ProgramAction was and try to turn it into an incremental action using the WrappedFrontendAction and add EmitLLVMOnlyAction. That is done here.

The diagnostics tells us that we cannot turn some ProgramAction into an incremental one. I am wondering what that action is. clang-repl -Xcc -v should be able to tell us.

@v.g.vassilev, thanks for working with me on this. I understand it is difficult to handle issues that appear on platforms and build configurations one does not have set up.

I've added -Xcc -v and the output is below. It seems it has to do with the implicit -fno-integrated-as currently used with AIX. I'll paste the result with -Xcc -fintegrated-as in my next comment.

$ cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl -Xcc -v
clang version 13.0.0
Target: powerpc64-ibm-aix7.2.0.0
Thread model: posix
InstalledDir:
 "" -cc1 -triple powerpc64-ibm-aix7.2.0.0 -S -disable-free -main-file-name "<<< inputs >>>" -mrelocation-model pic -pic-level 2 -mframe-pointer=all -fmath-errno -fno-rounding-math -fno-verbose-asm -no-integrated-as -target-cpu pwr7 -mfloat-abi hard -mllvm -treat-scalable-fixed-error-as-warning -gstrict-dwarf -gno-column-info -debugger-tuning=dbx -v -fdata-sections -fcoverage-compilation-dir=/home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build -resource-dir lib/clang/13.0.0 -internal-isystem lib/clang/13.0.0/include -internal-isystem /usr/include -fdeprecated-macro -fno-dwarf-directory-asm -fdebug-compilation-dir=/home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build -ferror-limit 19 -fno-signed-char -fno-use-cxa-atexit -fgnuc-version=4.2.1 -fcxx-exceptions -fexceptions -fcolor-diagnostics -fxl-pragma-pack -o "/tmp/hstong/auto.2021W19/<<< inputs >>>-e534ce.s" -x c++ "<<< inputs >>>"
 "/usr/bin/as" -a64 -many -o "<<< inputs >>>.o" "/tmp/hstong/auto.2021W19/<<< inputs >>>-e534ce.s"
clang-repl: Driver initialization failed. Incremental mode for action is not supported

Once I add -Xcc -fintegrated-as, we get:

$ cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl -Xcc -fintegrated-as
fatal error: error in backend: 64-bit XCOFF object files are not supported yet.
clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> 

I am not sure if there's something to avoid this other than to XFAIL the test somehow while the 64-bit XCOFF integrated-as capability is still pending.
If you have some ideas, please let me know. Meanwhile, I am trying out system-aix as the "feature" to XFAIL on.

I still think the switch to the use the process triple in https://reviews.llvm.org/D96033#2759808 is perhaps correct (so even if it does not help currently for the configuration we have running, it could be worthwhile to commit).

If you have some ideas, please let me know. Meanwhile, I am trying out system-aix as the "feature" to XFAIL on.

https://reviews.llvm.org/D102560 posted to use system-aix so other PPC configurations will run the test. Possible further changes is for the code to choose the process triple, for the test to possibly specify -fintegrated-as, and (alternatively) for the system assembler step to be integrated into this use scenario.

If you have some ideas, please let me know. Meanwhile, I am trying out system-aix as the "feature" to XFAIL on.

https://reviews.llvm.org/D102560 posted to use system-aix so other PPC configurations will run the test. Possible further changes is for the code to choose the process triple, for the test to possibly specify -fintegrated-as, and (alternatively) for the system assembler step to be integrated into this use scenario.

Thanks for the fix. Indeed I will commit the change wrt getting the process triple although it may require further adjustments for out-of-process execution.

I'd like to avoid requiring addition of the intricate fintegrated-as in the test (but also from users). It seems that the clang driver creates a program action. In cases where we have fintegrated-as it appends -emit-obj which is then converted to the frontend::EmitObj action. In that case clang-repl, just ignores this and replaces it with frontend::EmitLLVMOnly. IIUC, AIX has a default fno-integrated-as and I am still puzzled (and cannot find the relevant code) what is the program action kind for AIX in that case (frontend::???). FWIW, I tried to add some functionality to write it to the diagnostics message but it requires opening too many interfaces.

Maybe applying that diff can give us a hint:

diff --git a/clang/lib/Interpreter/IncrementalParser.cpp b/clang/lib/Interpreter/IncrementalParser.cpp
index 70baabfeb8fb..4c2292e0bde9 100644
--- a/clang/lib/Interpreter/IncrementalParser.cpp
+++ b/clang/lib/Interpreter/IncrementalParser.cpp
@@ -55,6 +55,7 @@ public:
                 std::errc::state_not_recoverable,
                 "Driver initialization failed. "
                 "Incremental mode for action is not supported");
+            printf("ActionKind=%d\n", CI.getFrontendOpts().ProgramAction);
             return Act;
           case frontend::ASTDump:
             LLVM_FALLTHROUGH;

IIUC, AIX has a default fno-integrated-as and I am still puzzled (and cannot find the relevant code) what is the program action kind for AIX in that case (frontend::???).

This is rooted in IsIntegratedAssemblerDefault and the relevant code is more a relevant lack of code. The default implementation, ToolChain::IsIntegratedAssemblerDefault, returns false.

Maybe applying that diff can give us a hint:

Output is:

$ cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl
ActionKind=7
clang-repl: Driver initialization failed. Incremental mode for action is not supported

IIUC, AIX has a default fno-integrated-as and I am still puzzled (and cannot find the relevant code) what is the program action kind for AIX in that case (frontend::???).

This is rooted in IsIntegratedAssemblerDefault and the relevant code is more a relevant lack of code. The default implementation, ToolChain::IsIntegratedAssemblerDefault, returns false.

Maybe applying that diff can give us a hint:

Output is:

$ cat /home/hstong/.Liodine/llvmproj/clang/test/Interpreter/execute.cpp | /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/build/bin/clang-repl
ActionKind=7
clang-repl: Driver initialization failed. Incremental mode for action is not supported

Thanks!

That patch should probably get us to a point where we can mark the test as XFAIL: system-aix

diff --git a/clang/lib/Interpreter/IncrementalParser.cpp b/clang/lib/Interpreter/IncrementalParser.cpp
index 70baabfeb8fb..84b4d779d43c 100644
--- a/clang/lib/Interpreter/IncrementalParser.cpp
+++ b/clang/lib/Interpreter/IncrementalParser.cpp
@@ -54,7 +54,8 @@ public:
             Err = llvm::createStringError(
                 std::errc::state_not_recoverable,
                 "Driver initialization failed. "
-                "Incremental mode for action is not supported");
+                "Incremental mode for action %d is not supported",
+                CI.getFrontendOpts().ProgramAction);
             return Act;
           case frontend::ASTDump:
             LLVM_FALLTHROUGH;
@@ -63,6 +64,8 @@ public:
           case frontend::ParseSyntaxOnly:
             Act = CreateFrontendAction(CI);
             break;
+          case frontend::EmitAssembly:
+            LLVM_FALLTHROUGH;
           case frontend::EmitObj:
             LLVM_FALLTHROUGH;
           case frontend::EmitLLVMOnly:

If that works on your platform I will happily open a review for the changes.

That patch should probably get us to a point where we can mark the test as XFAIL: system-aix

I've applied that patch to my 64-bit LLVM build and it does cause the object writer to be used (which generates the "64-bit XCOFF object files" error seen before).

If that works on your platform I will happily open a review for the changes.

From the behaviour observed in the 64-bit build, the 32-bit case might not fail similarly. I need to see if I can get a 32-bit build going. For now, the patch would be welcome (without changing from UNSUPPORTED to XFAIL).

If that works on your platform I will happily open a review for the changes.

From the behaviour observed in the 64-bit build, the 32-bit case might not fail similarly. I need to see if I can get a 32-bit build going. For now, the patch would be welcome (without changing from UNSUPPORTED to XFAIL).

The 32-bit case fails elsewhere:

clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> clang-repl> Not yet implemented!
UNREACHABLE executed at /home/hstong/.Nrtphome/.Liodine/llcrossbld/dev/llvm-project/llvm/lib/Object/XCOFFObjectFile.cpp:219!
IOT/Abort trap (core dumped)

So, I don't think the 32-bit and 64-bit cases are going to be synchronized.

Looks like this is also failing on s390x:

error: Added modules have incompatible data layouts: E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-a:8:16-n32:64 (module) vs E-m:e-i1:8:16-i8:8:16-i64:64-f128:64-v128:64-a:8:16-n32:64 (jit)

The problem here is that on s390x we use a different data layout on machines with vector registers vs. machines without. The (module) string above is the version without vector registers (which is presumably selected because there is no -march= argument and the compiler therefore defaults to an old machine), and the (jit) string is the version with vector registers (which is presumably because the jit auto-detected that it is running on a new machine).

I guess we should either tell the JIT to not autodetect the current processor, or else tell the compiler to target the processor that the JIT autodetected?

@hubert.reinterpretcast, thanks for the feedback. I have created a patch as discussed -- https://reviews.llvm.org/D102688

@uweigand, thanks for reaching out. I believe the patch above should fix your setup. Could you confirm?

@hubert.reinterpretcast, thanks for the feedback. I have created a patch as discussed -- https://reviews.llvm.org/D102688

@uweigand, thanks for reaching out. I believe the patch above should fix your setup. Could you confirm?

Unfortunately, it does not. Changing the triple doesn't affect the architecture the compiler generates code for. If you wanted to change the compiler to generate code for the architecture the JIT detects, the easiest way would probably be to use (the equivalent of) "-march=native", which causes the compiler to also auto-detect the current processor in the same way as the JIT does.

phosek added a subscriber: phosek.May 18 2021, 9:52 AM

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. clang-repl combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. clang-repl combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?

Yes, they are, see https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but there isn't much information in there unfortunately. It's possible that we're hitting some limit, but these bots use 32-core instances with 128GB RAM which I'd hope is enough even for the LTO build.

gulfem added a subscriber: gulfem.May 18 2021, 11:09 AM

@hubert.reinterpretcast, thanks for the feedback. I have created a patch as discussed -- https://reviews.llvm.org/D102688

@uweigand, thanks for reaching out. I believe the patch above should fix your setup. Could you confirm?

Unfortunately, it does not. Changing the triple doesn't affect the architecture the compiler generates code for. If you wanted to change the compiler to generate code for the architecture the JIT detects, the easiest way would probably be to use (the equivalent of) "-march=native", which causes the compiler to also auto-detect the current processor in the same way as the JIT does.

Ah, okay. Could you try this patch:

diff --git a/clang/lib/Interpreter/IncrementalExecutor.cpp b/clang/lib/Interpreter/IncrementalExecutor.cpp
index f999e5eceaed..9a368d9122bc 100644
--- a/clang/lib/Interpreter/IncrementalExecutor.cpp
+++ b/clang/lib/Interpreter/IncrementalExecutor.cpp
@@ -26,12 +26,14 @@
 namespace clang {
 
 IncrementalExecutor::IncrementalExecutor(llvm::orc::ThreadSafeContext &TSC,
-                                         llvm::Error &Err)
+                                         llvm::Error &Err,
+                                         const llvm::Triple &Triple)
     : TSCtx(TSC) {
   using namespace llvm::orc;
   llvm::ErrorAsOutParameter EAO(&Err);
 
-  if (auto JitOrErr = LLJITBuilder().create())
+  auto JTMB = JITTargetMachineBuilder(Triple);
+  if (auto JitOrErr = LLJITBuilder().setJITTargetMachineBuilder(JTMB).create())
     Jit = std::move(*JitOrErr);
   else {
     Err = JitOrErr.takeError();
diff --git a/clang/lib/Interpreter/IncrementalExecutor.h b/clang/lib/Interpreter/IncrementalExecutor.h
index c4e33a390942..b4c6ddec1047 100644
--- a/clang/lib/Interpreter/IncrementalExecutor.h
+++ b/clang/lib/Interpreter/IncrementalExecutor.h
@@ -14,6 +14,7 @@
 #define LLVM_CLANG_LIB_INTERPRETER_INCREMENTALEXECUTOR_H
 
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/Triple.h"
 #include "llvm/ExecutionEngine/Orc/ExecutionUtils.h"
 
 #include <memory>
@@ -34,7 +35,8 @@ class IncrementalExecutor {
   llvm::orc::ThreadSafeContext &TSCtx;
 
 public:
-  IncrementalExecutor(llvm::orc::ThreadSafeContext &TSC, llvm::Error &Err);
+  IncrementalExecutor(llvm::orc::ThreadSafeContext &TSC, llvm::Error &Err,
+                      const llvm::Triple &Triple);
   ~IncrementalExecutor();
 
   llvm::Error addModule(std::unique_ptr<llvm::Module> M);
diff --git a/clang/lib/Interpreter/Interpreter.cpp b/clang/lib/Interpreter/Interpreter.cpp
index 79acb5bd6898..025bdb14c54f 100644
--- a/clang/lib/Interpreter/Interpreter.cpp
+++ b/clang/lib/Interpreter/Interpreter.cpp
@@ -16,6 +16,7 @@
 #include "IncrementalExecutor.h"
 #include "IncrementalParser.h"
 
+#include "clang/AST/ASTContext.h"
 #include "clang/Basic/TargetInfo.h"
 #include "clang/CodeGen/ModuleBuilder.h"
 #include "clang/CodeGen/ObjectFilePCHContainerOperations.h"
@@ -204,8 +205,11 @@ llvm::Expected<Transaction &> Interpreter::Parse(llvm::StringRef Code) {
 llvm::Error Interpreter::Execute(Transaction &T) {
   assert(T.TheModule);
   if (!IncrExecutor) {
+    const llvm::Triple &Triple =
+      getCompilerInstance()->getASTContext().getTargetInfo().getTriple();
     llvm::Error Err = llvm::Error::success();
-    IncrExecutor = std::make_unique<IncrementalExecutor>(*TSCtx, Err);
+    IncrExecutor = std::make_unique<IncrementalExecutor>(*TSCtx, Err, Triple);
+
     if (Err)
       return Err;
   }

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. clang-repl combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?

Yes, they are, see https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but there isn't much information in there unfortunately. It's possible that we're hitting some limit, but these bots use 32-core instances with 128GB RAM which I'd hope is enough even for the LTO build.

I think the specs are fine for just building with LTO, but I am not sure if that's enough to for the worst case when running ninja -j320 with an LTO build (which is what your job is doing). Can you try limiting your link jobs to something like 16 or 32 (e.g., -DLLVM_PARALLEL_LINK_JOBS=32)

(FWIW, your go build script also crashes with OOM errors so you really are running low on memory on that node)

Yes, this patch fixes the problem for me. Thanks!

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. clang-repl combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?

Yes, they are, see https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but there isn't much information in there unfortunately. It's possible that we're hitting some limit, but these bots use 32-core instances with 128GB RAM which I'd hope is enough even for the LTO build.

I think the specs are fine for just building with LTO, but I am not sure if that's enough to for the worst case when running ninja -j320 with an LTO build (which is what your job is doing). Can you try limiting your link jobs to something like 16 or 32 (e.g., -DLLVM_PARALLEL_LINK_JOBS=32)

Right: On my system, linking clang with LTO takes 11.5 GB of RAM while clang-repl takes 8.4 GB. If the system has 128 GB, I agree that the issue is likely too many parallel links; the addition of clang-repl triggers this because it adds another large binary that the build system has to process at some point.

@uweigand, thanks again for confirming the patch works for you. The differential revision is here https://reviews.llvm.org/D102756

@teemperor and @Hahnfeld, thanks for the diagnosis. I forgot to mention that we had some exchange on Discord and @phosek kindly agreed to try this special flag.

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. clang-repl combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?

Yes, they are, see https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but there isn't much information in there unfortunately. It's possible that we're hitting some limit, but these bots use 32-core instances with 128GB RAM which I'd hope is enough even for the LTO build.

I think the specs are fine for just building with LTO, but I am not sure if that's enough to for the worst case when running ninja -j320 with an LTO build (which is what your job is doing). Can you try limiting your link jobs to something like 16 or 32 (e.g., -DLLVM_PARALLEL_LINK_JOBS=32)

(FWIW, your go build script also crashes with OOM errors so you really are running low on memory on that node)`

-j320 is only used for the first stage compiler which uses distributed compilation and no LTO, the second stage which uses LTO and where we see this issue uses Ninja default, so -j32 in this case.

We've started seeing LLVM ERROR: out of memory on our 2-stage LTO Linux builders after this change landed. It looks like linking clang-repl always fails on our bot, but I've also seen OOM when linking ClangCodeGenTests and FrontendTests. Do you have any idea why this could be happening? We'd appreciate any help since our bots have been broken for several days now.

Ouch. Are the bot logs public? If not maybe a stacktrace could be useful. clang-repl combines a lot of libraries across llvm and clang that usually are compiled separately. For instance we put in memory most of the clang frontend, the backend and the JIT. Could it be we are hitting some real limit?

Yes, they are, see https://luci-milo.appspot.com/p/fuchsia/builders/prod/clang-linux-x64, but there isn't much information in there unfortunately. It's possible that we're hitting some limit, but these bots use 32-core instances with 128GB RAM which I'd hope is enough even for the LTO build.

I think the specs are fine for just building with LTO, but I am not sure if that's enough to for the worst case when running ninja -j320 with an LTO build (which is what your job is doing). Can you try limiting your link jobs to something like 16 or 32 (e.g., -DLLVM_PARALLEL_LINK_JOBS=32)

(FWIW, your go build script also crashes with OOM errors so you really are running low on memory on that node)`

-j320 is only used for the first stage compiler which uses distributed compilation and no LTO, the second stage which uses LTO and where we see this issue uses Ninja default, so -j32 in this case.

I admit I don't really know the CI system on your node, but I assumed you're using -j320 from this output which I got by clicking on "execution details" on the aborted stage of this build:

Executing command [
  '/b/s/w/ir/x/w/cipd/ninja',
  '-j320',
  'stage2-check-clang',
  'stage2-check-lld',
  'stage2-check-llvm',
  'stage2-check-polly',
]
escaped for shell: /b/s/w/ir/x/w/cipd/ninja -j320 stage2-check-clang stage2-check-lld stage2-check-llvm stage2-check-polly
in dir /b/s/w/ir/x/w/staging/llvm_build
at time 2021-05-18T20:53:37.215574

(Tabbed on the Submit button instead of finishing that quote block...)

So I assume the stage2- targets are just invoking some ninja invocations in sequence?

Anyway, what I think it would be helpful to see what link jobs were in progress. But I guess even with 32 jobs you could end up with enough memory in 32 linker invocations that processes start get OOM killed.

We're still hitting the OOMs when building clang-repl with LTO even with -DLLVM_PARALLEL_LINK_JOBS=32. While we don't build this target explicitly in our toolchain, it is built when running tests via stage2-check-clang. Is there perhaps a simple cmake flag that allows us to not run clang-repl tests so we don't build it?

We're still hitting the OOMs when building clang-repl with LTO even with -DLLVM_PARALLEL_LINK_JOBS=32. While we don't build this target explicitly in our toolchain, it is built when running tests via stage2-check-clang. Is there perhaps a simple cmake flag that allows us to not run clang-repl tests so we don't build it?

To clarify, this is on a machine with 512GB RAM.

We're still hitting the OOMs when building clang-repl with LTO even with -DLLVM_PARALLEL_LINK_JOBS=32. While we don't build this target explicitly in our toolchain, it is built when running tests via stage2-check-clang. Is there perhaps a simple cmake flag that allows us to not run clang-repl tests so we don't build it?

To clarify, this is on a machine with 512GB RAM.

@leonardchan, @phosek, I am not aware of such flag. Do you know how much memory does LTO + clang-repl consume and would it make sense to ping some of the LTO folks for their advice?

@leonardchan, @phosek, I am not aware of such flag. Do you know how much memory does LTO + clang-repl consume and would it make sense to ping some of the LTO folks for their advice?

I played around a bit with adding a flag and it turned out to be not as difficult as I thought it would: D108173. I think making this more of an "opt-in/out" tool like clang-staticanalyzer might be a good idea in general since not all downstream users may want to built all clang tools.

@leonardchan, @phosek, I am not aware of such flag. Do you know how much memory does LTO + clang-repl consume and would it make sense to ping some of the LTO folks for their advice?

I played around a bit with adding a flag and it turned out to be not as difficult as I thought it would: D108173. I think making this more of an "opt-in/out" tool like clang-staticanalyzer might be a good idea in general since not all downstream users may want to built all clang tools.

I would rather fix the underlying issue. @samitolvanen, @pcc do you know who can help us with debugging excessive memory use when linking clang-repl using LTO?