Page MenuHomePhabricator

[ORC] Add generic initializer/deinitializer support.
ClosedPublic

Authored by lhames on Feb 9 2020, 7:09 PM.

Details

Summary

Background:

Initializers and deinitializers may be used to implement static constructor
and destructor support, runtime registration (e.g. ObjC) and other tasks
that would typically be performed when a dylib/shared-object is loaded or
unloaded.

Until now, MCJIT and ORC have provided limited support for static constructors
and destructors by scanning the llvm.global_ctors and llvm.global_dtors
variables. This approach suffers from two main drawbacks: It requires LLVM IR to
enable initializer discovery (objects loaded directly or from a compile cache
will not have their initializers run), and not all initializers are described by
llvm.global_ctors (e.g. ObjC registration is described by special sections
instead).

This patch introduces several interdependent changes to ORCv2 to enable full
support for initializers and deinitializers:

(1) The MaterializationUnit and MaterializationResponsibility classes are
extended to describe an optional "initializer" symbol for the module (see the
getInitializerSymbol method on each class). The presence or absence of this
symbol indicates whether the module contains any initializers or
deinitializers. The initializer symbol otherwise behaves like any other:
searching for it triggers materialization.

(2) A new Platform interface is introduced in llvm/ExecutionEngine/Orc/Core.h
which provides the following callback interface:

  • Error setupJITDylib(JITDylib &JD): Can be used to install standard symbols in JITDylibs upon creation. E.g. __dso_handle.
  • Error notifyAdding(JITDylib &JD, const MaterializationUnit &MU): Generally used to record initializer symbols.
  • Error notifyRemoving(JITDylib &JD, VModuleKey K): Used to notify a platform that a module is being removed.
  • void notifyReady(JITDylib &JD, SymbolNameVector Symbols): Called to notify a platform that some symbols are ready but have not yet been initialized.

    Platform implementations can use these callbacks to track outstanding

initializers and implement a platform-specific approach for executing them. For
example, the MachOPlatform installs plugins in the JIT linker and uses them to
scan both the __mod_inits section (for C++ static constructors) and ObjC
metadata sections, then register and run initializers according to JITDylib
link order (i.e. in the same order that they would have been run if the modules
had been linked into corresponding dylibs on the command line).

(3) An initialized state for symbols. The initialized state comes after the
existing states (NeverSearched, Materializing, Resolved, Emitted, Ready) and
can be used to prevent symbol queries from returning until the searched-for
symbol has been initialized.

This patch updates LLJIT to use this scheme for execution of static initializers
and deinitializers. Two LLJIT::PlatformSupport classes are implemented: A
GenericIR platform and a MachO platform. The GenericIR platform implements a
modified version of the previous llvm.global-ctor scraping scheme to provide
support for Windows and Linux. LLJIT's MachO platform support uses the
MachOPlatform class to provide MachO specific initialization (currently just
static initializers scraped from the __mod_inits section, but will include
ObjC support in the near future).

Diff Detail

Event Timeline

lhames created this revision.Feb 9 2020, 7:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2020, 7:09 PM

Hi Lang, I had a look at the first half of your changes. So far it looks really great!
A few mini notes inline and a larger one that's semi-related. I will continue with the second half and run this by the ThinLtoJIT example asap.

llvm/lib/ExecutionEngine/Orc/CompileOnDemandLayer.cpp
152

Is GV promotion here related to the initializers change? Is it always necessary or can we skip it in the whole module partitioning case?

165

This is an ObjC special case right? Can we have a comment for it?

llvm/lib/ExecutionEngine/Orc/Core.cpp
1137

Unintended newline I guess.

1924

The typical ErrorList vs. reportError() case as discussed in D72176. Isn't it a similar situation as with the OnComplete handler in ReExportsMaterializationUnit::materialize()? That one reports the incoming error.

While looking at this, should it be the handler's responsibility at all to think about incoming errors? Is there a deeper reason for this? Otherwise, couldn't the NotifyComplete callback bring a Optional<SymbolMap> instead of an Expected<SymbolMap>? IIUC AsynchronousSymbolQuery has two distinct code paths for success vs. error already: handleComplete() and handleFailed(). The latter could call NotifyComplete(None) and handle the error independently. However, it's not directly connected to this review.

llvm/lib/ExecutionEngine/Orc/LLJIT.cpp
37

Typo: tha -> that
This is how far I got.

As expected this doesn't affect my code much so far. Only the symbol promotion changes in CompileOnDemandLayer (see inline comment). What do you think?

llvm/lib/ExecutionEngine/Orc/CompileOnDemandLayer.cpp
167

ThinLtoJIT's lookups in the global index are based on name and flags of a symbol and now uses the promoted ones. Probably, I can work around this, but I wonder if promotion and renaming is at all necessary for the whole module partitioning case. Shouldn't translation units stay intact then and linkage be fine as it is?

llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp
91–92

LLVM_DEBUG([...])

lhames updated this revision to Diff 244801.Feb 14 2020, 6:25 PM
  • [ORC] Enable initializer support Objective C runtime registration.
lhames marked 5 inline comments as done.Feb 16 2020, 2:21 PM

Thanks very much for the review Stefan!

I'll apply your suggestions and update the patch shortly.

The part of this patch that I think still needs the most thought is the new state: Initialized.

Initially I thought I would want this to prevent symbols from being looked up (in the default case) until they were initialized. In practice though I'm not sure it's needed: dlsym interposes for JIT'd code can't use it: You're allowed to dlsym a symbol in initializers (i.e. before the initializer for that symbol is complete), and for clients they can just wait until their initialization call returns before looking up any symbols that depend on initialization and then they're safe.

So the question, since the code already been written, is: Can we think of any use for gating on initializers having run, or should we rip this part back out for now to simplify the patch? The benefit to removing it would be to eliminate some new APIs (e.g. Platform::notifySymbolsReady) and reduce some session locking since the platform won't need to re-lock the session to move the symbols to the initialized state.

Let me know if you have any thoughts on this. I probably shouldn't hold the patch up too much longer either way though: we can fix this in-tree after this lands easily enough.

llvm/lib/ExecutionEngine/Orc/CompileOnDemandLayer.cpp
152

I think it would be possible to skip it in the whole module case -- I will give that a try.

165

It's not ObjC specific, but I think it is hardcoding something that it shouldn't be:

The \01 bit comes from https://llvm.org/docs/LangRef.html#identifier : Symbols with that prefix should not be mangled. This is fine.

The 'L' that follows is meant to capture the assembler private prefix, but we should probably do that with DL.getPrivatePrefix rather than hardcoding it.

This issue is orthogonal to this patch so I'll fix it on trunk.

llvm/lib/ExecutionEngine/Orc/Core.cpp
1137

Fixed. Thanks!

1924

The typical ErrorList vs. reportError() case as discussed in D72176. Isn't it a similar situation as with the OnComplete handler in ReExportsMaterializationUnit::materialize()? That one reports the incoming error.

Yep. This was just laziness on my part while getting the prototype up and running. This should probably just log the errors and return a new "couldn't find some initializer symbols" error.

While looking at this, should it be the handler's responsibility at all to think about incoming errors? Is there a deeper reason for this? Otherwise, couldn't the NotifyComplete callback bring a Optional<SymbolMap> instead of an Expected<SymbolMap>?

I considered using Optional rather than Expected when I was designing the lookup API, but in the end decided that it was better to force clients to explicitly acknowledge a failure result: In some of my prototypes that used Optional I ran into issues when I accidentally continued past a failure result.

llvm/lib/ExecutionEngine/Orc/LLJIT.cpp
37

Fixed. :)

Will initializer symbols be looked up and run again when calling initialize() a second time on a JITDylib? More specifically, is it possible for a client to break this process down and initialize specific (or simply all uninitialized) MaterializationUnits? Otherwise, wouldn't it mean modules cannot be added (and initialized) to a JITDylib after initializiation (without re-init)?

lli orc-lazy             can we also do this?

add all modules          add some modules <--
run all initializers     run their initializers <--
lookup                   lookup
materialize              materialize
execute                  execute, 
call-through             call-through
lookup                   lookup
materialize              add more modules <--
execute                  run their initializers <--
...                      materialize
                         execute

Can we think of any use for gating on initializers having run, or should we rip this part back out for now to simplify the patch?

Not sure if this is related?

llvm/tools/lli/lli.cpp
927

If we called this again, would it re-initialize everything?

Will initializer symbols be looked up and run again when calling initialize() a second time on a JITDylib? More specifically, is it possible for a client to break this process down and initialize specific (or simply all uninitialized) MaterializationUnits? Otherwise, wouldn't it mean modules cannot be added (and initialized) to a JITDylib after initializiation (without re-init)?

lli orc-lazy             can we also do this?

add all modules          add some modules <--
run all initializers     run their initializers <--
lookup                   lookup
materialize              materialize
execute                  execute, 
call-through             call-through
lookup                   lookup
materialize              add more modules <--
execute                  run their initializers <--
...                      materialize
                         execute

Whether the initializers would be re-run is up to the platform.

The platform gets two key pieces of information from ORC: setupJITDylib -- called when new dylibs are created, and notifyAdding -- called when modules are added to the JITDylib. Generally, Platform implementations will need to keep a list of initializers to be run and how this list is managed will determine the answer to your question. Here are some example schemes:

(1) Prohibit any use of initializers

This can be achieved by returning an error from Platform::notifyAdding override. The platform wouldn't provide an initialization API, because it could never do anything useful.

(2) Allow support for exactly one round of initialization (this is your example above)

This can be achieved by maintaining a set of already-initialized JITDylibs in the Platform instance. In the notifyAdding method, if the JITDylib is already in the initialized set then we return an error, e.g.:

if (MU.getInitializerSymbol() != nullptr && AlreadyInitialized.count(&JD))
  make_error<StringError>("dylib " + JD.getName() + " has already been initialized. No new initializers can be added");

This is a sensible mode for a platform that's trying to emulate the execution environment of a normal statically compiled program. Arguably you could go even further and prohibit any new modules from being added after initialization, but this can be tricky: You'd need carve-outs for things like CompileOnDemandLayer's lazily compiled function/partition modules.

(3) On initialization, run newly added initializers only

This may be interesting to REPLs that want to allow new code (including initializers) to be added over time. In this case the Platform would maintain a list of not-yet-executed initializers. This would be appended to each time a new module is added via notifyAdding, and cleared when initializers are run. In this case a call to initialize is a no-op if no new initializers have been added since the last call.

(4) Re-run all initializers every time

I'm not sure who would want such a mode, but it's interesting to discuss as a prelude to something that is useful.

In this mode there would just be a list of initializers to run. New modules that contain initializers would append to this list, but it would never be cleared. Calling initialize (however it's defined for the platform) would run through the whole list each time, essentially resetting it to a state as if it had just been loaded, except that already compiled code would remain present in memory.

I don't think this is useful as a primary mode, but some variant of this might be useful as an addition to mode (2) if clients want to make dlclosing/re-dlopening a dylib relatively efficient by keeping the linked data and text in memory after dlclose is called. In that case you would modify (2) and replace the already-initialized set with a map from JITDylibs to their initializer list. Repeat dlopen operations would be no-ops, but a dlclose would transfer the already-run initializers back to a pending state so that they could be run again on the next dlopen.

Can we think of any use for gating on initializers having run, or should we rip this part back out for now to simplify the patch?

Not sure if this is related?

I don't think so. The initialized state gives JIT clients a way to do something that you can't do in static code, which is issue a lookup that's guaranteed not to return until the static initializers for the target have run.

I think I'm going to take this back out for now: We can always add it back in later, but there's no sense adding an experimental feature with no use case to an already complicated patch.

lhames updated this revision to Diff 245058.Feb 17 2020, 5:37 PM
  • [ORC] Remove extraneous whitespace
  • [ORC] Fix typo in comment
  • [ORC] Address Stefan's feedback, plus remove the 'initialized' state and associated transitions.
sgraenitz accepted this revision.Feb 18 2020, 7:55 AM

Whether the initializers would be re-run is up to the platform.

Great!

I think I'm going to take this back out for now: We can always add it back in later, but there's no sense adding an experimental feature with no use case to an already complicated patch.

Agree, so from my side this is ready once the on-demand layer skips promotion for globals in the whole module case.

This revision is now accepted and ready to land.Feb 18 2020, 7:55 AM
lhames updated this revision to Diff 245306.Feb 18 2020, 5:05 PM
  • [ORC][examples] Fix examples
  • [ORC] Don't add initializer symbols for llvm.global_ctors decls.
  • [ORC] Replace init symbols in MaterializationResponsibility::replace.
  • [ORC] Don't promote globals for whole-module partitions in CompileOnDemandLayer.

I've updated CompileOnDemandLayer. Tomorrow I will clang-format everything and land the patch when I have time to watch the bots for fallout.

Thanks very much for the feedback Stefan!

This revision was automatically updated to reflect the committed changes.

Follow-up commits to fix build failures on the bots:

9df65ca19e5a4f5e49d036b190d814ca8a1784a0 -- [ORC] Qualify nullptr_t.
63d0932c358012bcfc1a87f78a5f05bafc2dc86d -- [ORC] Fix a missing move.
f5efa08247c7b5dded4e6854c49c9f49b96398f2 -- [JITLink] Fix testcase for main JITDylib rename in 85fb997659b.

dyung added a subscriber: dyung.Feb 20 2020, 1:44 AM

I think there is still at least one bot failure due to this change:

http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/62923

7.710 [49/74/85] Building CXX object examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoJIT.cpp.o
FAILED: examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoJIT.cpp.o 
/usr/bin/clang++  -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Iexamples/ThinLtoJIT -I/home/buildslave/ps4-buildslave4/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/llvm-project/llvm/examples/ThinLtoJIT -Iinclude -I/home/buildslave/ps4-buildslave4/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/llvm-project/llvm/include -std=c++11 -Wdocumentation -Wno-documentation-deprecated-sync -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3     -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoJIT.cpp.o -MF examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoJIT.cpp.o.d -o examples/ThinLtoJIT/CMakeFiles/ThinLtoJIT.dir/ThinLtoJIT.cpp.o -c /home/buildslave/ps4-buildslave4/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoJIT.cpp
/home/buildslave/ps4-buildslave4/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoJIT.cpp:161:12: error: taking the address of a temporary object of type 'Expected<llvm::orc::JITDylib &>' [-Waddress-of-temporary]
  MainJD = &ES.createJITDylib("main");
           ^~~~~~~~~~~~~~~~~~~~~~~~~~
/home/buildslave/ps4-buildslave4/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/llvm-project/llvm/examples/ThinLtoJIT/ThinLtoJIT.cpp:161:12: error: assigning to 'llvm::orc::JITDylib *' from incompatible type 'Expected<llvm::orc::JITDylib &> *'
  MainJD = &ES.createJITDylib("main");
           ^~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.

Can you take a look?

ftynse added a subscriber: ftynse.EditedFeb 20 2020, 5:44 AM

This also broke MLIR's usage of ORC (https://github.com/llvm/llvm-project/blob/master/mlir/lib/ExecutionEngine/ExecutionEngine.cpp, error https://buildkite.com/mlir/mlir-core/builds/2874#f5df5065-4124-4abe-8783-d9c2c1649e73/348-522). In particular, it can no longer resolve the symbols available in the parent process, such as malloc.

thakis added inline comments.
llvm/lib/ExecutionEngine/Orc/Core.cpp
1898

This broke building with clang and gcc 5.3 --gcc-toolchain:

/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap-install/bin/clang++  -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/ExecutionEngine/Orc -I/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/ExecutionEngine/Orc -I/usr/include/libxml2 -Iinclude -I/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/include --gcc-toolchain=/b/s/w/ir/cache/builder/src/third_party/llvm-build-tools/gcc530trusty -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -fprofile-generate="/b/s/w/ir/cache/builder/src/third_party/llvm-instrumented/profiles" -O3 -DNDEBUG    -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -fno-rtti -std=c++14 -MD -MT lib/ExecutionEngine/Orc/CMakeFiles/LLVMOrcJIT.dir/Core.cpp.o -MF lib/ExecutionEngine/Orc/CMakeFiles/LLVMOrcJIT.dir/Core.cpp.o.d -o lib/ExecutionEngine/Orc/CMakeFiles/LLVMOrcJIT.dir/Core.cpp.o -c /b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/ExecutionEngine/Orc/Core.cpp


/b/s/w/ir/cache/builder/src/third_party/llvm/llvm/lib/ExecutionEngine/Orc/Core.cpp:1898:8: error: no type named 'condition_variable' in namespace 'std'
  std::condition_variable CV;
  ~~~~~^

For some reason, stage 1 (gcc 5.3) is happy, but stage 2 isn't.

Maybe it just needs an #include <condition_variable> up top?

thakis added inline comments.Feb 20 2020, 6:59 AM
llvm/lib/ExecutionEngine/Orc/Core.cpp
1898
hans added a subscriber: hans.Feb 20 2020, 7:36 AM

Maybe it just needs an #include <condition_variable> up top?

Yes, that seems to do it. Added in 1f984c83a41bb68b8af4998a7f68876d52bff872

Maybe it just needs an #include <condition_variable> up top?

Yes, that seems to do it. Added in 1f984c83a41bb68b8af4998a7f68876d52bff872

Thanks very much Hans!

@thakis, @ftnyse, @dyung — Sorry for the fallout. I’ve got a fix for the ThinLtoJIT example building now, and I’ll look in to the symbol resolution issue next.

@ftynse, @mehdi_amini, @rriddle — This patch changed the way that LLJIT’s DataLayout member is initialized, but I’m still not sure exactly how it’s affecting your use case.

What happens if you explicitly set the LLJIT instance’s data layout by adding a ‘setDataLayout’ call to your LLJITBuilder? E.g.

auto jit =
    cantFail(llvm::orc::LLJITBuilder()
                 .setDataLayout(dataLayout)
                 .setCompileFunctionCreator(compileFunctionCreator)
                 .setObjectLinkingLayerCreator(objectLinkingLayerCreator)
                 .create());

@lhames still failing with the same error. The use case should be similar to that of lli. Are we missing search order specification maybe?

The ThinLtoJIT example should be fixed by 70d8fec7c94.

@lhames still failing with the same error. The use case should be similar to that of lli. Are we missing search order specification maybe?

@ftynse -- Ok, the DataLayout change didn't break it. That's comforting on some level. I'll have a bit more of a read of the MLIR ExecutionEngine code and see what I can spot. Does this only reproduce on Linux?

Oops -- The ThinLtoJIT example should be fixed by 6de21c556d1, rather.

@ftynse I'll be on discord if you want to chat about this in real time. Side note: I noticed you're getting an assertion after the error:

/var/lib/buildkite-agent/builds/buildkite-6d6d45d7dd-4njcg-1/mlir/mlir-core/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:149: llvm::orc::SymbolStringPool::~SymbolStringPool(): Assertion `Pool.empty() && "Dangling references at pool destruction time"' failed.

That indicates that you're holding SymbolStringPtrs allocated by LLJIT after the JIT is torn down, but I can't see where they'd be held. If you can arrange for the LLJIT instance to outlive these strings it will eliminate this assertion.

Oops -- The ThinLtoJIT example should be fixed by 6de21c556d1, rather.

@ftynse I'll be on discord if you want to chat about this in real time. Side note: I noticed you're getting an assertion after the error:

I just pinged on discord :)

/var/lib/buildkite-agent/builds/buildkite-6d6d45d7dd-4njcg-1/mlir/mlir-core/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:149: llvm::orc::SymbolStringPool::~SymbolStringPool(): Assertion `Pool.empty() && "Dangling references at pool destruction time"' failed.

That indicates that you're holding SymbolStringPtrs allocated by LLJIT after the JIT is torn down, but I can't see where they'd be held. If you can arrange for the LLJIT instance to outlive these strings it will eliminate this assertion.

Yeah, the string pool errors are orthogonal, but very unfortunate. Those really need to be fixed.

Oops -- The ThinLtoJIT example should be fixed by 6de21c556d1, rather.

@ftynse I'll be on discord if you want to chat about this in real time. Side note: I noticed you're getting an assertion after the error:

I just pinged on discord :)

Thanks for picking this up! :) I'm in a different time zone so it may be tricky for me to be more reactive.

/var/lib/buildkite-agent/builds/buildkite-6d6d45d7dd-4njcg-1/mlir/mlir-core/llvm/include/llvm/ExecutionEngine/Orc/SymbolStringPool.h:149: llvm::orc::SymbolStringPool::~SymbolStringPool(): Assertion `Pool.empty() && "Dangling references at pool destruction time"' failed.

That indicates that you're holding SymbolStringPtrs allocated by LLJIT after the JIT is torn down, but I can't see where they'd be held. If you can arrange for the LLJIT instance to outlive these strings it will eliminate this assertion.

Yeah, the string pool errors are orthogonal, but very unfortunate. Those really need to be fixed.

Indeed, I remember seeing this quite a while ago, but it only manifests in the situations where the executor is going to abort anyway, so we did not spend too much time trying to fix it.