The blocking version of ExecutionSession::lookup() uses std::promise() to wait
for the result of a compilation.
In RTDyldObjectLinkingLayer::onObjEmit() another thread could have compiled
the object and notify the promise via R.notifyEmitted(), that the object was emitted, before
the memory manager of the object was registered with the resource tracker.
Thus the code could already be executed and the object removed from the
JIT, before the memory manager was registered, leading to a errors
about a defunct resource tracker.
Details
- Reviewers
lhames
Diff Detail
Unit Tests
Event Timeline
Added missing check on Expected result of lookup.
Apply ugly clang-format include reordering (this is not even alphabetic, is this just about editing distance to the containing file name?).
This is a really nice catch. Thank you for finding this!
I suspect the ObjectLinkingLayer has a similar issue. I'll look into that.
llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp | ||
---|---|---|
366–391 | This call to notifyObjectLoaded will reference the moved-from MemMgr argument if there are any listeners attached. The address of the MemMgr should be recorded in a local variable for use in the notifyObjectLoaded call. It would also be good if MemMgr were removed from the MemMgrs vector if the call to notifyEmitted fails. |
llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp | ||
---|---|---|
366–391 | Ouch, how did I miss that std::move()... But this would move the problem to the event listener. When the event listener is called, the other thread (unblocked by R.notifyEmitted()) could already have called ExecutionEngine::removeJITDylib() resulting in the memory manager to be removed in RTDyldObjectLinkingLayer::handleRemoveResources(). Hm, RtDyldObjectLinkingLayer::onObjEmit() calls three types of notification functions:
Do you think, it would be possible to change the order to
? This would then require to also undo adding the memory manager and other things by calling handleRemoveResources, if MaterializationResponsibility::notifyEmitted() failed. The OrcJITTests seem to be OK with this... |
llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp | ||
---|---|---|
366–391 |
Good point!
If the unit and regression tests pass then I'd be inclined to land that. If it breaks out-of-tree clients we may need to revert and reconsider. What's your JIT use case and architecture? Have you looked at ObjectLinkingLayer / JITLink yet? We're going to deprecate (and eventually remove) RuntimeDyld in favor of JITLink, so it's even more important that we get this right in ObjectLinkingLayer. |
Finally found some time to continue looking into this...
ObjectLinkingLayer doesn't have this problem because it uses this order in ObjectLinkingLayerJITLinkContext:
- ObjectLinkingLayer::notifyEmitted():
- ObjectLinkingLayer::Plugin::notifyEmitted()
- Add FinalizedAlloc to Allocs
- MaterializationResponsiblity::notifyEmitted() (note: The FinalizedAlloc is not explicitly removed from Allocs, if this fails. I didn't check, whether this happens indirectly)
We use the JIT to compile and execute shader code on the CPU (we also have backends for GPU). The problem occurred, when a mesh with 3 mio triangles was displaced on ARM with 8 CPUs and the same (we fixed that) displacement shader code was compiled 300 times in parallel.
Our JIT is based on the BuildingAJIT tutorial, not emitting lazily, using the ConcurrentIRCompiler, searching for symbols in specific modules to avoid problems with duplicate names and using an own copy of DefaultMMapper to avoid a use-after-free problem in the SectionMemoryManager, when things happen in the wrong order during application shutdown. All modules are linked with one "runtime" BareJITDylib, which only consists of absolute symbols pointing to hand-picked runtime functions. We're currently still using LLVM 12, but good to know, that I will have to rewrite the JIT integration again as with every LLVM upgrade ^_^
Update order as discussed to first EventListeners, then NotifyEmitted, then adding the MemMgr and then MaterializationResponsibility::notifyEmitted().
This is basically the same order as in ObjectLinkingLayer now.
ORCJitTests and LLVM regression tests are successful for me.
Ping!
I won't have access to my computer for two months starting on Wednesday, so if you still want anything to be changed before you can approve this, there's not much time for me left ^_^
This call to notifyObjectLoaded will reference the moved-from MemMgr argument if there are any listeners attached.
The address of the MemMgr should be recorded in a local variable for use in the notifyObjectLoaded call. It would also be good if MemMgr were removed from the MemMgrs vector if the call to notifyEmitted fails.