The pass claiming symbols has to run before the pass that marks all
claimed symbols as active. This fixes JIT codegen of code involving
exceptions on RISC-V (which was failing with a missing definition of
DW.ref.__gxx_personality_v0), but actually throwing and then catching
exceptions will need more work to register EH frames with libunwind.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,060 ms | x64 debian > MLIR.Examples/standalone::test.toy |
Event Timeline
The code was added in rG4dc110a4b83c625502092b127ca0e9eb13824297 which mentions ELF/x86-64. I played a bit with the code and a trivial example still seems to work on x86 if I revert that commit. I'm not sure if there have been other changes in the backend since then, but this change seems to be necessary on RISC-V. However, if there are other ways of doing it I'd be happy to learn about them.
Can you share the object(s) that produce the error that you're seeing?
4dc110a4b83c625502092b127ca0e9eb13824297 caused us to claim responsibility for weak symbols that survived dead stripping. To trigger the issue that you're seeing I guess that we need to have a symbol that's (a) inserted late in the compiler pipeline, (b) unreferenced in the file that defines it, and (c) used elsewhere. Does that match what you're seeing?
@lhames I'm running lli with the LLVM IR emitted by Clang for something like
int main() { try { throw 1; } catch (...) { } return 0; }
4dc110a4b83c625502092b127ca0e9eb13824297 caused us to claim responsibility for weak symbols that survived dead stripping. To trigger the issue that you're seeing I guess that we need to have a symbol that's (a) inserted late in the compiler pipeline, (b) unreferenced in the file that defines it, and (c) used elsewhere. Does that match what you're seeing?
Yes, I've mentioned that commit in my previous comment. I argue that the change in there never worked, and nobody noticed because it's seemingly not needed for x86.
I think I botched the commit message for that commit and that might be contributing to the confusion: Prior to that commit any weak symbol that was referenced in the graph, but wasn't in the responsibility set was turned into an external symbol (not dead-stripped, as the commit message says). That would result in a missing definition error within the graph containing the definition. That was definitely a bug, and the original commit fixed that issue.
This looks different: The symbol doesn't appear to be referenced in the graph (or if it is then this seems like a dead-stripping bug), but does appear to be referenced somewhere else (otherwise why are we getting a missing symbol error?).
Can you share the IR that you're feeding in? I'd like to try to reproduce this and understand what's going on.
But then, what test case do I have to run to see the problem after reverting that commit locally?
Can you share the IR that you're feeding in? I'd like to try to reproduce this and understand what's going on.
I posted the C++ code above, just compile with clang++ -target riscv64-unknown-linux-gnu -S -emit-llvm throw.cpp...
As mentioned in the commit message, there was no testcase added because lli did not support JITLink at the time.
Stefan fixed that in 23973e0aac1e1, so now you should be able to run
lli -relocation-model=pic -jit-linker=jitlink eh.ll
where eh.ll is the IR for your testcase.
Can you share the IR that you're feeding in? I'd like to try to reproduce this and understand what's going on.
I posted the C++ code above, just compile with clang++ -target riscv64-unknown-linux-gnu -S -emit-llvm throw.cpp...
Thanks! I should be able to take a look this weekend.
It fails consistently for me if I revert the given patch. My test config is docker container running x86-64 Linux, with a Debug build of Clang/LLVM, compiling the test case with
% clang++ -emit-llvm -S -o eh.ll eh.cpp
and running it with
% lli -relocation-model=pic -jit-linker=jitlink eh.ll
With 4dc110a4b83 applied this works as expected. With that commit reverted it crashes (from memory the original failure mode was an error, but I may have misremembered, or intervening patches may have affected it).
What config are you testing with?
This is still a side-issue for the actual review though -- I just need to find time to look at what RISCV is doing with this IR that's producing a weak def that's not needed locally, but still needed somewhere.
Indeed, I did not run with -relocation-model=pic - I thought this was the default for JITlink anyway... I'm now building for RISC-V and will see if it makes a difference there, maybe that will give us a clue.
AHA, I finally see what's going on by comparing the debug output from JITLinkGeneric.cpp between x86_64 and riscv64: On riscv64, DW.ref.__gxx_personality_v0 is dead-stripped because it's referenced from .eh_frame, which is equally removed. On x86, you can get the same effect by commenting out the three PrePrunePasses related to EH frame handling in link_ELF_x86_64, and then I get the same error
JIT session error: Missing definitions in module catch.ll-jitted-objectbuffer: [ DW.ref.__gxx_personality_v0 ] ./bin/lli: Failed to materialize symbols: { (main, { DW.ref.__gxx_personality_v0, __clang_call_terminate, main }) }
I guess we're forgetting to remove some references to the dead-stripped symbols, somewhere?
And sure enough,
diff diff --git a/llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp b/llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp index 7b4da052ed36..e79dd2e5e74c 100644 --- a/llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp +++ b/llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp @@ -426,7 +426,7 @@ private: std::vector<std::pair<SymbolStringPtr, Symbol *>> NameToSym; auto ProcessSymbol = [&](Symbol *Sym) { - if (Sym->hasName() && Sym->getLinkage() == Linkage::Weak && + if (Sym->isLive() && Sym->hasName() && Sym->getLinkage() == Linkage::Weak && Sym->getScope() != Scope::Local) { auto Name = ES.intern(Sym->getName()); if (!MR->getSymbols().count(ES.intern(Sym->getName()))) {
seems to fix it. However, then I get crashes on x86_64 for lli -jit-linker=jitlink --relocation-model=pic catch.ll:
JIT session error: In catch.ll-jitted-objectbuffer, encountered duplicate section ".group" while building debug object JIT session error: Unexpected definitions in module catch.ll-jitted-objectbuffer: [ DW.ref.__gxx_personality_v0 ] ./bin/lli: Failed to materialize symbols: { (main, { __clang_call_terminate, main }) }
The first error about duplicate sections is there even without the change, but I can't remember it from September. Is that something recent?
Hi Jonas, thats interesting results! I don't have the time right now to dig into the details, but..
There was a discussion on Discord these days about it: https://discord.com/channels/636084430946959380/687692371038830597/1033427215448293396
The error originates from JITLink's DebugObjectManagerPlugin and it should be fixed with: https://github.com/llvm/llvm-project/commit/b26f45e5a49ae363164e7dbbf57eadd9e78d612c
Does it work for you? Otherwise, can you disable the plugin temporarily and try again?
JIT session error: Unexpected definitions in module catch.ll-jitted-objectbuffer: [ DW.ref.__gxx_personality_v0 ] ./bin/lli: Failed to materialize symbols: { (main, { __clang_call_terminate, main }) }
Just a guess: Maybe the EHFrame plugin adds the extra dependency here. Do you need exception handling and/or backtraces? Maybe check and disable the plugin temporarily as well?
Yes, the error is gone now. Thanks for letting me know!
Unfortunately, my patch above still doesn't work on x86 with the EHFrame support. I'll need to debug...
JIT session error: Unexpected definitions in module catch.ll-jitted-objectbuffer: [ DW.ref.__gxx_personality_v0 ] ./bin/lli: Failed to materialize symbols: { (main, { __clang_call_terminate, main }) }Just a guess: Maybe the EHFrame plugin adds the extra dependency here. Do you need exception handling and/or backtraces? Maybe check and disable the plugin temporarily as well?
Yes, it obviously does. Yes, eventually I need and want exception handling (after all, I'm compiling with exceptions and want to throw and catch them at some point), but right now this isn't implemented for RISC-V. So there is nothing that I could disable to fix the problem, but I pointed out that disabling the plugin on x86 leads to similar problems there, so it's a general problem (as opposed to specific to RISC-V).
Okay, I think this was just the pass running at the wrong time. Superseded by https://reviews.llvm.org/D136877