This patch is a first step in reducing redundancy in addRelocations() implementations across ELF JITLink backends. It factors out common logic for ELF relocation traversal into the new helper function forEachRelocation() in the ELFLinkGraphBuilder base class.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
As discussed in D108986. Furthermore the patch:
- aligns the format of debug output with the MachO implementation (seemed more comprehensible)
- aims to be NFC otherwise -- ping me if I introduced a functional change by accident
- replaces auto with concrete type names where they aren't obvious or irrelevant
- tries to use more consistent names
Thanks very much for this Stefan!
Is there a reason that forEachRelocation works one-section-at-a-time? I thought that forEachRelocation could embed the loop over the sections within it to simplify addRelocations further. I.e. in most cases it should reduce to something like:
Error addRelocations() override { LLVM_DEBUG(dbgs() << "Processing relocations:\n"); return forEachRelocation(RelSect, this, &ELFJITLinker_aarch64::addSingleRelocation); }
There may be a good reason not to do that in ELF though?
llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h | ||
---|---|---|
113–118 | I think forEachRelocation should be a template method here, rather than taking a std::function: /// Func should be callable as: /// Error(const typename ELFT::Rela &, /// const typename ELFT::Shdr &, Section &) template <typename RelocHandlerFunction> Error forEachRelocation(const typename ELFT::Shdr &RelSect, RelocHandlerFunction &&Func, bool ProcessDebugSections) { ... } It might be handy to provide a method version for convenience too: template <typename ClassT, typename RelocHandlerMethod> Error forEachRelocation(const typename ELFT::Shdr &RelSect, ClassT *Instance, RelocHandlerMethod *Method, bool ProcessDebugSections) { return forEachRelocation( RelSect, [Instance, Method](const typename ELFT::Rela &Rel, const typename ELFT::Shdr &Target, Section &GS) { return (Instance->*Method)(Rel, Target, GS); }, ProcessDebugSections); } With that, I think the iteration below could be reduced to: using Base = ELFLinkGraphBuilder<ELFT>; for (const auto &RelSect : Base::Sections) if (Error Err = Base::forEachRelocation(RelSect, this, &ELFJITLinker_aarch64::addSingleRelocation)) return Err; |
Turn std::function into template parameter
llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h | ||
---|---|---|
113–118 | Agree, that's much better! |
I started like this and then realized that the x86_64 backend does an additional check on the type of the relocation section: https://github.com/llvm/llvm-project/blob/15e9575fb5988a66aa6e57a55818b54b575dd795/llvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp#L217 Not sure if this specific use-case is really relevant, but I started thinking it might be worth having this option. (Especially since, logically, this loop doesn't add any value to the forEachRelocation function.)
Thinking again now: Assuming we skip most sections immediately, this probably adds unreasonable overhead. Because now there's 2 function calls before checking the type against ELF::SHT_REL(A). And there's not much hope for inlining even with the templated parameter, right? Would be really bad with -ffunction-sections.
So, yes for performance reasons I'd change that and put the loop back inside. @lhames What do you think?
Thanks for making these changes Stefan. I'm happy with that approach. I think we would need to do some profiling to determine the practical performance impacts of these decisions before we bothered tuning further.
Now for a fun digression (only tangentially related to this, so you can skip it if you're busy): My motivation for asking you to remove the std::function was my intuition was that it would block inlining of the call to the per-relocation handling method. I didn't have any hard evidence to back that intuition up though, so for fun I thought I would write a quick test. In this test I let 'ints' stand in for relocation objects, since they should make things as easy as possible for the inliner:
#include <functional> void doSomethingWith(int); class MyLinker { public: void handleRelocation(int X) { doSomethingWith(X); } }; template <typename Func> void functionObjectWalk(const int *V, Func F) { while (*V != 0) F(*V++); } template <typename ClassT, typename MethT> void methodWalk(const int *V, ClassT *Instance, MethT Method) { functionObjectWalk(V, [Instance, Method](int X) { (Instance->*Method)(X); }); } void stdfunctionWalk(const int *V, std::function<void(int)> F) { while (*V != 0) F(*V++); } void testFunctionObjectWalk() { MyLinker ML; int V[] = { 1, 2, 3, 0 }; functionObjectWalk(V, [&](int X) { ML.handleRelocation(X); }); } void testMethodWalk() { MyLinker ML; int V[] = { 1, 2, 3, 0 }; methodWalk(V, &ML, &MyLinker::handleRelocation); } void testStdFunctionWalk() { MyLinker ML; int V[] = { 1, 2, 3, 0 }; stdfunctionWalk(V, [&](int X) { ML.handleRelocation(X); }); }
I compiled this with a released Xcode clang with: clang++ -std=c++17 -fno-exceptions -fno-rtti -fno-asynchronous-unwind-tables -O3 -S -o testcase.s testcase.cpp (the options helped to keep the assembly readable). As expected: functionObjectWalk and methodWalk produce equivalent results with neatly inlined calls to doSomethingWith, whereas stdfunctionWalk produces a longer loop with an indirect call. I'm going to call my intuition "tentatively confirmed". I'm not quite motivated enough to dig into the assembly for a release build of LLVM at the moment. ;)
I think forEachRelocation should be a template method here, rather than taking a std::function:
It might be handy to provide a method version for convenience too:
With that, I think the iteration below could be reduced to: