This is an archive of the discontinued LLVM Phabricator instance.

[JITLink] Factor out forEachRelocation() function from addRelocations() in ELF Aarch64 backend (NFC)
ClosedPublic

Authored by sgraenitz on Sep 9 2021, 8:57 AM.

Details

Summary

This patch is a first step in reducing redundancy in addRelocations() implementations across ELF JITLink backends. It factors out common logic for ELF relocation traversal into the new helper function forEachRelocation() in the ELFLinkGraphBuilder base class.

Diff Detail

Event Timeline

sgraenitz created this revision.Sep 9 2021, 8:57 AM
sgraenitz requested review of this revision.Sep 9 2021, 8:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2021, 8:57 AM
sgraenitz added a comment.EditedSep 9 2021, 9:08 AM

As discussed in D108986. Furthermore the patch:

  • aligns the format of debug output with the MachO implementation (seemed more comprehensible)
  • aims to be NFC otherwise -- ping me if I introduced a functional change by accident
  • replaces auto with concrete type names where they aren't obvious or irrelevant
  • tries to use more consistent names

Thanks very much for this Stefan!

Is there a reason that forEachRelocation works one-section-at-a-time? I thought that forEachRelocation could embed the loop over the sections within it to simplify addRelocations further. I.e. in most cases it should reduce to something like:

Error addRelocations() override {
  LLVM_DEBUG(dbgs() << "Processing relocations:\n");
  return forEachRelocation(RelSect, this, &ELFJITLinker_aarch64::addSingleRelocation);
}

There may be a good reason not to do that in ELF though?

llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h
113–118

I think forEachRelocation should be a template method here, rather than taking a std::function:

/// Func should be callable as:
///   Error(const typename ELFT::Rela &,
///         const typename ELFT::Shdr &, Section &)

template <typename RelocHandlerFunction>
Error forEachRelocation(const typename ELFT::Shdr &RelSect,
                        RelocHandlerFunction &&Func, bool ProcessDebugSections) {
  ...
}

It might be handy to provide a method version for convenience too:

template <typename ClassT, typename RelocHandlerMethod>
Error forEachRelocation(const typename ELFT::Shdr &RelSect,
                        ClassT *Instance, RelocHandlerMethod *Method,
                        bool ProcessDebugSections) {
  return forEachRelocation(
      RelSect,
      [Instance, Method](const typename ELFT::Rela &Rel,
                         const typename ELFT::Shdr &Target,
                         Section &GS) {
        return (Instance->*Method)(Rel, Target, GS);
      },
      ProcessDebugSections);
}

With that, I think the iteration below could be reduced to:

using Base = ELFLinkGraphBuilder<ELFT>;
for (const auto &RelSect : Base::Sections)
  if (Error Err =
        Base::forEachRelocation(RelSect, this,
                                &ELFJITLinker_aarch64::addSingleRelocation))
    return Err;
sgraenitz updated this revision to Diff 372099.Sep 11 2021, 3:33 PM
sgraenitz marked an inline comment as done.

Turn std::function into template parameter

llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h
113–118

Agree, that's much better!

Is there a reason that forEachRelocation works one-section-at-a-time? I thought that forEachRelocation could embed the loop over the sections within it to simplify addRelocations further.

I started like this and then realized that the x86_64 backend does an additional check on the type of the relocation section: https://github.com/llvm/llvm-project/blob/15e9575fb5988a66aa6e57a55818b54b575dd795/llvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp#L217 Not sure if this specific use-case is really relevant, but I started thinking it might be worth having this option. (Especially since, logically, this loop doesn't add any value to the forEachRelocation function.)

Thinking again now: Assuming we skip most sections immediately, this probably adds unreasonable overhead. Because now there's 2 function calls before checking the type against ELF::SHT_REL(A). And there's not much hope for inlining even with the templated parameter, right? Would be really bad with -ffunction-sections.

So, yes for performance reasons I'd change that and put the loop back inside. @lhames What do you think?

lhames accepted this revision.Sep 12 2021, 12:57 AM

Thinking again now: Assuming we skip most sections immediately, this probably adds unreasonable overhead. Because now there's 2 function calls before checking the type against ELF::SHT_REL(A). And there's not much hope for inlining even with the templated parameter, right? Would be really bad with -ffunction-sections.

So, yes for performance reasons I'd change that and put the loop back inside. @lhames What do you think?

Thanks for making these changes Stefan. I'm happy with that approach. I think we would need to do some profiling to determine the practical performance impacts of these decisions before we bothered tuning further.

Now for a fun digression (only tangentially related to this, so you can skip it if you're busy): My motivation for asking you to remove the std::function was my intuition was that it would block inlining of the call to the per-relocation handling method. I didn't have any hard evidence to back that intuition up though, so for fun I thought I would write a quick test. In this test I let 'ints' stand in for relocation objects, since they should make things as easy as possible for the inliner:

#include <functional>

void doSomethingWith(int);

class MyLinker {
public:
  void handleRelocation(int X) { doSomethingWith(X); }
};


template <typename Func>
void functionObjectWalk(const int *V, Func F) {
  while (*V != 0)
    F(*V++);
}

template <typename ClassT, typename MethT>
void methodWalk(const int *V, ClassT *Instance, MethT Method) {
  functionObjectWalk(V,
                     [Instance, Method](int X) {
                       (Instance->*Method)(X);
                     });
}

void stdfunctionWalk(const int *V,
                     std::function<void(int)> F) {
  while (*V != 0)
    F(*V++);
}

void testFunctionObjectWalk() {
  MyLinker ML;
  int V[] = { 1, 2, 3, 0 };
  functionObjectWalk(V, [&](int X) { ML.handleRelocation(X); });
}

void testMethodWalk() {
  MyLinker ML;
  int V[] = { 1, 2, 3, 0 };
  methodWalk(V, &ML, &MyLinker::handleRelocation);
}

void testStdFunctionWalk() {
  MyLinker ML;
  int V[] = { 1, 2, 3, 0 };
  stdfunctionWalk(V, [&](int X) { ML.handleRelocation(X); });
}

I compiled this with a released Xcode clang with: clang++ -std=c++17 -fno-exceptions -fno-rtti -fno-asynchronous-unwind-tables -O3 -S -o testcase.s testcase.cpp (the options helped to keep the assembly readable). As expected: functionObjectWalk and methodWalk produce equivalent results with neatly inlined calls to doSomethingWith, whereas stdfunctionWalk produces a longer loop with an indirect call. I'm going to call my intuition "tentatively confirmed". I'm not quite motivated enough to dig into the assembly for a release build of LLVM at the moment. ;)

This revision is now accepted and ready to land.Sep 12 2021, 12:57 AM
This revision was landed with ongoing or failed builds.Sep 13 2021, 5:59 AM
This revision was automatically updated to reflect the committed changes.