This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Implement JITLoaderGDB ObjectLinkingLayer plugin for ELF x86-64
AbandonedPublic

Authored by sgraenitz on Feb 12 2021, 12:27 PM.

Details

Reviewers
lhames
dblaikie
Summary

First proposal for debugging support in JITLink. For now, only in-process x86-64 ELF. Please consider it a work-in-progress.

The JITLinkGDBLoaderPlugin forwards object files from ObjectLinkingLayer to an attached debugger through the GDB JIT Interface [1]. It implements ObjectLinkingLayer::Plugin::notifyLoaded() and expects an additional argument: a callback function that prepares a in-memory debug object for the given MaterializationResponsibility on request. The patch aims to keep overhead in generic execution paths minimal. Additional efforts for debug registration should reside in the callback.

Furthermore, responsibility for the preparation of the actual debug object buffer lies with the respective JITLink backend, so we can keep all format/architecture-specific code in one place. In this first sketch it's accomplished with another callback from linkPhase1 to the backend. While the JITLinkContext might be considered the natural owner of the function pointer, it is currently stored in the LinkGraph, because this is our only connection to the backend. The goal was to keep patch small and seek feedback first.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html

Diff Detail

Event Timeline

sgraenitz created this revision.Feb 12 2021, 12:27 PM
sgraenitz requested review of this revision.Feb 12 2021, 12:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2021, 12:27 PM

This should work with the current state:

> cat jitbp.cpp
int jitbp() { return 0; }
int main() { return jitbp(); }
> clang -c -g -fPIC --target=x86_64-unknown-unknown-elf -o jitbp.o jitbp.cpp
> bin/lldb -o 'settings set plugin.jit-loader.gdb.enable on' -o 'b runAsMain' -o 'run jitbp.o' bin/llvm-jitlink
Process 65394 stopped
(lldb) b jitbp
Breakpoint 2: where = JIT(0x7ffff7fcd000)`jitbp() + 6 at jitbp.cpp:1:15, address = 0x00007ffff7fce006
(lldb) c
Process 65394 resuming
(lldb) Process 65394 stopped
* thread #1, name = 'llvm-jitlink', stop reason = breakpoint 2.1
    frame #0: 0x00007ffff7fce006 JIT(0x7ffff7fcd000)`jitbp() at jitbp.cpp:1:15
-> 1    int jitbp() { return 0; }
   2    int main() { return jitbp(); }

This is very exciting.

This is adopting RuntimeDyld's approach of patching section addresses to inform the debugger about final locations. I think this is ideal for a prototype, but probably not the right approach to the final version. I've only got two loosely formed thoughts on this so far:

(1) We shouldn't special-case the JITLinkContext callbacks except as a last resort. If we have to use them at all we should prefer making any information required available via generic structures like the LinkGraph. In this case I think you could build a SectionRange for each JITLink::Section rather than rely on the SegmentLayout.

(2) If we constructed the debug object file from scratch (rather than mutating the existing one) then we could

  • support debug info for raw LinkGraphs (very attractive for long term performance -- avoids serializing / deserializing to objects),
  • reduce memory usage by not redundantly representing non-debug sections in the debug object
  • potentially dead strip debug info to further reduce memory overhead

Whether any of this is reasonable, and how much would it would be is another question. Probably time for me to learn some DWARF. :)

Dave -- do you have any immediate thoughts?

Hi Lang, thanks for your feedback!

(2) If we constructed the debug object file from scratch (rather than mutating the existing one)

Let me see if I understand that right. At the moment LinkGraphBuilder skips DWARF sections in createNormalizedSections() so they will not appear in the linker's resulting target memory allocation. Instead we decide about requesting the debug object i the plugin after we built the LinkGraph and make a separate allocation for it. You propose that we could change that and let the LinkGraph prepare the debug sections in a format that could be passed to the debugger? The plugin could query the target memory addresses from the LinkGraph and pass them to the debugger.

I think we'd still need to patch section addresses to inform the debugger about final locations, but otherwise it sounds reasonable. We will need a way to tell the LinkGraphBuilder whether or not to include them (i.e. the ObjectLinkingLayer plugin is present/enabled or not) in order to avoid overhead in non-debug sessions right? ObjectLinkingLayer::emit() could accomplish it by passing a flag to createLinkGraphFromObject(), but I don't see a way to hook into it from a plugin context currently. How could we solve that?

(1) We shouldn't special-case the JITLinkContext callbacks except as a last resort.

We'd avoid the extra callback parameter in notifyLoaded(), but we'd need some other API function instead. Or would you propose not to use an ObjectLinkingLayer plugin altogether?

  • support debug info for raw LinkGraphs (very attractive for long term performance -- avoids serializing / deserializing to objects),

Oh interesting. Thinking about synthetic debug info for compiler/linker injected code..

  • reduce memory usage by not redundantly representing non-debug sections in the debug object

I think we could already strip them in the current approach. LLDB at least doesn't seem to touch it.

lhames added a comment.EditedFeb 14 2021, 4:00 PM

Hi Lang, thanks for your feedback!

(2) If we constructed the debug object file from scratch (rather than mutating the existing one)

Let me see if I understand that right. At the moment LinkGraphBuilder skips DWARF sections in createNormalizedSections() so they will not appear in the linker's resulting target memory allocation. Instead we decide about requesting the debug object i the plugin after we built the LinkGraph and make a separate allocation for it. You propose that we could change that and let the LinkGraph prepare the debug sections in a format that could be passed to the debugger? The plugin could query the target memory addresses from the LinkGraph and pass them to the debugger.

That's not quite what I had in mind (though it's also something we could discuss).

What I meant was that right now RuntimeDyld copies the entire incoming object (text, data and all) then mutates the load addresses in the header and passes the resulting object to the debugger. What I'm suggesting is that we consider constructing a new object file (ELF, MachO, COFF) from scratch to hold the debug info that we want to pass to the debugger. This synthesized object file should (ideally) not need to carry anything but the headers and debug info. How we transport that to the target process would be a separate question.

I think we'd still need to patch section addresses to inform the debugger about final locations, but otherwise it sounds reasonable.

In the scheme I'm suggesting we'd just be synthesizing new load commands for the sections in the LinkGraph. This would be a post-allocation pass, so we could inspect the graph to see where each section referred to by the debug info starts and ends.

We will need a way to tell the LinkGraphBuilder whether or not to include them (i.e. the ObjectLinkingLayer plugin is present/enabled or not) in order to avoid overhead in non-debug sessions right? ObjectLinkingLayer::emit() could accomplish it by passing a flag to createLinkGraphFromObject(), but I don't see a way to hook into it from a plugin context currently. How could we solve that?

I think the aim would be that we just don't install the debugging support plugin if debugging is not enabled.

(1) We shouldn't special-case the JITLinkContext callbacks except as a last resort.

We'd avoid the extra callback parameter in notifyLoaded(), but we'd need some other API function instead. Or would you propose not to use an ObjectLinkingLayer plugin altogether?

No, this should definitely be an object linking layer plugin, but I think we should be synthesizing / modifying the debug object in a jitlink pass installed by the plugin. That's the way EH-frame registration now works in the runtime prototype: https://github.com/lhames/llvm-project/blob/315f9f15e7233a5c956022201ac898f94196802a/llvm/lib/ExecutionEngine/Orc/MachOPlatform.cpp#L825 .

  • support debug info for raw LinkGraphs (very attractive for long term performance -- avoids serializing / deserializing to objects),

Oh interesting. Thinking about synthetic debug info for compiler/linker injected code..

Exactly. One day

  • reduce memory usage by not redundantly representing non-debug sections in the debug object

I think we could already strip them in the current approach. LLDB at least doesn't seem to touch it.

Yeah. I think we're converging on the same solution from different ends: You can think of this as stripping the incoming object back to contain only what's needed for the debugger, or you can think of it as building up a new object (using some source of debug info) for the debugger. The latter generalizes to support debug info in LinkGraphs.

If we take the synthesizing-a-new-object approach it means that we'll need to represent the debug info in the graphs, at least initially. I guess a sketch of the approach would look like this:

  1. LinkGraphBuilders stop skipping debug info (this adds some overhead in the case where objects have debug info, but debug info is not used by the JIT. I think that's tolerable for now, we can revisit later if we want to make this optional).
  2. JITLinkGDBLoaderPlugin installs two plugins that cooperate to synthesize and register a debug object:
    1. A pre-allocation pass synthesizes a debug object by copying the data out of the debug sections in the graph. In the synthesized object section and symbol target addresses are initially zero. The pass then deletes the debug sections from the graph.
    2. A post-allocation pass mutates the synthesized debug object based on the final target addresses in the graph. It then calls the runtime to register the debug info with the gdb registration API.

Does that sound reasonable?

sgraenitz added a comment.EditedFeb 16 2021, 2:15 AM

we should be synthesizing / modifying the debug object in a jitlink pass installed by the plugin

Aha, plugins can install linker passes! I didn't give enough attention to this. Sounds like it could make some extra infra in the current proposal obsolete (i.e. the callback and from the plugin to the linker driver).

  1. LinkGraphBuilders stop skipping debug info (this adds some overhead in the case where objects have debug info, but debug info is not used by the JIT. I think that's tolerable for now, we can revisit later if we want to make this optional).
  2. JITLinkGDBLoaderPlugin installs two plugins that cooperate to synthesize and register a debug object:
    1. A pre-allocation pass synthesizes a debug object by copying the data out of the debug sections in the graph. In the synthesized object section and symbol target addresses are initially zero. The pass then deletes the debug sections from the graph.
    2. A post-allocation pass mutates the synthesized debug object based on the final target addresses in the graph. It then calls the runtime to register the debug info with the gdb registration API.

Does that sound reasonable?

Yes, I will have a look at the API to install linker passes and see how far I come.

This is very exciting.

This is adopting RuntimeDyld's approach of patching section addresses to inform the debugger about final locations. I think this is ideal for a prototype, but probably not the right approach to the final version. I've only got two loosely formed thoughts on this so far:

(1) We shouldn't special-case the JITLinkContext callbacks except as a last resort. If we have to use them at all we should prefer making any information required available via generic structures like the LinkGraph. In this case I think you could build a SectionRange for each JITLink::Section rather than rely on the SegmentLayout.

(2) If we constructed the debug object file from scratch (rather than mutating the existing one) then we could

  • support debug info for raw LinkGraphs (very attractive for long term performance -- avoids serializing / deserializing to objects),
  • reduce memory usage by not redundantly representing non-debug sections in the debug object
  • potentially dead strip debug info to further reduce memory overhead

Whether any of this is reasonable, and how much would it would be is another question. Probably time for me to learn some DWARF. :)

Dave -- do you have any immediate thoughts?

No immediate thoughts - happy to chat more about it, though. We might have to meet in the middle with regards to learning things here - I don't know too much about these JIT debugger APIs or what options they might have with regards to their input formats, resolution of addresses, etc.

sgraenitz added a comment.EditedFeb 16 2021, 2:28 PM

If we constructed the debug object file from scratch (rather than mutating the existing one)

While this sounds reasonable to me, it appears to add quite some effort. I guess we could grow our own writer that can emit in-memory ELF objects in some specific format, but there's a load of details that would need consideration sooner or later (recalculating offsets, compressed sections, etc.). Eventually, doing it right would probably mean something like llvm-objcopy --only-keep-debug. The implementation of it is buried with the tool implementation currently, so I can hardly "just use" it: https://reviews.llvm.org/rG5ad0103d8a04cb066dfae4fc20b0dfcd9413f4d4

What do you think?

(I will try a hack in the next days to see whether it's viable at all, but I'd would keep hopes low that we get there anytime soon.)

If we constructed the debug object file from scratch (rather than mutating the existing one)

While this sounds reasonable to me, it appears to add quite some effort. I guess we could grow our own writer that can emit in-memory ELF objects in some specific format, but there's a load of details that would need consideration sooner or later (recalculating offsets, compressed sections, etc.). Eventually, doing it right would probably mean something like llvm-objcopy --only-keep-debug. The implementation of it is buried with the tool implementation currently, so I can hardly "just use" it: https://reviews.llvm.org/rG5ad0103d8a04cb066dfae4fc20b0dfcd9413f4d4

What do you think?

(I will try a hack in the next days to see whether it's viable at all, but I'd would keep hopes low that we get there anytime soon.)

I think MachOWriter and ELFWriter seem very useful outside llvm-objcopy. @MaskRay -- Do you think it would be reasonable to move those classes into a library? Maybe Object? Or BinaryFormat?

MaskRay added a comment.EditedFeb 16 2021, 11:11 PM

If we constructed the debug object file from scratch (rather than mutating the existing one)

While this sounds reasonable to me, it appears to add quite some effort. I guess we could grow our own writer that can emit in-memory ELF objects in some specific format, but there's a load of details that would need consideration sooner or later (recalculating offsets, compressed sections, etc.). Eventually, doing it right would probably mean something like llvm-objcopy --only-keep-debug. The implementation of it is buried with the tool implementation currently, so I can hardly "just use" it: https://reviews.llvm.org/rG5ad0103d8a04cb066dfae4fc20b0dfcd9413f4d4

What do you think?

(I will try a hack in the next days to see whether it's viable at all, but I'd would keep hopes low that we get there anytime soon.)

I think MachOWriter and ELFWriter seem very useful outside llvm-objcopy. @MaskRay -- Do you think it would be reasonable to move those classes into a library? Maybe Object? Or BinaryFormat?

Hey. I am happy to share what I know about ELF, but I never look into jit in detail. (I should do that at some point..)

There is a feature request about "librarify llvm-objcopy": https://bugs.llvm.org/show_bug.cgi?id=41044
There was incomplete work in 2019. The latest attempt is https://lists.llvm.org/pipermail/llvm-dev/2021-January/147892.html (I haven't followed it).

As someone who implemented llvm-objcopy --only-keep-debug, I know that llvm-objcopy has two writers, one placing segments than sections, the other one (--only-keep-debug) placing sections then segments.
If you look at objcopy/llvm-objcopy --add-section and think it can do JIT job,,, you may get disappointed😓 The feature does not affect program headers (especially the most important PT_LOAD)...
so generally --add-section is probably only meaningful for relocatable object files (ET_REL) and non-SHF_ALLOC sections in executable and shared object files (ET_EXEC, ET_DYN).

With my current limited understanding about JIT, I think you may be able to take inspiration from some llvm-objcopy code, but using it as a library to do what you intend to do ... may be really difficult.

I think you can probably take more inspiration from the link editor LLD than llvm-objcopy... Placing sections & segments in a usable way is difficult. lld/ELF/Writer.cpp has fairly sophisticated (complicated) code doing that.

Hey. I am happy to share what I know about ELF, but I never look into jit in detail. (I should do that at some point..)

Thanks for your pointers and estimation!

There is a feature request about "librarify llvm-objcopy": https://bugs.llvm.org/show_bug.cgi?id=41044
There was incomplete work in 2019. The latest attempt is https://lists.llvm.org/pipermail/llvm-dev/2021-January/147892.html (I haven't followed it).

Good to know that's basically in-progress already. It seems the overall refactoring in D88827 stagnated only because it requires some additional work in preparation (i.e. D91028 and D91693 which are both in active development). That's reinforcing my first impression: might be helpful, but it's a lot of work.

As someone who implemented llvm-objcopy --only-keep-debug, I know that llvm-objcopy has two writers, one placing segments than sections, the other one (--only-keep-debug) placing sections then segments.
If you look at objcopy/llvm-objcopy --add-section and think it can do JIT job,,, you may get disappointed😓 The feature does not affect program headers (especially the most important PT_LOAD)...
so generally --add-section is probably only meaningful for relocatable object files (ET_REL) and non-SHF_ALLOC sections in executable and shared object files (ET_EXEC, ET_DYN).

With my current limited understanding about JIT, I think you may be able to take inspiration from some llvm-objcopy code, but using it as a library to do what you intend to do ... may be really difficult.

JITLink does deal with relocatable object files (both, input and debug-output to LLDB), so maybe the functionality behind --add-section would be applicable in our case? However, I didn't have a closer look and it's really not something I'd want to judge.

Meanwhile, my quick hack is backing your thesis: I stripped my jitbp.o input object with llvm-objcopy --only-keep-debug on disk and fed that into LLDB as the debug-object for jitbp.o (including patched load-addresses for .text and .eh_frame). Now LLDB fails to calculate the correct address for the breakpoint site in my initial example:

(lldb) b jitbp
warning: failed to set breakpoint site at 0x6 for breakpoint 3.1: error: 9 sending the breakpoint request
Breakpoint 3: where = JIT(0x7ffff7fcd000)`jitbp() + 6 at jitbp.cpp:1:15, address = 0x0000000000000006

Maybe this is easy to fix. LLDB might just ignore .text sections of type NOBITS and so doesn't apply the load addresses or so, but given your concerns and my unfamiliarity with the code, I'd rather not debug that now. I'd say we keep the option in mind, let the refactoring land and consider it a task for further improvement.

BTW @MaskRay llvm-objcopy --only-keep-debug reduced the size of my minimal test object by only 128 byte (4%). That matches the size of .text plus .eh_frame, which disappeared (but yes, they still get reported in the headers). However, their relocation sections .rela.text and .rela.eh_frame still exist and occupy 90 additional bytes (3%). How are they required for debug?

@lhames In my first proposal I used the ResourceKey value to identify a MaterializationResponsibility. Now I found that the EHFrameRegistrationPlugin uses ResourceKey and MaterializationResponsibility * as keys for different purposes (EHFrameRanges and InProcessLinks respectively).

Are there conceptual reasons for it? Any pros and cons?
Is the mapping between ResourceKey and MaterializationResponsibility always unique in both directions?

Thanks

sgraenitz updated this revision to Diff 324657.Feb 18 2021, 8:35 AM

Add notifyMaterializing() handlers for Plugin and JITLinkContext. The JITLoaderGDBPlugin uses it to obtain a copy of the input object file, so we don't need to touch Plugin::notifyLoaded(). Furthermore the plugin uses a JITLink post-allocation pass to record section load-addresses in target memory. Instead of reaching out to the backend, it now applies format-specific transformations of the debug object itself based on the triple provided by the pass configuration.

Unregistering, failure handling, tests and out-of-process support are still to do. I made some notes and highlighted current open questions inline.

Going forward, I think the most pressing question is testing. LLDB will have an end-to-end test, which checks that a JIT breakpoint in a minimal C++ file gets hit. It will cover all of the mandatory code paths. I am not sure though what's the best way to test cases with multiple MRs, splitting MRs, multi-threading, removing resources, etc. EHFrameRegistrationPlugin seems to have coverage for most of its code from LLJIT tests. Maybe that could be an option here too?

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
1320

New API function. The ObjectLinkingLayerJITLinkContext forwards it to the plugins. It feels quite symmetric to notifyFailed().

1370

New API function. Doesn't give the API client access to the underlying buffer.

llvm/include/llvm/ExecutionEngine/Orc/ObjectLinkingLayer.h
75

New API function. We don't really need the LinkGraph, but only providing the context seems weird.

llvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp
338

In order to make the plugin usable, we have to "ProcessAllSections". The good news is that the pruning step in linkPhase1 seems to remove all this.

llvm/tools/llvm-jitlink/llvm-jitlink-gdb-loader.cpp
207

It's quite far away from this patch, but I had a look at JITDylib::delegate(). It splits a MR in two and it looks like they both share one tracker. That might break my code currently. For now, I kept tracking debug objects by ResourceKey here, because I can get them from a MR, but not the other way around.

290

TODO: lock_guard

345

I can't reach the ExecutionSession here and there is no Error return value. Any chance we can report that?

This is due to the "never have more than one pending debug object" restriction above. Otherwise there is no way to determine the right debug object for a MR in the modifyPassConfig() and notifyLoaded() handlers. Or am I missing something?

sgraenitz updated this revision to Diff 324943.Feb 19 2021, 3:56 AM

Obtain reference to ExecutionSession in ctor and use it for error reporting

sgraenitz added inline comments.Feb 19 2021, 6:02 AM
llvm/tools/llvm-jitlink/llvm-jitlink-gdb-loader.cpp
345

Error reporting done. The "never have more than one pending debug object" restriction still holds.

sgraenitz abandoned this revision.Feb 23 2021, 2:05 PM
sgraenitz retitled this revision from [llvm-jitlink] Implement JITLoaderGDB ObjectLinkingLayer plugin for ELF x86-64 to [WIP] Implement JITLoaderGDB ObjectLinkingLayer plugin for ELF x86-64.

New proposal posted with D97335