Page MenuHomePhabricator

Initial (incomplete) implementation of JITLink - A replacement for RuntimeDyld.
ClosedPublic

Authored by lhames on Feb 26 2019, 5:49 PM.

Details

Summary

JITLink is a jit-linker and performs the same high-level task as RuntimeDyld:
it parses relocatable object files to make their contents runnable in a target
process.

JITLink aims to improve on RuntimeDyld in two primary ways:

(1) A clear design intended to maximize code-sharing while minimizing coupling.

RuntimeDyld has been developed in an ad-hoc fashion for a number of years and
this had led to intermingling of code for multiple architectures (e.g. in
RuntimeDyldELF::processRelocationRef) in a way that makes the code more
difficult to read, reason about, extend.

(2) Native support for asynchronous linking, where the symbol resolution and
finalization steps call back to linker continuations to complete the linker's
work. This allows for cleaner interoperation with the new concurrent ORC JIT
APIs, while still being easily implementable in synchronous style where
asynchrony is not needed.

To maximise sharing, the design has a hierarchy of common code:

(1) Generic atom-graph data structure and algorithms (e.g. dead stripping and

memory allocation) that are intended to be shared by all architectures.

+ -- (2) Shared per-format code that utilizes (1), e.g. Generic MachO to

|  atom-graph parsing.
|
+ -- (3) Architecture specific code that uses (1) and (2). E.g.
         JITLinkerMachO_x86_64, which adds x86-64 specific relocation
         support to (2) to build and patch up the atom graph.

To support asynchronous symbol resolution and finalization, the callbacks for
these operations take continuations as arguments:

using JITLinkAsyncLookupContinuation =
    std::function<void(Expected<AsyncLookupResult> LR)>;

using JITLinkAsyncLookupFunction =
    std::function<void(const DenseSet<StringRef> &Symbols,
                       JITLinkAsyncLookupContinuation LookupContinuation)>;

using FinalizeContinuation = std::function<void(Error)>;

virtual void finalizeAsync(FinalizeContinuation OnFinalize);

In addition to supporting the top-level design goals, JITLink also aims to fix
a number of nitpicks. The first of these (which appears in this patch) is GOT
and stub handling: RuntimeDyld's limited 'stub management' utility allows for
the creation of stubs OR GOT entries but not both, preventing the small-code
model from being implemented on some architectures (e.g. x86-64). JITLink
addresses this by allowing mutations on the atom graph prior to finalization,
which can be used to build both GOT entries and stubs.

This patch is incomplete, but I'm posting it now to allow for early feedback.
Once it is ready to land in-tree, my aim will be to develop JITLink to the point
where it could replace RuntimeDyld for MachO, then (time permitting) look at
tackling COFF and ELF too, though I will be hoping to enlist support for those
formats as I am less familiar with them.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2019, 5:50 PM

My main concern with this design is how you plan to integrate this with the existing infrastructure. Are you planning on laying this under RuntimeDyLD? Or somehow factoring out common implementation to be used by both? I'm generally bothered by the seeming duplication of such a fragile piece of infrastructure.

Since you're rewrite this, is there any potential for sharing code with lld? Relocating instructions should essentially be the same between these two.

ruiu added a subscriber: ruiu.Mar 1 2019, 3:18 PM

Since you're rewrite this, is there any potential for sharing code with lld? Relocating instructions should essentially be the same between these two.

I haven't read this patch, but as a general comment, I could say that a static linker is fairly different from a JIT linker. A JIT linker is probably more close to a dynamic linker (or a loader) than a static linker. They are situated at different stages of making a program. In the pipeline of (1) compile, (2) link, (3) load and run, the static linker is the second stage and the dynamic loader and perhaps a JIT linker are the third. So, even though we call the second and the third stage "linker", they are not too similar. Even the relocation handling is pretty different -- a static linker reads relocations and possibly creates dynamic relocations if relocations cannot be resolved at link-time. A static linker may also create GOT/PLT entries and other data structures. All of them are not needed if you are creating an in-memory image of an executable. So a JIT linker and a static linker are naturally fairly different, and I don't think it is very easy to find large commonality between them.

I was actually thinking about handling of target-specific relocations, and applying them to instructions. We used to have a large chunk of auto-generated code that did that. The implementation in lld is much shorter, so it's probably not a big deal if it needs to be repeated.

ruiu added a comment.Mar 1 2019, 3:48 PM

I was actually thinking about handling of target-specific relocations, and applying them to instructions. We used to have a large chunk of auto-generated code that did that. The implementation in lld is much shorter, so it's probably not a big deal if it needs to be repeated.

Yeah, applying a target-specific relocation is essentially just one line of code per a relocation type in lld, and given that the types of relocations that a JIT linker has to handle is more limited than a static linker, the duplication is pretty limited.

(food for thought, not a "this must be answered now" sort of thing, but "Generic atom-graph data structure and algorithms " does sound a bit like the original LLD design, which didn't turn out to be a great fit (at least the way it ended up) for ELF, at least - any ideas on what might be done differently to avoid the same situation here when adding ELF support down the line? (my naive perspective is that the mismatch was around slicing up ELF sections into MachO-like atoms on symbol boundaries was the mismatch - rather than adding support for an atom with multiple symbols (that may not be at the start of the atom) and then having a one atom to one ELF section mapping))

include/llvm/ExecutionEngine/JITLink/JITLink.h
223–224 ↗(On Diff #188488)

Might be able to get away without having specific begin/end accessors - everyone can use the range accessor instead?

(similar feedback for other range accessors in this patch)

252–253 ↗(On Diff #188488)

Any chance this container could be one of unique_ptr? I guess DenseSet doesn't support that? Could it?

454 ↗(On Diff #188488)

Should this be 'mutable' (since it's modified from a const member function like "getAddrToAtomMap() const")?

Also, I'd tend to implement the non-const version of a function in terms of the const version, rather than the other way around. That way if the object is actually const, the code doesn't hit UB.

(currently calling getAddrToAtomMap() on an actually const object would be UB)

lib/ExecutionEngine/JITLink/JITLink.cpp
57–59 ↗(On Diff #188488)

Could use

= default?

here. No big deal, but might give the frontend extra info about the triviality, etc.

lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
230–234 ↗(On Diff #188488)

This leaks or double frees if this lambda is either not called, or called more than once.

Perhaps it'd be worth using shared_ptr here for simplicity/robustness?

(fixme would still apply and worth migrating to a unique_ptr captured by move at that point)

282–288 ↗(On Diff #188488)

might be easier written as:

return std::tie(LHSP.getSectionOrdinal(), LHS->getOrdinal()) < std::tie(RHSP.getSectionOrdinal(), RHS->getOrdinal());
293–297 ↗(On Diff #188488)

Indentation seems off here

381–382 ↗(On Diff #188488)

Seem to be a few negatives in here, maybe:

assert(std::all_of(... isResolved()); ...

would be a more direct expression/match the phrasing of the assertion text below?

Oh, and there's probably llvm::all_of that takes a range directly rather than needing to decompose into begin/end

388 ↗(On Diff #188488)

I tend to advocate for [&] whenever a lambda is used within the scope it's defined (ie: doesn't leak out via a std::function or similar) - it's just another scope & seems fine for it to have access to all of the names in the outer scope without having to explicitly opt in, etc.

408 ↗(On Diff #188488)

reinterpret_casts (especially of references, I think - I'd worry about potentially turning a non-reference/pointer/thing into a reference with such a cast) make me a bit twitchy - any chance of wrapping the reinterpret_casting up in something more type safe at both ends?

lib/ExecutionEngine/JITLink/JITLinkMachO_x86_64.cpp
46–82 ↗(On Diff #188488)

Worth a small .def file or local macro to reduce the duplication?

#define MY_CASE(n) case n: return #n
MY_CASE(NegDelta32);
MY_CASE(NegDelta64);

etc...

232–238 ↗(On Diff #188488)

elses-after-returns can be removed, I think

lib/ExecutionEngine/JITLink/MachOAtomGraphBuilder.cpp
121–127 ↗(On Diff #188488)

else-after-continues can be removed

unittests/ExecutionEngine/JITLink/JITLinkTestCommon.cpp
44–45 ↗(On Diff #188488)

Using '&' capture for a lambda that escapes the current scope makes me a bit twitchy - would it be worth enumerating the things being captured here? I guess they're all members, so it's capturing 'this' by value, but still - I'd be more comfortable with that spelled out.

unittests/ExecutionEngine/JITLink/JITLinkTestCommon.h
97–99 ↗(On Diff #188488)

else-after-return

Ah, I see, scoping because of "DA" - I'd probably move DA out of the condition and early return for the error case instead:

auto DA = G.find...;
if (!DA)
  return DA.takeError();
return support::endian::read<>(...);

Reduces indentation of the main path

lhames added a comment.Mar 4 2019, 8:21 AM

My main concern with this design is how you plan to integrate this with the existing infrastructure. Are you planning on laying this under RuntimeDyLD? Or somehow factoring out common implementation to be used by both? I'm generally bothered by the seeming duplication of such a fragile piece of infrastructure.

JITLink should be more-or-less a drop-in replacement for RuntimeDyld, with a few minor API changes. My intention is to substitute it for MachO handling, first in ORC (where switching out the JIT linker is easy) and then potentially in MCJIT with something like:

if (isa<MachOObjectFile>(Obj))
  jitLink(Obj)
else
  Dyld.loadObject(Obj);

The duplication should only be until we are confident in it. If/when JITLink has proved itself a viable replacement we should deprecate and then remove RuntimeDyldMachO. Ideally we will do the same for RuntimeDyldCOFF and RuntimeDyldELF too.

lhames added a comment.Mar 4 2019, 8:42 AM

Since you're rewrite this, is there any potential for sharing code with lld? Relocating instructions should essentially be the same between these two.

I haven't read this patch, but as a general comment, I could say that a static linker is fairly different from a JIT linker...

Actually they’ve got much more in common than you would think. The commonality comes from the input format: relocatable object files. The JIT linker does indeed need to create GOT entries, PLTs, etc. It is the output formats that are really different: A static linker needs to be able to output its data structures as executables, shared objects, or relocatable object files, which requires some non-trivial formatting. A JIT linker just needs to arrange the atom contents into memory segments with the right permissions. The other big differences are that the JIT linker only links one relocatable object file at a time (at least for now), and that it requires some asynchronous callbacks to make it efficient to use within the concurrent JIT APIs.

I did think about using the atom-based LLD for this, but it’s a big dependence to bring in to LLVM, and lacked the asynchronous API support that the JIT needed. In the end I opted for a cut down initial design inspired by the atom-based LLD.

I was actually thinking about handling of target-specific relocations, and applying them to instructions. We used to have a large chunk of auto-generated code that did that. The implementation in lld is much shorter, so it's probably not a big deal if it needs to be repeated.

Yeah, applying a target-specific relocation is essentially just one line of code per a relocation type in lld, and given that the types of relocations that a JIT linker has to handle is more limited than a static linker, the duplication is pretty limited.

As noted, we actually do need to handle the full scope of relocations supported by the static linker (on the input side). There is not a lot of duplication to eliminate, but I’m thinking about switching to a named function for each relocation expression to support sharing between JITLink and its test infrastructure.

lhames marked 13 inline comments as done.Mar 6 2019, 4:01 PM

Thanks Dave! I've got a patch that addresses your feedback that I will post from the bus in a minute.

include/llvm/ExecutionEngine/JITLink/JITLink.h
223–224 ↗(On Diff #188488)

Yeah — I like that idea.

252–253 ↗(On Diff #188488)

I had not thought of that. I can’t see any reason that would not work. I will try it out.

454 ↗(On Diff #188488)

Good call.

lib/ExecutionEngine/JITLink/JITLink.cpp
57–59 ↗(On Diff #188488)

It had never occurred to me to use =default in an out-of-line definition before. :P

Good call.

lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
230–234 ↗(On Diff #188488)

Yep. The problem is that once you convert to shared_ptr you cannot convert back. I think I opted to keep this as a raw pointer for ease of conversion, on the assumption that we're switching to c++14 *reasonably* soon.

282–288 ↗(On Diff #188488)

That's a neat trick, but once formatted for 80 columns I am not sure it's any more readable.

293–297 ↗(On Diff #188488)

Hrm. Clang format is doing awful things to this. Will hand-format.

381–382 ↗(On Diff #188488)

There is! Thanks. :)

408 ↗(On Diff #188488)

Huh. This was a think-o: it should have been a static_cast. Fixed in the latest iteration of the dead-stripping code.

lib/ExecutionEngine/JITLink/JITLinkMachO_x86_64.cpp
232–238 ↗(On Diff #188488)

Indeed.

lib/ExecutionEngine/JITLink/MachOAtomGraphBuilder.cpp
121–127 ↗(On Diff #188488)

Updated.

unittests/ExecutionEngine/JITLink/JITLinkTestCommon.cpp
44–45 ↗(On Diff #188488)

Yep -- 'this' would be better here. Fixed.

unittests/ExecutionEngine/JITLink/JITLinkTestCommon.h
97–99 ↗(On Diff #188488)

Interesting. I tend to prefer the if/else framing because it feels like the "success/failure" state on each path is more obvious. I only switch to early outs for Expected values if I think the indentation will make things really unreadable.

lhames updated this revision to Diff 189615.Mar 6 2019, 4:34 PM
  • Change the way dead stripping is performed.
  • Flesh out initial JITLink MachO/x86-64 unit test.
  • clang-format
  • Address dblaikie's review feedback, also make Atom/DefinedAtom/Section
This comment was removed by sgraenitz.
lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
293–297 ↗(On Diff #188488)
LLVM_DEBUG({
  ...
});

usually works well

As a side note: it would be very helpful if you could add links to the specs you used to implement relocations.

include/llvm/ExecutionEngine/JITLink/JITLink.h
242 ↗(On Diff #189615)

There seems to be a discrepancy: within the Edge, the Offset and Addend are defined as uint32_t and int64_t respectively, but here they are initialized with JITTargetAddress which is uint64_t.

414 ↗(On Diff #189615)

Just curious, is it necessary to have both getAtomByName/findAtomByName publicly available (same for ..AtomByAddress)?
As an LLVM APIs user, I always confused when I see such duality.
Also, I think the getAtomByName/Address are dangerous: if you have assertions disabled (as it is in release builds), then a user will get undefined behavior if the element is not in the container.

lib/ExecutionEngine/JITLink/JITLinkGeneric.h
35 ↗(On Diff #189615)

Seeing a unique_ptr here, does it mean that a user of the API would have to give up the ownership of the object file?
If that's the case, then, in my humble opinion, this API should be reconsidered: either take ownership from the very beginning or do not take it at all.

54 ↗(On Diff #189615)

How does dead-strip work? Is it possible to control what is stripped?
Here is the use-case I'm concerned about:

void *address = 0;
void driver() {
  void (*fun)() = address;
  fun();
}

void foo() {}
void bar() {}

There is a function that loads the address of another function from a global variable and calls it. Initially, the global address is zero, but it is changed by the JIT client, like this (pseudo-code):

address = jit.getAddress("foo")
jit.callFunction("driver")
address = jit.getAddress("bar")
jit.callFunction("driver")

Techically, the foo and bar are dead, but in fact they are used.

lib/ExecutionEngine/JITLink/JITLink_MachO_x86_64.cpp
562 ↗(On Diff #189615)

Does it make sense to at least add some logs here, so that an end user will see what's going wrong with their code?

dblaikie added inline comments.Mar 31 2019, 9:00 AM
lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
282–288 ↗(On Diff #188488)

FWIW, I find the tie version easier to read because I don't have to think about the interesting true/false/less than/greater than cases in the custom version you have here. I can't eyeball exactly what this ordering is.

unittests/ExecutionEngine/JITLink/JITLinkTestCommon.h
97–99 ↗(On Diff #188488)

*nod*

For myself, I prefer the "conditionalize the failure/error handling" as a consistent strategy (since it's more necessary in longer functions (where indenting every successful branch starts to get a bit arrow-y ( https://blog.codinghorror.com/flattening-arrow-code/ )) , but I like to adopt a format that's consistent across short and longer code (so it doesn't need to be restructured if there ends up being a second error path in the code)

Yeah, I wish there was slightly tidier syntax for it - for sure (the convenience of variable declared in if scopes is kinda nice - makes it very clear that the if/else block only occurs when the entity is true/false - rather than having to parse some (even just slightly, with a single !) non-trivial condition to check)

lhames marked 6 inline comments as done.Apr 8 2019, 5:01 PM

New patch coming soon: more complete relocation support, weak symbol support, testing infrastructure, and a new llvm-jitlink tool (the jitlink counterpart to llvm-rtdyld).

include/llvm/ExecutionEngine/JITLink/JITLink.h
242 ↗(On Diff #189615)

Good point. This will be fixed in the upcoming patch.

414 ↗(On Diff #189615)

Their purposes are slightly different: Sometimes you know from context that a certain address or symbol name is *definitely* present (usually because you would have error’d out earlier in the code if it wasn’t). In this case you can use the “get*” versions of the APIs and bypass the redundant error check. By contract you are only allowed to call these atoms if you are certain they will succeed.

Otherwise you should use the find* APIs, which do not assume that the query will succeed.

Of note: JITLink implementations should assume potentially malicious input: just because a well formed binary *should* have a particular symbol in it, does not mean that you are allow to assume that it will.

lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
293–297 ↗(On Diff #188488)

That's a handy tip. Thanks Stefan!

lib/ExecutionEngine/JITLink/JITLinkGeneric.h
35 ↗(On Diff #189615)

I am not quite sure what you mean here? Passing ownership into the constructor seems like taking ownership from the get-go?

In any case this has actually changed in the most recent version of the patch, which I will upload shortly: The new version takes ownership of a Context object instead, and its up to the client whether or not their context class owns the underlying buffer.

54 ↗(On Diff #189615)

The details of dead-stripping have changed a little bit in the yet-to-be-posted patch, but the basic idea is that the target provides a sensible default “live-marker” pass which sets the live symbol roots.

Clients are definitely free to add passes to mark additional definitions live (or remove the live markers), and there is a convenience pass for tagging all atoms live.

The default MachO live-marker pass marks all external symbols live, so in your example ‘foo’ and ‘bar’ would be kept by default.

lib/ExecutionEngine/JITLink/JITLink_MachO_x86_64.cpp
562 ↗(On Diff #189615)

I should have used a different comment for these: they are “precommit fixmes”. I wanted to get the high level API design up for review while I worked on the details. The intention is that all relocations should be supported by the time this review is complete. :)

lhames updated this revision to Diff 194414.Apr 9 2019, 3:57 PM
  • Lots of changes. Headlines:
lhames added a comment.Apr 9 2019, 3:57 PM
  • Lots of changes. Headlines:

Oops. The commit message got cut off in this update. Here it is:

Lots of changes. Headlines:

(1) Changes the way dead stripping is performed: Atoms now have a 'live' flag,
which should be set by a pass (that may be supplied by the target or the
client). The dead-strip algorithm propagates liveness from the original set of
live atoms along edges to all reachable atoms, then deletes any atoms not marked
live.

(2) Weak symbol support: JITLink now supports discarding duplicate weak
definitions (despite JITLink operating on a single file at a time) by setting a
"should-discard" flag on definitions to be dropped. If the should-discard flag
is set on an atom it will be discarded during dead-stripping. (Liveness flags
are not propagated through atoms marked should-discard, to prevent them from
spuriously keeping other atoms live).

(3) New 'llvm-jitlink' tool for testing: This is the JITLink equivalent of
llvm-rtdyld: it allows us to link object files from the command line and execute
their contents or perform tests.

(4) New JITLink based ObjectLinkingLayer in ORC: This is the JITLink equivalent
RTDyldObjectLinkingLayer, and is used in the implementation of llvm-jitlink.

(5) Most MachO/x86-64 relocations now supported. More to come before commit.
lhames updated this revision to Diff 195605.Apr 17 2019, 11:27 AM

Another big update.

  • Improved eh-frame support. C++ exception handling now works in JIT'd code (for MachO/x86-64).
  • The llvm-jitlink tool now supports basic RuntimeDyld checker tests (support for remapping sections and injecting symbols still to come)
  • Add a 'global' flag to atoms so that we don't try to resolve local symbols in Orc (avoiding 'resolving outside responsibility set' assertions in ORC Core).
  • Supports all MachO/x86-64 relocations except for TLV.

Regression test cases and full RuntimeDyldChecker support still to come.

lhames updated this revision to Diff 195669.Apr 17 2019, 8:42 PM
  • Fix some section layout and test infrastructure bugs, start work on an initial testcase.
lhames updated this revision to Diff 195962.EditedApr 19 2019, 10:30 PM

More updates:

Fix SUBTRACTOR relocation logic for anonymous atoms.

Add checks for all regular relocations to the MachO/x86-64 testcase (eh-frame and TLV still need tests).

Get the code building on Linux.

Now that the unit and regression test infrastructure is in place I think it's time to move development of this to trunk. I'm going to commit tomorrow morning when I can keep an eye on the bots.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 20 2019, 10:10 AM
This revision was automatically updated to reflect the committed changes.
thakis added inline comments.
llvm/trunk/test/ExecutionEngine/JITLink/MachO_x86-64_relocations.s
3

Should llvm/test/CMakeLists.txt be updated to have a dependency on llvm-jitlink? Looks like check-llvm will now call llvm-jitlink but there's no dependency in place to make sure it's actually built as far as I can tell.

thakis added a comment.EditedApr 21 2019, 4:42 PM

I fixed the build on Windows; looks like maskray added llvm-jitlink to llvm/test/CMakeLists.txt -- thanks!

bjope added a subscriber: bjope.Apr 23 2019, 1:52 PM
bjope added inline comments.
llvm/trunk/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
161

This is a post commit comment, but I just wanted to inform that the CodingStandards.rst says "As a rule of thumb, always make sure to use llvm::sort instead of std::sort.". Afaik there has been some general cleanup to replace all std::sort uses by llvm::sort. It might still be ongoing for some projects but I think "llvm" was cleaned up some time ago.

(I think main problem with std::sort is when having several equal keys, I haven't looked at the code here to see if it is a problem, I only noticed that some new uses of std::sort had popped up when browsing the code)

One crucial part that I found missing here is a way to connect JITEventListener (like GDBJITRegistration, PerfJITEvents, etc.), because the new ObjectLinkingLayer::NotifyLoaded function only provides the module key but no ObjectFile so we cannot create and pass over a LoadedObjectInfo as with RuntimeDyld. Please see my inline comments for some pointers.

While support for JITEventListener was never implemented for MachO, it has been working well for ELF and should be supported design-wise (while ELF support is in progress).

@lhames Can you advise a good way to pass out ownership of the ObjectFile created in JITLinker::buildGraph() back to ObjectLinkingLayer? IIUC we could allow to query it by ID from there and create something like a LoadedObjectInfo independently.

llvm/trunk/lib/ExecutionEngine/JITLink/JITLink_MachO_x86_64.cpp
460

ObjectFile is instantiated here.

464

ObjectFile gets deleted here.

llvm/trunk/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp
137

Notification function only provides the module key.

154

Materialization can still fail in notifyFinalized() and ownership of allocation is only passed back to ObjectLinkingLayer here.

346

jitLink() instantiates the JITLinker that owns the memory buffer and creates an ObjectFile. None of this is accessible to the outside during materialization.

Maybe not the most elegant solution, but this seems to work: https://reviews.llvm.org/D61065